In-Person Oral presentation / top 25% paper

Multi-lingual Evaluation of Code Generation Models

Ben Athiwaratkun ⋅ Sanjay Krishna Gouda ⋅ Zijian Wang ⋅ Xiaopeng Li ⋅ YUCHEN TIAN ⋅ Ming Tan ⋅ Wasi Ahmad ⋅ Shiqi Wang ⋅ Qing Sun ⋅ Mingyue Shang ⋅ Sujan Kumar Gonugondla ⋅ Hantian Ding ⋅ Varun Kumar ⋅ Nathan Fulton ⋅ Arash Farahani ⋅ Siddhartha Jain ⋅ Robert Giaquinto ⋅ Haifeng Qian ⋅ Murali Krishna Ramanathan ⋅ Ramesh Nallapati ⋅ Baishakhi Ray ⋅ Parminder Bhatia ⋅ Sudipta Sengupta ⋅ Dan Roth ⋅ Bing Xiang

Keywords: Deep Learning and representational learning language models zero-shot code translation execution-based evaluation robustness for code multi-lingual code generation mono-lingual code translation language models. code insertion test-based evaluation multi-lingual code generation benchmark Code Summarization

2023 In-Person Oral presentation / top 25% paper

Abstract

We present two new benchmarks, MBXP and Multilingual HumanEval, designed to evaluate code completion models in over 10 programming languages. These datasets are generated using a conversion framework that transpiles prompts and test cases from the original MBPP and HumanEval datasets into the corresponding data in the target language. By using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities. In addition, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages, which can be used for other code-related evaluations such as code insertion, robustness, or summarization tasks.

Video

Chat is not available.