An exciting application area of machine learning and deep learning methods is completion, repair, synthesis, and automatic explanation of program code. This field has received a fair amount of attention in the last decade, yet arguably the recent application of large scale language modelling techniques to the domain of code holds a tremendous promise to completely revolutionize this area. The new large pretrained models excel at completing code and synthesizing code from natural language descriptions; they work across a wide range of domains, tasks, and programming languages. The excitement about new possibilities is spurring tremendous interest in both industry and academia. Yet, we are just beginning to explore the potential of large-scale deep learning for code, and state-of-the-art models still struggle with correctness and generalization. This calls for platforms to exchange ideas and discuss the challenges in this line of work. Deep Learning for Code (DL4C) is a workshop that will provide a platform for researchers to share their work on deep learning for code.DL4C welcomes researchers interested in a number of topics, including but not limited to: AI code assistants, representations and model architectures for code, pretraining methods, methods for producing code from natural language, static code analysis and evaluation of deep learning for code techniques.
Fri 5:00 a.m. - 5:15 a.m.
|
Opening Remarks
(
Announcement
)
|
🔗 |
Fri 5:15 a.m. - 6:00 a.m.
|
Deep Learning Models for Bug Detection and Repair
(
Invited Talk
)
SlidesLive Video » While generative models for code completion are currently popular, code construction is only a small part of software development. Instead, code maintenance spans a much larger proportion of software development. One way to support such activities is through learned program analyses, However, token-based representations of code have been shown to underperform for such tasks. In this talk, I discuss graph and hypergraph representations of code that can be used with deep learning models for program analyses. Then, I illustrate how such models can be used towards finding and fixing seemingly simple but hard-to-find bugs. I conclude by discussing open challenges and opportunities in this area. |
Miltiadis Allamanis 🔗 |
Fri 6:00 a.m. - 6:45 a.m.
|
Learning to Program by Learning to Read
(
Invited Talk
)
In the age of deep networks, "learning" almost invariably means "learning from examples". Image classifiers are trained with large datasets of (labeled or unlabeled) images, machine translation systems with corpora of translated sentences, and robot policies with demonstrations. But when human learners acquire new concepts and skills, we often do so with richer supervision, especially in the form of language---we learn new concepts from exemplars accompanied by descriptions or definitions, and new skills from demonstrations accompanied by instructions. In natural language processing, recent years have seen a number of successful approaches to learning from task definitions and other forms of auxiliary language-based supervision. But these successes have been largely confined to tasks that also involve language as an input and an output. What will it take to make language-based training useful for other learning problems? In this talk, I'll present some recent results on using natural language to guide both search and library learning in inductive program synthesis, and discuss connections to the role of language in human concept learning. |
Jacob Andreas 🔗 |
Fri 6:45 a.m. - 7:00 a.m.
|
Coffee Break
|
🔗 |
Fri 7:00 a.m. - 7:10 a.m.
|
Learning to Superoptimize Real-World Programs
(
Best Paper Spotlight
)
Program optimization is the process of modifying software to execute more efficiently. Superoptimizers attempt to find the optimal program by employing significantly more expensive search and constraint solving techniques. Generally, these methods do not scale well to programs in real development scenarios, and as a result superoptimization has largely been confined to small-scale, domain-specific, and/or synthetic program benchmarks. In this paper, we propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models. We created a dataset consisting of over 25K real-world x86-64 assembly functions mined from open-source projects and propose an approach, Self Imitation Learning for Optimization (SILO) that is easy to implement and outperforms a standard policy gradient learning approach on our dataset. Our method, SILO, superoptimizes 5.9% of our test set when compared with the gcc version 10.3 compiler’s aggressive optimization level -O3. We also report that SILO’s rate of superoptimization on our test set is over five times that of a standard policy gradient approach and a model pre-trained on compiler optimization demonstration. |
Alexander Shypula · Pengcheng Yin · Jeremy Lacomis · Claire Le Goues · Edward Schwartz · Graham Neubig 🔗 |
Fri 7:10 a.m. - 7:20 a.m.
|
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
(
Spotlight
)
SlidesLive Video » Recent works has widely adopted large language model pretraining for source code, suggested source code-specific pretraining objectives and investigated the applicability of various Transformer-based language model architectures for source code. This work investigates another important aspect of such models, the effect of different subtokenization options, and aims at identifying most effective and length-efficient subtokenizations, taking into account source code specifics. We propose subtokenziation that reduces average length by 17--40% without downstream performance drop, and show that a carefully chosen subtokenization may significantly improve quality by 0.5-2%, possibly with some length increase. |
Nadezhda Chirkova · Sergei Troshin 🔗 |
Fri 7:20 a.m. - 7:30 a.m.
|
NS3: Neuro-Symbolic Semantic Code Search
(
Spotlight
)
Semantic code search is the task of retrieving a code snippet given a textual description of its functionality. Recent work has been focused on using similarity metrics between neural embeddings of text and code. However, current language models are known to struggle with longer, compositional sentences, and multi-step reasoning. To overcome this limitation, we propose supplementing the query sentence with a layout of its semantic structure. The semantic layout is used to break down the final reasoning decision into a series of lower-level decisions. We use a Neural Module Network architecture to implement this idea. We compare our model - $NS^3$ (Neuro-Symbolic Semantic Search) - to a number of baselines, including state-of-the-art semantic code retrieval methods, such as CodeBERT, CuBERT and GraphCodeBERT, and evaluate on two datasets - Code Search Net (CSN) and Code Search and Question Answering (CoSQA). On these datasets, we demonstrate that our approach results in higher performance. We also perform additional studies to show the effectiveness of our modular design when handling compositional queries.
|
Shushan Arakelyan · Anna Hakhverdyan · Miltiadis Allamanis · Christophe Hauser · Luis Garcia · Xiang Ren 🔗 |
Fri 7:30 a.m. - 8:15 a.m.
|
In-IDE Code Generation from Natural Language: Promise and Challenges
(
Invited Talk
)
One major difficulty of programming is turning concepts into code, especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries, but these have primarily been evaluated purely based on execution accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this talk, I will describe a user study in which we performed a comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking questions such as “does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” |
Graham Neubig 🔗 |
Fri 8:15 a.m. - 9:00 a.m.
|
Competitive Programming with AlphaCode
(
Invited Talk
)
Programming is a powerful problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent large-scale language models have demonstrated impressive abilities to generate code, however they still perform poorly on more complex tasks that require problem-solving skills, such as competitive programming problems. In this talk we'll present AlphaCode, the motivations of the project and the design decisions we made. AlphaCode is a system for code generation that achieved an average ranking of top 54.3% in simulated evaluations on popular, recent programming competitions on the Codeforces platform. AlphaCode's success stemmed from: large transformer-based models, using a novel combination of architectural, training, and prompting modifications; extensive datasets; efficient large-scale sampling; and filtering and clustering-based sample selection. This marks the first time an artificial intelligence system has performed competitively in programming competitions. |
David Choi · Yujia Li 🔗 |
Fri 9:00 a.m. - 10:00 a.m.
|
Lunch Break
|
🔗 |
Fri 10:00 a.m. - 11:00 a.m.
|
Panel Discussion
(
Discussion Panel
)
|
Miltiadis Allamanis · Jacob Andreas · Graham Neubig · David Choi · Yujia Li · Jerry Tworek · Xinyun Chen 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Code Summarization: Do Transformers Really Understand Code?
(
Poster
)
link »
Recent approaches for automatic code summarization rely on fine-tuned transformer based language Models often injected with program analysis information. We perform empirical studies to analyze the extent to which these models understand the code they attempt to summarize. We observe that these models rely heavily on the textual cues present in comments/function names/variable names and that masking this information negatively impacts the generated summaries. Further, subtle code transformations which drastically alter program logic have no corresponding impact on the generated summaries. Overall, the quality of the generated summaries even from State-Of-The-Art models is quite poor, raising questions about the utility of current approaches and datasets. |
Ankita Sontakke · Manasi Patwardhan · Lovekesh Vig · Raveendra Kumar Medicherla · Ravindra Naik · Gautam Shroff 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Patch Generation with Language Models: Feasibility and Scaling Behavior
(
Poster
)
link »
Large language models have shown a propensity for generating correct, multi-line programs from natural language prompts. Given past findings highlighting that bugs and patches can be distinguished by predictability according to simple language models, it is natural to ask if modern, large neural options lend themselves especially well to program repair without any calibration. We study this in the context of one-line bugs, providing a series of models of varying scales (from 160M to 12B parameters) with the context preceding a buggy line in 72 Java and Python programs and analyze the rank at which the correct patch (and original buggy line) is generated, if at all. Our results highlight a noticeable correlation of model size with test-passing accuracy and patch ranking quality, as well as several other findings related to the differences between the two languages and the propensity for especially the largest models to generate candidate patches that closely resemble (if not exactly match), the original developer patch. |
Sophia Kolak · Ruben Martins · Claire Le Goues · Vincent Hellendoorn 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Compositional Generalization and Decomposition in Neural Program Synthesis
(
Poster
)
link »
When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, what we can measure is whether they compositionally generalize, that is, whether a model that has been trained on the simpler subtasks is subsequently able to solve more complex tasks. In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize. We first characterize several different axes along which program synthesis methods would be desired to generalize, e.g., length generalization, or the ability to combine known subroutines in new ways that do not occur in the training data. Based on this characterization, we introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets, SCAN and RobustFill. Finally, we make first attempts to improve the compositional generalization ability of Transformer models along these axes through novel attention mechanisms that draw inspiration from a human-like decomposition strategy. Empirically, we find our modified Transformer models generally perform better than natural baselines, but the tasks remain challenging. |
Kensen Shi · Joey Hong · Manzil Zaheer · Pengcheng Yin · Charles Sutton 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Learning to Walk over Relational Graphs of Source Code
(
Poster
)
link »
Information-rich relational graphs have shown great potential in designing effective representations of code for program-understanding tasks. However, the wealth of structural and semantic information in such graphs can overwhelm models, because of their limited input size. A promising approach for overcoming this challenge is to gather presumed-relevant but smaller context from a larger graph, and random walks over graphs was one of the first such approaches discovered. We propose a deep-learning approach that improves upon random walks by learning task-specific walk policies that guide the traversal of the graph towards the most relevant context. In the setting of relational graphs representing programs and their semantic properties, we observe that models that employ learned policies for guiding walks are 6-36% points more accurate than models that employ uniform random walks, and 0.2-3.5% points more accurate than models that employ expert knowledge for guiding the walks. |
Pardis Pashakhanloo · Aaditya Naik · Hanjun Dai · Petros Maniatis · Mayur Naik 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Scotch: A Semantic Code Search Engine for IDEs
(
Poster
)
link »
Code search is the task of finding relevant code snippets given a natural language query. In order to facilitate real time code search, we introduce Scotch, a semantic code search tool that runs within an IDE. The semantic nature of code search in Scotch allows us to leverage the semantic meaning of code via learned vector representations, while the in-IDE nature helps to improve developers' productivity by eliminating the need to navigate to web-browsers to search for code. The query used for code search is oftentimes ambiguous without the surrounding context of the search. In direct contrast to traditional search engines tailored to take a single line of input, the in-IDE nature of Scotch allows it to automatically infer code context during search and utilize it for search results. Hence, we propose the task `contextual code search' and present an analysis of how this code context can help improve the relevance of search results. Since no existing dataset could fit our task of contextual code search, we collect and contribute a dataset of about 19M functions from GitHub repositories with permissive licenses, which is the first large-scale dataset openly available for the task of contextual code search. We also present a small, manually-curated test set to assess the code ranking quality for code search. We finetune the CodeBERT model to perform code search given a natural language query with and without surrounding code context. Results from automated as well as human evaluation suggest that the inclusion of code context in search significantly improves the retrieval of the correct code snippet and slightly hinders the ranking quality among annotated code snippets. Our work provides motivation and resources for future research into contextual code search. |
Samip Dahal · Adyasha Maharana · Mohit Bansal 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Generating Programming Puzzles to Train Language Models
(
Poster
)
link »
This work shows how one can use large-scale Language Models (LMs) to automatically generate programming problems with verified solutions, in the form of “programming puzzles,” which can then in turn be used to fine-tune other LMs to solve more difficult programming puzzles. This work builds on two recent developments. First, LMs have achieved breakthroughs in non-trivial reasoning and algorithm implementation, generating code that can solve some intermediate level competitive programming problems. However, training code LMs involves curated sets of natural-language problem descriptions and source-code tests and solutions, which are limited in size. Second, a new format of programming challenge called a programming puzzle was introduced, which does not require a natural-language description and is directly specified by a source-code test. In this work we show how generating synthetic programming puzzles and solutions, verified for correctness by a Python interpreter, can be used to improve performance in solving test puzzles from P3, a public benchmark set of Python Programming Puzzles. It also opens the door to iterative self-improvement for LMs in future work. |
Patrick Haluptzok · Matthew Bowers · Adam Tauman Kalai 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Fix Bugs with Transformer through a Neural-Symbolic Edit Grammar
(
Poster
)
link »
We introduce NSEdit (neural-symbolic edit), a novel Transformer-based code repair method. Given only the source code that contains bugs, NSEdit predicts an editing sequence that can fix the bugs. The edit grammar is formulated as a regular language, and the Transformer uses it as a neural-symbolic scripting interface to generate editing programs. We modify the Transformer and add a pointer network to select the edit locations. An ensemble of rerankers are trained to re-rank the editing sequences generated by beam search. We fine-tune the rerankers on the validation set to reduce over-fitting. NSEdit is evaluated on various code repair datasets and achieved a new state-of-the-art accuracy ($24.04\%$) on the Tufano small dataset of the CodeXGLUE benchmark. NSEdit performs robustly when programs vary from packages to packages and when buggy programs are concrete. We conduct detailed analysis on our methods and demonstrate the effectiveness of each component.
|
Yaojie Hu · Xingjian Shi · Qiang Zhou · Lee Pike 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Neural Instruction Combiner
(
Poster
)
link »
Instruction combiner (IC) is a critical compiler optimization pass, which replaces a sequence of instructions with an equivalent and optimized instruction sequence at basic block level. There can be thousands of instruction-combining patterns which need to be frequently updated as new coding styles/idioms/applications and novel hardware evolve over time. This results in frequent updates to the IC optimization pass thereby incurring considerable human effort and high software maintenance costs. To mitigate these challenges associated with the traditional IC, we design and implement a Neural Instruction Combiner (NIC) and demonstrate its feasibility by integrating it into the standard LLVM compiler optimization pipeline. NIC leverages neural Seq2Seq model techniques for generating optimized encoded Intermediate Representation (IR) sequence from the unoptimized encoded IR sequence. To the best of our knowledge, ours is the first work demonstrating the feasibility of a neural instruction combiner built into a full-fledged compiler pipeline. Given the novelty of this task, we built a new dataset for training our NIC neural model. We show that NIC achieves exact match results percentage of $72\%$ for optimized sequences as compared to traditional IC and Bleu precision score of $0.94$, demonstrating its feasibility in a production compiler pipeline.
|
sandya mannarswamy · Dibyendu Das 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
On-the-fly Discovery of Local Bugs using Inconsistency Analysis
(
Poster
)
link »
Traditional bug detection mechanisms have focused on a limited set of important issues and have specialized detectors for each of them. As the code corpora continue to grow in size and complexity, newer opportunities for a developer to make mistakes emerge, leading to \textit{long tail of local bugs}. Hence, we must investigate generalizable approaches that can detect such bugs. In this paper, we formulate and use the inconsistency principle that can be applied to discover bugs at arbitrary code granularity, for example at the package level. We experiment with two types of formulations: Pointwise Mutual Information (PMI) based and Sequence based approaches that respectively model smaller and larger contexts. The techniques learn code usage patterns from the code under analysis and apply the learnings on the same code -- thereby enabling on-the-fly bug detection. Experiments are conducted with two different program representations: token-based and graph-based. We show how the different variations capture diverse and complementary types of issues. The system is deployed in industrial setting and has detected 12 types of bugs with 70\% acceptance by developers in real-world code reviews. |
Srinivasan Sengamedu · Qiang Zhou · Hangqi Zhao 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
COBRA: Enhancing DNN Latency Prediction with Language Models trained on Source Code
(
Poster
)
link »
With the recent developments of Deep Learning, having an accurate and device specific latency prediction for Deep Neural Networks (DNNs) has become important for both the manual and automatic design of efficient DNNs. Directly predicting the latency of DNNs from their source code yields significant practical benefits. It opens a way towards profilers that can instantly feedback the latency of a given piece of deep learning code to the developer.In this paper, we conduct a preliminary study for source code based latency prediction of DNNs. We introduce Code Based Runtime Approximation (COBRA), that leverages a transformer encoder to learn representations of short code snippets. These representations are then aggregated by a Graph Convolutional Network (GCN) that captures the algorithmic dependencies and that estimates the latency of the implemented DNN. Our experiments with COBRA show promising results and indicate that latency prediction from code can be competitive with traditional latency prediction methods for DNNs. |
Robin Zbinden · Lukas Mauch · Fabien Cardinaux 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Code Editing from Few Exemplars by Adaptive Multi-Extent Composition
(
Poster
)
link »
This paper considers the computer source code editing with a few exemplars. The editing exemplar, containing the original and modified support code snippets, showcases a certain editorial pattern, and code editing adapts the common pattern derived from a few support exemplars to a query code snippet. In this work, we propose a compositional deep learning approach to solve this code editing problem automatically. Our learning approach combines edit representations extracted from support exemplars and compositionally generalizes them to the query code snippet editing via multi-extent similarities ensemble. Specifically, we parse the support and query code snippets using language-specific grammar into abstract syntax trees. We apply the similarities measurement in multiple extents from individual nodes to collective tree representations for query and support sample matching, and ensemble the matching results through a similarity-ranking error estimator. We evaluate the proposed method on C# and Python datasets, and show up to 8.6\% accuracy improvements compared to non-composition baselines. |
Peizhao Li · Xuchao Zhang · Ziyu Yao · Wei Cheng · Haifeng Chen · Hongfu Liu 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
A Systematic Evaluation of Large Language Models of Code
(
Poster
)
link »
Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not publicly available, leaving many questions about their model and data design decisions. We aim to fill in some of these blanks through a systematic evaluation of the largest existing models: Codex, GPT-J, GPT-Neo, GPT-NeoX-20B, and CodeParrot, across various programming languages. Although Codex itself is not open-source, we find that existing open-source models do achieve close results in some programming languages, although targeted mainly for natural language modeling. We further identify an important missing piece in the form of a large open-source model trained exclusively on a multi-lingual corpus of code. We release a new model, PolyCoder, with 2.7B parameters based on the GPT-2architecture, which was trained on 249GB of code across 12 programming languages on a single machine. In the C programming language, PolyCoder outperforms all models including Codex. Our trained models are open-source and publicly available at https://anonymized.for.review, which enables future research and application in this area. |
Frank F Xu · Uri Alon · Graham Neubig · Vincent Hellendoorn 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions
(
Poster
)
link »
The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runtime errors in a "static" setting, where program execution is not possible? Here, we introduce a real-world dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. We approach this task by developing an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and "learns to execute" descriptions of the contents of external resources. Surprisingly, we show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error. In total, we present a practical and difficult-yet-approachable challenge problem related to learning program execution and we demonstrate promising new capabilities of interpreter-inspired machine learning models for code. |
David Bieber · Rishab Goel · Daniel F Zheng · Hugo Larochelle · Danny Tarlow 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Show Your Work: Scratchpads for Intermediate Computation with Language Models
(
Poster
)
link »
Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step computation, such as adding integers or executing programs. Surprisingly, we find that these same models are able to perform complex multi-step computations - even in the few-shot regime - when asked to perform the operation "step by step", showing the results of intermediate computations.In particular, we train Transformers to perform multi-step computations by asking them to emit intermediate computation steps into a "scratchpad". We hypothesize that by providing supervision on the intermediate computation steps, the model gains additional learning signal on how to systematically generalize from small computations to larger ones. On a series of increasingly complex tasks ranging from long addition to the execution of arbitrary programs, we show that scratchpads dramatically improve the ability of language models to perform multi-step computations, even when we care only about the final result. Even though the model is required to predict many more tokens, it is still better at predicting the final results, because the individual prediction steps are easier. We believe that this result provides an early indication of the potential power of intermediate computation within language models. |
Maxwell Nye · Anders J Andreassen · Guy Gur-Ari · Henryk Michalewski · Jacob Austin · David Bieber · David Dohan · Aitor Lewkowycz · Maarten Bosma · David Luan · Charles Sutton · Augustus Odena
|
Fri 11:00 a.m. - 12:15 p.m.
|
ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection
(
Poster
)
link »
Identifying vulnerabilities in the source code is essential to protect the software systems from cyber security attacks. It, however, is also a challenging step that requires specialized expertise in security and code representation. To this end, we aim to develop a general, practical, and programming language-independent model capable of running on various source codes and libraries without difficulty. Therefore, we consider vulnerability detection as an inductive text classification problem and propose ReGVD, a simple yet effective graph neural network-based model for the problem. In particular, ReGVD views each raw source code as a flat sequence of tokens to build a graph, wherein node features are initialized by only the token embedding layer of a pre-trained programming language (PL) model. ReGVD then leverages residual connection among GNN layers and examines a mixture of graph-level sum and max poolings to return a graph embedding for the source code. Experimental results demonstrate that ReGVD outperforms the existing state-of-the-art models and obtains the highest accuracy on the real-world benchmark dataset from CodeXGLUE for vulnerability detection. |
Van-Anh Nguyen · Dai Quoc Nguyen · Van Nguyen · Trung Le · Quan Tran · Dinh Phung 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
(
Poster
)
link »
Recent works has widely adopted large language model pretraining for source code, suggested source code-specific pretraining objectives and investigated the applicability of various Transformer-based language model architectures for source code. This work investigates another important aspect of such models, the effect of different subtokenization options, and aims at identifying most effective and length-efficient subtokenizations, taking into account source code specifics. We propose subtokenziation that reduces average length by 17--40% without downstream performance drop, and show that a carefully chosen subtokenization may significantly improve quality by 0.5-2%, possibly with some length increase. [Note: This contribution also has a spotlight talk. Please find the paper here: https://openreview.net/forum?id=rd-G1nO-Jbq] |
Nadezhda Chirkova · Sergei Troshin 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
Learning to Superoptimize Real-World Programs
(
Poster
)
link »
Program optimization is the process of modifying software to execute more efficiently. Superoptimizers attempt to find the optimal program by employing significantly more expensive search and constraint solving techniques. Generally, these methods do not scale well to programs in real development scenarios, and as a result superoptimization has largely been confined to small-scale, domain-specific, and/or synthetic program benchmarks. In this paper, we propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models. We created a dataset consisting of over 25K real-world x86-64 assembly functions mined from open-source projects and propose an approach, Self Imitation Learning for Optimization (SILO) that is easy to implement and outperforms a standard policy gradient learning approach on our dataset. Our method, SILO, superoptimizes 5.9% of our test set when compared with the gcc version 10.3 compiler’s aggressive optimization level -O3. We also report that SILO’s rate of superoptimization on our test set is over five times that of a standard policy gradient approach and a model pre-trained on compiler optimization demonstration. [Note: This contribution also has a spotlight talk. Please find the paper here: https://openreview.net/forum?id=H8q40ouZJWc] |
Alexander Shypula · Pengcheng Yin · Jeremy Lacomis · Claire Le Goues · Edward Schwartz · Graham Neubig 🔗 |
Fri 11:00 a.m. - 12:15 p.m.
|
NS3: Neuro-Symbolic Semantic Code Search
(
Poster
)
link »
Semantic code search is the task of retrieving a code snippet given a textual description of its functionality. Recent work has been focused on using similarity metrics between neural embeddings of text and code. However, current language models are known to struggle with longer, compositional sentences, and multi-step reasoning. To overcome this limitation, we propose supplementing the query sentence with a layout of its semantic structure. The semantic layout is used to break down the final reasoning decision into a series of lower-level decisions. We use a Neural Module Network architecture to implement this idea. We compare our model - $NS^3$ (Neuro-Symbolic Semantic Search) - to a number of baselines, including state-of-the-art semantic code retrieval methods, such as CodeBERT, CuBERT and GraphCodeBERT, and evaluate on two datasets - Code Search Net (CSN) and Code Search and Question Answering (CoSQA). On these datasets, we demonstrate that our approach results in higher performance. We also perform additional studies to show the effectiveness of our modular design when handling compositional queries.
[Note: This contribution also has a spotlight talk. Please find the paper here: https://openreview.net/forum?id=rubeJ2ObyWc]
|
Shushan Arakelyan · Anna Hakhverdyan · Miltiadis Allamanis · Christophe Hauser · Luis Garcia · Xiang Ren 🔗 |
Fri 12:15 p.m. - 12:30 p.m.
|
Coffee Break
|
🔗 |
Fri 12:30 p.m. - 1:15 p.m.
|
Where generative models meet search: a brief history of recent advancements in neural program synthesis.
(
Invited Talk
)
In this talk I'll offer a birds eye view on recent progress in generative language models and how that applies to the domain of synthesizing small computer programs according to a natural language specification. I'll pay special attention to various techniques of search and how they can greatly increase accuracy of model generations at the cost of test-time compute. I will conclude the talk with some speculations on trend extrapolation and directions for future progress in this domain. |
Jerry Tworek 🔗 |
Fri 1:15 p.m. - 2:00 p.m.
|
Learning to Model Structures and Execution for Program Synthesis
(
Invited Talk
)
Deep neural networks have achieved remarkable success in natural language processing and code modeling, especially with the advancement of pre-training techniques. In this talk, I will discuss my neural program synthesis research, with a focus of developing program synthesizers that learn to infer the user intents from different specification formats, and can be deployed in production. First, I will discuss my SpreadsheetCoder work, where we aim to predict spreadsheet formulas from the user-written tabular data. The SpreadsheetCoder model was integrated into Google Sheets, and is available to all Google users. In the second part of my talk, I will discuss my work on execution-guided techniques for program synthesis from input-output examples. We show that utilizing and modeling partial program execution significantly improves the program synthesis performance, especially for programming languages that include control flow constructs such as conditionals and loops. |
Xinyun Chen 🔗 |
Fri 2:00 p.m. - 2:15 p.m.
|
Closing Remarks
(
Announcement
)
|
🔗 |