Skip to yearly menu bar Skip to main content

Workshop: Deep Learning for Code

Scotch: A Semantic Code Search Engine for IDEs

Samip Dahal · Adyasha Maharana · Mohit Bansal


Code search is the task of finding relevant code snippets given a natural language query. In order to facilitate real time code search, we introduce Scotch, a semantic code search tool that runs within an IDE. The semantic nature of code search in Scotch allows us to leverage the semantic meaning of code via learned vector representations, while the in-IDE nature helps to improve developers' productivity by eliminating the need to navigate to web-browsers to search for code. The query used for code search is oftentimes ambiguous without the surrounding context of the search. In direct contrast to traditional search engines tailored to take a single line of input, the in-IDE nature of Scotch allows it to automatically infer code context during search and utilize it for search results. Hence, we propose the task `contextual code search' and present an analysis of how this code context can help improve the relevance of search results. Since no existing dataset could fit our task of contextual code search, we collect and contribute a dataset of about 19M functions from GitHub repositories with permissive licenses, which is the first large-scale dataset openly available for the task of contextual code search. We also present a small, manually-curated test set to assess the code ranking quality for code search. We finetune the CodeBERT model to perform code search given a natural language query with and without surrounding code context. Results from automated as well as human evaluation suggest that the inclusion of code context in search significantly improves the retrieval of the correct code snippet and slightly hinders the ranking quality among annotated code snippets. Our work provides motivation and resources for future research into contextual code search.

Chat is not available.