What is it about?

This paper presents DiffSearch, a search engine that, given a query that describes a code change, returns a set of changes that match the query. The approach is enabled by three key contributions. First, we present a query language that extends the underlying programming language with wildcards and placeholders, providing an intuitive way of formulating queries that is easy to adapt to different programming languages. Second, to ensure scalability, the approach indexes code changes in a one-time preprocessing step, mapping them into a feature space, and then performs an efficient search in the feature space for each query. Third, to guarantee precision, i.e., that any returned code change indeed matches the given query, we present a tree-based matching algorithm that checks whether a query can be expanded to a concrete code change.

Featured Image

Why is it important?

We present implementations for Java, JavaScript, and Python, and show that the approach responds within seconds to queries across one million code changes, has a recall of 80.7% for Java, 89.6% for Python, and 90.4% for JavaScript, enables users to find relevant code changes more effectively than a regular expression-based search and GitHub’s search feature, and is helpful for gathering a large-scale dataset of real-world bug fixes.

Perspectives

We envision DiffSearch to serve as a tool useful to both practitioners and researchers, and to provide a basis for future work on searching for code changes.

Luca Di Grazia
Universitat Stuttgart

Read the Original

This page is a summary of: DiffSearch: A Scalable and Precise Search Engine for Code Changes, IEEE Transactions on Software Engineering, January 2022, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/tse.2022.3218859.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page