What is it about?

A wide range of code intelligence (CI) tools, powered by deep neural networks, have been developed recently to improve programming productivity and perform program analysis. To reliably use such tools, developers often need to reason about the behavior of the underlying models and the factors that affect them. This is especially challenging for tools backed by deep neural networks. Various methods have tried to reduce this opacity in the vein of "transparent/interpretable-AI". However, these approaches are often specific to a particular set of network architectures, even requiring access to the network's parameters. This makes them difficult to use for the average programmer, which hinders the reliable adoption of neural CI systems. In this paper, we propose a simple, model-agnostic approach to identify critical input features for models in CI systems, by drawing on software debugging research, specifically delta debugging. Our approach uses simplification techniques that reduce the size of input programs of a CI model while preserving the predictions of the model. We show that this approach yields remarkably small outputs and is broadly applicable across many model architectures and problem domains. We find that the models in our experiments often rely heavily on just a few syntactic features in input programs. We believe that the extracted features using our approach may help understand neural CI systems' predictions and learned behavior.

Featured Image

Why is it important?

Deep neural models seem capable of discovering many non-trivial properties of the source code, even ones that are beyond the reach of traditional static analyzers. Although this may be reminiscent of a software developer's ability to intuit the properties of programs, there is a sharp contrast in interpretability - developers can explain their deductions and formulate falsifiable hypotheses about the behavior of their code. Deep neural models offer no such capability, rather, they remain stubbornly opaque black boxes. This opacity is already a concern in non-critical applications, where the lack of explainability frustrates efforts to build useful tools. It is substantially more problematic in safety-critical applications, where deep learners could play a key role in preventing defects and adversarial attacks that are hard to detect for traditional analyzers. Therefore, in this work, we propose a simple, yet effective methodology to better analyze the input (over)reliance of neural models in software engineering applications. Our approach is model-agnostic, where rather than studying the network itself, it relies on the input reductions using delta debugging. The main insight is that by removing irrelevant parts to a prediction from the input programs, we may better understand the important features in the model inference and the reasoning behind the model prediction.

Perspectives

We propose a model-agnostic methodology for interpreting a wide range of code intelligence models, which works by reducing the size of input programs using the well-known delta debugging algorithm. We show that our approach can significantly reduce the size of input programs while preserving the prediction of the model, thereby exposing the most significant input features to the various models. Our results hint at the idea that these models often use just a few simple syntactic shortcuts in their prediction. This sets the stage for the broader use of transparency-enhancing techniques to better understand and develop neural code intelligence models.

Md Rafiqul Islam Rabin
University of Houston

Read the Original

This page is a summary of: Understanding neural code intelligence through program simplification, August 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3468264.3468539.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page