What is it about?

With the prevalence of publicly available source code repositories to train deep neural network models, neural program models can do well in source code analysis tasks such as predicting method names in given programs that cannot be easily done by traditional program analysis techniques. Although such neural program models have been tested on various existing datasets, the extent to which they generalize to unforeseen source code is largely unknown. Since it is very challenging to test neural program models on all unforeseen programs, in this paper, we propose to evaluate the generalizability of neural program models with respect to semantic-preserving transformations: a generalizable neural program model should perform equally well on programs that are of the same semantics but of different lexical appearances and syntactical structures. We compare the results of various neural program models for the method name prediction task on programs before and after automated semantic-preserving transformations. Our results show that even with small semantically preserving changes to the programs, these neural program models often fail to generalize their performance. Our results also suggest that neural program models based on data and control dependencies in programs generalize better than neural program models based only on abstract syntax trees. On the positive side, we observe that as the size of the training dataset grows and diversifies the generalizability of correct predictions produced by the neural program models can be improved too. Our results on the generalizability of neural program models provide insights to measure their limitations and provide a stepping stone for their improvement.

Featured Image

Why is it important?

The performance of neural networks has encouraged researchers to increasingly adopt neural networks in program analysis tasks, giving rise to increasing uses of neural program models. While the performance of neural program models continues to improve, the extent to which they can generalize to new, unseen programs is still unknown, even if the programs are in the same programming language. This problem is of more importance if we want to use them in downstream safety-critical tasks, such as malware detection and automated defect repair. This problem is particularly hard, as the interpretation of neural models that constitute the core reasoning engine of neural program models remains challenging---especially for the complex neural networks. Lack of knowledge about the limits of neural program models may exaggerate their capability and cause careless applications of the neural program models on the domains that they are not suited for. Our results on the generalizability of neural program models provide insights to measure their limitations and provide a stepping stone for their improvement.


We find that even semantic-preserving program transformations frequently sway the predictions of these neural program models, indicating serious generalization issues that could negatively impact the wider applications of deep neural networks in program analysis tasks. A comprehensive understanding of the extent of generalizability of neural program models would help developers to know when to use data-driven approaches and when to resort to traditional deductive methods of program analysis. It would also help researchers to focus their efforts on devising new techniques to alleviate the shortcomings of existing neural program models.

Md Rafiqul Islam Rabin
University of Houston

Read the Original

This page is a summary of: On the generalizability of Neural Program Models with respect to semantic-preserving program transformations, Information and Software Technology, July 2021, Elsevier,
DOI: 10.1016/j.infsof.2021.106552.
You can read the full text:




The following have contributed to this page