What is it about?
DLInfer is a deep learning type inference with static slicing for Python variables. DLInfer combines logical information into context. That is static slicing, which obtains contextual information related to the variable based on data flow analysis. DLInfer collects slice statements for variables through static analysis and then vectorizes them with the Unigram Language Model algorithm. We designed a bi-directional gated recurrent unit model based on the vectorized slicing features to learn the type propagation information for inference. To validate the effectiveness of DLInfer, we conduct an extensive empirical study on 700 open-source projects. We evaluate its accuracy in inferring three fundamental types: built-in, library, and user-defined. By training with a large-scale dataset, DLInfer achieves an average of 98.79% Top-1 accuracy for the variables that can get type information through static analysis and manual annotation. Further, DLInfer achieves 83.03% type inference accuracy on average for the variables that can only obtain the type information through dynamic analysis. The results indicate that DLInfer is highly effective in inferring types. Applying it to assist in various software engineering tasks for Python programs is promising.
Featured Image
Why is it important?
Python's dynamic nature and flexible syntax make program-type annotation extremely difficult. Traditional and machine learning-based type inference approaches have had little success with Python. To improve the accuracy of Python type inference, we propose a Python deep neural network type inference approach based on static slicing, combining logic and context information. DLInfer can achieve an average accuracy rate of 98.79% in Top-1. DLInfer can also achieve a recall rate of more than 70% for variables that can only be collected dynamically.
Perspectives
I hope this article can help developers infer and annotate their programs and understand other open-source programs. Besides, this article may provide another way to combine the traditional static slicing technique and powerful deep learning neural network to solve software engineering research problems.
Yanyan Yan
Nanjing University
Read the Original
This page is a summary of: DLInfer: Deep Learning with Static Slicing for Python Type Inference, May 2023, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/icse48619.2023.00170.
You can read the full text:
Contributors
The following have contributed to this page