All Stories

  1. Understanding the potentially confounding effect of test suite size in test effectiveness evaluation
  2. Boosting Code-line-level Defect Prediction with Spectrum Information and Causality Analysis
  3. Less is More: Feature Engineering for Fairness and Performance of Machine Learning Software
  4. Weighted Suspiciousness and Balanced Aggregation to Boost Spectrum-based Fault Localization of Deep Learning Models
  5. Optimizing Search-Based Unit Test Generation with Large Language Models: An Empirical Study
  6. Risky Dynamic Typing-related Practices in Python: An Empirical Study
  7. Knowledge Graph Driven Inference Testing for Question Answering Software
  8. Generating Python Type Annotations from Type Inference: How Far Are We?
  9. Assessing effectiveness of test suites: what do we know and what should we do?
  10. Towards Better Dependency Scope Settings in Maven Projects
  11. Back Deduction Based Testing for Word Sense Disambiguation Ability of Machine Translation Systems
  12. Code-line-level bugginess identification: How far have we come, and how far have we yet to go?
  13. Accelerating OCR-Based Widget Localization for Test Automation of GUI Applications
  14. Training data debugging for the fairness of machine learning software
  15. Mutant reduction evaluation: what is there and what is missing?
  16. How Far Have We Progressed in Identifying Self-admitted Technical Debts? A Comprehensive Empirical Study
  17. Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models
  18. Stay professional and efficient
  19. Multiple-boundary clustering and prioritization to promote neural network retraining
  20. An Empirical Study on Dynamic Typing Related Practices in Python Systems
  21. An Empirical Study on Critical Blocking Bugs
  22. RoScript
  23. Impact analysis of cross-project bugs on software ecosystems
  24. How C++ Templates Are Used for Generic Programming
  25. Python probabilistic type inference with natural language support