Research Project

Empirical Software Engineering: Identifying Bug-Prone Java Files via Metrics

Description

  • Implemented scripts to extract 18 source-code metrics (e.g., coupling, complexity via CKJM) and 7 change metrics (e.g., LOC added/deleted) from 835+ SVN diffs.
  • Trained scikit-learn classifiers that achieved >91% accuracy at flagging bug-prone files, which guided Agile policy updates and reduced post-release defects.
  • Automated the metrics pipeline and addressed multicollinearity, improving reproducibility and model reliability across iterations.

Stack: Java, Python, scikit-learn, SVN