A search-based training algorithm for cost-aware defect prediction

Annibale Panichella, Carol V. Alexandru, Sebastiano Panichella, Alberto Bacchelli, Harald C. Gall

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

17 Citations (Scopus)
59 Downloads (Pure)

Abstract

Research has yielded approaches to predict future defects in software artifacts based on historical information, thus assisting companies in effectively allocating limited development resources and developers in reviewing each others' code changes. Developers are unlikely to devote the same effort to inspect each software artifact predicted to contain defects, since the effort varies with the artifacts' size (cost) and the number of defects it exhibits (effectiveness). We propose to use Genetic Algorithms (GAs) for training prediction models to maximize their cost-effectiveness. We evaluate the approach on two well-known models, Regression Tree and Generalized Linear Model, and predict defects between multiple releases of six open source projects. Our results show that regression models trained by GAs significantly outperform their traditional counterparts, improving the cost-effectiveness by up to 240%. Often the top 10% of predicted lines of code contain up to twice as many defects.

Original languageEnglish
Title of host publicationProceedings of the Genetic and Evolutionary Computation Conference, GECCO 2016
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery (ACM)
Pages1077-1084
Number of pages8
ISBN (Electronic)978-1-4503-4206-3
DOIs
Publication statusPublished - 2016
Event2016 Genetic and Evolutionary Computation Conference, GECCO 2016 - Denver, United States
Duration: 20 Jul 201624 Jul 2016

Conference

Conference2016 Genetic and Evolutionary Computation Conference, GECCO 2016
Country/TerritoryUnited States
CityDenver
Period20/07/1624/07/16

Bibliographical note

Accepted Author Manuscript

Keywords

  • Defect prediction
  • Genetic algorithm
  • Machine learning

Fingerprint

Dive into the research topics of 'A search-based training algorithm for cost-aware defect prediction'. Together they form a unique fingerprint.

Cite this