Regression testing is the activity performed by developers to check whether new modifications have not introduced bugs. A crucial requirement to make regression testing effective is that test cases are deterministic. Unfortunately, this is not always the case as some tests might suffer from so-called flakiness, i.e., tests that exhibit both a passing and a failing outcome with the same code. Flaky tests are widely recognized as a serious issue, since they hide real bugs and increase software inspection costs. While previous research has focused on understanding the root causes of test flakiness and devising techniques that automatically fix them, in this paper we explore an orthogonal perspective: the relation between flaky tests and test smells, i.e., suboptimal development choices applied when developing tests. Relying on (1) an analysis of the state-of-the-art and (2) interviews with industrial developers, we first identify five flakiness-inducing test smell types, namely Resource Optimism, Indirect Testing, Test Run War, Fire and Forget, and Conditional Test Logic, and automate their detection. Then, we perform a large-scale empirical study on 19,532 JUnit test methods of 18 software systems, discovering that the five considered test smells causally co-occur with flaky tests in 75% of the cases. Furthermore, we evaluate the effect of refactoring, showing that it is not only able to remove design flaws, but also fixes all 75% flaky tests causally co-occurring with test smells.

Original languageEnglish
Pages (from-to)2907-2946
Number of pages40
JournalEmpirical Software Engineering
Issue number5
Publication statusPublished - 2019

    Research areas

  • Flaky tests, Refactoring, Test smells

ID: 52267935