OffSide: Learning to Identify Mistakes in Boundary Conditions

Jón Arnar Briem; Jordi Smit; Hendrig Sellik; Pavel Rapoport; Georgios Gousios; Maurício Aniche

doi:10.1145/3387940.3391464

OffSide: Learning to Identify Mistakes in Boundary Conditions

Jón Arnar Briem, Jordi Smit, Hendrig Sellik, Pavel Rapoport, Georgios Gousios, Maurício Aniche

Software Engineering

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

5 Citations (Scopus)

261 Downloads (Pure)

Abstract

Mistakes in boundary conditions are the cause of many bugs in software. These mistakes happen when, e.g., developers make use of '<' or '>' in cases where they should have used '<=' or '>='. Mistakes in boundary conditions are often hard to find and manually detecting them might be very time-consuming for developers. While researchers have been proposing techniques to cope with mistakes in the boundaries for a long time, the automated detection of such bugs still remains a challenge. We conjecture that, for a tool to be able to precisely identify mistakes in boundary conditions, it should be able to capture the overall context of the source code under analysis. In this work, we propose a deep learning model that learn mistakes in boundary conditions and, later, is able to identify them in unseen code snippets. We train and test a model on over 1.5 million code snippets, with and without mistakes in different boundary conditions. Our model shows an accuracy from 55% up to 87%. The model is also able to detect 24 out of 41 real-world bugs; however, with a high false positive rate. The existing state-of-the-practice linter tools are not able to detect any of the bugs. We hope this paper can pave the road towards deep learning models that will be able to support developers in detecting mistakes in boundary conditions.

Original language	English
Title of host publication	ICSEW'20
Subtitle of host publication	Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops
Place of Publication	New York
Publisher	Association for Computing Machinery (ACM)
Pages	203-208
Number of pages	6
ISBN (Print)	978-1-4503-7963-2
DOIs	https://doi.org/10.1145/3387940.3391464
Publication status	Published - 2020
Event	ICSEW'20: The IEEE/ACM 42nd International Conference on Software Engineering Workshops - Seoul, Korea, Republic of Duration: 23 May 2020 → 29 May 2020

Conference

Conference	ICSEW'20
Country/Territory	Korea, Republic of
City	Seoul
Period	23/05/20 → 29/05/20

Keywords

boundary testing
deep learning for software testing
machine learning for software engineering
machine learning for software testing
software engineering
software testing

Access to Document

10.1145/3387940.3391464

deeptest-2020Accepted author manuscript, 594 KB

Cite this

@inproceedings{5474f7e41a1c4950a63d6960b847ba94,

title = "OffSide: Learning to Identify Mistakes in Boundary Conditions",

abstract = "Mistakes in boundary conditions are the cause of many bugs in software. These mistakes happen when, e.g., developers make use of '<' or '>' in cases where they should have used '<=' or '>='. Mistakes in boundary conditions are often hard to find and manually detecting them might be very time-consuming for developers. While researchers have been proposing techniques to cope with mistakes in the boundaries for a long time, the automated detection of such bugs still remains a challenge. We conjecture that, for a tool to be able to precisely identify mistakes in boundary conditions, it should be able to capture the overall context of the source code under analysis. In this work, we propose a deep learning model that learn mistakes in boundary conditions and, later, is able to identify them in unseen code snippets. We train and test a model on over 1.5 million code snippets, with and without mistakes in different boundary conditions. Our model shows an accuracy from 55% up to 87%. The model is also able to detect 24 out of 41 real-world bugs; however, with a high false positive rate. The existing state-of-the-practice linter tools are not able to detect any of the bugs. We hope this paper can pave the road towards deep learning models that will be able to support developers in detecting mistakes in boundary conditions.",

keywords = "boundary testing, deep learning for software testing, machine learning for software engineering, machine learning for software testing, software engineering, software testing",

author = "{Arnar Briem}, J{\'o}n and Jordi Smit and Hendrig Sellik and Pavel Rapoport and Georgios Gousios and Maur{\'i}cio Aniche",

year = "2020",

doi = "10.1145/3387940.3391464",

language = "English",

isbn = "978-1-4503-7963-2",

pages = "203--208",

booktitle = "ICSEW'20",

publisher = "Association for Computing Machinery (ACM)",

address = "United States",

note = "ICSEW'20 : The IEEE/ACM 42nd International Conference on Software Engineering Workshops ; Conference date: 23-05-2020 Through 29-05-2020",

}

Arnar Briem, J, Smit, J, Sellik, H, Rapoport, P, Gousios, G & Aniche, M 2020, OffSide: Learning to Identify Mistakes in Boundary Conditions. in ICSEW'20: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops . Association for Computing Machinery (ACM), New York, pp. 203-208, ICSEW'20, Seoul, Korea, Republic of, 23/05/20. https://doi.org/10.1145/3387940.3391464

OffSide: Learning to Identify Mistakes in Boundary Conditions. / Arnar Briem, Jón; Smit, Jordi; Sellik, Hendrig et al.
ICSEW'20: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops . New York: Association for Computing Machinery (ACM), 2020. p. 203-208.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - OffSide

T2 - ICSEW'20

AU - Arnar Briem, Jón

AU - Smit, Jordi

AU - Sellik, Hendrig

AU - Rapoport, Pavel

AU - Gousios, Georgios

AU - Aniche, Maurício

PY - 2020

Y1 - 2020

N2 - Mistakes in boundary conditions are the cause of many bugs in software. These mistakes happen when, e.g., developers make use of '<' or '>' in cases where they should have used '<=' or '>='. Mistakes in boundary conditions are often hard to find and manually detecting them might be very time-consuming for developers. While researchers have been proposing techniques to cope with mistakes in the boundaries for a long time, the automated detection of such bugs still remains a challenge. We conjecture that, for a tool to be able to precisely identify mistakes in boundary conditions, it should be able to capture the overall context of the source code under analysis. In this work, we propose a deep learning model that learn mistakes in boundary conditions and, later, is able to identify them in unseen code snippets. We train and test a model on over 1.5 million code snippets, with and without mistakes in different boundary conditions. Our model shows an accuracy from 55% up to 87%. The model is also able to detect 24 out of 41 real-world bugs; however, with a high false positive rate. The existing state-of-the-practice linter tools are not able to detect any of the bugs. We hope this paper can pave the road towards deep learning models that will be able to support developers in detecting mistakes in boundary conditions.

AB - Mistakes in boundary conditions are the cause of many bugs in software. These mistakes happen when, e.g., developers make use of '<' or '>' in cases where they should have used '<=' or '>='. Mistakes in boundary conditions are often hard to find and manually detecting them might be very time-consuming for developers. While researchers have been proposing techniques to cope with mistakes in the boundaries for a long time, the automated detection of such bugs still remains a challenge. We conjecture that, for a tool to be able to precisely identify mistakes in boundary conditions, it should be able to capture the overall context of the source code under analysis. In this work, we propose a deep learning model that learn mistakes in boundary conditions and, later, is able to identify them in unseen code snippets. We train and test a model on over 1.5 million code snippets, with and without mistakes in different boundary conditions. Our model shows an accuracy from 55% up to 87%. The model is also able to detect 24 out of 41 real-world bugs; however, with a high false positive rate. The existing state-of-the-practice linter tools are not able to detect any of the bugs. We hope this paper can pave the road towards deep learning models that will be able to support developers in detecting mistakes in boundary conditions.

KW - boundary testing

KW - deep learning for software testing

KW - machine learning for software engineering

KW - machine learning for software testing

KW - software engineering

KW - software testing

UR - http://www.scopus.com/inward/record.url?scp=85093079917&partnerID=8YFLogxK

U2 - 10.1145/3387940.3391464

DO - 10.1145/3387940.3391464

M3 - Conference contribution

SN - 978-1-4503-7963-2

SP - 203

EP - 208

BT - ICSEW'20

PB - Association for Computing Machinery (ACM)

CY - New York

Y2 - 23 May 2020 through 29 May 2020

ER -

OffSide: Learning to Identify Mistakes in Boundary Conditions

Abstract

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this