Standard

Search-Based Test Data Generation for SQL Queries. / Castelein, Jeroen; Aniche, Maurício; Soltani, Mozhan; Panichella, Annibale; van Deursen, Arie.

Proceedings of the 40th International Conference on Software Engineering. 2018. p. 1220-1230.

Research output: Scientific - peer-reviewConference contribution

Harvard

Castelein, J, Aniche, M, Soltani, M, Panichella, A & van Deursen, A 2018, Search-Based Test Data Generation for SQL Queries. in Proceedings of the 40th International Conference on Software Engineering. pp. 1220-1230, ICSE 2018, Gothenburg, Sweden, 27/05/18. DOI: 10.1145/3180155.3180202

APA

Castelein, J., Aniche, M., Soltani, M., Panichella, A., & van Deursen, A. (2018). Search-Based Test Data Generation for SQL Queries. In Proceedings of the 40th International Conference on Software Engineering (pp. 1220-1230). DOI: 10.1145/3180155.3180202

Vancouver

Castelein J, Aniche M, Soltani M, Panichella A, van Deursen A. Search-Based Test Data Generation for SQL Queries. In Proceedings of the 40th International Conference on Software Engineering. 2018. p. 1220-1230. Available from, DOI: 10.1145/3180155.3180202

Author

Castelein, Jeroen ; Aniche, Maurício ; Soltani, Mozhan ; Panichella, Annibale ; van Deursen, Arie. / Search-Based Test Data Generation for SQL Queries. Proceedings of the 40th International Conference on Software Engineering. 2018. pp. 1220-1230

BibTeX

@inbook{90a6431ff78f4ac3bf87c052cd9cd5d4,
title = "Search-Based Test Data Generation for SQL Queries",
abstract = "Database-centric systems strongly rely on SQL queries to manage and manipulate their data. These SQL commands can range from very simple selections to queries that involve several tables, subqueries, and grouping operations. And, as with any important piece of code, developers should properly test SQL queries. In order to completely test a SQL query, developers need to create test data that exercise all possible coverage targets in a query, e.g., JOINs and WHERE predicates. And indeed, this task can be challenging and time-consuming for complex queries. Previous studies have modeled the problem of generating test data as a constraint satisfaction problem and, with the help of SAT solvers, generate the required data. However, such approaches have strong limitations, such as partial support for queries with JOINs, subqueries, and strings (which are commonly used in SQL queries). In this paper, we model test data generation for SQL queries as a search-based problem. Then, we devise and evaluate three different approaches based on random search, biased random search, and genetic algorithms (GAs). The GA, in particular, uses a fitness function based on information extracted from the physical query plan of a database engine as search guidance. We then evaluate each approach in 2,135 queries extracted from three open source software and one industrial software system. Our results show that GA is able to completely cover 98.6% of all queries in the dataset, requiring only a few seconds for each query. Moreover, it does not suffer from the limitations affecting state-of-the art techniques.",
keywords = "search-based software engineering, automated test data generation, SQL, Database",
author = "Jeroen Castelein and Maurício Aniche and Mozhan Soltani and Annibale Panichella and {van Deursen}, Arie",
year = "2018",
doi = "10.1145/3180155.3180202",
pages = "1220--1230",
booktitle = "Proceedings of the 40th International Conference on Software Engineering",

}

RIS

TY - CHAP

T1 - Search-Based Test Data Generation for SQL Queries

AU - Castelein,Jeroen

AU - Aniche,Maurício

AU - Soltani,Mozhan

AU - Panichella,Annibale

AU - van Deursen,Arie

PY - 2018

Y1 - 2018

N2 - Database-centric systems strongly rely on SQL queries to manage and manipulate their data. These SQL commands can range from very simple selections to queries that involve several tables, subqueries, and grouping operations. And, as with any important piece of code, developers should properly test SQL queries. In order to completely test a SQL query, developers need to create test data that exercise all possible coverage targets in a query, e.g., JOINs and WHERE predicates. And indeed, this task can be challenging and time-consuming for complex queries. Previous studies have modeled the problem of generating test data as a constraint satisfaction problem and, with the help of SAT solvers, generate the required data. However, such approaches have strong limitations, such as partial support for queries with JOINs, subqueries, and strings (which are commonly used in SQL queries). In this paper, we model test data generation for SQL queries as a search-based problem. Then, we devise and evaluate three different approaches based on random search, biased random search, and genetic algorithms (GAs). The GA, in particular, uses a fitness function based on information extracted from the physical query plan of a database engine as search guidance. We then evaluate each approach in 2,135 queries extracted from three open source software and one industrial software system. Our results show that GA is able to completely cover 98.6% of all queries in the dataset, requiring only a few seconds for each query. Moreover, it does not suffer from the limitations affecting state-of-the art techniques.

AB - Database-centric systems strongly rely on SQL queries to manage and manipulate their data. These SQL commands can range from very simple selections to queries that involve several tables, subqueries, and grouping operations. And, as with any important piece of code, developers should properly test SQL queries. In order to completely test a SQL query, developers need to create test data that exercise all possible coverage targets in a query, e.g., JOINs and WHERE predicates. And indeed, this task can be challenging and time-consuming for complex queries. Previous studies have modeled the problem of generating test data as a constraint satisfaction problem and, with the help of SAT solvers, generate the required data. However, such approaches have strong limitations, such as partial support for queries with JOINs, subqueries, and strings (which are commonly used in SQL queries). In this paper, we model test data generation for SQL queries as a search-based problem. Then, we devise and evaluate three different approaches based on random search, biased random search, and genetic algorithms (GAs). The GA, in particular, uses a fitness function based on information extracted from the physical query plan of a database engine as search guidance. We then evaluate each approach in 2,135 queries extracted from three open source software and one industrial software system. Our results show that GA is able to completely cover 98.6% of all queries in the dataset, requiring only a few seconds for each query. Moreover, it does not suffer from the limitations affecting state-of-the art techniques.

KW - search-based software engineering

KW - automated test data generation

KW - SQL

KW - Database

U2 - 10.1145/3180155.3180202

DO - 10.1145/3180155.3180202

M3 - Conference contribution

SP - 1220

EP - 1230

BT - Proceedings of the 40th International Conference on Software Engineering

ER -

ID: 38773133