Standard

Parsing Excel formulas : A grammar and its application on 4 large datasets. / Aivaloglou, Efthimia; Hoepelman, David; Hermans, Felienne.

In: Journal of Software: Evolution and Process, Vol. 29, No. 12, 01.12.2017, p. 1-19.

Research output: Scientific - peer-reviewSpecial issue

Harvard

Aivaloglou, E, Hoepelman, D & Hermans, F 2017, 'Parsing Excel formulas: A grammar and its application on 4 large datasets' Journal of Software: Evolution and Process, vol 29, no. 12, pp. 1-19. DOI: 10.1002/smr.1895

APA

Aivaloglou, E., Hoepelman, D., & Hermans, F. (2017). Parsing Excel formulas: A grammar and its application on 4 large datasets. Journal of Software: Evolution and Process, 29(12), 1-19. DOI: 10.1002/smr.1895

Vancouver

Aivaloglou E, Hoepelman D, Hermans F. Parsing Excel formulas: A grammar and its application on 4 large datasets. Journal of Software: Evolution and Process. 2017 Dec 1;29(12):1-19. Available from, DOI: 10.1002/smr.1895

Author

Aivaloglou, Efthimia; Hoepelman, David; Hermans, Felienne / Parsing Excel formulas : A grammar and its application on 4 large datasets.

In: Journal of Software: Evolution and Process, Vol. 29, No. 12, 01.12.2017, p. 1-19.

Research output: Scientific - peer-reviewSpecial issue

BibTeX

@article{782ebf329a724ea49b7bc12de6d50e5b,
title = "Parsing Excel formulas: A grammar and its application on 4 large datasets",
keywords = "formula grammer, spreadsheets, syntax",
author = "Efthimia Aivaloglou and David Hoepelman and Felienne Hermans",
note = "Special Issue on Source Code Analysis and Manipulation (SCAM 2015)",
year = "2017",
month = "12",
doi = "10.1002/smr.1895",
volume = "29",
pages = "1--19",
journal = "Journal of Software: Evolution and Process",
issn = "2047-7473",
number = "12",

}

RIS

TY - JOUR

T1 - Parsing Excel formulas

T2 - Journal of Software: Evolution and Process

AU - Aivaloglou,Efthimia

AU - Hoepelman,David

AU - Hermans,Felienne

N1 - Special Issue on Source Code Analysis and Manipulation (SCAM 2015)

PY - 2017/12/1

Y1 - 2017/12/1

N2 - Spreadsheets are popular end user programming tools, especially in the industrial world. This makes them interesting research targets. However, there does not exist a reliable grammar that is concise enough to facilitate formula parsing and analysis and to support research on spreadsheet codebases. This paper presents a grammar for spreadsheet formulas that can successfully parse 99.99% of more than 8 million unique formulas extracted from 4 spreadsheet datasets. Our grammar is compatible with the spreadsheet formula language, recognizes the spreadsheet formula elements that are required for supporting spreadsheets research, and produces parse trees aimed at further manipulation and analysis. Additionally, we use the grammar to analyze the characteristics of the formulas of the 4 datasets in 3 different dimensions: complexity, functionality, and data utilization. Our results show that (1) most Excel formulas are simple, however formulas with more than 50 functions or operations exist, (2) almost all formulas use data from other cells, which is often not local, and (3) a surprising number of referring mechanisms are used by less than 1% of the formulas.

AB - Spreadsheets are popular end user programming tools, especially in the industrial world. This makes them interesting research targets. However, there does not exist a reliable grammar that is concise enough to facilitate formula parsing and analysis and to support research on spreadsheet codebases. This paper presents a grammar for spreadsheet formulas that can successfully parse 99.99% of more than 8 million unique formulas extracted from 4 spreadsheet datasets. Our grammar is compatible with the spreadsheet formula language, recognizes the spreadsheet formula elements that are required for supporting spreadsheets research, and produces parse trees aimed at further manipulation and analysis. Additionally, we use the grammar to analyze the characteristics of the formulas of the 4 datasets in 3 different dimensions: complexity, functionality, and data utilization. Our results show that (1) most Excel formulas are simple, however formulas with more than 50 functions or operations exist, (2) almost all formulas use data from other cells, which is often not local, and (3) a surprising number of referring mechanisms are used by less than 1% of the formulas.

KW - formula grammer

KW - spreadsheets

KW - syntax

U2 - 10.1002/smr.1895

DO - 10.1002/smr.1895

M3 - Special issue

VL - 29

SP - 1

EP - 19

JO - Journal of Software: Evolution and Process

JF - Journal of Software: Evolution and Process

SN - 2047-7473

IS - 12

ER -

ID: 34682718