Evaluating Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants

S. Roy; F. Hermans; E. Aivaloglou; J. Winter; Arie van Deursen

doi:10.1109/SANER.2016.98

Evaluating Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants

S. Roy, F. Hermans, E. Aivaloglou, J. Winter, Arie van Deursen

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

133 Downloads (Pure)

Abstract

Spreadsheets are popular end-user computing applications and one reason behind their popularity is that they offer a large degree of freedom to their users regarding the way they can structure their data. However, this flexibility also makes spreadsheets difficult to understand. Textual documentation can address this issue, yet for supporting automatic generation of textual documentation, an important pre-requisite is to extract metadata inside spreadsheets. It is a challenge though, to distinguish between data and metadata due to the lack of universally accepted structural patterns in spreadsheets. Two existing approaches for automatic extraction of spreadsheet metadata were not evaluated on large datasets consisting of user inputs. Hence in this paper, we describe the collection of a large number of user responses regarding identification of spreadsheet metadata from participants of a MOOC. We describe the use of this large dataset to understand how users identify metadata in spreadsheets, and to evaluate two existing approaches of automatic metadata extraction from spreadsheets. The results provide us with directions to follow in order to improve metadata extraction approaches, obtained from insights about user perception of metadata. We also understand what type of spreadsheet patterns the existing approaches perform well and on what type poorly, and thus which problem areas to focus on in order to improve.

Original language	English
Title of host publication	2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER)
Editors	A. Jiu
Place of Publication	Los Alamitos, CA
Publisher	IEEE Society
Pages	135-145
Number of pages	11
Volume	2
ISBN (Print)	978-1-5090-1855-0
DOIs	https://doi.org/10.1109/SANER.2016.98
Publication status	Published - 2016
Event	SANER 2016: 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering - Osaka, Japan Duration: 14 Mar 2016 → 18 Mar 2016

Conference

Conference	SANER 2016
Country/Territory	Japan
City	Osaka
Period	14/03/16 → 18/03/16

Keywords

computer aided instruction
meta data
personal computing
spreadsheet programs
text analysis
MOOC participants
automatic spreadsheet metadata extraction
automatic textual documentation generation
data structure
end-user computing applications
metadata user perception
spreadsheet patterns
Computers
Conferences
Data mining
Documentation
Metadata
Reliability
Software
Empirical evaluation
MOOC
Meta-data extraction
Spreadsheet
User-study

Access to Document

10.1109/SANER.2016.98

TUD-SERG-2016-002Submitted manuscript, 685 KB

1 Conference contribution

On the Effectiveness of Automatically Inferred Invariants in Detecting Regression Faults in Spreadsheets
Roy, S., van Deursen, A. & Hermans, F., Jul 2018, Companion of the 18th IEEE International Conference on Software Quality, Reliability, and Security. Piscataway, NJ: IEEE, p. 199-206 8 p.
Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

Open Access
File
95 Downloads (Pure)

Cite this

Roy, S., Hermans, F., Aivaloglou, E., Winter, J., & van Deursen, A. (2016). Evaluating Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants. In A. Jiu (Ed.), 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (Vol. 2, pp. 135-145). IEEE Society. https://doi.org/10.1109/SANER.2016.98

@inproceedings{34045a3402db44418846b9f094680d7d,

title = "Evaluating Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants",

abstract = "Spreadsheets are popular end-user computing applications and one reason behind their popularity is that they offer a large degree of freedom to their users regarding the way they can structure their data. However, this flexibility also makes spreadsheets difficult to understand. Textual documentation can address this issue, yet for supporting automatic generation of textual documentation, an important pre-requisite is to extract metadata inside spreadsheets. It is a challenge though, to distinguish between data and metadata due to the lack of universally accepted structural patterns in spreadsheets. Two existing approaches for automatic extraction of spreadsheet metadata were not evaluated on large datasets consisting of user inputs. Hence in this paper, we describe the collection of a large number of user responses regarding identification of spreadsheet metadata from participants of a MOOC. We describe the use of this large dataset to understand how users identify metadata in spreadsheets, and to evaluate two existing approaches of automatic metadata extraction from spreadsheets. The results provide us with directions to follow in order to improve metadata extraction approaches, obtained from insights about user perception of metadata. We also understand what type of spreadsheet patterns the existing approaches perform well and on what type poorly, and thus which problem areas to focus on in order to improve.",

keywords = "computer aided instruction, meta data, personal computing, spreadsheet programs, text analysis, MOOC participants, automatic spreadsheet metadata extraction, automatic textual documentation generation, data structure, end-user computing applications, metadata user perception, spreadsheet patterns, Computers, Conferences, Data mining, Documentation, Metadata, Reliability, Software, Empirical evaluation, MOOC, Meta-data extraction, Spreadsheet, User-study",

author = "S. Roy and F. Hermans and E. Aivaloglou and J. Winter and {van Deursen}, Arie",

year = "2016",

doi = "10.1109/SANER.2016.98",

language = "English",

isbn = "978-1-5090-1855-0",

volume = "2",

pages = "135--145",

editor = "A. Jiu",

booktitle = "2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER)",

publisher = "IEEE Society",

note = "SANER 2016 : 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering ; Conference date: 14-03-2016 Through 18-03-2016",

}

Roy, S, Hermans, F, Aivaloglou, E, Winter, J & van Deursen, A 2016, Evaluating Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants. in A Jiu (ed.), 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). vol. 2, IEEE Society, Los Alamitos, CA, pp. 135-145, SANER 2016, Osaka, Japan, 14/03/16. https://doi.org/10.1109/SANER.2016.98

Evaluating Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants. / Roy, S.; Hermans, F.; Aivaloglou, E. et al.
2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). ed. / A. Jiu. Vol. 2 Los Alamitos, CA: IEEE Society, 2016. p. 135-145.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Evaluating Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants

AU - Roy, S.

AU - Hermans, F.

AU - Aivaloglou, E.

AU - Winter, J.

AU - van Deursen, Arie

PY - 2016

Y1 - 2016

N2 - Spreadsheets are popular end-user computing applications and one reason behind their popularity is that they offer a large degree of freedom to their users regarding the way they can structure their data. However, this flexibility also makes spreadsheets difficult to understand. Textual documentation can address this issue, yet for supporting automatic generation of textual documentation, an important pre-requisite is to extract metadata inside spreadsheets. It is a challenge though, to distinguish between data and metadata due to the lack of universally accepted structural patterns in spreadsheets. Two existing approaches for automatic extraction of spreadsheet metadata were not evaluated on large datasets consisting of user inputs. Hence in this paper, we describe the collection of a large number of user responses regarding identification of spreadsheet metadata from participants of a MOOC. We describe the use of this large dataset to understand how users identify metadata in spreadsheets, and to evaluate two existing approaches of automatic metadata extraction from spreadsheets. The results provide us with directions to follow in order to improve metadata extraction approaches, obtained from insights about user perception of metadata. We also understand what type of spreadsheet patterns the existing approaches perform well and on what type poorly, and thus which problem areas to focus on in order to improve.

AB - Spreadsheets are popular end-user computing applications and one reason behind their popularity is that they offer a large degree of freedom to their users regarding the way they can structure their data. However, this flexibility also makes spreadsheets difficult to understand. Textual documentation can address this issue, yet for supporting automatic generation of textual documentation, an important pre-requisite is to extract metadata inside spreadsheets. It is a challenge though, to distinguish between data and metadata due to the lack of universally accepted structural patterns in spreadsheets. Two existing approaches for automatic extraction of spreadsheet metadata were not evaluated on large datasets consisting of user inputs. Hence in this paper, we describe the collection of a large number of user responses regarding identification of spreadsheet metadata from participants of a MOOC. We describe the use of this large dataset to understand how users identify metadata in spreadsheets, and to evaluate two existing approaches of automatic metadata extraction from spreadsheets. The results provide us with directions to follow in order to improve metadata extraction approaches, obtained from insights about user perception of metadata. We also understand what type of spreadsheet patterns the existing approaches perform well and on what type poorly, and thus which problem areas to focus on in order to improve.

KW - computer aided instruction

KW - meta data

KW - personal computing

KW - spreadsheet programs

KW - text analysis

KW - MOOC participants

KW - automatic spreadsheet metadata extraction

KW - automatic textual documentation generation

KW - data structure

KW - end-user computing applications

KW - metadata user perception

KW - spreadsheet patterns

KW - Computers

KW - Conferences

KW - Data mining

KW - Documentation

KW - Metadata

KW - Reliability

KW - Software

KW - Empirical evaluation

KW - MOOC

KW - Meta-data extraction

KW - Spreadsheet

KW - User-study

U2 - 10.1109/SANER.2016.98

DO - 10.1109/SANER.2016.98

M3 - Conference contribution

SN - 978-1-5090-1855-0

VL - 2

SP - 135

EP - 145

BT - 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

A2 - Jiu, A.

PB - IEEE Society

CY - Los Alamitos, CA

T2 - SANER 2016

Y2 - 14 March 2016 through 18 March 2016

ER -

Evaluating Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants

Abstract

Conference

Keywords

Access to Document

Fingerprint

Research output

On the Effectiveness of Automatically Inferred Invariants in Detecting Regression Faults in Spreadsheets

Cite this