Enron's Spreadsheets and Related Emails: A Dataset and Analysis

Felienne Hermans, Emerson Murphy-Hill

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

69 Citations (Scopus)

Abstract

Spreadsheets are used extensively in business processes around the world and as such, are a topic of research interest. Over the past few years, many spreadsheet studies have been performed on the EUSES spreadsheet corpus. While this corpus has served the spreadsheet community well, the spreadsheets it contains are mainly gathered with search engines and might therefore not represent spreadsheets used in companies. This paper presents an analysis of a new dataset, extracted from the Enron email archive, containing over 15,000 spreadsheets used within the Enron Corporation. In addition to the spreadsheets, we also present an analysis of the associated emails, where we look into spreadsheet-specific email behavior. Our analysis shows that 1) 24% of Enron spreadsheets with at least one formula contain an Excel error, 2) there is little diversity in the functions used in spreadsheets: 76% of spreadsheets in the presented corpus use the same 15 functions and, 3) the spreadsheets are substantially more smelly than the EUSES corpus, especially in terms of long calculation chains. Regarding the emails, we observe that spreadsheets 1) are a frequent topic of email conversation with 10% of emails either referring to or sending spreadsheets and 2) the emails are frequently discussing errors in and updates to spreadsheets.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, ICSE 2015
PublisherIEEE
Pages7-16
Number of pages10
Volume2
ISBN (Electronic)978-1-4799-1934-5
DOIs
Publication statusPublished - 2015
EventICSE 2015: 37th IEEE/ACM International Conference on Software Engineering - Florence, Italy
Duration: 16 May 201524 May 2015
Conference number: 37

Conference

ConferenceICSE 2015
Country/TerritoryItaly
CityFlorence
Period16/05/1524/05/15

Keywords

  • Electronic mail
  • Software engineering
  • Software
  • Measurement
  • Companies
  • Economics
  • Industries

Fingerprint

Dive into the research topics of 'Enron's Spreadsheets and Related Emails: A Dataset and Analysis'. Together they form a unique fingerprint.

Cite this