Knowledge about a (Web) document's creation time has been shown to be an important factor in various temporal information retrieval settings. Commonly, it is assumed that such documents were created at a single point in time. While this assumption may hold for news articles and similar document types, it is a clear oversimplification for general Web documents. In this paper, we investigate to what extent (i) this simplifying assumption is violated for a corpus of Web documents, and, (ii) it is possible to accurately estimate the creation time of individual Web documents' components (so-called sub-documents).
Original languageEnglish
Title of host publicationProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015
EditorsR Baeza-Yates, M Lalmas, A Moffat, B Ribeiro-Neto
Place of PublicationNew York, NY, USA
PublisherAssociation for Computing Machinery (ACM)
Number of pages4
ISBN (Print)978-1-4503-3621-5
Publication statusPublished - 2015
EventSIGIR 2015, Santiago, Chile - New york
Duration: 9 Aug 201513 Aug 2015

Publication series



ConferenceSIGIR 2015, Santiago, Chile

    Research areas

  • timestamping, sub-documents, Web-archiving

ID: 3858862