DOI

Knowledge about a (Web) document's creation time has been shown to be an important factor in various temporal information retrieval settings. Commonly, it is assumed that such documents were created at a single point in time. While this assumption may hold for news articles and similar document types, it is a clear oversimplification for general Web documents. In this paper, we investigate to what extent (i) this simplifying assumption is violated for a corpus of Web documents, and, (ii) it is possible to accurately estimate the creation time of individual Web documents' components (so-called sub-documents).
Original languageEnglish
Title of host publicationProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015
EditorsR Baeza-Yates, M Lalmas, A Moffat, B Ribeiro-Neto
Place of PublicationNew York, NY, USA
PublisherAssociation for Computing Machinery (ACM)
Pages1023-1026
Number of pages4
ISBN (Print)978-1-4503-3621-5
DOIs
StatePublished - 2015
EventSIGIR 2015, Santiago, Chile -

Publication series

Name
PublisherACM

Conference

ConferenceSIGIR 2015, Santiago, Chile
Period9/08/1513/08/15

    Research areas

  • timestamping, sub-documents, Web-archiving

ID: 3858862