The creation time of documents is an important kind of information in temporal information retrieval, especially for document clustering, timeline construction and search engine improvements. Considering the manner in which content on the Web is created, updated & deleted, the common assumption that each document has only one creation time is not suitable for Web documents. In this paper, we investigate to what extent this assumption is wrong. We introduce two methods to timestamp individual parts (sub-documents) of Web documents and analyze in detail the creation & update dynamics of three classes of Web documents.
Original languageEnglish
Title of host publicationResearch and Advanced Technology for Digital Libraries
Subtitle of host publication20th International Conference on Theory and Practice of Digital Libraries, TPDL 2016
EditorsN. Fuhr, L. Kovács, T. Risse, W. Nejdl
Place of PublicationCham
PublisherSpringer International Publishing
Pages203-214
Number of pages12
ISBN (Electronic)978-3-319-43997-6
ISBN (Print)978-3-319-43996-9
StatePublished - 2016

Publication series

NameLecture Notes in Computer Science
Volume9819
ISSN (Print)0302-9743

    Research areas

  • Timestamping, Sub-documents, Internet Archive

ID: 11399956