Sub-document Timestamping: A Study on the Content Creation Dynamics of Web Documents

Yue Zhao, Claudia Hauff

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

Abstract

The creation time of documents is an important kind of information in temporal information retrieval, especially for document clustering, timeline construction and search engine improvements. Considering the manner in which content on the Web is created, updated & deleted, the common assumption that each document has only one creation time is not suitable for Web documents. In this paper, we investigate to what extent this assumption is wrong. We introduce two methods to timestamp individual parts (sub-documents) of Web documents and analyze in detail the creation & update dynamics of three classes of Web documents.
Original languageEnglish
Title of host publicationResearch and Advanced Technology for Digital Libraries
Subtitle of host publication20th International Conference on Theory and Practice of Digital Libraries, TPDL 2016
EditorsN. Fuhr, L. Kovács, T. Risse, W. Nejdl
Place of PublicationCham
PublisherSpringer
Pages203-214
Number of pages12
ISBN (Electronic)978-3-319-43997-6
ISBN (Print)978-3-319-43996-9
Publication statusPublished - 2016

Publication series

NameLecture Notes in Computer Science
Volume9819
ISSN (Print)0302-9743

Keywords

  • Timestamping
  • Sub-documents
  • Internet Archive

Fingerprint

Dive into the research topics of 'Sub-document Timestamping: A Study on the Content Creation Dynamics of Web Documents'. Together they form a unique fingerprint.

Cite this