Proximity of Terms, Texts and Semantic Vectors in Information Retrieval

Jeroen Vuurens

Research output: ThesisDissertation (TU Delft)

102 Downloads (Pure)

Abstract

Information Retrieval (IR) is finding content of an unstructured nature with respect to an information need. A retrieval system typically uses a retrieval model to rank the available content by their estimated relevance to an information need. For decades, state-of-the-art retrieval models have used the assumption that terms appear independently in text documents. Chapter 1 of this thesis describes how the relevance likelihood of a document changes by the observed distance between co-occurring query terms in its text.
Nowadays, news is abundantly available online, allowing users to discover and follow news events. However, online news is often very redundant; most sources basing their stories on previously published works and add only limited new information. Thus, a user often ends up spending significant amount of effort re-reading the same parts of a story before finding relevant and novel information. In Chapter 2 and Chapter 3, we present a novel approach to construct an online news summary for a given topic. Salient sentences are identified by clustering the sentences in the news stream based on the relative proximity of the sentences and the temporal proximity of their publication times. To improve the coherence of a long summary that describes a news topic, we propose to automatically cluster sentences by subtopics in Chapter 4. In Chapter 5, we show how new topics can be detected in the news stream using the same clustering technique.
In real-life decision making, people are often faced with an overload of choices. A recommender system aids the user by reducing the available choices to a shortlist of items that are of interest to the user. In Chapter 6, we learn high-dimensional representations for movies that allow to effectively recommend movies based on a user’s most recently rated movies.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Delft University of Technology
Supervisors/Advisors
  • de Vries, Arjen, Supervisor
Award date26 Apr 2017
Print ISBNs978-94-6186-803-9
DOIs
Publication statusPublished - 2017

Bibliographical note

SIKS Dissertation Series No. 2017-19 The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems.

Keywords

  • Information retrieval
  • retrieval algorithms
  • clustering
  • recommender systems

Fingerprint

Dive into the research topics of 'Proximity of Terms, Texts and Semantic Vectors in Information Retrieval'. Together they form a unique fingerprint.

Cite this