Search computing meets data extraction

Tim Furche, Giorgio Orsi, Alessandro Bozzon, Chiara Pasini, Luca Tettamanti, Salvatore Vadacca

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Thanks to the Web, access to an increasing wealth and variety of information has become near instantaneous. To make informed decisions, however, we often need to access data from many different sources and integrate different types of information. Manually collecting data from scores of web sites and combining that data remains a daunting task. The ERC projects SeCo (Search Computing) and DIA- DEM (Domain-centric Intelligent Automated Data Extrac- Tion Methodology) address two aspects of this problem: SeCo supports complex search processes drawing on data from multiple domains with a user interface capable of refining and exploring the search results. DIADEM aims to automatically extract structured data from a domain's websites. In this paper, we outline a first approach for integrating SeCo and DIADEM. We discuss how to use the DIADEM methodology to automatically turn nearly any website from a given domain into a SeCo search service. We describe how such services can be registered and exploited by the SeCo framework in combination with services from other domains (and possibly developed with other methodologies).
Original languageEnglish
Pages (from-to)58-61
Number of pages4
JournalCEUR Workshop Proceedings
Volume880
Publication statusPublished - 2011
Externally publishedYes
Event1st International Workshop on Searching and Integrating New Web Data Sources: Very Large Data Search - Seattle, United States
Duration: 2 Sept 20112 Sept 2011
http://ceur-ws.org/

Fingerprint

Dive into the research topics of 'Search computing meets data extraction'. Together they form a unique fingerprint.

Cite this