Standard

TSE-NER : An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications. / Mesbah, Sepideh; Lofi, Christoph; Valle Torre, Manuel; Bozzon, Alessandro; Houben, Geert-Jan.

The Semantic Web – ISWC 2018: Proceedings of the 17th International Semantic Web Conference. ed. / D. Vrandečić; K. Bontcheva; M.C. Suárez-Figueroa; V. Presutti; I. Celino; M. Sabou; L.M Kaffee; E. Simperl. Cham : Springer, 2018. p. 127-143 (Lecture Notes in Computer Science (LNCS); Vol. 11136).

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Harvard

Mesbah, S, Lofi, C, Valle Torre, M, Bozzon, A & Houben, G-J 2018, TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications. in D Vrandečić, K Bontcheva, MC Suárez-Figueroa, V Presutti, I Celino, M Sabou, LM Kaffee & E Simperl (eds), The Semantic Web – ISWC 2018: Proceedings of the 17th International Semantic Web Conference. Lecture Notes in Computer Science (LNCS), vol. 11136, Springer, Cham, pp. 127-143, ISWC 2018, Monterey, CA, United States, 8/10/18. https://doi.org/10.1007/978-3-030-00671-6_8

APA

Mesbah, S., Lofi, C., Valle Torre, M., Bozzon, A., & Houben, G-J. (2018). TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications. In D. Vrandečić, K. Bontcheva, M. C. Suárez-Figueroa, V. Presutti, I. Celino, M. Sabou, L. M. Kaffee, ... E. Simperl (Eds.), The Semantic Web – ISWC 2018: Proceedings of the 17th International Semantic Web Conference (pp. 127-143). (Lecture Notes in Computer Science (LNCS); Vol. 11136). Cham: Springer. https://doi.org/10.1007/978-3-030-00671-6_8

Vancouver

Mesbah S, Lofi C, Valle Torre M, Bozzon A, Houben G-J. TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications. In Vrandečić D, Bontcheva K, Suárez-Figueroa MC, Presutti V, Celino I, Sabou M, Kaffee LM, Simperl E, editors, The Semantic Web – ISWC 2018: Proceedings of the 17th International Semantic Web Conference. Cham: Springer. 2018. p. 127-143. (Lecture Notes in Computer Science (LNCS)). https://doi.org/10.1007/978-3-030-00671-6_8

Author

Mesbah, Sepideh ; Lofi, Christoph ; Valle Torre, Manuel ; Bozzon, Alessandro ; Houben, Geert-Jan. / TSE-NER : An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications. The Semantic Web – ISWC 2018: Proceedings of the 17th International Semantic Web Conference. editor / D. Vrandečić ; K. Bontcheva ; M.C. Suárez-Figueroa ; V. Presutti ; I. Celino ; M. Sabou ; L.M Kaffee ; E. Simperl. Cham : Springer, 2018. pp. 127-143 (Lecture Notes in Computer Science (LNCS)).

BibTeX

@inproceedings{91b0bf6013044b2fba55f58f04351381,
title = "TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications",
abstract = "Named Entity Recognition and Typing (NER/NET) is a challenging task, especially with long-tail entities such as the ones found in scientific publications. These entities (e.g. “WebKB”, “StatSnowball”) are rare, often relevant only in specific knowledge domains, yet important for retrieval and exploration purposes. State-of-the-art NER approaches employ supervised machine learning models, trained on expensive typelabeled data laboriously produced by human annotators. A common workaround is the generation of labeled training data from knowledge bases; this approach is not suitable for long-tail entity types that are, by definition, scarcely represented in KBs.This paper presents an iterative approach for training NER and NETclassifiers in scientific publications that relies on minimal human input,namely a small seed set of instances for the targeted entity type. Weintroduce different strategies for training data extraction, semantic expansion, and result entity filtering.We evaluate our approach on scientificpublications, focusing on the long-tail entities types Datasets, Methods incomputer science publications, and Proteins in biomedical publications.",
author = "Sepideh Mesbah and Christoph Lofi and {Valle Torre}, Manuel and Alessandro Bozzon and Geert-Jan Houben",
note = "Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.",
year = "2018",
doi = "10.1007/978-3-030-00671-6_8",
language = "English",
isbn = "978-3-030-00670-9",
series = "Lecture Notes in Computer Science (LNCS)",
publisher = "Springer",
pages = "127--143",
editor = "{ Vrandečić}, D. and K. Bontcheva and M.C. Su{\'a}rez-Figueroa and V. Presutti and I. Celino and M. Sabou and L.M Kaffee and E. Simperl",
booktitle = "The Semantic Web – ISWC 2018",

}

RIS

TY - GEN

T1 - TSE-NER

T2 - An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications

AU - Mesbah, Sepideh

AU - Lofi, Christoph

AU - Valle Torre, Manuel

AU - Bozzon, Alessandro

AU - Houben, Geert-Jan

N1 - Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2018

Y1 - 2018

N2 - Named Entity Recognition and Typing (NER/NET) is a challenging task, especially with long-tail entities such as the ones found in scientific publications. These entities (e.g. “WebKB”, “StatSnowball”) are rare, often relevant only in specific knowledge domains, yet important for retrieval and exploration purposes. State-of-the-art NER approaches employ supervised machine learning models, trained on expensive typelabeled data laboriously produced by human annotators. A common workaround is the generation of labeled training data from knowledge bases; this approach is not suitable for long-tail entity types that are, by definition, scarcely represented in KBs.This paper presents an iterative approach for training NER and NETclassifiers in scientific publications that relies on minimal human input,namely a small seed set of instances for the targeted entity type. Weintroduce different strategies for training data extraction, semantic expansion, and result entity filtering.We evaluate our approach on scientificpublications, focusing on the long-tail entities types Datasets, Methods incomputer science publications, and Proteins in biomedical publications.

AB - Named Entity Recognition and Typing (NER/NET) is a challenging task, especially with long-tail entities such as the ones found in scientific publications. These entities (e.g. “WebKB”, “StatSnowball”) are rare, often relevant only in specific knowledge domains, yet important for retrieval and exploration purposes. State-of-the-art NER approaches employ supervised machine learning models, trained on expensive typelabeled data laboriously produced by human annotators. A common workaround is the generation of labeled training data from knowledge bases; this approach is not suitable for long-tail entity types that are, by definition, scarcely represented in KBs.This paper presents an iterative approach for training NER and NETclassifiers in scientific publications that relies on minimal human input,namely a small seed set of instances for the targeted entity type. Weintroduce different strategies for training data extraction, semantic expansion, and result entity filtering.We evaluate our approach on scientificpublications, focusing on the long-tail entities types Datasets, Methods incomputer science publications, and Proteins in biomedical publications.

U2 - 10.1007/978-3-030-00671-6_8

DO - 10.1007/978-3-030-00671-6_8

M3 - Conference contribution

SN - 978-3-030-00670-9

T3 - Lecture Notes in Computer Science (LNCS)

SP - 127

EP - 143

BT - The Semantic Web – ISWC 2018

A2 - Vrandečić, D.

A2 - Bontcheva, K.

A2 - Suárez-Figueroa, M.C.

A2 - Presutti, V.

A2 - Celino, I.

A2 - Sabou, M.

A2 - Kaffee, L.M

A2 - Simperl, E.

PB - Springer

CY - Cham

ER -

ID: 45302869