Documents

Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challeng- ing task, as typically the extensive training data and test data for fine- tuning NER algorithms is lacking. Recent approaches presented promis- ing solutions relying on training NER algorithms in an iterative weakly- supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incremen- tally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.
Original languageEnglish
Title of host publicationInternational Conferences on Theory and Practice of Digital Libraries (TPDL)
Place of PublicationOslo, Norway
PublisherSpringer
Number of pages15
Publication statusAccepted/In press - 1 Sep 2019
Event23rd International Conference on Theory and Practice of Digital Libraries, - Oslo, Norway
Duration: 9 Sep 201912 Sep 2019
Conference number: 23
http://www.tpdl.eu/tpdl2019/

Conference

Conference23rd International Conference on Theory and Practice of Digital Libraries,
Abbreviated titleTPDL 2019
CountryNorway
CityOslo
Period9/09/1912/09/19
Internet address

ID: 54783918