BERT Downstream Task Analysis: Named Entity Recognition in Serbian

Објеката

Тип
Рад у зборнику
Верзија рада
објављена
Језик
енглески
Креатор
Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković
Извор
Lecture Notes in Networks and Systems
Издавач
Springer Nature Switzerland
Датум издавања
2024
Сажетак
This paper compares different architectures and techniques for preparing named entity recognition (NER) models for the Serbian language via integrating BERT with spaCy. Models were trained to recognize seven different named entity types (persons, locations, organisations, professions, events, demonyms, and artworks), and are trained on the dataset containing Serbian novels published between 1840 and 1920, publicly available newspaper articles and sentences generated from the Wikidata knowledge base and Leximirka lexical database. We explore various configurations and several training pipelines that differ in complexity and functionality. Some are dedicated solely to NER, while others encompass additional features like Part-of-speech tagging and lemmatization. One of the key aspects of this work involves testing different versions of BERT, with varied architectures, sizes, and pre-training corpora that contain the Serbian language. This approach allows us to evaluate the trade-offs between model complexity and performance and offers a nuanced understanding of how different configurations impact the efficiency and effectiveness of NER task in Serbian.
почетак странице
333
крај странице
347
doi
10.1007/978-3-031-71419-1_29
issn
2367-3370
Шира категорија рада
М30
Ужа категорија рада
М33
Права
Затворени приступ
Лиценца
All rights reserved
Формат
.pdf

Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković. "BERT Downstream Task Analysis: Named Entity Recognition in Serbian" in Lecture Notes in Networks and Systems, Springer Nature Switzerland (2024). https://doi.org/10.1007/978-3-031-71419-1_29

This item was submitted on 9. јануар 2025. by [anonymous user] using the form “Рад у зборнику радова” on the site “Радови”: http://drug.rgf.bg.ac.rs/s/repo

Click here to view the collected data.