BERT Downstream Task Analysis: Named Entity Recognition in Serbian
Објеката
- Тип
- Рад у зборнику
- Верзија рада
- објављена
- Језик
- енглески
- Креатор
- Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković
- Извор
- Lecture Notes in Networks and Systems
- Издавач
- Springer Nature Switzerland
- Датум издавања
- 2024
- Сажетак
- This paper compares different architectures and techniques for preparing named entity recognition (NER) models for the Serbian language via integrating BERT with spaCy. Models were trained to recognize seven different named entity types (persons, locations, organisations, professions, events, demonyms, and artworks), and are trained on the dataset containing Serbian novels published between 1840 and 1920, publicly available newspaper articles and sentences generated from the Wikidata knowledge base and Leximirka lexical database. We explore various configurations and several training pipelines that differ in complexity and functionality. Some are dedicated solely to NER, while others encompass additional features like Part-of-speech tagging and lemmatization. One of the key aspects of this work involves testing different versions of BERT, with varied architectures, sizes, and pre-training corpora that contain the Serbian language. This approach allows us to evaluate the trade-offs between model complexity and performance and offers a nuanced understanding of how different configurations impact the efficiency and effectiveness of NER task in Serbian.
- почетак странице
- 333
- крај странице
- 347
- doi
- 10.1007/978-3-031-71419-1_29
- issn
- 2367-3370
- Шира категорија рада
- М30
- Ужа категорија рада
- М33
- Права
- Затворени приступ
- Лиценца
- All rights reserved
- Формат
Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković. "BERT Downstream Task Analysis: Named Entity Recognition in Serbian" in Lecture Notes in Networks and Systems, Springer Nature Switzerland (2024). https://doi.org/10.1007/978-3-031-71419-1_29
This item was submitted on 9. јануар 2025. by [anonymous user] using the form “Рад у зборнику радова” on the site “Радови”: http://drug.rgf.bg.ac.rs/s/repo
Click here to view the collected data.