SrpELTeC: A Serbian Literary Corpus for Distant Reading ⚒ Радови ⚒ Др РГФ

SrpELTeC: A Serbian Literary Corpus for Distant Reading

Објеката

Тип: Рад у часопису
Верзија рада: објављена верзија
Језик: енглески
Креатор: Ranka Stanković, Cvetana Krstev, Duško Vitas
Извор: Primerjalna književnost
Издавач: Research Centre of the Slovenian Academy of Sciences and Arts
Датум издавања: 2024
Сажетак: U članku je predstavljen SrpELTeC, korpus razvijen u okviru akcije COST Distant Reading for European Literary History (CA16204). Svi romani u SrpELTeC-u su odabrani, pripremljeni i obeleženi korišćenjem zajedničkih principa uspostavljenih za sve jezičke zbirke u Evropskoj zbirci književnog teksta (ELTeC). Navedeni su izazovi i rešenja u pripremi SrpELTeC od nule. Svi romani su ručno kodirani u TEI sa bogatim metapodacima i strukturnim napomenama. Automatska anotacija je uključivala POS-označavanje, lematizaciju i imenovane entitete, oslanjajući se na resurse za obradu prirodnog jezika koje je razvilo i održavalo JeRTeh Language Resources and Technologies Societi. Integracija SrpELTeC-a sa Vikipodacima je podržana skupom SPARQL upita za pronalaženje metapodataka sa različitim opcijama vizuelizacije. Nedavne aktivnosti u okviru COST Action NexusLinguarum—European Network for Web-centred Linguistic Data Science (CA18209) su povezane sa verzijom povezanih podataka SrpELTeC-a koristeći NLP Interchange Format. Sve verzije SrpELTeC-a su besplatno dostupne pod CC-BY licencom.; The article presents SrpELTeC, a corpus developed within the COST action Distant Reading for European Literary History (CA16204). All novels in SrpELTeC were selected, prepared, and annotated using the common principles established for all language collections in the European Literary Text Collection (ELTeC). The challenges and solutions in preparing SrpELTeC from scratch are outlined. All novels were manually encoded in TEI with rich metadata and structural annotation. The automatic annotation included POS-tagging, lemmatization, and named entities, relying on Natural Language Processing resources developed and maintained by the JeRTeh Language Resources and Technologies Society. The integration of SrpELTeC with Wikidata was supported with a set of SPARQL queries for the retrieval of metadata with different visualization options. Recent activities within the COST Action NexusLinguarum—European Network for Web-centred Linguistic Data Science (CA18209) are related to the linked data version of SrpELTeC using the NLP Interchange Format. All versions of SrpELTeC are freely available under the CC-BY license.
том: 2 (2024)
Број: 47
почетак странице: 45
крај странице: 63
doi: 10.3986/pkn.v47.i2.03
issn: 0351-1189
Subject: digital humanities, Serbian literature, text corpora, distant reading , linked data, named entity recognition, text analytics
uri: https://ojs-gr.zrc-sazu.si/primerjalna_knjizevnost/article/view/9411/8803
Шира категорија рада: M20
Ужа категорија рада: М23
Права: Отворени приступ
Лиценца: Creative Commons – Attribution 4.0 International
Формат: .pdf

Скупови објеката: Ранка Станковић; Radovi istraživača

Медија: 4-ts-en-PKn-2024-2.pdf

Ranka Stanković, Cvetana Krstev, Duško Vitas. "SrpELTeC: A Serbian Literary Corpus for Distant Reading" in Primerjalna književnost, Research Centre of the Slovenian Academy of Sciences and Arts (2024). https://doi.org/10.3986/pkn.v47.i2.03 М23

This item was submitted on 25. јун 2024. by [anonymous user] using the form “Рад у часопису” on the site “Радови”: http://drug.rgf.bg.ac.rs/s/repo

Click here to view the collected data.