Претрага
50 items
-
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... this set in PWN, either by merging it into an existing synset, or adding it as a new hyponym synset. Keywords: WordNet, bilingual resources, term alignment, parallel lists 104 Five teams submitted 13 systems, with all teams performing better than chance, but only one team sur- passing a simple ...
... correlated with the comprehensiveness of the resource used in the alignment process (Hristea, 2007). Different methods and resources can be used for alignment. One of the common approaches is to take PWN as the source for alignment, and a bilingual dictionary of English and the target language. There ...
... aligned term pairs. The structure of lexical entries in GeolISSTerm, Rudonto and Termi is similar. Each term comes with a name, definition, an optional list of synonyms, abbreviations and a bibliographic source. Each term, except the top term in the dictionary tree, has only one hyperonym term, but it ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... 77--84. Tadić, M., Šojat, K. (2003). Finding multiword term candidates in Croatian. In Proc. of IESL2003 Workshop, Borovets: Context, pp. 102-107. Vintar, Š. (2010). Bilingual term recognition revisited the bag-of-equivalents term alignment approach and its evaluation. Terminology, 16(2), pp.141--158 ...
... Rule-based Automatic Multi-word Term Extraction and Lemmatization Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Rule-based Automatic Multi-word Term Extraction and Lemmatization ...
... among them 97% were associated with correct lemmas. Keywords: term extraction, terminology, multi-word units, lemmatization, finite-state transducers 1. Motivation Various approaches have been proposed for multi-word term (MWT) extraction as this problem has been gaining in importance ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... constructing an n-way alignment of LSRs and applied it to the produc- tion of a three-way alignment of the English WordNet, Wikipedia and Wiktionary. Niemann and Gurevych (2011) propose a threshold-based Personalized PageRank method for extracting a set of Wikipedia articles as alignment candidates and ...
... further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA. Keywords: lexical semantic resources, sense alignment, lexicography, language ...
... word sense alignment. Different dictionaries and related resources such as word- nets and encyclopedia have significant differences in struc- ture and heterogeneity in content, which makes aligning information across resources and languages a challenging task. Word sense alignment (WSA) is a more ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
-
Bilingual lexical extraction based on word alignment for improving corpus search
Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
-
Towards translation of educational resources using GIZA++
... terminology look-up, display and insertion of the search results into the text being translated. 4. ENVIRONMENT FOR TEXT ALIGNMENT Preliminary phase for the text alignment (parallelization) consists of XML document (eXtensible Markup Language) preparation according to TEI (Text Encoding Initiative) ...
... translation variants in large parallel corpora [17]. Volk et al. argue that automatic word alignment allows for major innovations in searching parallel corpora. Some online query systems already employ word alignment for sorting translation variants, but they describe the system for efficiently searching ...
... texts: English left and Serbian right Document with the extension _fs contains the information about paired segments. The method used in the alignment is based on the number of characters (length of the segment). This approach is very successful (on the average as much as 96% correctly paired ...Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)
-
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... and chunk alignment. In this first experiment a source language is English, a target language is Serbian, a domain is Library and Information Science for which a bilingual terminological dictionary exists. Our term extractor is based on e-dictionaries and shallow parsing, and for word alignment we use GIZA++ ...
... extracted Serbian term (SRP_EXTRACTED) and from the English part of the aligned chunk (GIZA_ENG_LEMM) and its Dictio- nary entry (ENG_DICT). These features are: 1) total number of words in Serbian and English chunks, ex- tracted term and English Dictionary term (*_WC), 2) ex- tracted term frequency (*_FREQ) ...
... id- iomatic expressions using automatic word-alignment. In Proceedings of the EACL 2006 Workshop on Multi-word expressions in a multilingual context, pages 33–40. Och, F. J. and Ney, H. (2003). A Systematic Comparison of Various Statistical Alignment Models. Computational linguistics, 29(1):19–51 ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
The shear strength evaluation of rough and infilled joints and its indications for stability of rock cutting in schist rock mass
Construction of E75 highway section through Grdelica gorge was one of the most demanding projects realized in recent Serbian history. The alignment approximately 25 km long consists of several tens of bridges, two tunnels, three galleries and cuts with total length of 6 km. The alignment passes through highly anisotropic Palaeozoic schist rock formation of different weathering grades. This study focuses on shear strength properties of discontinuities, which are found to be the critical feature contributing to the occurrence ...Dušan Berisavljević, Zoran Berisavljević, Svetlana Melentijević. "The shear strength evaluation of rough and infilled joints and its indications for stability of rock cutting in schist rock mass" in Bulletin of engineering geology and the environment, Springer (2022). https://doi.org/10.1007/s10064-022-02580-8
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... e”. The alignment works in the following way. GIZA++ reads the two input texts in parallel. Whenever two bilingual chunks appear together, their co-occurrence is written into text file (dubbed f_phrases). Afterwards, f_phrases is sorted in two ways (by the target term and by the source term), and that’s ...
... The first line is a header, each line contains a term and its frequency (for filtering later), separated with | (“pipe” character). The interface of this module is displayed in Figure 3. Figure 3. Input module of the BiLTe Web application Alignment and Post-Processing Module Aligning with GIZA++ ...
... from Lexical Resources”. Natural Language Engineering, 2019 Xu, Yan, Luoxin Chen, Junsheng Wei, Sophia Ananiadou, Yubo Fan et al.. “Bilingual Term Alignment from Comparable Corpora in English Dis- charge Summary and Chinese Discharge Summary”. BMC bioinformatics Vol. 16, no. 1 (2015): 149 138 Infotheca ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
-
Integrisano okruženje za pripremu paralelizovanog korpusa
Razvoj paralelizovanih korpusa zahteva pripremu paralelnih tekstova za njihovu integraciju u paralelizovani korpus. Reč je o jednom kompleksnom zadatku koji se može rešiti na različite načine, i koji mora da se odvija u nekoliko koraka. U ovom radu najpre je iznet postupak pripreme paralelnih tekstova za paralelizovani korpus koji se koristi u Grupi za jezičke tehnologije Univerziteta u Beogradu. Potom je dat kratak pregled programa (XAlign, Concordancier, WS4LR), odnosno softverskih alata koji se pri tome koriste. Nedostatak udobnog okruženja ...... datoteka u TMX formatu na datoteke pojedinačnih jezika • vertikalizaciju teksta Sve navedene funkcije su dostupne preko menija Alignment, Tools i TMX. Meni Alignment obezbeđuje GUI za programske pakete za paralelizaciju laboratorije Loria. Pojedinačne stavke u meniju omogućavaju korišćenje svakog ...
... Encoding Initiative) consortium recommendations, and their alignment is performed at the level of paragraphs and sentences. We then give an overview of the software, namely programs (XAlign, Concordancier, WS4LR) that are used for alignment. The absence of a comfortable environment with a graphical ...
... construction of this environment we chose the C# programming language. Among other things, ACIDE provides a graphical user interface (GUI) for alignment and visualization of aligned texts, their control and correction, as well as generation of files in TMX format. ACIDE also enables the decomposition ...Ivan Obradović, Ranka Stanković, Miloš Utvić. "Integrisano okruženje za pripremu paralelizovanog korpusa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
-
Keyword-Based Search on Bilingual Digital Libraries
This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
-
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... two alignment tools developed by LORIA (Laboratoire lorrain de recherche en informatique et ses applications), one for automatic sentence alignment of texts (Xalign, http://led.loria.fr/outils/ALIGN/align.html), and another for alignment visualization and manual correction of alignment errors ...
... by the “OR” operator. This way, the user can obtain text segments in which either the Serbian or the English term was translated in an unexpected way. For instance, for the English term browser the Dictionary of library and information science terminology offers the Serbian translations pretraživač ...
... of search retrieves only segments that contain both the English and the Serbian term, thus excluding unwanted hits resulting from the ambiguity of terms. For instance, when initiating a search with the Serbian term novine, and expanding it with both the Wordnets and the Dictionary of librarianship ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
-
Long-term planning methodology for improving wood biomass utilization
The insufficiently developed forest management system is often followed by undeveloped forest resources supply chain and insufficient institutional support. These cause inefficient usage of fuel-wood as well as huge amounts of unused forest residues. In order to achieve optimal and long-term sustainable utilisation of biomass, an original methodology based on the interaction of mathematical optimization and backcasting approach has been developed. Mathematical optimization is used for both generation and consideration of techno-economic parameters of the forest biomass supply chain. ...Vladimir Vukašinović, Dušan Gordić, Marija Živković, Davor Koncalović, Dubravka Živković. "Long-term planning methodology for improving wood biomass utilization" in Energy, Elsevier BV (2019). https://doi.org/10.1016/j.energy.2019.03.105
-
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... Concordancier, developed in Loria labaratory in France (Laboratoire Lorrain de Recherche en Informatique et ses Applications) are used for alignment. The alignment method is based on the number of characters (length of the segment). Utvić reports that this approach is very successful (as much as 96% ...
... journal INFOtheca and 3 Baektel deliverables. The inspection of these lines reveals that the Serbian term “e-učenje” always has as its equivalence the English term “e-learning”, while the English term e-learning is translated in most cases with “elektronsko učenje” (43), but also with “e-učenje” (38) ...
... “sinteza”, “učenje”, “znanje”. The final form of the query is obtained by the morphological expansion of each individual term, if requested by the user. For example, the original term “sticanje znanja” is expanded with inflectional forms “sticanja znanja”, “sticanju znanja”, “sticanjem znanja”, “sticanjima ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... Similarly, comparison and partial alignment of the DSA tag 3 http://www.tei-c.org/ 5 / 9 946 ProceediNGS oF tHe xviii euraLex iNterNatioNaL coNGreSS set was done with Ontolex4 and LexInfo5, but a more precise and detailed alignment is envisaged. The dictionary article ...
... guidelines for dictionary writing were used to defi ne the rules for the segmentation of the dictionary articles, the pattern recognition, and the alignment of the recognized markers with the predefi ned categories, as described in the previous section. The dictionary article units that were recognized ...
... predefi ned article structure. This information is then used for the manual postediting of the digitized text. Within this research, the partial alignment of the XML tag set, defi ned for the DSA with the TEI dictionary module, was made. For instance, theelement is used to denote the grammatical ... Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... much as possible term entries from dictionaries and other resources covering raw material domain terminology. Besides the aim of aggregating terms from different resources, one of the reasons for alignment of terms from multiple dictionaries (paper and electronic) was to assess term usage, which determines ...
... the other hand, alignment of terms with SrpMD was necessary, since these dictionaries are a base resource for lemmatization and multiword term extraction. Since SrpMD are already in the lexical database Leximirka [32], developed and managed by the same research team, this type of alignment was possible ...
... Section 3.2 using terminology extractors for Serbian and English, and Bilte [38]), a tool for chunk alignment [39,40]. The method combines the approach with existing domain terminology lexicons with term extraction tools. For English, FlexiTerm [41] was used with threshold 3 and TermSuite [42] with threshold ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
-
WS4LR - a Worksation for Lexical Resources
... as smaller. The standard method for representing aligned texts is the Translation Memory eXchange format (TMX) that is XML-compliant 5 . The alignment itself can be performed by different methods and tools (Veronis, 2000). Of particular interest are programs that use XML tagged input texts and ...
... with or without synset hypernyms. 3.4 Working with Aligned Texts The module uses texts which have previously been aligned using Xalign as an alignment tool and converts them to TMX format, or texts that are already in that format. By choosing the appropriate XSLT stylesheet various visualizations ...
... for Balkan Languages, in Proc. of 1st International Wordnet Conference, Mysore, India Veronis, J. (ed.) (2000) Parallel Text processing: Alignment and Use of Translation Corpora, Dordrecht: Kluwer Academic Publishers Vossen, P. (ed.) (1998) EuroWordNet: A Multilingual Database with Lexical ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović. "WS4LR - a Worksation for Lexical Resources" in Proceedings of the Fifth Interantional Conference on Language Resources and Evaluation, Genoa, Italy, May 2006, ELRA - European Language Resources Association (2006)
-
Softverski alati za korišćenje resursa za srpski jezik
Ivan Obradović, Ranka Stanković (2008)... smaller, such as words. The sec- ond step is the alignment of segmented parallel texts by means of one of the available alignment methods. The goal is to connect equivalent seg- ments in two or more parallel texts. The method usually used for alignment at the sentence level, which is the most common ...
... and its hypernyms 3.4 Aligned texts WS4LR contains a module for processing of parallel texts which have previously been aligned using the text alignment tool XAlign (Bonhomme et al., 2001). The module enables the transformation of texts aligned by XAlign into different formats: textual, XML, tabular ...
... depending on the type of visual- ization required. Besides the possibility of work- ing with specific file structures which are result from the alignment with XAlign, this module also accepts other files already in TMX format as input. The panel in Figure 8 depicts an aligned text in TMX format. ...Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
-
E-Connecting Balkan Languages
In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. This tool relies for its work on several sophisticated textual and lexical resources that were developed for most of Balkan languages. These resources are based on several de facto standards in natural language processing.... text logical layout. At the beginning of the alignment process all segments coincided with sentences automatically tagged by Unitex. The XAlign system [1] was used for the alignment process. Starting from the French version, the goal of the alignment was to establish 1:1 relations on the segment ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva. "E-Connecting Balkan Languages" in Proceedings of the Workshop Workshop on Multilingual resources, technologies and evaluation for Central and Eastern European Languages, 17 September 2009, eds. C. Vertan, S. Piperidis, E. Paskaleva and Milena Slavcheva, Borovets, Bulgaria : Association for Computational Linguistics Stroudsburg, PA, USA (2009)
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... the same) or weighted, where partial overlapping is taken into account, but with some weighted value to mea- sure overlapping segment. To indicate alignment type, one can choose among the two options: the first option is greedyMatching, where the match- ing of annotations in the first and second files ...
... 2 × 3 × 4 evaluation rounds: two test sets, three NERs and four models per each. All trials were run with strict matching type and max- Matching alignment type. To indicate the chosen score type to evaluate the correspondence between one annotation from the first file and one annotation from the second ...
... entities, classes, attributes per doc- ument and collection; Gemini tool allows comparison of two text anno- tation files and provides different alignment scores. It is possible to compare a pair of XML files, a pair of files in BRAT for mat and one XML file against a file in BRAT for- mat. The first ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122