Претрага
92 items
-
Combining Heterogeneous Lexical Resources
... ones are: • The system of morphological dictionaries of Serbian (SMD) in Intex format (Silberztein, 2000), that consists of a dictionary of simple lemmas, a dictionary of compounds (under construction), the corresponding dictionaries of word forms, and morphological finite-state automata that ...
... Serbian DELAS has been postponed until the two resources will become comparable in size. Besides that, the development of the Serbian morphological dictionary of compounds is in its initial phase, which is a serious drawback for the enhancement of the WN with morphosyntactic information, where ...
... classes of lemmas. The current size of SMD of simple lemmas is around 65.000, and they produce a dictionary of word forms with more than 930.000 entries. An example of an entry in the dictionary of simple lemmas (DELAS) is: (1) devojcyin,A1+Pos+Ek The information that has to be assigned to ...Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
-
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... ish, Greek, Russian etc. The system of morphological dictionaries is based on the theory of finite-state automata, namely on morphological and local grammars in the form of finite-state transducers that generate all morpho- logical forms of words in the dictionary (Krstev, 2008). 2 Laboratoire d’Automatique ...
... of speech, morphological class, etc.) is in the LexicalEntry table. Inflective class information is in the MorfPat- tern table, while the information about the dictionary to which the lexical entry belongs is in the Lexicon table. For one entry in the Lexicon table, that is one dictionary, one or more ...
... from different morphological dictionar- ies in DELA format. The only difference comparing to Serbian example is that Serbian nouns use morphological class that is written in MorfPattern table. Infotheca Vol. 19, No. 2, December 2019 93 Lazić B., Škorić M., “From DELA based dictionary to . . . ”, ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
-
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... 9Serbian WordNet can be browsed at http://sm. jerteh.rs/. order to obtain inflected forms with grammatical cat- egories we used the English morphological dictionary from the Unitex distribution10 and the MULTEX-East English lexicon.11 Grammatical codes from these two sources were harmonized. 4. In the ...
... ad- ditional result we enriched the Dictionary of Library and Information Sciences with 515 synonyms in the Serbian part. Another by-product is the bilingual Serbian/English list of inflected word forms and MWE pairs derived from bilingual dictionaries and morphological dictionaries. We will apply the same ...
... languages producing many different forms for each lemma. 4.2. Dictionary of Library and Information Science The development of the Dictionary of Librarianship: English-Serbian and Serbian-English (in this text referred to as ‘Dictionary’) (Ljiljana Kovačević, 2014) has started in 2001 at the National ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... models. The SR BASIC annotated dataset will also be published. Keywords: Part-of-Speech tagging, lemmatization, corpus, evaluation, Serbian, morphological dictionary 1. Introduction The task of assigning to each token its Part-of-Speech cat- egory (noun, verb, adjective, etc.) is a common Natural Language ...
... model for Serbian are: (a) Serbian morphological dic- tionaries (Cvetana Krstev, Duško Vitas, 2015) (SMD); (b) pre-annotated texts (Duško Vitas, Cvetana Krstev, Ranka Stanković, Miloš Utvić, 2019). 2.1. Serbian morphological dictionaries Serbian morphological dictionaries represent a rich lexical ...
... especially for the Novels test set. This comes as no surprise, due to the fact that it is a very specific text, which is fully covered by the new dictionary used for the TT19 model. Figure 3: Precision of lemmatization per token, obtained by two TreeTagger based taggers 3959 sentences tokens words ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
-
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).... Analytics The main stages of thesaurus-based document processing include: • Tokenization and lemmatization, that is, the transfer of word forms to dictionary forms (lemmas); • Matching with the thesaurus based on the lemma representation of the document. Multiword terms from a thesaurus are matched with ...Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
-
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis
U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...... without changing the form of words. Extended search includes morphological and semantic search. Morphological search includes search of all inflected forms of specified word that retrieve from SrpMD (Serbian morphological dictionary). For nouns, grammatical forms include case and number for example ...
... metadata search are customized and improved by query expansion via web service relaying on the Serbian morphological dictionary and the Serbian WordNet semantic network for providing morphological and semantic text search expansion. The paper outlines possibilities for further use and analysis on a ...
... components of the language support system. Main lexical resources include morphological dictionaries for Serbian language15, Serbian and English WordNets, terminological databases: Termi, GeolISSTerm, RudOnto and Librarian dictionary. Apart from the grammars in the form finite state automata and transducers ...Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
-
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење... lexis (dialectal, archaic or dated, jargon, etc.), as well as non-standard phonetic, morphological and syntactic forms and types of complements are labelled. 249 Proceedings of eLex 2019 Each dictionary entry contains (or may contain) several subentries (one subentry for each lexical unit) ...
... selection of dictionary examples from corpora, and the presented approach supports the selection of dictionary examples making the process of dictionary development faster and more productive. 1.2 The role of dictionary examples Dictionary examples play an important role in dictionary entries and ...
... paper. 2. SASA Dictionary 2.1 SASA Dictionary retro-digitization The first ideas how to modernize the work on the SASA dictionary came many years ago (Sabo & Vitas, 1989). These ideas were later revitalized and various possibilities for updating the work on this dictionary were considered (Vitas ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... of an article in the Dictionary is presented in Figure 1. The same entry has been taken as an example in Table 1 and Figure 2, which illustrates the result of the parsing process. 3.2 Dictionary markers Beside various semantic, accentual and grammatical (phonetic, morphological and, more recently, ...
... a similar approach was used as in Stanković et al. (2018) for the Serbian morphological electronic dictionary. The main class, in the core of this dic- tionary model, is LexicalEntry, representing a headword of the dictionary article, which encompasses the set of senses that are associated with this ...
... 6 5 6 4 Figure 1: The microstructure of dictionary articles. 4 The transformation from the dictionary article text form to the lexical database The guidelines for dictionary writing were used to defi ne the rules for the segmentation of the dictionary articles, the pattern recognition, and the ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
A WordNet Ontology in Improving Searches of Digital Dialect Dictionary
In this paper, we present a method for automatic generation of a digital resource, which connects all indirect synonyms of a dialect term to all indirect synonyms of a corresponding term in the standard language, aiming to improve the search of a digital dialect dictionary. The method uses SWRL rules defined in the Serbian WordNet ontology to identify sets of synonymous words. It also uses e-dictionaries to produce correct lemmas in standard language that users usually employ in searches. ...... language morphological transformations for lemma generation Extract definitions of verbs in a dialect џ? dictionary, given in standard language о Index inverting Table: dictionary verb @ entry related with equivalent standard language lemma of a verb Table: dictionary verb ...
... concepts of Google Maps; the etymological origin of the words, morphological information like part of speech, and additional semantic data. The content of the dictionary can be shared through social networks. Another important aspect of the dictionary is that it allows Web users to expand and complementit. ...
... digital dialect dictionary by using terms in the standard language. In Section 2 we discuss some previous approaches to searching digital dialect dictionaries. In Section 3 we represent re- sources used to improve searching performances of the digital dialect dictionary: Serbian morphological e-dictionaries ...Miljana Mladenović, Ranka Stanković, Cvetana Krstev. "A WordNet Ontology in Improving Searches of Digital Dialect Dictionary" in New Trends in Databases and Information Systems: ADBIS 2017 Short Papers and Workshops - SW4CH (Semantic Web for Cultural Heritage) 767, Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-67162-8_37
-
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...... The definitions of nouns from the SASA dictionary were analysed using Serbian morphological e-dictionaries and local grammars in the form of Finite-State Transducers (Krstev 2008) and implemented in the Unitex corpus processing suite1. Electronic morphological dictionaries of Serbian intended for automatic ...
... of nouns, which were analysed using Serbian morphological e-dictionaries and local grammars implemented as finite state transducers in an open-source corpus processing suite Unitex. The 21 models developed up to the present moment cover 57% of dictionary definitions, 83% of which were fully recognized ...
... SASA Dictionary The first step in our research was a thorough analysis of various lexical and syntactic features of definitions in the Serbian Academy of Sciences and Arts (SASA) dictionary; this part of a dictionary entry is presented in italics. The definition structure in the SASA dictionary is ...Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
-
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... construction of a morphological dictionary of multi-word units Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Automatic construction of a morphological dictionary of multi-word units ...
... open source software Automatic Construction of a Morphological Dictionary of MWUs 11 distributed under the terms of LGPL, we easily incorporated its modules in LeXimir for many tasks that involve manipulation of e-dictionaries, including dictionary look-up used in the module for (automated) production ...
... open access, as well as the employees' publications. - The Repository is available at: www.dr.rgf.bg.ac.rs Automatic Construction of a Morphological Dictionary of Multi-Word Units Cvetana Krstev1, Ranka Stanković2, Ivan Obradović2, Duško Vitas3, and Miloš Utvić1 1 Faculty of Philology, University ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
-
Possibilities of retro-digitalized German-Serbian Mining Dictionary
U radu će biti prikazan opis procesa retrodigitalizacije dvojezičnog Nemačko-srpskog rudarskog rečnika iz 1923. godine čiji je autor rudarski inženjer Dragutin Stepanović (Степановић, 1923). Ovaj rečnik je zasnovan na skoro 4 000 leksičkih zapisa koji su prevodilački ekvivalenti ili uputnice. Umesto predgovora autor daje uvid u svoje pismo upućeno “Ministru šuma i rudnika” u kome piše o nameri da zabeleži reči koje se koriste u narodu kako bi izbegao upotrebu nemačkih reči. Iako broj odrednica nije toliko veliki, rečnik ...Biljana Lazić, Olivera Kitanović, Ivan Obradović. "Possibilities of retro-digitalized German-Serbian Mining Dictionary" in E-dictionaries and E-lexicography, Zagreb, 10-11 May 2019, Zagreb : Institut za hrvatski jezik i jezikoslovlje (2019)
-
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... e-dictionaries of multi-word units. Development of morphological dictionaries of MWUs is a tedious task, especially in the case of Serbian and other languages featuring complex morphological structures. After realizing that the development of such a dictionary manually is an extremely slow process, we endeavored ...
... Borovetz, Bulgaria, 2009, pp. 23–29. [12] C. Krstev, R. Stanković, I. Obradović, D. Vitas, and M. Utvić, “Auto- matic Construction of a Morphological Dictionary of Multi-Word Units,” in IceTAL. Reykavik, Iceland: Springer, August 2010, pp. 226–237. [13] I. Alegria, O. Ansa, X. Artola, N. Ezeiza, K ...
... can be briefly described in the following way: in a dictionary of lemmas (DELAS) every lemma is described in full detail so that a dictionary of forms containing all necessary grammatical information (DELAF) can be generated from it. The dictionary of forms is used in NLP tasks. Two corpus processing ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
-
Развој геолошког терминолошког речника ГеолИССТерм
... Serbian, one of the ways of improving the searches related to the geology resources on the web would be through the integration of the morphological dictionary of Serbian (vitas et al. 2003) and geolISSTerm by adding inflectional class codes to the terms featured in geolISSTerm. 6. Conclusion ...
... ing the dictionary The data entry interface (Figure 6) displays the structure i.e. organization of concepts and terms already existing in the dictionary on the left-hand side, while the right-hand side of the interface shows the attributes of the selected concept, namely, dictionary entry. given ...
... initiative are planned to be provided. The electronic edition of the dictionary is complemented by the printed version. Keywords. Terminological resources, geology, gIS, geologic Information System, geologic vocabulary, electronic dictionary. INFOtheca, № 1, vol XII, August 201150 rANkA STANkOvIć eT ...Ranka Stanković, Branislav Trivić, Olivera Kitanović, Branislav Blagojević, Velizar Nikolić. "Развој геолошког терминолошког речника ГеолИССТерм" in INFOteka: časopis za informatiku i bibliotekarstvo, Beograd : Zajednica biblioteka univerziteta u Srbiji (2011)
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... domain, as well as general lexica morphological dictionaries. Resource preparation started with dictionary (retro)digitisation and corpora enlargement, followed by adding new Serbian terms to general lexica dictionaries, as well as adding bilingual terms. Dictionary development is relying on corpus analysis ...
... domain, as well as general lexica morphological dictionaries. Resource preparation started with dictionary (retro)digitisation and corpora enlargement, followed by adding new Serbian terms to general lexica dictionaries, as well as adding bilingual terms. Dictionary development is relying on corpus analysis ...
... corpora, adding domain terms to general purpose morphological e-dictionaries and extraction of bilingual lists. The process of terminology compilation, from the perspective of monolingual and bilingual extraction, a well as the web and mobile form of the dictionary are given in Section 4. The last section ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... paraphrases broader The sense in the first dictionary completely covers the meaning of the sense in the second dictionary and is applicable to further meanings narrower The sense in the first dictionary is entirely covered by the sense of the second dictionary, which is applicable to further meanings ...
... version of Web- ster’s dictionary from 19139. Estonian We used the EKS Dictionary of Estonian and the PSV Basic Estonian Dictionary (Kallas et al., 2014). German We used the German versions of OmegaWiki10 and Wiktionary11. Hungarian We linked the Explanatory Dictionary of Hun- garian (1959-1962)12 ...
... 4,500 DDO lemmas (of 97,500 in the dictionary). The lemma intersection (86%) with ODS was selected for our task. Dutch We used the Woordenboek der Nederlandsche Taal (Dictionary of the Dutch Language, WNT) 6 and the Algemeen Nederlands Woordenboek (Dictionary of Contemporary Dutch, ANW)7. The Dutch ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
-
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... designed and implemented for the purpose of further development and management of morphological electronic dictionaries of Serbian (SMD), presented in more details in Section 3.. However, with the growing number of dictionary developers, and given the va- riety of dictionaries and information stored in ...
... lexically-based corpus processing suite that offers strong support for finite-state processing using morphological dic- tionaries –http://unitexgramlab.org/ Figure 1: Data categories (markers) dictionary. The main class of the core of the lexicon model is the class LexicalEntry, representing a unit ...
... DELA format: in the dictionary of lemmas each lemma is described in full detail, so that the dictionary of forms containing all nec- essary grammatical information can be generated from it, and subsequently used in various NLP tasks (Courtois and Silberztein, 1990). A dictionary of lemmas can con- ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...... possible to use the Serbian morphological dictionary. Serbian morphological dictionaries include semantic markers which allow the distinction between ijekavian, ekavian and ikavian pronunciation. Dictionaries cover both general lexica and proper names. Serbian morphological dictionaries are found in ...
... noun and human entity. According to data from 2014, Serbian morphological dictionary of simple words consists of 133,361 lemmas. Their production is 4,581,657word forms. The number of units covered by Serbian morphological dictionary of compounds is 13,717, or 262,686 word forms [7]. RudOnto and ...
... org/rest/dc/4024 http://www.macmillandictionary.com/dictionary/british/terminology http://www.macmillandictionary.com/dictionary/british/terminology http://www.macmillandictionary.com/dictionary/british/terminology http://www.macmillandictionary.com/dictionary/british/terminology http://www.isocat.org/rest/dc/4024 ...Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
-
An Approach to Development of Bilingual Lexical Resources
... resources, we had at our disposal Serbian morphological e-dictionaries [Krstev, 2008], Serbian and English wordnets (SrpWN and EWN), and a bilingual Serbian-English Dictionary of Library and Information Science technology (further referred to as Dictionary of Librarianship) [Kovačević et al., 2004] ...
... bilingual lexical resource. The approach relies on already available resources, Serbian morphological e-dictionaries, Serbian and English wordnets connected via the interlingual index, and a bilingual Dictionary of Librarianship, as well as on a TMX document collection generated from aligned Ser ...
... Serbian and English wordnets and the bilingual dictionary of Librarianship. The user formulates the initial query as one or more keywords (simple or multiword). If the user so specifies, Bibliša forwards this query for further morphological and semantic expansion, This is essentially handled ...Stanković Ranka, Obradović Ivan, Trtovac Aleksandra. "An Approach to Development of Bilingual Lexical Resources" in Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012, Novi Sad : BCI (2012)
-
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... University of Belgrade, Belgrade (2008) 7. Krstev, C., Stanković, R., Obradović, I., Vitas, D., Utvić, M.: Automatic Construction of a Morphological Dictionary of Multi-Word Units. In: IceTAL, pp. 226–237. Springer, Reykavik, Iceland (2010) 8. Krstev, C., Stanković, R., Vitas, D., Koeva, S.: E- ...
... resources in any part of the system, wherever they are needed. Thus, for example, morphological dic- tionaries can be used for adding additional morphological information to wordnet synsets, whereas both morphological dictionaries and the wordnet can be used in production of concordances for aligned ...
... been produced for many other languages. This format can be briefly described in the following way: in a dictionary of lem- mas (DELAS) every lemma is described in full detail so that a dictionary of forms containing all necessary grammatical information (DELAF) can be generated from it, and subsequently ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6