Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

78 items

Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Branislava Šandrih, Cvetana Krstev, Ranka Stanković (2019)

In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...

NER, Named Entity Recognition Systems, Serbian, Personal Names

... of personal names (approximately one third of num- ber of personal names in our test sets). Jiang et al. (2016) compared 4 NER systems, two of which were STANFORD NER and SPACY NER, for English. Their test set consisting of Wiki articles contained approximately the same number of personal names as ...
... time (moments and periods), money ex- pressions, measurement expressions, geopolitical names (countries, settlements, oronyms and hy- dronyms), and personal names (one or more last names with or without first names and nicknames). The presented evaluation results for the recogni- tion of all mentioned ...
... Serbian - The Case of Personal Names Branislava Šandrih, Cvetana Krstev, Ranka Stanković Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names | Branislava Šandrih ...
Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
Serbian NER&Beyond: The Archaic and the Modern Intertwinned

Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić (2021)

U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History” CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje ...

... dates and time (moments and periods), money and mea- surement expressions, geopolitical names (co- untries, settlements, oronyms and hydronyms), and personal names (one or more last names with or without first names and nicknames). The system was developed to recognize NEs in 1253 newspapers and similar ...
... that this tool supports. 1256 Entity Explanation PERS Personal names First names, surnames, nicknames and their combinations (of real people and fictional characters, including gods and saints). Possessive adjectives from personal names should not be annotated. ROLE Occupations and titles Occupations ...
... literary texts. The enhanced version of SrpNER was la- ter utilized by Šandrih et al. (2019) for the preparation of a gold standard annotated with personal names, which was used for building training sets for 4 different levels of annota- tion, on which two ML-based NE recognizers were trained and evaluated ...
Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
A Lexical Approach to Acronyms and their Definitions

Cvetana Krstev, Duško Vitas, Ranka Stanković (2015)

In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.

... agencija za telekomunikacije ‘Re- public Agency for Telecommunications’), sometimes let- Figure 1: A Many-to-many relation between an entity names and acronyms. Names and acronyms given in italic are possibilities that are not realized for the given example. ters that are not initial are used, e.g. for ...
... Fraktion’. In respect to the orthography, acronyms differ from other words in a text. Namely, they can be common names, e.g. EKG – elektro-kardiogram ‘electrocardio- gram’, or proper names, e.g. MOK – Medjunarodni olimpi- jski komitet ‘International Olympic Committee’ but the distinction cannot be ...
... Zoran Djindjić represents Democratic Party, not that DS is an acronym for Zoran Djindjić. 3. Lemmatizing the MWU names from the list obtained in Step 2 in order to obtain names in a dictionary form, normally in the singu- lar, nominative case, sometimes in the plural. (3) KFOR - Medjunarodna ...
Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology

Mihailo Škorić, Mauro Dragoni (2019)

This paper is a result of a task that was presented to attendants of Keyword Search in Big Linked Data summer school, that was organized by Vienna University of Technology, under the Keystone COST action in the summer of 2017. It presents a specific approach to the classification via creation of minimal document surrogates based on the US National medical library’s MeSH ontology, which is derived from the Medical Subject Headings thesaurus. In a series of previously classified medically ...

document classification, MeSH, ontology, information extraction

... syllables describing the hierarchy (Figure 4). The data is used in a subsequent query that aims to derive all the names of the concepts and their meshv:treeNumber values. The concept names are derived from the rdfs:label, followed by the mesh:treeNumber of the same concept. The returned concepts are sorted ...
... level of ontology) were reduced to a single term – their name. On the other hand, the documents were reduced only to the occurrences of terms (concept names from MeSH ontology) that, with 7 IR notes (on-line) 58 Infotheca Vol. 19, No. 1, September 2019 Scientific paper the simple mapping (stored in ...
... , “Medical document classification...”, pp. 55–69 2.1 Extraction of taxonomy of concepts from MeSH ontology Extracting the matrix of the concept names and their identifiers in the clas- sification tree is done using another SPARQL query. Since in this ontology there are triples consisting of the concept ...
Mihailo Škorić, Mauro Dragoni. "Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology" in Infotheca, Faculty of Philology, University of Belgrade (2019). https://doi.org/10.18485/infotheca.2019.19.1.3
E-Connecting Balkan Languages

Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva (2009)

In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. This tool relies for its work on several sophisticated textual and lexical resources that were developed for most of Balkan languages. These resources are based on several de facto standards in natural language processing.

Query expansion, e-dictionary, wordnet, proper name, aligned text

... simple proper names [11]. The Bulgarian Grammar dictionary (DELAS dictionary) consists of 127,000 lemmas distributed as follows: app. 85,000 simple lemmas belong to general lexis, app. 6,000 lemmas represent domain specific lexis and app. 36,000 lemmas are simple proper names. The corresponding ...
... aim of appropriately processing proper names in natural language applications [16]. This work has been pursued by development of a Serbian version, which finally led to the design and construction of a relational multilingual dictionary of Proper Names, Prolexbase, in a form of relational database ...
... resources are based on several de facto standards in natural language processing. Keywords Query expansion, e-dictionaries, wordnets, proper names, aligned texts 1. Introduction The software tool WS4LR (shortened for WorkStation for Language Resources) is being developed by the Language ...
Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva. "E-Connecting Balkan Languages" in Proceedings of the Workshop Workshop on Multilingual resources, technologies and evaluation for Central and Eastern European Languages, 17 September 2009, eds. C. Vertan, S. Piperidis, E. Paskaleva and Milena Slavcheva, Borovets, Bulgaria : Association for Computational Linguistics Stroudsburg, PA, USA (2009)
An Approach to Efficient Processing of Multi-Word Units

Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas (2013)

Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...

Natural Language Processing, Grammatical Category, Lexical Representation, MWU, multi-word unit

... belonging to specific library and information science terminology. In addition to that, we used a smaller set of 152 MWU proper names, mostly geographic names and event names. As in the case of the first evaluation the results varied depending on the type of data used: for the first data set of general ...
... Hence, we will refrain from a general conclusion and just point out that in the case of relatively comparable sets of geographic names from the first evaluation and proper names from the second, a considerable improvement was reached beyond doubt. In the second evaluation we also looked at the relation ...
... easily described using Multiflex graphs [18]. The Multiflex system is incorporated into Unitex, but it was also successfully used for Polish proper names in another environment [19]. For the inflection of Serbian MWUs 104 such transducers were developed — 18 for adjectives and 86 for nouns. By analogy ...
Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
SrpELTeC: A Serbian Literary Corpus for Distant Reading

Ranka Stanković, Cvetana Krstev, Duško Vitas (2024)

U članku je predstavljen SrpELTeC, korpus razvijen u okviru akcije COST Distant Reading for European Literary History (CA16204). Svi romani u SrpELTeC-u su odabrani, pripremljeni i obeleženi korišćenjem zajedničkih principa uspostavljenih za sve jezičke zbirke u Evropskoj zbirci književnog teksta (ELTeC). Navedeni su izazovi i rešenja u pripremi SrpELTeC od nule. Svi romani su ručno kodirani u TEI sa bogatim metapodacima i strukturnim napomenama. Automatska anotacija je uključivala POS-označavanje, lematizaciju i imenovane entitete, oslanjajući se na resurse za obradu ...

digital humanities, Serbian literature, text corpora, distant reading , linked data, named entity recognition, text analytics

Ranka Stanković, Cvetana Krstev, Duško Vitas. "SrpELTeC: A Serbian Literary Corpus for Distant Reading" in Primerjalna književnost, Research Centre of the Slovenian Academy of Sciences and Arts (2024). https://doi.org/10.3986/pkn.v47.i2.03
On the compatibility of lexical resources for NooJ

Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović (2012)

Lexical resources for many languages are provided for the NooJ linguistic development environment. Meta-data descriptions of morphosyntactic and semantic properties of these languages and their resources are a mandatory part of each language module. In this paper we analyze how well the meta-data actually describe resources for a chosen subset of languages and to what extent are they compatible across languages to support multilingual processing. We show that there is place for improvement in both directions.

... that the text contains nouns that belong to these semantic categories. In French, N+PRENOM extracts both given names and family names, although the code used suggests family names only. In addition to that, PRENOM as a category does not appear in the *.def file, whereas the category PR, which ...
... y applied to appropriate texts, and the results are shown in Table 5. There are considerable differences, as for example in the case of proper names, which range from 122 in Croatian to 1364 in Serbian. However, these data are also not fully comparable, as semantic categories in Serbian are ...
... number of recognized “states” in the French and Serbian texts (Table 5). On the other hand, inhabitants of states and cities are not marked as proper names in the French dictionary (e.g. Anglais,N+z1+m+s) as opposed to the Serbian dictionary (e.g. Englez,N+m+s+1+v+Hum+NProp+Top+Inh). Another interesting ...
Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović. "On the compatibility of lexical resources for NooJ" in Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the 2011 International Nooj Conference, Cambridge Scholars Publishing (2012): 96-108
The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines

Krstev Cvetana, Stanković Ranka, Vitas Duško, Obradović Ivan (2008)

In this paper we present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for tuning queries before submitting them to a web search engine. We argue that the selection of words chosen for a query, which are of paramount importance for the quality of results obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and wordnets. These dictionaries enable semantic ...

LR web services, MultiWord Expressions & Collocations, Information Extraction, Information Retrieval

... 1998). 4. In a similar way queries can be expanded by Prolex, a multilingual database of proper names which represents the implementation of an elaborate four-layered ontology of proper names (Krstev, et al., 2005) organized around a conceptual proper name that represents the same concept ...
... Mars, etc. if meronymy is used for query expansion. 4. The expansion of proper names using Prolex which offers to the user the option of adding proper name aliases, its synonyms, but also other proper names which are semantically related to the initial proper name through holonym and meronym ...
... lexical words. More than 85,000 simple lemmas belong to general lexica, while the remaining 32,000 lemmas represent various kinds of simple proper names. The Serbian morphological dictionary of compounds contains approximately 2,700 lemmas (yielding more than 60,000 different forms) and it is being ...
Krstev Cvetana, Stanković Ranka, Vitas Duško, Obradović Ivan. "The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines" in LREC 2008: Conference on Language Resources and Evaluation, Marrakesh, Morocco, May 2008, European Language Resources Association (ELRA) (2008)
Automatic construction of a morphological dictionary of multi-word units

Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić (2010)

The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...

electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion

... words from this procedure, for instance, dictionaries of personal names, in order to alleviate similar problems. However, that might not be such a good idea: these very dictionaries successfully processed many items, among them three very specific names of small towns in Serbia named after famous Serbian ...
... not in the dictionary of simple words (in 255 cases, or 80% of all failures). The latter case occurred frequently due to MWUs represent- ing proper names, where components are often not words in Serbian, e.g. Dar es Salam, or due to the fact that some words are used only in MWUs (like domali in domali ...
... simply because we have not collected enough new adjectives. Our list of new MWU nouns came from several different sources: the official list of MWU names of settlements in Serbia (236), MWUs extracted from a log file of a Serbian professional journal that deals with economic issues (162), from Verne’s ...
Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović (2017)

Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...

... system of Serbian e-dictionaries covers both general lexica and proper names and all inflected forms are generated from 135,000 simple forms and 13,000 MWU lem- mas. Approximately 28.5% of these lemmas represent proper names: personal, geopolitical, organizational, etc. Named Entity Recognition. According ...
... represents one document from our collection in which recognized NEs are highlighted—toponyms are underlined, personal names (with roles) are underlined with a double line, organi- zation names are framed. Determination of weights for terms within the indexes of a document is a complex process and there ...
... handcrafted rule-based system that relies on comprehensive lexical resources for Serbian. For recognition of some types of named entities, e.g. personal names and locations, e-dictionaries and information within them is crucial; for others, like temporal expressions, local grammars in the form of FSTs ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map

Olja Perišić, Ranka Stanković, Milica Ikonić Nešić, Mihailo Škorić (2023)

The paper will present the results of the project `“It-Sr-NER: Web services for named entities recognition, linking and mapping,” in which teams from the University of Turin and the Society for Language Resources and Technologies JeRTeh participated, and whose goal was the development of the It-Sr-NER web service for named entity annotations in the text and displaying them on the map. Named entities in these services are names of persons, places, organizations, demonyms (ethnicities), events and works of art.

General Engineering

Olja Perišić, Ranka Stanković, Milica Ikonić Nešić, Mihailo Škorić. "It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map" in Infotheca, Belgrade : Faculty of Philology, University of Belgrade (2023). https://doi.org/10.18485/infotheca.2023.23.1.3
Improvement of geodatabase queries within GeolISS

Ranka Stanković (2008)

... cross-language information retrieval. For expansion of queries with proper names WS4LR is using Prolex, a multilingual database of proper names which represents the implementation of an elaborate four-layered ontology of proper names [12] organized around a conceptual proper name that represents the ...
... Romanian Academy, vol. 7, No. 1–2, pp. 147–161, (2004) [12] Krstev, C., Vitas, D., Maurel, D., Tran, M. (2005). “Multilingual Ontology of Proper Names”. In Proc. of Second Language & Technology Conference, Poznań, Poland, April 21–23, Wydawnictwo Poznań- skie Sp. z o.o, Poznań... [13] TMX 1.4b ...
... significantly influenced by the Ontology Web Language (OWL). GeolISS is implemented using ESRI ArcGIS technology [5], and designed to function as a personal geodatabase and SDE enterprise geodatabase on MS SQL server 2000. The logical framework of GeolISS implementation is based on five packages of ...
Ranka Stanković. "Improvement of geodatabase queries within GeolISS" in Review of the National Center for Digitization, Beograd : Faculty of Mathematics, Belgrade (2008)
Towards Automatic Definition Extraction for Serbian

Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić (2021)

U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...

... contain automatically generated word forms of more than 200,000 lemmas2, and their content covers both general vocabulary and proper names - personal, geopolitical, names of organizations and the like. Moreover, the dictionary contains multi-word units, which are recorded in traditional dictionaries ...
... the basic word (“diminutive of …”). These definitions are easy to model. The second group consists of “definitions” of some types of proper names, e.g. names of holidays, saints, monasteries, etc. These definitions are similar to the explanations given in encyclopaedias, they are expressed freely, ...
... tured texts. In addition to the basic concept (Term) and its main definitions (Definition), sentence segments containing pseudonyms or additional names (Alias Term) are also annotated and associated with the basic term. Likewise, noun phrases (Referential Term) that refer to the previously marked term ...
Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking

Ranka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović (2024)

U radu se prikazuju rezultati istraživanja vezanih za pripremu paralelnih korpusa, fokusirajući se na transformaciju u RDF grafove koristeći NLP Interchange Format (NIF) za lingvističku anotaciju. Pružamo pregled paralelnog korpusa koji je korišćen u ovom studijskom slučaju, kao i proces označavanja delova govora, lematizacije i prepoznavanja imenovanih entiteta (NER). Zatim opisujemo povezivanje imenovanih entiteta (NEL), konverziju podataka u RDF, i uključivanje NIF anotacija. Proizvedene NIF datoteke su evaluirane kroz istraživanje triplestore-a korišćenjem SPARQL upita. Na kraju, razmatra se povezivanje Linked ...

paralelni korpusi, povezivanje imenovanih entiteta, prepoznavanje imenovanih entiteta, NER, NEL, povezani podaci, NIF, Vikipodaci

Ranka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović. "Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
Indexing of textual databases based on lexical resources: A case study for Serbian

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović (2015)

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...

... system of Serbian e-dictionaries covers both general lexica and proper names and all inflected forms are generated from 135,000 simple forms and 13,000 MWU lem- mas. Approximately 28.5% of these lemmas represent proper names: personal, geopolitical, organizational, etc. Another lexical resource that is ...
... relies on comprehensive lexical resources for Serbian described in the previous subsec- tion. For recognition of some types of named entities, e.g. personal names and locations, e-dictionaries and information within them is crucial; for others, like temporal expressions, local grammars in the form of FSTs ...
... average, 4 NEs of all types were recognized per document, with as many as 47 NEs for one of them. For indexing we used only three top level types: personal names, locations and organizations and their distribution is presented in Table 1. Table 1. Distribution of three top-level NEs: persons, locations ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
Rule-based Automatic Multi-word Term Extraction and Lemmatization

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac (2016)

In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...

term extraction, terminology, multi-word units, lemmatization, finite-state transducers

... because linguistic resources and tools they used were underdeveloped. In (Małyszko et al., 2015) authors lemmatize multiword entity names (organization names and similar named entities found in a corpus of legislative acts) by using rules generated on the basis of corpora analysis. For tackling ...
... lemmatized MWT, that is, a MWT in the form of a dictionary head-word. The problem of lemmatization of special kind of MWUs, 507 person names, was tackled for Polish (Piskorski et al., 2007). The authors used several statistical approaches that outperformed the approach relying on heuristics ...
... structure and inflectional and other properties (omission of a constituent, reverse order, exchangeability of constituent separators, etc.). Class names correspond to FSTs used for inflection of MWUs belonging to that class. For example, MWUs composed of an adjective (A) followed by a noun (N) ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities

Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov (2024)

Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...

полилексемске језинице, именовани ентитет, вишезначност значења речи, складиште смисла, LLOD

Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
Production of morphological dictionaries of multi-word units using a multipurpose tool

Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas (2011)

The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...

electronic dictionary, Serbian, morphology, inﬂection, multi-word units, noun phrases, query expansion

... easily described using Multiflex graphs [8]. The Multiflex system is incorporated into Unitex, but it was also successfully used for Polish proper names in another environment [9]. By analogy with entries in a dictionary of simple word lemmas, an entry in a DELAC dictionary consists of a MWU lemma ...
... (including a separator), with some additional digits and letters added to differentiate transducers. This is illustrated in Table I by four classes (names of inflectional transducers) all belong- ing to the same AXN super-class and used for the inflection of MWUs consisting of an adjective followed by ...
... First we removed all MWUs that already existed in DELAC which resulted in a list of approximately 1000 MWUs. We separated the list into proper names or toponyms (about 20%) and common nouns (about 80%). The rationale for such an approach was the fact, indicated by the analysis of the first set ...
Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
Regional Slope Stability Analysis in Landslide Hazard Assessment Context, North Macedonia Example

Miloš Marjanović, Biljana Abolmasov, Igor Peshevski, James Reeves, Irena Georgievska (2020)

Miloš Marjanović, Biljana Abolmasov, Igor Peshevski, James Reeves, Irena Georgievska. "Regional Slope Stability Analysis in Landslide Hazard Assessment Context, North Macedonia Example" in Understanding and Reducing Landslide Disaster Risk, Springer International Publishing (2020). https://doi.org/10.1007/978-3-030-60227-7_29

Претрага

78 items

Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names cite

Serbian NER&Beyond: The Archaic and the Modern Intertwinned cite

A Lexical Approach to Acronyms and their Definitions cite

Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology cite

E-Connecting Balkan Languages cite

An Approach to Efficient Processing of Multi-Word Units cite

SrpELTeC: A Serbian Literary Corpus for Distant Reading cite

On the compatibility of lexical resources for NooJ cite

The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines cite

Automatic construction of a morphological dictionary of multi-word units cite

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources cite

It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map cite

Improvement of geodatabase queries within GeolISS cite

Towards Automatic Definition Extraction for Serbian cite

Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking cite

Indexing of textual databases based on lexical resources: A case study for Serbian cite

Rule-based Automatic Multi-word Term Extraction and Lemmatization cite

Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities cite

Production of morphological dictionaries of multi-word units using a multipurpose tool cite

Regional Slope Stability Analysis in Landslide Hazard Assessment Context, North Macedonia Example cite

Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Serbian NER&Beyond: The Archaic and the Modern Intertwinned

A Lexical Approach to Acronyms and their Definitions

Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology

E-Connecting Balkan Languages

An Approach to Efficient Processing of Multi-Word Units

SrpELTeC: A Serbian Literary Corpus for Distant Reading

On the compatibility of lexical resources for NooJ

The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines

Automatic construction of a morphological dictionary of multi-word units

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map

Improvement of geodatabase queries within GeolISS

Towards Automatic Definition Extraction for Serbian

Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking

Indexing of textual databases based on lexical resources: A case study for Serbian

Rule-based Automatic Multi-word Term Extraction and Lemmatization

Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities

Production of morphological dictionaries of multi-word units using a multipurpose tool

Regional Slope Stability Analysis in Landslide Hazard Assessment Context, North Macedonia Example