Претрага
219 items
-
Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology
Mihailo Škorić, Mauro Dragoni (2019)This paper is a result of a task that was presented to attendants of Keyword Search in Big Linked Data summer school, that was organized by Vienna University of Technology, under the Keystone COST action in the summer of 2017. It presents a specific approach to the classification via creation of minimal document surrogates based on the US National medical library’s MeSH ontology, which is derived from the Medical Subject Headings thesaurus. In a series of previously classified medically ...... Training set to build explicit models for each MeSH concept (‘Concept- oriented’ classifiers); – Manually created document annotations, like ordinary text classifiers, to determine the appropriate concept (‘K-Nearest Neighbor’ classifier); – Hybrid and hand-refined systems that combine multiple approaches ...
... stored in the MeSH ontology. The goal was to create a classifier that would be quick and simple, in order to solve the problem of the large amount of text that needed to be classified. A drastic summarization of documents and the classes themselves was applied. Classes (concepts of the second level of ...
... occurs most often in it, thus avoiding a large amount of computation and reducing the task to finding the most frequent term in the surrogate of the text. 2 Experiment setting The aim of the experiment was to test the possibility and success of clas- sification of medical documents based on taxonomy ...Mihailo Škorić, Mauro Dragoni. "Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology" in Infotheca, Faculty of Philology, University of Belgrade (2019). https://doi.org/10.18485/infotheca.2019.19.1.3
-
Serbian NER&Beyond: The Archaic and the Modern Intertwinned
U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History” CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje ...... evaluation. Web users can naviga- te to http://ner.jerteh.rs/ in order to apply the SrpCNNER model directly on input text. The model can also be applied to a custom- size collection of text files using the previously mentioned NER&Beyond web platform. story), https://zenodo.org/communities/eltec 7 SrpELTeC ...
... entity, so the evaluators were asked to identify and anno- tate them when they occur in text. SrpNER does not recognize WORK entity either, but these annotations were in many cases added by volunteer readers during text correction. Afterwards, students were given different no- vel chapters along with the ...
... distribution of different en- tity types over SrpELTeC-gold novels. The first four digits of text identifiers represent the year of the first publication of a novel. For some novels, NER was not performed on the whole text, but rather on randomly selected chapters. These annotated samples were also included ...Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... source language part of the aligned input corpus; 3. The extraction of the set of MWTs in the target language by Serb-TE (Input iii) was done: (a) on the target language part of the aligned chunks (chunk); (b) on the target language part of the aligned input sentences (text). Infotheca Vol. 19, No. 2 ...
... steps; C Number of distinct, lemmatised Serbian MWTs extracted from the target language part of the aligned chunks (for chunk) or from the target language part of the aligned input corpus (for text). Table 1. Numerical data that describes the results of the term extraction system Experiment A B C ...
... need to examine several settings of the experiment, which are conducted and discussed in the later text. The proposed approach is based on the following hypothesis: On the basis of bilingual, aligned, domain-specific textual re- sources, a terminological list and/or a term extraction tool in a source ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... texts used in this research are shown in Table 2. The text 1984, Serbian translation of Orwell’s novel, was anno- tated according to the MULTEXT-East specification and in- cluded in MULTEXT-East resources (version 3) (Krstev et al., 2004). The text Verne, Serbian translation of the novel Around the ...
... on four different manually an- notated set of texts. Test set was compiled of 10% of each text used for training, and it can give a rough idea on how models perform when tagging similar, already familiar text. Verne, History and Novels represent texts previously un- known to the taggers and show their ...
... result when tagging unfamiliar text. Although TreeTagger TT19 seems to have better overall results, the performance of both tag- Figure 1: Part-of-Speech tagging accuracy per token on test sets, for each of trained models gers drops significantly when tagging unknown text. Figure 2: nPoS-tagging accuracy ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... resources for linguistic text pro- cessing; 2.5 Repeated linguistic preprocessing with ex- panded dictionaries for verification of recognition of new lemmas. 3. MWUs extraction 3.1. Application of syntactic graphs to extract MWUs with different syntactic structures from the same text (detailed description ...
... bager kašikar (case 6, NXN) is detected in the analyzed text in the genitive case bagera kašikara it may be erroneously in- terpreted as a MWU of a form NNg (case 3) in the genitive case. Consequently, all NNg con- structions in an analyzed text that appear in the genitive case (which happens very ...
... domains in terminological dictionaries using lexical resources and local grammars in our approach are: 1. Linguistic preprocessing of the input plain text file from the chosen domain using Unitex. 2. Analysis of unrecognized words as the most probable source of terminology and expand- ing the dictionary ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
An Integrated Environment for Management and Exploitation of Linguistic Resources
Ranka Stanković, Ivan Obradović (2009)... in a given text, with the possibility of adding hypernym literals. D. Aligned texts WS4LR contains a module for processing of parallel texts which have previously been aligned using the text align- ment tool XAlign. The module enables the transformation of texts aligned by XAlign into ...
... glasshouse” from the corresponding synsets in English wordnet were included in query. B. Aligned text search When a bilingual query is applied to an aligned text, WS4QE generates a filtered aligned document in TMX for- mat. Namely, based on the expansion of the query, which can be mo ...
... are ex- tracted from aligned text and inserted in the filtered docu- ment. As we have already mentioned, documents in differ- ent formats, such as XML, TXT and HTML, can subse- quently be generated from the TMX document filtered in this way. Fig. 7 Aligned segments with highlighted ...Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
-
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... linguists. is is a very long and there- fore costly process. 64 Statistical Machine Translation Source Text Target Text Text Analysis (Formatting, Morphology, Syntax, etc.) Text Generation Translation Rules 10: Machine translation (left: statistical; right: rule-based) In the late 1980s ...
... behind the scenes of larger software systems. Text summarisation and text generation are two bor- derline areas that can act either as standalone applica- tions or play a supporting role. Summarisation attempts to give the essentials of a long text in a short form, and is one of the features available ...
... of sentence extraction, and the text is reduced to a subset of its sentences. An alternative approach, for which some research has been carried out, is to generate brand new sentences that do not exist in the source text. is requires a deeper un- derstanding of the text, which means that so far this ap- ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
-
Part of Speech Tagging for Serbian language using Natural Language Toolkit
Ranka Stanković, Boro Milovanović (2020)Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...... of tagger models packaged in NLTK that can be trained. Every tagger has an evaluation procedure that strips down the tags from the given text, tags the text with the newly created tagger and reports the accuracy on all tokens. This measure will be used for comparing different taggers. The simplest ...
... 83 90.51 86.95 Training Time 1143s 1343s 3074s Useful tagger model is one which generalizes well to the text from the other domains. That’s why we tested our best taggers on the text that stayed out of the training and validation phases. Results can be seen in Figure 3. Fig. 3. Accuracy ...
... performed later in the pipeline. One basic task is PoS (Part of Speech) tagging, a process of assigning a part of speech category to each token in the text. The program that performs tagging is called tagger. The taggers can be created in multiple ways. In this paper, we will create a tagger for Serbian ...Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)
-
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...... A finite state transducer “passes” through the text it analyses to compare a text chunk with the model it represents. In the case of successful recognition, a final state transducer produces some result, which can be a modification of the source text by adding tags for types of recognized 1 Un ...
... result of OCR errors that remained in the text, but we are working on correcting them.3 4.2 Recognition of Candidates in the Textbook Corpus To determine whether it is possible to recognize definitions of domain-specific terms in the domain corpus text, a subset of local grammars presented in Section ...
... definition extraction in free-and semi-structured text. In Proceedings of the 13th Linguistic Annotation Workshop, 2019, pp. 124–131. Stanković, R., Stijović, R., Vitas, D., Krstev, C. & Sabo O. (2018). The Dictionary of the Serbian Academy: from the Text to the Lexical Database. In: Proceedings of the ...Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... Management of aligned parallel texts Parallel texts, which usually originate from a text in one language and its translation in another, are often aligned at a certain level (paragraph, sentence, etc) by matching the corresponding segments of the original and its translation. Aligned parallel texts ...
... candidate words for a synset by searching aligned texts with words from the original PWN synset and words he/she has already selected for the target synset. Then, if a highlighted word found in the text in English does not have a highlighted match in the text in the target language, the lexicographer ...
... WS4LR module for management of aligned parallel texts uses texts which have previously been aligned using Xalign as an alignment tool [3]. The module converts these texts to the Translation Memory eXchange (TMX) format, which is becoming the standard format for aligned texts. Figure 4 depicts the ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
-
GIS Application Improvement with Multilingual Lexical and Terminological Resources
... placement of descriptive text, or label, onto or next to features on a map is known as labelling. In ArcGIS, it refers specifically to the process of automatically generating and placing descriptive text for map features. A label in ArcGIS is dynamically placed and its text string is derived from ...
... al., 2008). Concept represents the core of GeolISS, and is implemented as an aggregation of geological vocabularies, collections of terms and text definitions of domain objects or collections of possible values for properties. Terms in the vocabularies are used to classify observations/i ...
... GeolISS. GeolISSTerm represents the core of GeolISS, and it is implemented as an aggregation of geological vocabularies, collections of terms and text definitions of things thought to exist in a domain or collections of possible values for properties. The terms in the vocabularies are used to ...Ranka Stanković, Ivan Obradović, Olivera Kitanović. "GIS Application Improvement with Multilingual Lexical and Terminological Resources" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta, May 2010, Valetta, Malta : European Language Resources Association (2010)
-
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење... corpus of contemporary Serbian (Vitas & Krstev, 2012; Utvić, 2014) and Serbian ELTeC Collection9. It consists of several text collections of different types, which reflect text variability. For the first collection with contemporary novels (labelled CN), the sentences were extracted from seven novels ...
... digitized volumes was reported in Stijović and Stanković (2017). Dictionary entries from five volumes were automatically parsed and stored as a structured text in a lexical database, which offers the opportunity to use this data for extraction of different kinds of knowledge, as well as knowledge about examples ...
... entry was produced, and a lexical database model was developed (Stanković et al., 2018). The conversion of the SASA dictionary from unstructured text into a lexical database consisted of a thorough analysis of formatting conventions that were used for typesetting dictionary entries, as well as ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... place with very little human intervention, starting from the tokenization and lexical analysis of a raw text up to production of dictionary entries. The system relies Unitex routines for text analysis and FST application, while one of the many functionalities of LeXimir is used to produce dictionary ...
... English and Chinese corpora is described in (Pantel&Lin, 2001), while Chen and his associates present a MWT extraction system based on co-related text-segments within a set of documents (Chen et al., 2006). Statistical measures of co-occurrence (MI3 – mutual information) were used for finding ...
... for the evaluation, without deleting any candidate lemmas from it. In general, the longest match for the MWU is looked for. For example, if a text sequence matches the AXAXN pattern (a noun preceded by two adjectives that agree with it in gender, number, case and animateness), then a lower ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
-
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... dictionary to . . . ”, pp. 81–98 of terms, the extraction of time expressions and advanced search of text repositories and libraries. The morphological dictionaries were developed in the DELA text format (fr. Dictionnaires électroniques du LADL2 ) which will be discussed in Sec- tion 2.1. As the ...
... and to make them in- teroperable and reusable. Three standards for lexical information have been considered: Guidelines for Electronic Text Encoding and Interchange, Text Encoding Initiative (TEI)3, Lexical Markup Framework (LMF)4 and the Lemon model5. Although Chapter 9 of the TEI Guidelines addresses ...
... DOI 10.18485/infotheca.2019.19.2.4 ABSTRACT: In this paper, we will present an approach for transforming morphological dictionaries from a DELA text format to a lex- ical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
-
Vebran Web Services for Corpus Query Expansion
Ranka Stanković, Miloš Utvić (2020)U ovom radu se govori o razvoju veb usluga Vebran i njihovoj primeni u poboljšanju pretraživanja korpusa. Veb-servisi Vebran koriste se za konsultovanje spoljnih leksičkih izvora za srpski jezik (uglavnom elektronski morfološki rečnici i srpski Vordnet) i proširivanje korisničkih upita radi dobijanja relevantnijih rezultata iz srpskih korpusa.... subset SrpLemKor2; – SrpEngKor3, aligned English-Serbian corpus including subcorpus SELFEH (Serbian-English Law Finance Education and Health) with documents on finance, health, law and education; – SrpFranKor4, aligned French-Serbian corpus; – SrpNemKor5, aligned German-Serbian corpus; – RudKor6, a ...
... concerning the functional style to which the text belongs, as well as an indicator whether corpus text is written in Serbian or represents a translation from another language. The SrpKor2013 is not structurally annotated, although some or all lev- els of the text structure (section, title, paragraph, sentence) ...
... sentence) are annotated in some particular corpus texts, especially those which are part of aligned corpora. The SrpKor2013 corpus is used by more than 700 users, mostly Slavists. 2.2 RudKor Systematic collection and preparation of texts from the mining domain started with English-Serbian alignment ...Ranka Stanković, Miloš Utvić. "Vebran Web Services for Corpus Query Expansion" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.5
-
Improvement of geodatabase queries within GeolISS
Ranka Stanković (2008)... handles aligned texts. A pair of semantically equivalent texts in different languages, such as an original text and its translation, that are aligned on a structural level (paragraph, sentence, phrase, etc.) is known as an aligned text or bitext. The standard format for representing aligned texts ...
... is the Translation Memory eXchange format (TMX) that is XML-compliant [13]. Expanded query can be applied on TXM documents in order to retrieve aligned segments that correspond to search criteria in the source and target language. A filtered TMX document is transformed into XML, TXT and HTML output ...
... metadata [6]. Concept represents the core of GeolISS, and it is implemented as an aggregation of geological vocabularies, collections of terms and text definitions of things thought to exist in a domain or collections of possible values for properties. The terms in the vocabularies are used to classify ...Ranka Stanković. "Improvement of geodatabase queries within GeolISS" in Review of the National Center for Digitization, Beograd : Faculty of Mathematics, Belgrade (2008)
-
Measuring semantic relevance of words in synsets
Obradović Ivan, Krstev Cvetana, Vitas Duško. "Measuring semantic relevance of words in synsets" in Text and Language, Structures · Functions · Interrelations. Quantitative Perspectives, P. Grzybek, E. Kelih, J. Mačutek (eds.), Wien:Praesens Verlag (2010): 133-144
-
Distribution of canonical syllable types in Serbian
Obradović Ivan, Obuljen Aljoša, Vitas Duško, Krstev Cvetana, Radulović Vanja. "Distribution of canonical syllable types in Serbian" in Text and Language, Structures · Functions · Interrelations. Quantitative Perspectives, P. Grzybek, E. Kelih, J. Mačutek (eds.), Wien:Praesens Verlag (2010): 145-157
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... , "gender": "", "meta_ID": "", "resource_1_senses": [ { "#text": "of or relating to the spleen", "external_ID": "splenic.a.01"}, { "#text": "very irritable", "external_ID": "bristly.s.01"} ], "resource_2_senses": [ { "#text": "affected with spleen; malicious; spiteful; peevish; fretful ...
... represents the results of our evaluations on the aligned senses. The degree indicates the distribution of the alignments with respect to the senses. For instance, a de- gree of 1.182 (k1) in the case of Russian shows that every sense is at least aligned with another one. On the other hand, a low degree ...
... speech (given in parentheses after the headword); • The sense text (definition) in the first resource; • An interactive drop-down to specify one of the 5 se- mantic relations (see below) from the sense in the first resource; • The sense text (abbreviated) in a drop-down list from the second resource ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)