Претрага
49 items
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms ...
... units, and among them 97% were associated with correct lemmas. Keywords: term extraction, terminology, multi-word units, lemmatization, finite-state transducers 1. Motivation Various approaches have been proposed for multi-word term (MWT) extraction as this problem has been gaining in importance ...
... MWUs and four contain 4-component MWUs. As the thirteen classes cover the large majority of MWUs, lexical rules and the corresponding finite state transducers (FSTs) have been developed for the extraction of MWTs belonging to these classes, with the assumption that structures used most frequently ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
-
Old or New, We Repair, Adjust and Alter (Texts)
Cvetana Krstev, Ranka Stanković (2020)U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja... lists of candidates for replacement is done by finite-state transducers implemented in Unitex software (Paumier et al., 2016). – All presented systems consist of two independent parts, both of which are implemented as cascades of finite-states transducers – in these cascades each FST works on a texts ...
... (texts) UDC 811.163.41’322.2: 004.9 DOI 10.18485/infotheca.2019.19.2.3 ABSTRACT: In this paper we present how e-dictionaries and cascades of finite-state transducers, as implemented in Unitex, can be used to solve three text transformation prob- lems: correction of texts after OCR, restora- tion of diacritics ...
... between differ- ent language variants. KEYWORDS: text correction, OCR errors, diacritic restoration, language variants, electronic dictionary, finite-state transducers. PAPER SUBMITTED: 13 October 2019 PAPER ACCEPTED: 08 December 2019 Cvetana Krstev University of Belgrade, Faculty of Mathematics cvetana@matf ...Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3
-
GIS Application Improvement with Multilingual Lexical and Terminological Resources
... 000 different forms) and it is being constantly upgraded. Inflectional finite state transducers (FST) for the inflection of both simple and compound words have been developed within the Unitex system 2 . These transducers play an important role in the query expansion application WS4QE, by enabling ...
... long period, and they have reached a considerable volume to date (Vitas et al., 2003). They include morphological e-dictionaries and finite state transducers, which offer the possibilities for solving the problem of flections in queries, and electronic thesauri, ontologies and wordnets which ...
... ly improve retrieval performances. The use of transducers is especially important in the case of compounds. For instance, if a query is performed with the compound keyword kvarcna stena ‘quartzose, quartz rock’, three inflectional transducers are used: one for the inflection of the adjective ...Ranka Stanković, Ivan Obradović, Olivera Kitanović. "GIS Application Improvement with Multilingual Lexical and Terminological Resources" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta, May 2010, Valetta, Malta : European Language Resources Association (2010)
-
WS4LR - a Worksation for Lexical Resources
... input and output to XAlign, as well as the corresponding TMX format are given. 2.4 Finite Transducers As described in section 2.1, inflectional paradigms are represented by appropriate finite state transducers usually produced by the graph management tool in Intex/Unitex environment. The produced ...
... code uniquely determine the finite transducer that generates all the forms in a lemma paradigm. A finite transducer, being capable of producing the output, adds to all these forms their possible grammatical categories. The DELAS dictionary and the set of transducers describing inflectional properties ...
... has developed in the course of many years and within different projects. The tool handles morphological dictionaries, wordnets, aligned texts and transducers equally and has already proved very useful for various tasks. Although it has so far been used mainly for Serbian, WS4LR is not language dependent ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović. "WS4LR - a Worksation for Lexical Resources" in Proceedings of the Fifth Interantional Conference on Language Resources and Evaluation, Genoa, Italy, May 2006, ELRA - European Language Resources Association (2006)
-
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... are by no means trivial. After considering several options, we concluded that Multiflex, a finite-state tool for MWUs, developed by A. Savary [5], suits our needs best. This tool supports finite-state transducers that can model a number of combining conditions. As for the inflection of constituent simple ...
... (2008) 4. Krstev, C., Vitas, D.: Finite State Transducers for Recognition and Generation of Compound Words. In Erjavec, T., Žganec Gros, J., eds.: IS-LTC 2006, Ljubljana, Slovenia, Institut “Jožef Stefan”(2006) 192–197 5. Savary, A.: Multiflex: A Multilingual Finite-State Tool for Multi-Word Units. In: ...
... productive classes of MWUs, like different types of numerals and named entities (time and duration, measures and currencies), we have devel- oped finite-state transducers (FSTs) that rely on morphological e-dictionaries of simple words to model these MWUs correctly [4]. When applied to a text in automatic text ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
-
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... Krstev and D. Vitas, “Finite State Transducers for Recognition and Generation of Compound Words,” in IS-LTC 2006, T. Erjavec and J. Žganec Gros, Eds. Ljubljana, Slovenia: Institut “Jožef Stefan”, October 2006, pp. 192–197. [7] A. Savary, “Multiflex: A Multilingual Finite-state Tool for Multi-Word Units ...
... like numerals and various named entities that rely on them (e.g. measurement phrases) can best be described by dictionaries in the form of finite-state transducers (FST), and a number of them were produced for Serbian as well [6]. Other contiguous MWUs that are idiosyncratic in nature, namely nouns ...
... processing systems that support work with this dictionary format were developed, Unitex [2] and Nooj [3], both of which are based on the use of finite-state technology. Serbian e-dictionaries of simple forms have reached a considerable size: they have a total of more than 126,000 lemmas [4] generating ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
-
The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines
In this paper we present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for tuning queries before submitting them to a web search engine. We argue that the selection of words chosen for a query, which are of paramount importance for the quality of results obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and wordnets. These dictionaries enable semantic ...LR web services, MultiWord Expressions & Collocations, Information Extraction, Information Retrieval... various language resources we have developed for Serbian (Krstev et al., 2008). These resources include morphological e-dictionaries and finite state transducers, which offer the possibilities for solving the problem of flections in queries, and electronic thesauri, ontologies and wordnets which ...
... contains approximately 2,700 lemmas (yielding more than 60,000 different forms) and it is being constantly upgrading. 2. Inflectional finite state transducers (FST) for the inflection of both simple and compound words have been developed for the Unitex system (http://www-igm.univ-mlv.fr/~unitex/) ...
... compounds words, but on the inflectional transducers as well. This enables a more elaborate query expansion that can significantly improve retrieval performances. For instance, if a query is performed with the keyword beli luk, three inflectional transducers are used: one for inflection of the adjective ...Krstev Cvetana, Stanković Ranka, Vitas Duško, Obradović Ivan. "The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines" in LREC 2008: Conference on Language Resources and Evaluation, Marrakesh, Morocco, May 2008, European Language Resources Association (ELRA) (2008)
-
Development of Open Educational Resources (OER) for Natural Language Processing
In this paper we present the development of an online course at the edX BAEKTEL platform named “Lexical Recognition in the Natural Language Processing (NLP)”. It is based on the course of the same name for PhD studies at the University of Belgrade, Faculty of Philology. There are not many courses in Computational Linguistics (CL) on OER platforms, and there is none in Serbian either for CL or NLP. We have developed this course in order to improve this ...... developed. This course covers a broad range of topics such as pattern recognition using regular expressions, electronic dictionaries, Finite-state automata and transducers, etc. Within the course different didactic forms were used including text, video tutorials and some useful practical exercises that ...
... sophisticated approaches. Besides simple regular expressions, graphs that represent the visualization of finite state automata (FSA) can be used for complex queries. Moreover, finite state transducer FSTs (FSAs with output) can be used for text transformation. The concept of e-dictionaries ...
... system consisting of a collection of programs developed for text analysis by using linguistic resourcesprogramme.[11] Unitex is based on finite-state technology. It enables application of morphological electronic dictionaries and grammars to texts for a number of different languages: 10 ...Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević. "Development of Open Educational Resources (OER) for Natural Language Processing" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
-
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... numerals and various named entities that rely on them (e.g. measurement phrases) can best be described by dic- tionaries in the form of finite-state transducers (FST), and a number of them were produced for Serbian as well [10]. Other contiguous MWUs that are idiosyncratic in nature, namely nouns ...
... Marocco (2008) 10. Krstev, C., Vitas, D., Obradović, I., Utvić, M.: E-dictionaries and finite-state automata for the recognition of named entities. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pp. 48–56. Association for Computational Linguistics ...
... tasks. Two corpus processing systems that support work with this dictionary format were developed, Unitex [13] and Nooj [20], both of which use finite-state technology as initially introduced in [5]. Serbian e-dictionaries of simple forms have reached a considerable size: they have a total of more ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
-
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... morphological electronic dictionaries and finite state transducers for Serbian [6]. 4.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state methodology as described in [1], ...
... [2]. The role of electronic dictionar- ies, covering both simple words and multi-word units, and dictionary finite-state transducers (FSTs) is text tagging. Each e-dictionary of forms consists of a list of entries supplied with their lemmas, morphosyntactic, semantic and other in- formation. The forms ...
... keywords, abstract, geographical location and the like. A bag of words was produced from these metadata using morphological dic- tionaries and transducers, and named entities within the metadata were recognized using a rule-based system. Both were then used for index- ing documents and ranking was based ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
-
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...... codes of its possible morphosyntactic realization. Finite state transducers (FSTs) are abstract mathematical constructions that allow modelling of local grammars to describe some linguistic constructions, for example, noun phrases. A finite state transducer “passes” through the text it analyses to ...
... syntactic structure (Vitas & Krstev 2012). Finite state transducers are visualized by graphs for easier development and use. A local grammar and its corresponding graph that models (and recognizes) definitions of nouns that represent attributes and/or state is given in Figure 1. Below the graph, six ...
... consists of 61,213 definitions of nouns, which were analysed using Serbian morphological e-dictionaries and local grammars implemented as finite state transducers in an open-source corpus processing suite Unitex. The 21 models developed up to the present moment cover 57% of dictionary definitions, 83% ...Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... nouns grammatical gen- der and number, case, and animateness are given. Grammatical inflectional rules are encoded by 854 inflectional Finite-State Transducers (FST) Inflectional FSTs are a special kind of FSTs used for modeling inflectional paradigms, that is, in- flectional classes. Each FST of ...
... the neck for NLP Computational Linguis- tics and Intelligent Text Processing (1-15): Spring- er. Savary, A. (2009). Multiflex: A Multilingual Finite- State Tool for Multi-Word Units. In S. Maneth (Ed.), Implementation and Application of Automata (Vol. 5642, pp. 237-240): Springer Berlin Heidelberg ...
... 2012; Zhang et al., 2006) and the other is based on lin- guistic rules. A rule-based approach for the ex- traction of terms based on a cascade of transducers using CasSys tool incorporated in Unitex1 corpus processing platform, as well as the use of TMF standard for the representation of terms is proposed ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite state transducers, can also be used to aid the user in developing and refining the wordnet. Keywords Wordnet development, language resource integration ...
... format was a system called Intex [12]. Intex uses dictionaries in combination with regular expressions and inflectional and morphological finite state transducers (FSTs) to locate morphological, lexical and syntactic patterns, remove ambiguities, and tag simple and compound words in texts. The ...
... resources, of which wordnets are just one type. The tool enables integrated handling of electronic dictionaries, wordnets, aligned texts and transducers equally, and has already proved very useful for various tasks. Although the tool has a module especially developed for manipulating wordnets ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
-
Constitutive Model for Analysis of Long-Term Municipal Solid Waste Landfill Settlement
Irena Basarić Ikodinović, Dragoslav Rakić, Mirjana Vukićević, Sanja Jocković, Jovana Janković Pantić (2022)Large long-term settlement occurs at the municipal solid waste landfills over an extended period of time which may lead to breakage of the geomembrane, damage of the cover systems, other protective systems or facilities constructed on top of a landfill. Also, municipal solid waste is an extremely heterogeneous material and its properties vary over location and time within a landfill. These material characteristics require the formulation of a new constitutive model to predict the long-term settlement of municipal solid ...Irena Basarić Ikodinović, Dragoslav Rakić, Mirjana Vukićević, Sanja Jocković, Jovana Janković Pantić. "Constitutive Model for Analysis of Long-Term Municipal Solid Waste Landfill Settlement" in ICEG 2022: International Conference on Environmental Geotechnics, World Academy of Science, Engineering and Technology (2022)
-
A Lexical Approach to Acronyms and their Definitions
In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.... (lower part of the same graph): (8) mirovne,mirovan.A:aefs2g mirovne,mirovan.A:aefp1g 4In Unitex complex grammars can be modelled by using finite-state transducers and e-dictionaries (http://www-igm.univ- mlv.fr/ unitex/) Figure 2: Two paths from a graph that filters AANprepNp constructions and performs ...
... and A. Barreiro, 2004. Portuguese Large-scale Language Resources for NLP Applications. In 4st LREC. Savary, A., 2009. Multiflex: a multilingual finite-state tool for multi-word units. In Implementation and Applica- tion of Automata. Springer, pages 237–240. Schwartz, A. S. and M. A. Hearst, 2003. ...
... look in dictionaries to confirm, for instance, the occurrence of prepositions and/or conjunctions. These patterns are im- plemented as Unitex4 transducers, which produce input for the next step by recognizing modelled patterns. Graphs for filtering and simple word lemmatization: They are used in ...Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
-
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... morphological electronic dictionaries and finite-state transducers for Serbian [12]. 3.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state methodology as described in [3,7] ...
... 7]. The role of electronic dictionar- ies, covering both simple words and multi-word units, and dictionary finite-state transducers (FSTs) is text tagging. Each e-dictionary of forms consists of a list of entries supplied with their lemmas, morphosyntactic, semantic and other information. The forms are ...
... and geographical location. These metadata were used for generating a bag of words for each document with the aid of morphological dictionaries and transducers. Named entities within metadata were also recognized with the help of a rule- based system. Both the bag of words and the metadata were then used ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
-
The Nooj System as Module within an Integrated Language Processing Environment
... developed in the course of many years and within different projects. The tool handles morphological dictionaries, wordnets, aligned texts and transducers and has already proved very useful for various tasks. Although it has so far been used mainly for Serbian, WS4LR is not language dependent and ...
... which can be in different formats. Figure 8 presents a form with query expansion for lemma: dokument, with morphological dictionaries and transducers as the morphological resource and Serbian wordnet as the resource for semantic expansion. The user can also use the translation equivalence option ...Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
-
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...... patterns that represent structure of MWU terms. They are represented in form of transducers applied on domain corpus to extract terminology. Examples of patterns are presented in [15]. After applying these transducers on domain text extracted potential terms were evaluated. Results presented in previous ...
... in all languages involved, which would lead to equivalent opportunities and up to date education materials. Firstly, a brief history and current state of the art of terminological resources are presented, followed by an overview of BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology ...
... Electronic Dictionaries with Terms from the Culinary Domain,” in Proc. 7th Global WordNet Conference, 2014, pp. 127–132. [10] M. Hepp, “Ontologies: State of the art, business potential, and grand challenges,” in Ontology Management, 1 st ed., M. Hepp, P. DeLeenheer, A. De Moor and Y.Sure, Ed. Springer ...Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
-
An Approach to Development of Bilingual Lexical Resources
... in ISBN 978-86-7031-200-5, Faculty of Sciences, University of Novi Sad. 102 language resources such as grammars in the form of finite automata and transducers, as well as various lexical resources. Bibliša is able to expand search queries both morphologically and semantically, as well as ...
... language. One type of lexical resources, morphological e-dictionaries, together with the system of rules for compound inflection, finite automata and transducers, represent the basis for morphological expansion of queries. As for semantic and bilingual expansion, the system relies on Serbian ...Stanković Ranka, Obradović Ivan, Trtovac Aleksandra. "An Approach to Development of Bilingual Lexical Resources" in Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012, Novi Sad : BCI (2012)
-
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... in e-dictionaries (850 pos- sessive adjectives and 25 feminine surnames). To system- atically produce these derived lemmas we developed finite- state transducers (16 different FSTs), similar to those used for inflection, to derive possessive adjectives and feminine counterparts from all surnames ...
... structure of a simple word lemma is: lemma,POS#fst[+Marker]* Mandatory parts of this structure are a lemma, its POS, and identification of a finite-state transducer that will produce all lemma’s inflected forms with associated grammatical in- formation (e.d. case, number, gender, etc.). Markers ...
... (e.g., to denote gender, number, part of speech, etc.). 3Unitex is a lexically-based corpus processing suite that offers strong support for finite-state processing using morphological dic- tionaries –http://unitexgramlab.org/ Figure 1: Data categories (markers) dictionary. The main class of ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)