Претрага
2342 items
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... 313(1):93– 104. Ralph Grishman and Beth Sundheim. 1996. Message Understanding Conference-6: A Brief History. In Proceedings of the 16th International Conference on Computational Linguistics (COLING 1996). vol- ume 1. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural Language Understanding with Bloom ...
... Proceed- ings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 10Gemini, https://github.com/fyh828/gemini/ 1068 1. Association for Computational Linguistics, pages 141–150. Nathalie Friburger and Denis Maurel. 2004. Finite- state Transducer Cascades to Extract ...
... Task: Language-independent Named Entity Recognition. In COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002). Satoshi Sekine, Kiyoshi Sudo, and Chikashi No- bata. 2002. Extended Named Entity Hier- archy. In Proceedings of the Third Interna- tional Conference on Language Resources ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
Keyword-Based Search on Bilingual Digital Libraries
This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
-
OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian
Ovaj rad predstavlja novi jezički resurs za pretraživanje i istraživanje verbalnih aspektnih parova u BCS (bosanskom, hrvatskom i srpskom), kreiran korišćenjem principa Lingvističkih Povezanih Otvorenih Podataka (LLOD). Pošto ne postoji resurs koji bi pomogao učenicima bosanskog, hrvatskog i srpskog kao stranih jezika da prepoznaju aspekt glagola ili njegove parove, kreirali smo novi resurs koji će korisnicima pružiti informacije o aspektu, kao i link ka aspektnim parovima glagola. Ovaj resurs takođe sadrži spoljne linkove ka monolingvalnim rečnicima, Wordnetu i BabelNetu. ...Ranka Stanković, Maxim Ionov, Medina Bajtarević, Lorena Ninčević. "OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
E-Connecting Balkan Languages
In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. This tool relies for its work on several sophisticated textual and lexical resources that were developed for most of Balkan languages. These resources are based on several de facto standards in natural language processing.... for Serbian, and in bilingual context, for Serbian and English. In this paper we will show that tools WS4LR and WS4QE are truly independent both from Serbian, for which they were initially developed, and from English which seems to be in the background of many natural language processing tools ...
... Information Science and Technology. Bucureşti: Publishing house of the Romanian academy, Vol. 7, No.1-2, 2004. [21] D. Tufiş, S. Koeva, T. Erjavec, M. Gavrilidou, and C. Krstev. Building Language Resources and Translation Models for Machine Translation focused on South Slavic and Balkan Languages ...
... for them. 2. Integrated Language Resources In order to prove the usability of WS4LR and WS4QE for languages other then Serbian and English we used various resources, both textual and lexical. In the following sections we will briefly present these resources, what methodological framework ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva. "E-Connecting Balkan Languages" in Proceedings of the Workshop Workshop on Multilingual resources, technologies and evaluation for Central and Eastern European Languages, 17 September 2009, eds. C. Vertan, S. Piperidis, E. Paskaleva and Milena Slavcheva, Borovets, Bulgaria : Association for Computational Linguistics Stroudsburg, PA, USA (2009)
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... Faculty of Mining and Geology Đušina 7, 11000 Belgrade, Serbia ranka@rgf.bg.ac.yu Abstract In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their s ...
... developed on basis of PWN and the top-ontology accepted in EuroWordNet, and aligned by using ILI. From a lexicographer’s point of view, the development of a wordnet, perceived as a specific form of dictionary and hierarchical thesaurus for a particular language, opens two critical issues ...
... abandoned. However, language specific concepts were also developed for each particular wordnet, as well as a set of concepts common to BalkaNet languages and unknown to PWN [10]. Once a concept has been accepted and placed within the conceptual framework of a particular language, the lexicographer ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
-
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...... the Association for Computational Linguistics, 4, 17-30. Jin, Y., Kan, M. Y., Ng, J. P., & He, X. (2013). Mining scientific terms and their definitions: A study of the ACL anthology. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 780-790. Tissier ...
... Natural Language Processing (EMNLP 2017), Sep 2017, Copenhague, Denmark. pp. 254-263. Navigli, R. & Velardi, P. (2010). Learning Word-Class Lattices for Definition and Hypernym Extraction. In Proceedings of the Forty-Eighth Annual Meeting of the Association for Computational Linguistics. Uppsala ...
... embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1522-1532. Barnbrook, G. (2002). Defining Language, A local grammar of definition sentences, Studies in Corpus Linguistics, (Vol. 11). John Benjamins Publishing. Gortan Premk, D. (1980). O gramatičkoj ...Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... Computational Linguistics - EACL. Rodrıguez, F. M. B., Noya, E. D., Otero, P. G., Martınez, M. L., Mato, E. M. M., Rojo, G., Docıo, S. S. (2007). A Corpus and Lexical Resources for Multi-word Terminology Extraction in the Field of Economy in a Minority Language. Proc. of 3rd Language & Technology ...
... being created and introduced in Serbian making important the automation of their retrieval and incorporation in Serbian terminological dictionaries. Due to spe- cific features of Serbian grammar, especially its rich morphology, this is a complex task, and cor- responding language resources in the ...
... & H. Li (Eds.), Natural Language Processing and Information Systems (Vol. 6177, pp. 248-255): Springer Berlin Heidelberg. Justeson, J. S., & Katz, S. M. (1995). Technical ter- minology: some linguistic properties and an algo- rithm for identification in text. Natural Language Engineering, 1 (01): ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... the keyword and optionally selects a text collection to search (the default is all collections). Besides the keyword itself, it is necessary to choose the keyword language, and then click on the “Preview and modify terms for query” link. The system uses web services to find synonyms and translations ...
... the language and the collection (it is possible to simultaneously search through all available text collections). The user enters the search criteria in the search field, then adds additional criteria by clicking the “+” sign. Boolean operators “OR” and “AND” build the search query. “AND” is used ...
... Assistant Professor of Mathematics and Informatics at Faculty of Mining and Geology at University of Belgrade. Her scientific field is Human Language Technologies (HLT). She is teaching several courses related to informatics (traditional, online and blended) and she is head of the Computing Centre ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
-
Old or New, We Repair, Adjust and Alter (Texts)
Cvetana Krstev, Ranka Stanković (2020)U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja... lem of OCR error detection and correction is still not considered solved (see, for example, (Kolak and Resnik, 2002)), especially for more “demanding” scripts and languages (Cyrillic, Arabic, etc.). Transformation from one language variant to another is usually not per- ceived as an error/correction ...
... true for problems of diacritic restoration, OCR errors correction and language variants transformation. In this paper we present an approach to solving three text mending problems for Serbian: OCR errors, diacritics omission and language vari- ant switching. The common characteristic of these problems ...
... on Human Language Technology Research, 257–262. Morgan Kaufmann Publishers Inc., 2002 Krstev, Cvetana. Processing of Serbian – Automata, Texts and Electronic dictionaries. Faculty of Philology, University of Belgrade, 2008 Krstev, Cvetana, Ranka Stanković and Duško Vitas. “Knowledge and Rule- Based ...Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3
-
Towards translation of educational resources using GIZA++
... on quantitative linguistics (QUALICO) in Belgrade, Serbia, April 26-29, 2012. University of Belgrade, 2013. [20] D. Vitas and C. Krstev. “Construction and Exploitation of X-Serbian Bitexts”. In Cristina Vertan and Walther v. Hahn (eds.) Multilingual Processing in Eastern and Southern EU Languages: ...
... of them hard to translate into and with relatively weak machine translation (MT) support. Phrase-based and syntax-based SMT models are developed to address language diversity and support the language independent nature of the methodology. For high-quality MT and to add value to existing infrastructure ...
... ~/corpus/edX.clean 1 80 Language Model Training A language model (LM) is used to ensure fluent output, built with the target language, in our case English. Following script creates lm folder, positions in it and finally execute command that will build an 3-gram language model. mkdir ~/lm cd ...Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)
-
A Mathematical Learning Environment Based on Serbian Language Resources
In recent years, in line with ever growing usage of Information technology, the learning environments are changing. The amount of available learning materials in various forms has increased. These new environments demand comprehensive learning systems, which enable management of the learning corpus with special attention paid to relevant lexical resources. In this paper we present the concept of a Mathematical Learning Environment in Serbian (MLES), which is based on a corpus of mathematical materials and various lexical resources, enabling ...... for C# programming language and MVC design pattern, as well as HTML and JavaScript, whereas SQL Server served as support for the database. The application is located at http://termi.rgf.bg.ac.rs/ and consists of 5 specific units: browse, search, update, bibliography and profiles. Termi currently ...
... corpus of mathematical content and provides mechanisms for processing and search of this content. It relies on existing lexical resources, morphological e-dictionaries and WordNet of Serbian, which have been developed within the University of Belgrade Human Language Technology group for several ...
... millennium. Proceedings of the Corpus Linguistics 2011 conference. Birmingham: University of Birmingham. [15] Hardie, A. (2012). CQPweb - combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics. 17 (3), pp. 380–409. [16] Stanković ...Radojičić Marija, Obradović Ivan, Stanković Ranka, Utvić Miloć, Kaplar Sebastijan. "A Mathematical Learning Environment Based on Serbian Language Resources" in Proceedings of the 7th International Scientific Conference Technics and Informatics in Education, Faculty of Technical Sciences, Čačak (2018)
-
An Integrated Environment for Management and Exploitation of Linguistic Resources
Ranka Stanković, Ivan Obradović (2009)... ly used for management and exploitation of linguistic resources. Both the tools and the resources were developed within the University of Belgrade Human Language Technology Group. The tools we describe are WS4LR, a software tool that has been devel- oped and used for solving different ...
... “Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases”, Polibits, Special section: Natural Language Processing, Journal of Research and Development in Computer Science and Engineering, ed. G. Sidorov (ed.), Centro Innovacion y Desarrollo Tecnologico ...
... spoken, but also from France and the Nether- lands. A national development team was formed for each language, which in the case of Serbian was the University of Belgrade HLT Group. Upon the termination of this project, the development of SWN continued, and this net- work to date contains ...Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
-
Integrisano okruženje za pripremu paralelizovanog korpusa
Razvoj paralelizovanih korpusa zahteva pripremu paralelnih tekstova za njihovu integraciju u paralelizovani korpus. Reč je o jednom kompleksnom zadatku koji se može rešiti na različite načine, i koji mora da se odvija u nekoliko koraka. U ovom radu najpre je iznet postupak pripreme paralelnih tekstova za paralelizovani korpus koji se koristi u Grupi za jezičke tehnologije Univerziteta u Beogradu. Potom je dat kratak pregled programa (XAlign, Concordancier, WS4LR), odnosno softverskih alata koji se pri tome koriste. Nedostatak udobnog okruženja ...... bilingual corpora, Computational Linguistics”, Vol. 19/1, pp. 75 – 102. 12 [3] Krstev Cvetana, Ranka Stanković, Duško Vitas, Ivan Obradović (2006): “WS4LR - a Worksation for Lexical Resources”, in Proceedings of the Fifth International Conference on Language Resources and Evaluation, Genoa, Italy, May ...
... Kaalep, V. Petkevič, D. Tufiş (1998): “Multext- East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages”, in Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics COLING-ACL '98. Montréal, Québec, Canada, pp. 315- 319. [5] ...
... aligned parallel corpus with 20+ languages. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC'06, ELRA, Paris, 2006. [6] Tomaž Erjavec: Compiling and Using the IJS-ELAN Parallel Corpus. Informatica, 26(3), pp. 299-307, 2002. SUMMARY The development ...Ivan Obradović, Ranka Stanković, Miloš Utvić. "Integrisano okruženje za pripremu paralelizovanog korpusa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
-
An Italian-Serbian Sentence Aligned Parallel Literary Corpus
This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388
-
Using Query Expansion for Cross-Lingual Mathematical Terminology Extraction
Velislava Stoykova, Ranka Stanković (2018)Velislava Stoykova, Ranka Stanković. "Using Query Expansion for Cross-Lingual Mathematical Terminology Extraction" in Advances in Intelligent Systems and Computing, Springer International Publishing (2018). https://doi.org/10.1007/978-3-319-91189-2_16
-
Претрага корпуса заснована на употреби екстерних лексичких ресурса путем веб-сервиса
У раду се разматра хибридни приступ претрази корпуса, илустрован на примеру алатки OCWB и NoSketch Engine, примењених на специјални корпус из области рударства (РудКор) и Корпус савременог српског језика (СрпКор). Разматрани приступ комбинује постојеће могућности алатки OCWB и NoSketch Engine, које своју претрагу заснивају на лингвистичкој анотацији корпуса, са новим могућностима претраге у виду консултовања екстерних језичких ресурса (морфолошки електронски речници српског језика и лексичка база података Српски ворднет). Хибридни приступ је реализован надоградњом вебсучеља која поменуте алатке користе ...... Еверт-Харди 2011: Stefan Evert and Andrew Hardie, „Twenty-first Century Cor- pus Workbench: Updating a Query Architecture for the New Millennium”, In: Proceedings of the Corpus Linguistics 2011 Conference, Birmingham, University of Birmingham. Еверт 2019: Stefan Evert and The OCWB Development Team, CQP ...
... Belgrade. Крстев и др. 2018: Cvetana Krstev, Ranka Stanković, Duško Vitas, ”Knowl- edge and Rule-Based Diacritic Restoration in Serbian”, In: Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, ISSN 2367-5675 (on-line) ...
... Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev, “Resource based WordNet augmentation and enrichment”, In: Proceedings of the Third International Conference Compu- tational Linguistics in Bulgaria (CLIB 2018), May 27–29, 2018, Sofia, Bul- garia, ISSN 2367-5675 (on-line), 104–114, http://dcl ...Милош Утвић, Ранка Станковић, Александра Томашевић, Михаило Шкорић, Биљана Лазић. "Претрага корпуса заснована на употреби екстерних лексичких ресурса путем веб-сервиса" in Научни састанак слависта у Вукове дане - Vol. 48/3 Српски језик и његови ресурси, Међународни славистички центар, Филолошки факултет, Универзитет у Београду (2019). https://doi.org/10.18485/msc.2019.48.3.ch12
-
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić | Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018 | 2018 | | http://dr ...
... , G., and Krstev, C. (1993). Electronic dictionary and text processing in Serbo- Croatian. Sprache–Kommunikation–Informatik, 1:225. 10. Language Resource References Krstev, Cvetana and Vitas, Duško. (2015). Serbian Mor- phological Dictionary - SMD. University of Belgrade, HLT Group and Jerteh ...
... other (lexical) data and the possibility to access data by using the standardized SPARQL query language. The model pre- sented is based on the lemon model, but some modifica- tions and extensions were necessary to enable full migra- tion of complex grammatical structures and numerous in- flected ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... M., Perrin, D. (eds.) Electronic Dictionaries and Automata in Computational Linguistics, Lecture Notes in Computer Science, vol. 377, pp. 34–50. Springer Berlin / Heidelberg (1989), http://dx.doi.org/10.1007/3-540-51465-1 3 3. Hiemstra, D.: Using language models for information retrieval. Taaluitgeverij ...
... ex- tracting and selecting terms (words) that appear in the text of documents. To that end, many natural language processing (NLP) methods and techniques are used: determining the boundaries of sentences, tokenization, stemming, tagging, recognition of nominal phrases and named entities and, finally, parsing ...
... results are shown in the original script, and that is Cyrillic. Query processing on the server side expands the query by creating a matrix of key words, fields that are searched, and weight factors, and then translates this query into SQL (Structured Query Language) form. The query generated in such a way ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
-
Речник САНУ као база терминолошких речника (на примеру речника кулинарства)
... 115-130. 6. Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić and Aleksandra Trtovac. „Rule based automatic multi-word term extraction and lemmatization.” In: 10th edition of the Language Resources and Evaluation Conference (LREC), 23-28 May 2016, Portorož. 7. Ranka Stanković, Ivan ...
... Lexicography 1(1), pp.7–36. 4. Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), pp.61 –74. 13 5. Frantzi, K., Ananiadou, S., and Mima, H. (2000). Automatic recognition of multi-word terms:.the C-value/NC-value method. Int ...
... Katarina Popovic Midžine (processed using Unitex tool and morphological dictionaries for Serbian language in DELA format). After this we applied Leximir tool to extract a list of the lemma frequencies in the cookbook, which are then filtered and classified by applying semantic markers. The obtained ...Рада Стијовић, Олга Сабо, Ранка Станковић. "Речник САНУ као база терминолошких речника (на примеру речника кулинарства)" in Словенска терминологија данас, Београд : Српска академија наука и уметности (2017)
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... Utvić, M. Developing Termbases for Expert Terminology under the TBX Standard. In Natural Language Processing for Serbian-Resources and Applications, Proceedings of the 35th Anniversary of Computational Linguistics in Serbia, Belgrade, Serbia, 12 November 2013; Pavlović Lažetić, G., Vitas, D., Krstev ...
... using human language technology [31] and used within this research in the web and mobile applications. 2.3. General Purpose Morphological Dictionaries Serbian has an extensive system of inflection and a complex agreement system that makes extraction of terminology more complicated, and thus the use ...
... the corpora and in the dictionaries. Finally, candidates are harmonised and assembled to the microstructure of the lexical database Termi, which consists of a headword, synonyms, abbreviations, definition, for each language, bibliographic source and possibility to include illustration and other external ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892