Претрага
180 items
-
Fourth Summer Datathon on Linguistic Linked Open Data
Tijana Radović, Ranka Stanković (2023)The 4th Summer Datathon on Linguistic Linked Open Data (SD-LLOD-22) was held in Spain, in Cersedilla near Madrid, in May 2022, and organized by the COST Action NexusLinguarum. The school gathered interested researchers, academics, students who wanted to acquire and/or expand their knowledge in the field of linguistic linked data science. During the school, a spectrum of topics from the field of linked data was presented, from various ontologies, through document integration, annotation and natural language text processing tools ...Tijana Radović, Ranka Stanković. "Fourth Summer Datathon on Linguistic Linked Open Data" in Infotheca, Faculty of Philology, University of Belgrade (2023). https://doi.org/10.18485/infotheca.2023.23.1.6
-
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... by using the standardized SPARQL query language. The model pre- sented is based on the lemon model, but some modifica- tions and extensions were necessary to enable full migra- tion of complex grammatical structures and numerous in- flected forms for Serbian. MULTEX-East lexicons (Krstev et al., ...
... dictionaries. Therefore, in our model the class Form is used for inflected forms in- stead of variant forms, which is important for Serbian as a highly inflective language. Also, we adapted the lemon model to store all existing markers as a thesaurus of data categories and their values, which enabled ...
... Resources. In Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006, pages 1692–1697. Krstev, C., Stanković, R., and Vitas, D. (2010). A Descrip- tion of Morphological Features of Serbian: a Revision using Feature System Declaration. In Nicoletta Calzo- ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Building Terminological Resources in an e-Learning Environment
... repository for different types of terms: Serbian synonyms of the basic term, its available translational equivalent in the chosen language, and the inflectional forms of the Serbian term and its synonyms. Namely, as Serbian is a morphologically very rich language, there was a need to provide for all ...
... technologies. For each concept separate Serbian and English entries were created. In line with the standard requirements for glossaries, besides the basic Serbian and English terms, each entry contained a short definition of the term in the respective language. However, no synonyms were taken into ...
... functionality within the information system, an UML (Unified Modeling Language) engineering model with a special structure has been developed, whose main features are depicted in Figure 2. Assuming basic familiarity with this language we will briefly comment this model. The class Rečnik in the model ...Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Building Terminological Resources in an e-Learning Environment" in Proceedings of the Third International Conference on e-Learning, eLearning-2012, September 2012, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2012)
-
Parallel Bidirectionally Pretrained Taggers as Feature Generators
In a setting where multiple automatic annotation approaches coexist and advance separately but none completely solve a specific problem, the key might be in their combination and integration. This paper outlines a scalable architecture for Part-of-Speech tagging using multiple standalone annotation systems as feature generators for a stacked classifier. It also explores automatic resource expansion via dataset augmentation and bidirectional training in order to increase the number of taggers and to maximize the impact of the composite system, which ...Ranka Stanković, Mihailo Škorić, Branislava Šandrih Todorović. "Parallel Bidirectionally Pretrained Taggers as Feature Generators" in Applied Sciences, MDPI AG (2022). https://doi.org/10.3390/app12105028
-
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... rs 1 2 Cvetana Krstev, Ivan Obradović, Ranka Stanković, and Duško Vitas 1 Introduction Morphological electronic dictionaries of Serbian for natural language processing (NLP) are being developed for many years now. Their development follows the methodology and format (known as DELAS/DELAF) presented ...
... for some languages this complex procedure can be skipped and a list of MWU forms can be produced from scratch. Serbian is, how- ever, like all Slavic languages a highly inflectional language and such a shortcut procedure cannot be applied. We will illustrate this with two examples. The nomi- nal MWUs ...
... languages other than Serbian and En- glish, namely, for Bulgarian [8]. The new functionality for production of DELAC entries is also expected to perform successfully without any modifications for other languages. The prerequisites are that there exists a Unitex module for that language including: a dictionary ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
-
Old or New, We Repair, Adjust and Alter (Texts)
Cvetana Krstev, Ranka Stanković (2020)U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja... true for problems of diacritic restoration, OCR errors correction and language variants transformation. In this paper we present an approach to solving three text mending problems for Serbian: OCR errors, diacritics omission and language vari- ant switching. The common characteristic of these problems is ...
... print of the original text, and its language and alphabet. OCR software today is of good quality compared to its first versions, even when produced for personal rather than professional use,1 and it is applicable to a large number of languages and scripts, including Serbian Cyrillic. However, OCR of old printed ...
... using a noisy channel model”. In Proceedings of the second international conference on Human Language Technology Research, 257–262. Morgan Kaufmann Publishers Inc., 2002 Krstev, Cvetana. Processing of Serbian – Automata, Texts and Electronic dictionaries. Faculty of Philology, University of Belgrade ...Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3
-
Towards ELTeC-LLOD: European Literary Text Collection Linguistic Linked Open Data
Овај рад описује студију случаја о генерисању повезаних података креираних на основу обечежених текстуалних корпуса коришћењем формата размене података у обради природних језика (NIF). Као основа за ово истраживање послужио је подскуп корпуса ELTeC, који се састоји од 900 романа из периода 1840-1920 за 9 европских језика. Верзија романа са коментарима, у такозваном TEI level-2 формату, трансформисана је у NIF, формат заснован на RDF/OWL који има за циљ постизање интероперабилности између алата за обраду природних језика, језичких ресурса и ...Ranka Stanković, Christian Chiarcos, Miloš Utvić, Olivera Kitanović. "Towards ELTeC-LLOD: European Literary Text Collection Linguistic Linked Open Data" in LDK 2023 – 4th Conference on Language, Data and Knowledge, 12-15 September in Vienna, Austria, Lisabon : NOVA FCSH - CLUNL (2023). https://doi.org/10.34619/srmk-injj
-
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... [22]. When language processing methods and techniques are used for generating a document surrogate, they rely heavily on lexical resources, which is especially important in the case of languages with rich morphology, such as Serbian, and South-Slavic languages in general. Although Serbian belongs to ...
... (corpora and e-dictionaries), as well as applications for basic language processing (tokenization, Part-Of-Speech (POS) tagging, mor- phological analysis), information retrieval and extraction [26]. Several successful applications of Serbian language resources and tools in tasks related to document indexing ...
... in this paper is based on morphological electronic dictionaries and finite-state transducers for Serbian [12]. 3.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
-
Srbija u OneGeology Europe
Геолошки завод Србије као носилац Пројекта ОneGeologyEurope заједно са Рударско геолошким факултетом и Министарством за природне ресурсе, рударство и просторно планирање су се укључили у међународни Пројекат OneGeology Europe у мају 2013. године у већ поодмаклој фази израде Пројекта. До краја 2013. године испунили су завршене активности које треба да доведу до пуноправног укључења у Пројекат чиме је Република Србија нашла своје место на Геолошкој карти Европе 1:1М. Геолошка карта Србије 1:1М представља компилациону односно поједностављену верзију ОГК 1:500 ...... на српском језику што је приказано на слици 8. Слика 8: 1GE Портал са интерфeјсом на српском језику Fig 8: 1GE Portal with interface in Serbian language Заокружење пројекта подразумева остале финалне активности након којих би карта требала да буде доступна на 1G-E Порталу, а то обухвата: ...
... добити на српском језику што је приказано на слици 8. Слика 8: 1GE Портал са интерфeјсом на српском језику Fig 8: 1GE Portal with interface in Serbian language Заокружење пројекта подразумева остале финалне активности након којих би карта требала да буде доступна на 1G-E Порталу, а то обухвата: - Валидацију ...
... Serbia. The tasks of each member ОneGeology-Europe Project was as following: - Metadata entry (in English and each native language) - Translation (to each native language: common geological vocabulary, keywords, portal components, metadata titles and abstract of all existing records) - Harmonization ...Danka Blagojević, Ranka Stanković, Petar Stejić, Velizar Nikolić. "Srbija u OneGeology Europe" in Zapisnici Srpskog geološkog društva za 2013. godinu, Beograd : Srpsko geološko društvo (2014)
-
Corpus-based bilingual terminology extraction in the power engineering domain
Ovaj rad predstavlja resurse i alate koji se koriste za ekstrkciju i evaluaciju dvojezične, englesko-srpske terminologije u domenu energetike. Resursi se sastoje od postojeće opšte i domenske leksike i domenskog paralelnog korpusa; alati uključuju ekstraktore termina za oba jezika i alat za poravnavanje segmenata koji pripadaju korpusnim rečenicama. Sistem je testiran variranjem funkcije podudaranja koja utvrđuje prisustvo ekstrahovanog termina u poravnatom segmentu (odsečak), u rasponu od veoma labavog do strogog. Procena rezultata je pokazala da je preciznost izdvajanja termina ...Tanja Ivanović, Ranka Stanković, Branislava Šandrih Todorović, Cvetana Krstev. "Corpus-based bilingual terminology extraction in the power engineering domain" in Terminology, John Benjamins Publishing Company (2022). https://doi.org/10.1075/term.20038.iva
-
Football terminology: compilation and transformation into OntoLex-Lemon resource
У овом раду представља се пројекат који је у развоју, креирање првог дигиталног фудбалског речника на српском језику, као и да демонстрација примене модела OntoLex и љегових модула. OntoLex-FrAC модул укључује информације о учесталости и примерима употребе екстрахованих из корпуса. У овом случају, креиран је корпус за специфичан домен под називом СрФудКо, који садржи чланке вести о фудбалу на српском језику. Вишечлани термини аутоматски су екстраховани из српског корпуса, а затим ручно евалуирани и класификовани као спортски или ...Jelena Lazarević, Ranka Stanković, Mihailo Škorić, Biljana Rujević. "Football terminology: compilation and transformation into OntoLex-Lemon resource" in LDK 2023 – 4th Conference on Language, Data and Knowledge, 12-15 September in Vienna, Austria, Lisabon : NOVA FCSH - CLUNL (2023). https://doi.org/10.34619/srmk-injj
-
Језички модели, шта је то?
Михаило Шкорић (2023)Михаило Шкорић. "Језички модели, шта је то?" in Језик данас, Нови Сад : Матица српска (2023)
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... especially its rich morphology, this is a complex task, and cor- responding language resources in the form of morphological e-dictionaries and grammars need to be applied (Vitas et al., 2012). For that reason, in the case of Serbian, it is not enough to extract terminology from the domain, but it also ...
... acquisition in Serbian. Rap- id changes in many knowledge domains mean that new terms are continuously being created and introduced in Serbian making important the automation of their retrieval and incorporation in Serbian terminological dictionaries. Due to spe- cific features of Serbian grammar, especially ...
... Terminological Da- tabase Using a Transducer Cascade. Proc. of Recent Advances in Natural Language Processing. (pp. 17-23). Baldwin, T., & Kim, S. N. (2010). Multiword expres- sions Handbook of Natural Language Processing, second edition. (267-292): CRC Press. Cerbah, F., & Daille, B. (2007). A ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... boundaries are not taken into consideration. This can par- tially solve the problem of the rich morphology that characterizes Serbian, as a language belonging to the South-Slavic Language family. For instance, scanning with lignit ‘lignite’ will also retrieve inflected forms lignita, lignitu, lignitom, etc ...
... bases lemmatization on morphological electronic dictionaries and finite state transducers for Serbian [6]. 4.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state ...
... query into SQL (Structured Query Language) form. The query generated in such a way searches the text of the subset of attributes in the database that correspond to the selected criteria of search. 4 The Improved Solution One of the problems of full text search in Serbian is its rich morphology, where ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
-
Развој геолошког терминолошког речника ГеолИССТерм
... from one language to another is done by choosing the appropriate tab, namely, the hierarchical tree structure on the right-hand side of the interface. Figure 8. geolISSTerm on the web – browsing the dictionary in Serbian and english The search by key word can be performed in Serbian and english ...
... lexical database SWN (Serbian WordNet), as a semantic network of words of Ser- bian (krstev et al., 2008) is indispensable in terms of the development of terminological resources with a rich semantic description. The first network of this kind was built for the english language at Princeton and provided ...
... etić, goradana. krstev, cvetana. Popović, ljuba. Obradović, Ivan. 2003: Processing Serbian Written Texts: An Overview of resources and Basic Tools, Proceedings of the Inter- national Workshop on Balkan language resources and Tools, Thessaloniki, greece, November 2003, S. Piperidis, v. karakaletsis ...Ranka Stanković, Branislav Trivić, Olivera Kitanović, Branislav Blagojević, Velizar Nikolić. "Развој геолошког терминолошког речника ГеолИССТерм" in INFOteka: časopis za informatiku i bibliotekarstvo, Beograd : Zajednica biblioteka univerziteta u Srbiji (2011)
-
A Lexical Approach to Acronyms and their Definitions
In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.... ∗(cvetana|vitas)@matf.bg.ac.rs, †ranka@rgf.bg.ac.rs Abstract In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow ...
... much more complex than the one represented at the beginning of this section and given in many Serbian orthography textbooks. Namely, the acronym in use may be derived from the name in a foreign language (the original name), while the translation of the name is in use, some func- tional words need ...
... rule. Acronyms, much as other words in Serbian, are charac- terized by grammatical categories of number and gender, and they may inflect in case. The inflection is expressed by inflectional endings added after a hyphen. However, according to the Serbian orthography as well as practice the inflection ...Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
-
Integrisano okruženje za pripremu paralelizovanog korpusa
Razvoj paralelizovanih korpusa zahteva pripremu paralelnih tekstova za njihovu integraciju u paralelizovani korpus. Reč je o jednom kompleksnom zadatku koji se može rešiti na različite načine, i koji mora da se odvija u nekoliko koraka. U ovom radu najpre je iznet postupak pripreme paralelnih tekstova za paralelizovani korpus koji se koristi u Grupi za jezičke tehnologije Univerziteta u Beogradu. Potom je dat kratak pregled programa (XAlign, Concordancier, WS4LR), odnosno softverskih alata koji se pri tome koriste. Nedostatak udobnog okruženja ...... would be united, motivated the Human Language Technology Group to embark on the task of developing an integrated environment for the preparation of aligned corpora, under the name of ACIDE. For the construction of this environment we chose the C# programming language. Among other things, ACIDE provides ...
... teksta na manje jedinice (paragrafe, rečenice, reči) i obeležavanje jedinica teksta. Samo obeležavanje se vrši korišćenjem XML (eXtensible Markup Language) obeležja1, u skladu sa preporukama TEI (Text Encoding Initiative) konzorcijuma2. U praksi se to uglavnom svodi na obeležavanje paragrafa i rečenica ...
... korisniku je omogućeno da generiše tekstualni zapis i XML oblik integrisanih podataka. Konačno, koristeći odgovarajuću XSLT (Extensible Stylesheet Language Transformations) transformaciju, korisnik može XML oblik transformisanih podataka prevesti u HTML i druge formate, u zavisnosti od vrste vizualizacije ...Ivan Obradović, Ranka Stanković, Miloš Utvić. "Integrisano okruženje za pripremu paralelizovanog korpusa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
-
English for Geology Students. 2
Lidija Beko (2023)... previous textbook with this one, putting their own principles of clarity and coherence as a way in which they wish to teach the subject of English language and geology. Six thematic units: 1. Landslides 2. Metamorphic rocks 3. Mineral deposits 4. Hydrological cycle and groundwater 5. Surface ...
... aking between registers while at the same time referring to active learning within the given context. Teaching vocabulary, which is the base of language knowledge, can be continued by creating paper vocabulary cards, and later even electronic cards, which would ensure continuity in vocabulary learning ...Lidija Beko. English for Geology Students. 2, Belgrade : The Faculty of Mining and Geology, 2023
-
Development of terminological resources for expert knowledge: a case study in mining
Ljiljana Kolonja, Ranka Stanković, Ivan Obradović, Olivera Kitanović, Aleksandar Cvjetić. "Development of terminological resources for expert knowledge: a case study in mining" in Knowledge Management Research & Practice, Palgrave Macmillan (2015). https://doi.org/10.1057/kmrp.2015.10
-
Developing Termbases for Expert Terminology under the TBX Standard
... geology, both monolingual for Serbian and multilingual with Serbian as one of the languages. The resources have been developed within the scope of various projects, but using the same platform, namely RDBMS SQL Server, with MS Visual Studio -.NET and C# programming language for application development ...
... intensively developed. Serbian e-dictionaries are being widely used for various language technology tasks, including termbase applica- tions. However, the system of dictionaries still lacks domain specific terms for some areas, such as mining and geology, which motivated the language technol- ogy team at ...
... section in the header of the file. Thus we added a newelement with a two character language code “sr” for in compliance with the IETF (Internet Engineering Task Force)? language tag and “Serbian” for *. Figure 8 represents the header of a TBX file generated by export from ... Ranka Stanković, Ivan Obradović, and Miloš Utvić. "Developing Termbases for Expert Terminology under the TBX Standard" in Natural Language Processing for Serbian - Resources and Applications, Belgrade : University of Belgrade, Faculty of Mathematics (2014)