Претрага
2342 items
-
Веб-алат за управљање грађом Речника САНУ и анотација листића
Грађа на основу које се израђује Речник српскохрватског књижевног и народног језика САНУ, а која садржи материјал из преко 4.500 писаних извора и 300 рукописних збирки речи са подручја народних говора штокавског наречја, забележена је на око 5.000.000 листића. Богат лексички материјал, који обухвата књижевни и народни језик у протекла два века и на основу кога треба да се напише још најмање 15 томова Речника, пружа могућност и за разноврсна лингвистичка и ванлингвистичка истраживања. Из тог разлога се приступило ...Рада Стијовић, Ранка Станковић, Михаило Шкорић. "Веб-алат за управљање грађом Речника САНУ и анотација листића" in Rasprave Instituta za hrvatski jezik i jezikoslovlje, Institute of Croatian Language and Linguistics (2020). https://doi.org/10.31724/rihjj.46.2.32
-
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
-
Corpus-based bilingual terminology extraction in the power engineering domain
Ovaj rad predstavlja resurse i alate koji se koriste za ekstrkciju i evaluaciju dvojezične, englesko-srpske terminologije u domenu energetike. Resursi se sastoje od postojeće opšte i domenske leksike i domenskog paralelnog korpusa; alati uključuju ekstraktore termina za oba jezika i alat za poravnavanje segmenata koji pripadaju korpusnim rečenicama. Sistem je testiran variranjem funkcije podudaranja koja utvrđuje prisustvo ekstrahovanog termina u poravnatom segmentu (odsečak), u rasponu od veoma labavog do strogog. Procena rezultata je pokazala da je preciznost izdvajanja termina ...Tanja Ivanović, Ranka Stanković, Branislava Šandrih Todorović, Cvetana Krstev. "Corpus-based bilingual terminology extraction in the power engineering domain" in Terminology, John Benjamins Publishing Company (2022). https://doi.org/10.1075/term.20038.iva
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... Resource-based WordNet aug- mentation and enrichment. In Svetla Koeva, editor, Pro- ceedings of the Third International Conference Com- putational Linguistics in Bulgaria (CLIB 2018), pages 104–114, Sofia, Bulgaria, May. Institute for Bulgar- ian Language “Prof. Lyubomir Andreychin”, Bulgarian ...
... Compu- tational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 105– 112. Association for Computational Linguistics. Niemann, E. and Gurevych, I. (2011). The people’s web meets linguistic knowledge: Automatic sense alignment of Wikipedia and WordNet. In ...
... Portuguesa Contem- porânea (DLPC, (Casteleiro, 2001)) and Dicionário Aberto (DA)16 were used. Russian Ozhegov and Shvedova’s ”The Dictionary of the Russian Language” (Ozhegov and Shvedova, 1992) and the Dictionary of the Russian Language edited by A.P. Evgenyeva, or Maliy Akademicheskiy Slo- ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
-
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... Sciences de l’Ingénieur and Institute for Multilingual and Multimedia In- formation: Joseph Mariani Evaluations and Language Resources Distribution Agency: Khalid Choukri Холандија Netherlands Utrecht Institute of Linguistics, Utrecht Univ.: Jan Odijk Computational Linguistics, Univ. of Groningen: Gertjan ...
... from the field of computational linguistics are present within com- puter science, electronics, library science, linguistics and psychology studies at the Universities of Belgrade and Novi Sad. Courses offered to students cover the ba- sic concepts of natural language processing, but they aim to educate ...
... Rosner Немачка Germany Language Technology Lab, DFKI: Hans Uszkoreit, Georg Rehm Human Language Technology and Pattern Recognition, RWTH Aachen Univ.: Hermann Ney Dept. of Computational Linguistics, Saarland Univ.: Manfred Pinkal Норвешка Norway Dept. of Linguistic, Literary and Aesthetic Studies, Univ ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
-
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...... collaborations and future goals.” Language Resources and Evaluation 46 (2): 269–286. Boas, Hans C., and Ryan Dux. 2017. “From the past into the present: From case frames to semantic frames.” Linguistics Vanguard 3 (1): 20160003. https://doi.org/doi:10.1515/lingvan-2016-0003. Brač, Ivana, and Ana Ostroški ...
... 1976. “Frame semantics and the nature of language.” In Annals of the New York Academy of Sciences: Conference on the origin and development of language and speech, 280:20–32. 1. New York. Fillmore, Charles J. 1982. “Frame semantics.” Linguistic society of Korea (ed.), Linguistics in the morning calm, ...
... Hamilton, Craig, Svenja Adolphs, and Brigitte Nerlich. 2007. “The meanings of ‘risk’: A view from corpus linguistics.” Discourse & Society 18 (2): 163–181. Jurafsky, Dan, and James H Martin. 2020. “Semantic Role Labeling and Ar- gument Structure.” Chap. 19 in Speech and Language Processing, 3rd ed. December ...Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
-
EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School
Prva škola za obuku polaznika koju je organizovala COST akcija NexusLinguarum održana je od 8. do 12. februara 2021. godine sa ciljem da studenti, istraživači i stručnjaci nauče osnove lingvističke nauke o podacima. Tokom obuke polaznici su se upoznali sa širokim spektrom tema: od semantičkog veba, RDF -a i ontologija, do modeliranja i pretraživanja jezičkih podataka pomoću najsavremenijih ontoloških modela i alata. Škola je održana u okviru serije letnjih škola EUROLAN-a i organizovalo ju je virtuelno (onlajn) nekoliko instituta; ...nauka o lingvističkim podacima, povezani podaci u lingvistici, jezički podaci, EUROLAN, NexusLinguarum, COST akcija, škola za obuku... of computational linguistics and natural language processing (NLP). The goal of this 15th EUROLAN School was to bring together scholars, teachers and students of linguistics, NLP and information technology to discuss the principles and best practices for repre- senting, publishing and linking linguistic ...
... RDFS, RDF(S), RDF-S, or RDF/S), Web Ontology Language (OWL),5 etc.); – SPARQL query language- a semantic query language for databases able to retrieve and manipulate data stored in the RDF format; 2. EUROLAN 3. Deliverable D1.1 4. Introducing Linked Data and the Semantic Web 5. OWL 114 Infotheca Vol ...
... the JeRTeh14 Language Resources and Technologies So- ciety set up a local installation of VocBench15 and, apart from JeRTeh mem- bers, it was used by students and teachers of the Intelligent Systems PhD program16 at the University of Belgradefor the subjects Knowledge repre- sentation and Semantic web ...Milan Dojchinovski, Julia Bosque Gil, Jorge Gracia, Ranka Stanković. "EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.7
-
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... automata in the lexical representation of natural language. In: Electronic dictionaries and automata in computational linguistics, Lecture Notes in Computer Science, vol. 377, pp. 34–50. Springer (1989) 6. Krstev, C.: Processing of Serbian — Automata, Texts and Electronic Dictionaries. Faculty of Philology ...
... Obradović, I., Utvić, M.: E-dictionaries and finite-state automata for the recognition of named entities. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pp. 48–56. Association for Computational Linguistics, Blois, France (2011). URL http://www ...
... Przepiórkowski, A.: Slavonic information extraction and partial parsing. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and En- abling Technologies, ACL ’07, pp. 1–10. Association for Computational Linguistics, Strouds- burg, PA, USA (2007). URL http://dl ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
-
Managing mining project documentation using human language technology
Purpose: This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing. Design/methodology/approach: The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases. Findings: The ...Digital libraries, Information retrieval, Data mining, Human language technologies, Project documentationAleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja . "Managing mining project documentation using human language technology" in The Electronic Library (2018). https://doi.org/10.1108/EL-11-2017-0239
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... model and software solution can be successfully used for the other volumes as well. Keywords: computer lexicography, lexical database, language resources, dictionary, Serbian language 1 Introduction The first volume of the Dictionary of the Serbo-Croatian Standard and Vernacular Language (re- ferred ...
... public. References Ahačič, K., Ledinek, N., & Perdih, A. (2015). Fran: The Next Generation Slovenian Dictionary Portal. In Natural Language Processing, Corpus Linguistics, Lexicography. Eight International Conference Bratislava, Slovakia, pp. 21-22. Berg, D. L., Gonnet, G. H., & Tompa, F. W. (1988) ...
... olga011@yahoo.com Abstract In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Ver- nacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... Viviana Patti, and Cristina Bosco. Hate speech annotation: Analysis of an Italian twitter corpus. In 4th Italian Conference on Computational Linguistics, CLiC-it 2017, volume 2006, pages 1–6. CEUR-WS, 2017. 34 Amir H Razavi, Diana Inkpen, Sasha Uritsky, and Stan Matwin. Offensive language detection using ...
... Wiegand, Josef Ruppenhofer, Anna Schmidt, and Clayton Greenberg. Inducing a lexicon of abusive words – a feature-based approach. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 1ŰJune 6, 2018, New Orleans ...
... Association for Computational Linguistics, 2019. doi:10.18653/v1/s19-2010. 52 Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, and Çağrı Çöltekin. SemEval-2020 Task 12: Multi- lingual Offensive Language Identification in Social ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
-
Part of Speech Tagging for Serbian language using Natural Language Toolkit
Ranka Stanković, Boro Milovanović (2020)Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...... Krstev, and D. Vitas “Lexical Analysis of Serbian with Conditional Random Fields and Large-Coverage Finite-State Resources”, Proc. 7th Language and Technology Conference (LTC), Poznan, Poland, Nov. 2015 [8] N. Ljubešić, F. Klubička, Ž. Agić, and I. Jazbec, “New inflectional lexicons and training ...
... Soria, and K. Choukri, “Language Resources Production Models: the Case of the INTERA Multilingual Corpus and Terminology,” Proc. Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy, May 2006 [11] D. l. Tufis, S. Koeva, T. Erjavec, M. Gavrilidou, and C. Krstev ...
... Šandrih, C. Krstev, M. Utvić, and M. Škorić, “Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian,” Proc. International Conference on Language Resources and Evaluation, pp. 3954‑3962, May 2020 [24] A. Akbik, D. Blythe, and R. Vollgraf, “Contextual String ...Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)
-
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...... crossroad between linguistics, psycholinguistics, computer science, engineering and statistics, as we go more in-depth, experts from more narrow fields are required. For example, NLP tools for language learning must connect to Second Language Acquisition (SLA) and Foreign Language and Teaching (FLTL) ...
... resources, Natural Language Processing, Terminology 1. INTRODUCTION Natural Language Processing (NLP) has a two-faceted approach to education where one involves e-learning and computer-assisted learning and instruction and the other consists of NLP tools for analysis and use of language by machines [1] ...
... of Serbian language and the complexity of terms (they are the most often composed of two or more words called multi word units) it is not a simple process. Members of Language Resources and Technologies Society developed semiautomatic approach for term recognition, extraction and lemmatization ...Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
-
Development of Open Educational Resources (OER) for Natural Language Processing
In this paper we present the development of an online course at the edX BAEKTEL platform named “Lexical Recognition in the Natural Language Processing (NLP)”. It is based on the course of the same name for PhD studies at the University of Belgrade, Faculty of Philology. There are not many courses in Computational Linguistics (CL) on OER platforms, and there is none in Serbian either for CL or NLP. We have developed this course in order to improve this ...... COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING Computational linguistics (CL) is a theoretical discipline between linguistics and computer science concerned with understanding and modelling the written and spoken language from a computational aspect.[3]Natural Language Processing ...
... mining and environmental protection, geology and natural language processing, the last being in the focus of this paper. Why Study Natural Language Processing (NLP) and Computational Linguistics (CL)? Natural language processing is the technology for dealing with human language, as it ...
... field in Serbia, broadened its influence, and established the group of interested researchers. The first courses at University of Belgrade were introduced in 1994: in Mathematical and CL for students of General Linguistics and students of Serbian language at the Faculty of Philology, as well as ...Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević. "Development of Open Educational Resources (OER) for Natural Language Processing" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
-
Frequency and Length of Syllables in Serbian
Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová (2019)Basic analyses of several properties of syllables (the rank-frequency distribution, the distribution of length, and the relation between length and frequency) in Serbian is presented. The syllabification algorithm used combines the maximum onset principle and the sonority hierarchy. Results indicate that syllables behave similarly to words as far as mathematical models are concerned, but values of parameters in models for syllables are quite different from those for words.... (1996). A comparison of lexeme and speech syllables in Dutch. Journal of Quantitative Linguistics 3, 8-28. Schiller, N.O., Meyer, A.S., Levelt, W.J.M. (1997). The syllabic structure of spoken words: Evidence from the syllabification of intervocalic consonants. Language and Speech 40, 103-140. Stanojčić ...
... framework of quantitative linguistics in several other papers. However, borders between syllables were determined either using language-specific rules (Obradović et al., 2010, for Serbian; Meštrović et al, 2015, for Croatian), or using the approach suggested by Pulgram (1970) and modified by Lehfeldt (1971) ...
... word length. In: Grzybek, P. (ed.), Contributions to the Science of Text and Language: 117-156. Dordrecht: Springer. Best, K.-H. (2005). Wortlänge. In: Köhler, R., Altmann, G., Piotrowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 260-273. Berlin, New York: de Gruyter. ...Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová. "Frequency and Length of Syllables in Serbian" in Glottometrics (2019)
-
Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian
Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Jan Mačutek (2021)Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Jan Mačutek. "Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian" in Language and Text: Data, models, information and applications, John Benjamins Publishing Company (2021). https://doi.org/10.1075/cilt.356.04ruj
-
Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC
OntoLex, dominantni standard zajednice za mašinski čitljive leksičke resurse u kontekstu RDF-a, Linked Data i tehnologija Semantičkog veba, trenutno se proširuje sa posebnim modulom za Frekvencije, Primere i Informacije zasnovane na Korpusu (OntoLex-FrAC). Predlažemo novi komponent za OntoLex-FrAC, koji se bavi inkorporacijom korpusnih upita za (a) povezivanje rečnika sa korpusnim mašinama, (b) omogućavanje RDF baziranih web servisa da dinamički razmenjuju korpusne upite i podatke odgovora, i (c) korišćenje konvencionalnih upitačkih jezika za formalizaciju unutrašnje strukture kolokacija, skica reči i ...standardizacija, digitalna leksikografija, OntoLex, upiti korpusa, povezani podaci, Lingvistički povezani otvoreni podaciChristian Chiarcos, Ranka Stanković, Maxim Ionov, Gilles Sérasset. "Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC" in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin, 20-25 May 2024, LREC (2024)
-
E-Dictionaries and Finite-State Automata for the Recognition of Named Entities
Krstev Cvetana, Vitas Duško, Obradović Ivan, Utvić Miloš. "E-Dictionaries and Finite-State Automata for the Recognition of Named Entities" in Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, FSMNLP 2011, July 2010, Blois, France, A. Maletti and M. Constant (eds.), :Association for Computational Linguistics (2011): 48-56
-
Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking
U radu se prikazuju rezultati istraživanja vezanih za pripremu paralelnih korpusa, fokusirajući se na transformaciju u RDF grafove koristeći NLP Interchange Format (NIF) za lingvističku anotaciju. Pružamo pregled paralelnog korpusa koji je korišćen u ovom studijskom slučaju, kao i proces označavanja delova govora, lematizacije i prepoznavanja imenovanih entiteta (NER). Zatim opisujemo povezivanje imenovanih entiteta (NEL), konverziju podataka u RDF, i uključivanje NIF anotacija. Proizvedene NIF datoteke su evaluirane kroz istraživanje triplestore-a korišćenjem SPARQL upita. Na kraju, razmatra se povezivanje Linked ...paralelni korpusi, povezivanje imenovanih entiteta, prepoznavanje imenovanih entiteta, NER, NEL, povezani podaci, NIF, VikipodaciRanka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović. "Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
A Method for Extracting Translational Equivalents from Aligned Texts
Obradović Ivan (2013)Obradović Ivan. "A Method for Extracting Translational Equivalents from Aligned Texts" in Methods and Applications of Quantitative Linguistics, Selected papers of the 8th International Conference on Quantitative Linguistics (QUALICO) in Belgrade, Serbia, April 26-29, 2012, Ivan Obradović, Emmerich Kelih, Reinhard Köhler (eds.), :University of Belgrade & Academic Mind (2013): 119-129