Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

339 items

It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map

Olja Perišić, Ranka Stanković, Milica Ikonić Nešić, Mihailo Škorić (2023)

The paper will present the results of the project `“It-Sr-NER: Web services for named entities recognition, linking and mapping,” in which teams from the University of Turin and the Society for Language Resources and Technologies JeRTeh participated, and whose goal was the development of the It-Sr-NER web service for named entity annotations in the text and displaying them on the map. Named entities in these services are names of persons, places, organizations, demonyms (ethnicities), events and works of art.

General Engineering

Olja Perišić, Ranka Stanković, Milica Ikonić Nešić, Mihailo Škorić. "It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map" in Infotheca, Belgrade : Faculty of Philology, University of Belgrade (2023). https://doi.org/10.18485/infotheca.2023.23.1.3
Keyword-Based Search on Bilingual Digital Libraries

Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović (2017)

This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...

Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
Bilingual lexical extraction based on word alignment for improving corpus search

Jelena Andonovski, Branislava Šandrih, Olivera Kitanović (2019)

Library and Information Sciences,Computer Science Applications

Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model

Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević (2024)

Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...

zero-shot, few-shot, sentiment, Serbian, Mistral model

Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Sofia, Bulgaria, 9-10 September 2024, LREC | COLING (2024)
Rule-based Automatic Multi-word Term Extraction and Lemmatization

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac (2016)

In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...

term extraction, terminology, multi-word units, lemmatization, finite-state transducers

... from Serbian texts we have chosen a rule-based approach, which relies on a system of language resources such as morphological e-dictionaries and grammars developed within the University of Belgrade Human Language Technology Group (Vitas et al., 2012). For our approach, production of lemmas for ...
... Preece, A., Li, H. (Eds.), Natural Language Processing and Information Systems. Berlin: Springer, pp. 248--255. Koeva, S. (2007). Multi-word term extraction for Bulgarian. In Proc. of the Workshop on BSNLP: Information Extraction and Enabling Technologies, pp. 59--66. Krstev, C., Obradović ...
... and Kupść, A. (2007). Lemmatization of Polish person names. In Proc. of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies, Stroudsburg: Association for Computational Linguistics, pp. 27--34. Savary, A., Zaborowski, B., Krawczyk-Wieczorek ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
Open Educational Resources in Serbia

Ivan Obradović, Ranka Stanković, Marija Blagojević, Danijela Milošević (2020)

Open educational resources, BAEKTEL, Metadata portal

... learning platform. She published more than 100 papers in journals and proceedings of scientific conferences, most of them in the area of human language technologies and more than 15 related to TEL. Marija Blagojević University of Kragujevac, Faculty of Technical Sciences Čačak Svetog Save 65 ...
... incorporate knowledge from various language and lexical resources. She is head of Computer Centre for the Mining department, Chairman of Technical comity A037 Terminology in Institute for Standardisation of Serbia and vice president of Language Resources and Technologies Society (JERTEH). She actively ...
... Topics of content (titles and keywords) are visualised by a word cloud in Figure 3. It can be seen that computer science, modeling and language technologies are dominant. 12 Chapter # - will be assigend by editors Figure 3. Word cloud of ...
Ivan Obradović, Ranka Stanković, Marija Blagojević, Danijela Milošević. "Open Educational Resources in Serbia" in Current State of Open Educational Resources in the “Belt and Road” Countries, Springer Singapore (2020). https://doi.org/10.1007/978-981-15-3040-1_10
Resource-based WordNet Augmentation and Enrichment

Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev (2018)

In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset deﬁnitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...

WordNet, bilingual resources, term alignment, parallel lists

... of this approach to wordnet enrichment. 1. Introduction Semantic networks, such as wordnets, are among the most important resources in Human Language Technologies. Thus, for example, the Princeton WordNet - PWN (Fellbaum, 1998), has been in use for more than two decades as the standard lexical database ...
... management and semantic web technologies compliant to W3C recommendations, as well as latest trends in thesaurus standards. For this research we used the bilingual en-sr version 4.7 in xls format, with 6,939 term entries, and 6,971 aligned pairs of terms. Microsoft language portal10 has published Microsoft ...
... form of a .tbx (ISO 30042:2008) file containing: Concept ID, Definition, Source term, Source language identifier, Target term, Target language identifier. The number of terms differ from language to language, due to varying levels of localization. The Microsoft Terminology Collection is a set of standard ...
Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić (2020)

The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...

Part-of-Speech tagging, lemmatization, corpus, evaluation, Serbian, morphological dictionary

... Computational Linguistics: Human Language Technologies, pages 271–281. Constant, M., Krstev, C., and Vitas, D. (2018). Lexical analysis of serbian with conditional random fields and large-coverage finite-state resources. In Zygmunt Vetu- lani, et al., editors, Human Language Technology. Chal- lenges ...
... taggers as well as new tagging technologies will be taken into consideration and tested in order to find the best solu- tion for Serbian, a highly-inflected language without fixed word order, for instance RNNTagger.9 Since CRF tagger for Serbian and Croatian language obtained the accuracy over 98% ...
... (2009). Coupling an annotated corpus and a morphosyntactic lexicon for state-of-the-art POS tagging with less human effort. In Proceedings of the 23rd Pacific Asia Conference on Language, Informa- tion and Computation, PACLIC 23, Hong Kong, China, December 3-5, 2009, pages 110–119. Erjavec, T. (2012) ...
Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
A Data Driven Approach for Raw Material Terminology

Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)

The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...

sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci

... documentation using human language technology. Electron. Libr. 2018, 36, 993–1009. [CrossRef] 32. Stanković, R.; Krstev, C.; Lazić, B.; Škorić, M. Electronic Dictionaries—from File System to lemon Based Lexical Database. In Proceedings of the Eleventh International Conference on Language Resources and ...
... and L1 in the CLIL Classroom. In Proceedings of the Second International Conference on Teaching English for Specific Purposes and New Language Learning Technologies, Niš, Serbia, 22–24 May 2015; Faculty of Electronic Engineering, University of Niš: Niš, Serbia, 2015. 24. Termi—Terminological Web A ...
... approach. A monolingual corpus from the mining domain was developed as part of a project related to managing mining project documentation using human language technology [31] and used within this research in the web and mobile applications. 2.3. General Purpose Morphological Dictionaries Serbian has ...
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
An aproach to Implementation of blended learning in a university setting

Ivan Obradović, Ranka Stanković, Olivera Kitanović, Jelena Prodanović (2011)

... systems, Information system design, and GIS (Geographic Information System) technologies. InfoTech course material is hence not organized on a weekly basis but rather by different topics. Namely, GIS technologies are, for example, one of the topics of the Information system design course ...
... staff of 143 professors and teaching assistants. The development environment for the production of the portal was based on the PHP scripting language, and the portal database was implemented on MS SQL Server 2008. For each of the several hundred courses available at FMG CMS two types of ...
... platform. From a technical point of view, it is a web application for creating Internet-based courses and web sites developed using the PHP scripting language (Hypertext Preprocessor), with a SQL type data base (for example MySQL, PostgreSQL, Microsoft SQL Server or Oracle). It can be run on Windows ...
Ivan Obradović, Ranka Stanković, Olivera Kitanović, Jelena Prodanović . "An aproach to Implementation of blended learning in a university setting" in Proceedings of the Second International Conference on e-Learning, eLearning 2011, September 2011, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2011)
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain

Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović (2021)

U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...

Srpski jezik, semantika okvira, FrameNet, scenario rizika, rudarski korpus, obrada prirodnog jezika

... different language resources, as well as the Sketch Engine corpus analysis tool. We have shown that FrameNet offers a detailed and structured mapping, which can then be used in different ways for language processing, especially in text extraction and organizing, as well as in an effort to make human- computer ...
... Serbian.” In Proceedings of The 12th LREC – Language Resources and Evaluation Conference, 3954–3962. Tomašević, Aleksandra, Ranka Stanković, Miloš Utvić, Ivan Obradović, and Božo Kolonja. 2018. “Managing mining project documentation using human language technology.” The Electronic Library, https://doi ...
... domain of mining started as part of a mining project documentation management project using language 18. Data for the frame Risky_situation 22 Infotheca Vol. 21, No. 1, September 2021 Scientific paper technologies (Tomašević et al. 2018, 996). Back then, the corpus contained texts from the domain of ...
Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration

Cvetana Krstev, Ranka Stanković, Vitas Duško (2010)

In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...

Morphology, Lexicon, lexical database, Standards for LRs

... 33-40. Savary, A. (2008). Computational Inflection of Multi-Word Units – A Contrastive Study of Lexical Approach, In: Linguistic Issues in Language Technologies, Vol. 1, No. 2, CSLI Publications. 819 ...
... 07 Language resource management - Feature Structures – Part 2: Feature System Declaration, ISO/TC 37/SC 4. ISO. (2009) ISO 12620 Terminology and other language and content resources – Data Categories – Specification of data categories and management of a data category registry for language resources ...
... the satisfactory solution. 1. Motivation Description of morphological features of a language is a prerequisite for many NLP applications. This description can be simple or complex depending both on a language and application in question. Considerable efforts in standardizing such a description ...
Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
An Approach to Efficient Processing of Multi-Word Units

Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas (2013)

Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...

Natural Language Processing, Grammatical Category, Lexical Representation, MWU, multi-word unit

... A.: Slavonic information extraction and partial parsing. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and En- abling Technologies, ACL ’07, pp. 1–10. Association for Computational Linguistics, Strouds- burg, PA, USA (2007). URL http://dl.acm.or ...
... ée (2000) 16. Savary, A.: Computational Inflection of Multi-Word Units — A Contrastive Study of Lexical Approaches. Linguistic Issues in Language Technologies 1(2) (2008) 17. Savary, A.: Multiflex: A Multilingual Finite-state Tool for Multi-Word Units. In: CIAA, pp. 237–240 (2009) 18. Savary, A. ...
... before, most of these conditions are satisfied for many languages. However, in order to apply this functionality to a new language it would be necessary to develop a new language- dependent strategy, that is, a new XML document. It is also worth mentioning that the system can be easily modified to ...
Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
Improvement of geodatabase queries within GeolISS

Ranka Stanković (2008)

... Belgrade) IMPROVEMENT OF GEODATABASE QUERIES WITHIN GEOLISS Abstract: We present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for improvement of queries for the geodatabase within the Geological information ...
... languages, such as Serbian. The geological dictionary, developed within GeolISS, supports semantic and multilingual expansions of the query. The Human Language Technology group at the University of Belgrade (HLT) has been developing various lexical resources over a long period, the resources reaching ...
... adjustable tool, a workstation for language resources, labeled WS4LR, which greatly enhances the potential of manipulating each particular resource as well as several resources simultaneously [9]. This tool has already been successfully used for various language processing related tasks including ...
Ranka Stanković. "Improvement of geodatabase queries within GeolISS" in Review of the National Center for Digitization, Beograd : Faculty of Mathematics, Belgrade (2008)
Combining Heterogeneous Lexical Resources

Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić (2004)

development of lexical resources, morphological dictionaries, WordNet

... (IDE), which allows them to share tools and facilitates the creation of mixed-language solutions. In addition, these languages leverage the functionality of the .NET Framework, which provides access to key technologies that simplify the development of ASP Web applications and XML Web services ...
... the other hand, the XML Schema definition language (XSD) enables the definition of the structure and data types of XML documents. Figure 1 shows the graphical representation of XSD schema of Serbian WN. The XML Path Language (XPath) provides a language for addressing parts of an XML document ...
... Resources | Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić | Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4 | 2004 | | http://dr.rgf.bg.ac.rs/s/repo/item/0004863 Дигитални репозиторијум Ру ...
Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++

Branislava Šandrih, Ranka Stanković (2020)

U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...

ekstrakcija terminologije, validacija terminologije, GIZA++, grafovi, Unitex, klasifikacija teksta

... Texts”. Natural Language Engi- neering Vol. 22, no. 4 (2016): 517–548 Koehn, Philipp, Franz Josef Och and Daniel Marcu. “Statistical Phrase- based Translation”. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology ...
... “debugger”, the transcribed version is adopted for everyday use in Information Technologies domain. It is a challenge to produce and maintain up-to-date terminology re- sources, especially for an under-resourced language, such as Serbian. Today, Serbian terminology is transferred mainly from English ...
... domain terms for the source language (Input ii) is (a) the source language part of LIS-dict including SWTs; (b) the output of the extractor Eng-TE applied to the source language part of the aligned input corpus; 3. The extraction of the set of MWTs in the target language by Serb-TE (Input iii) was ...
Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
An Italian-Serbian Sentence Aligned Parallel Literary Corpus

Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić (2023)

This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...

Aligned corpus, parallel corpus, Serbian, Italian, literature

Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian

Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih (2021)

Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...

uvredljivi jezik, govor mržnje, srpski, tviter, leksikon, korpus

... n for Computational Linguistics: Human Language Technologies, June 1ŰJune 6, 2018, New Orleans, Louisiana, Vol. 1, 2018. 47 Michael Wiegand, Melanie Siegel, and Josef Ruppenhofer. Overview of the germeval 2018 shared task on the identification of offensive language. In Proceedings of GermEval 2018, ...
... information is a crucial component in human language technology, the FrAC module facilitates sharing and utilising this valued information [9], as presented in Listing 3. 4 Discussion and conclusion In this paper, we presented AbCoSER 1.0, the first corpus of abusive language in Serbian which consists of tweets ...
... as a language successfully, and thus the language column of a tweet could not be relied upon, the annotators were given one more task – to check the language of a tweet and whether it could be interpreted. They needed to mark tweets with meaningless content, tweets written in a foreign language or m ...
Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment

Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...

lexical semantic resources, sense alignment, lexicography, language resource

... Bulgarian were par- tially funded by the Bulgarian National In- terdisciplinary Research e-Infrastructure for Resources and Technologies in favor of the Bulgarian Language and Cultural Heritage, part of the EU infrastructures CLARIN and DARIAH – CLaDA-BG, Grant number DO1- 272/16.12.2019. This work ...
... - The Repository is available at: www.dr.rgf.bg.ac.rs Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 3232–3242 Marseille, 11–16 May 2020 c© European Language Resources Association (ELRA), licensed under CC-BY-NC 3232 A Multilingual Evaluation Dataset ...
... eu/MWSA. Keywords: lexical semantic resources, sense alignment, lexicography, language resource 1. Introduction Lexical semantic resources (LSRs) are knowledge reposi- tories that provide the vocabulary of a language in a de- scriptive and structured way. One of the famous examples of LSRs are ...
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
Using Lexical Resources for Irony and Sarcasm Classification

Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković (2017)

The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...

... the 49th Annual Meeting of the ACL: Human Language Technologies: short papers – Volume 2. Association for Computational Linguistics, 564–568. [10] Matthieu Constant, Cvetana Krstev, and Duško Vitas. 2015. Hybrid Lexical Tagging in Serbian. In Proc. of 7th Language & Technology Conference. Fundacja U ...
... 1145/3136273.3136298 1 INTRODUCTION There are many different theories on what irony is and what role it plays in language understanding. According to [33] “Irony is . . . a uniquely human mode of communication, curious in that the speaker says something other than what he or she intends”. Like- wise ...
... annotators were asked to decide whether the language of the tweet was recognized and whether the tweet represents an ironic statement.13 The results of the language tagging were used to estimate a binary language classifier (BCMS or not_BCMS). After the language classification we obtained a subset of 1 ...
Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/

Претрага

339 items

It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map cite

Keyword-Based Search on Bilingual Digital Libraries cite

Bilingual lexical extraction based on word alignment for improving corpus search cite

Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model cite

Rule-based Automatic Multi-word Term Extraction and Lemmatization cite

Open Educational Resources in Serbia cite

Resource-based WordNet Augmentation and Enrichment cite

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian cite

A Data Driven Approach for Raw Material Terminology cite

An aproach to Implementation of blended learning in a university setting cite

FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain cite

A Description of Morphological Features of Serbian: a Revision using Feature System Declaration cite

An Approach to Efficient Processing of Multi-Word Units cite

Improvement of geodatabase queries within GeolISS cite

Combining Heterogeneous Lexical Resources cite

Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++ cite

An Italian-Serbian Sentence Aligned Parallel Literary Corpus cite

A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian cite

A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment cite

Using Lexical Resources for Irony and Sarcasm Classification cite

It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map

Keyword-Based Search on Bilingual Digital Libraries

Bilingual lexical extraction based on word alignment for improving corpus search

Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model

Rule-based Automatic Multi-word Term Extraction and Lemmatization

Open Educational Resources in Serbia

Resource-based WordNet Augmentation and Enrichment

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

A Data Driven Approach for Raw Material Terminology

An aproach to Implementation of blended learning in a university setting

FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain

A Description of Morphological Features of Serbian: a Revision using Feature System Declaration

An Approach to Efficient Processing of Multi-Word Units

Improvement of geodatabase queries within GeolISS

Combining Heterogeneous Lexical Resources

Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++

An Italian-Serbian Sentence Aligned Parallel Literary Corpus

A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian

A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment

Using Lexical Resources for Irony and Sarcasm Classification