Претрага
88 items
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... technique used in (Naguib Sabtan, 2016), groups of aligned sentences (verses) were used. In (Irvine and Callison-Burch, 2016) authors performed two experiments, the first one relying on the existence of a bilingual dictio- nary with no parallel texts and the second one requiring only the existence of ...
... options, thus obtaining 8 different experimental settings: 1. The input domain aligned corpus (Input i) consists of: (a) the aligned corpus LIS-corpus; (b) the aligned corpus LIS-corpus extended with the bilingual aligned pairs bi-list (LIS-corpus+); 2. The list of domain terms for the source language ...
... sentence-aligned domain-specific corpus involving a source and a target language, denoted as S(text.align) ↔ T (text.align). In this paper we refer to this tool as LIS-corpus. As a textual resource, twelve issues with a total of 84 papers were aligned at the sentence level resulting in 14,710 aligned segments ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... (32%). The bilingual corpus of texts aligned on the sentence level was produced from the bilin- gual digital library Bibliša. The initial set of 55 documents containing 4831 aligned Serbian- English sentences [29] was enlarged with 44 new documents containing 12,657 aligned sentences from the raw material ...
... Underground Mining, published both in Serbian and English, stored in the bilingual digital library Bibliša, as one of the collections of aligned English-Serbian bi-texts [29,30], were also used in our approach. A monolingual corpus from the mining domain was developed as part of a project related to managing ...
... characteristics of the distribution of the sample sentences extracted from the corpus that contains different texts. The approach was adapted to work also for English and to be applied for bilingual aligned sentences. For ranking, we have used a weighted score derived from lexical features (e.g., sentence length ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
-
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model
Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Sofia, Bulgaria, 9-10 September 2024, LREC | COLING (2024)
-
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... Architecture and Urbanism presently has just 10 papers. Project reports included originate from the BAEKTEL Tempus project. All bilingual texts are aligned at the sentence level, represented in a TMX1 (Translation Memory eXchange) format, and stored in the MarkLogic NoSQL database. Text collections ...
... lexical resources, access to aligned resources, etc.) 4 System components In designing Bibliša special attention is given to its language support component. It supports various aspects of multilingual libraries: its content is not only multilingual, but also aligned and it can be searched in any ...
... tool ACIDE (Aligned Corpora Integrated Development Environment) (Utvić et al., 2007). The TMX document consists of TU2 (Translation Unit) and TUV (Translation Unit Variant) elements, where each TUV is a segment in one of the languages. The following example illustrates a single aligned segment (TU) ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
-
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
Mihailo Škorić (2017)The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...... the system would be language-independent as well. If it turns out to be valid, this method could allow machine learning the usage of huge corpus of texts that are pre-labeled with determiners. 1.1 Review of their former similar studies In 2005 a series of experiments with the classification of mood ...
... The main idea of this experiment is to prove that it is possible to: – build an inverted index of terms in a language-neutral way using a corpus of texts that contain known determiners. – automatically assign values to terms on positive-negative scale using those determiners, so that specific values ...
... successful and its final outcome satis- factory, three prerequisites should be met: – collected corpus must be organized in a certain way; – collected texts and messages must contain determiners that would help assign a value to a nearby term; – determiners must have a predetermined value. In the following ...Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
-
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...... Internet. This component consists of morphological dictionaries, WordNet, domain specific terminological resources such as GeolISSterm, RudOnto, aligned texts in TMX format, corpora etc. Special attention will be given to Termi, newly developed application for terminology management. Keywords: Open ...
... development in scientific research which constantly produces new terms which need to be translated in other languages. There is a huge amount of texts available on the Internet which is growing daily and needs to be translated for different purposes, at the same time paying attention to terminology ...Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
-
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење... distribution of examples from this corpus is compared with the feature distribution of sentence samples extracted from corpora comprising various texts. The analysis showed that there is a group of features which are strong indicators that a sentence should not be used as an example. The remaining ...
... dictionary is conceived as a thesaurus, meant primarily for native speakers. Its primary goal is to help understanding words from different kinds of texts (receptive use of dictionary). It covers a large portion of the vocabulary of the Serbian language, standard and vernacular, for the last 200 years ...
... from the beginning of the 19th century to the present day, as well as about 300-word collections (for details see Stanković et al., 2018). Written texts, as well as word collections, come from what used to be the SC language territory. According to the Style Guide2, lexicographers have to choose two ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
-
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... some resources that will facilitate abusive language detection already exist. Serbian Morphological Dictionaries are certainly a staple in processing texts in Serbian (Krstev, 2008). In order to process implicitly abusive language, we need to take into account the usage of non-literal language, the rhetorical ...
... words in each category. 4.2 Lexical Representation of Multi-Word Abusive Expressions In order to enable the detection of abusive language in Serbian texts it is necessary to represent in a lexicon both simple- and multi-word abusive expressions. Lexical representation should address various aspects of ...
... complemented with finite-state automata (FSA) that deal with word order, model complements, etc. and that are used to retrieve verbal expressions in texts. So far three classes of V N were modelled, covering 68 ver- bal MWEs.2 This approach enables formulation of elaborate retrieal queries, similar to ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... con- version of the lexicon. Przepiorkowski and asso- ciates (2007) present results of automatic extraction of term definitions from unstructured texts in Bulgarian, Czech and Polish by use of regular grammars. There are also combinations of the two ap- proaches (Rodrıguez et al., 2007). Sag et al ...
... the corresponding DELAS word is assigned to the lemma. 4. For thresholds 80 and less steps 1 and 2 only are repeated. From a sample of domain texts and dictionar- ies we manually filtered 623 new terms from domains of mining, geology and e-learning and applied the described procedure for FST class ...
... transporting device measured from the vertical excavator rotation axis to the front edge of the caterpillar”. 4.2 Extraction of MWUs from domain texts The extraction of MWUs from a text is preceded by the retrieval of new simple word terms from it and their incorporation in the existing system ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
-
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... normalization for logarithm of tflog (the log-number of times the given word appears in a document) for calculat- ing semantic similarity of short texts. Graovac [6] applies lexical resources for A u t h o r P r o o f Improving Document Retrieval in Large Domain Specific Textual Databases 3 ...
... retrieved documents. As we have already pointed out, a Serbian keyword in a search query is almost always entered in the nominative singular, while in the texts that are searched it can occur in different inflectional forms. Thus, for languages such as Serbian, some kind of normalization of morphological forms ...
... in [13] showed that F -measure of recognition was 0.96 for types and 0.92 for tokens.3 3 Tokens are all occurrences (in this case, NEs) in a given texts, types are different occurrences. A u t h o r P r o o f Improving Document Retrieval in Large Domain Specific Textual Databases 9 Table 2 ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
-
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... singular). However, a large number of other forms cannot be found by scanning the text, for example, the form zlata (genitive singular) cannot be aligned with the query keyword key zlato (nominative singular). The disadvantage of the system based on text scanning which affects the precision is especially ...
... problems of full text search in Serbian is its rich morphology, where the keyword for search is always entered in the first person singular, while in the texts that are searched it can occur in different inflectional forms. For languages such as Serbian, some kind of normalization of morphological forms has ...
... text from several records and fields in the database related to a particular document or project; 2. Lemmatizing and Part-Of-Speech tagging of all texts Di, where i = 1, . . . N and N is the size of text collection; 3. Recognizing NEs and assigning the chosen types to documents; 4. Selecting ungrammatical ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
-
Integracija heterogenih tekstualnih resursa
Ranka Stanković, Ivan Obradović (2007)U radu je opisan pristup integraciji heterogenih tekstualnih resursa za srpski jezik uz pomoć jednog kompleksnog softverskog alata, razvijenog specijalno za ove potrebe. Opisani su struktura i osnovne komponente razvijenog sistema. Iznete su i mogućnosti unapređivanja resursa međusobnom razmenom informacija, koje pruža razvijeno integrisano okruženje. Konačno, opisana je i mogućnost primene integrisanih heterogenih resursa za proširenje upita, kao i pretraživanje tekstova uopšte, a naznačeni su i neki od pravaca daljeg razvoja.... processing of texts, namely resource combining, in particular the combining of morphological information from the dictionaries and semantic information from the wordnet. Finally, we explain how integrated heterogeneous resources can be used for query expansion, as well as for searching texts in general ...
... the system we developed under the name of WS4LR (WorkStation for Lexical Resources), which synchronously handles corpora of Serbian, multilingual aligned corpora, a system of morphological dictionaries for Serbian, the Serbian wordnet and the multilingual ontology of proper names Prolex. We describe ...
... Journal on Information Science and Technology. Bucureş of the Romanian academy. t al. 2003 – Vitas, D. et al. (2003): Processing Serbian Written Texts: An Overview of Resources an (Hg.): Proceedings of the International Workshop on Balkan Language Resources and Tools. Thessaloniki, November 2003 ...Ranka Stanković, Ivan Obradović. "Integracija heterogenih tekstualnih resursa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
-
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... Serbian. In Informatica, No. 28, pp. 431-436, The Slovene Society Informatika, Ljubljana. Krstev, C. (2008). Processing of Serbian – Automata, Texts and Electronic dictionaries. Faculty of Philology, University of Belgrade, Belgrade. Krstev, C. and Vitas, D. (2009) An Effective Methode for Developing ...
... 373-376. Paumier, S. (2008). Unitex 2.1 User Manual, http://www-igm.univ-mlv.fr/~unitex/UnitexManual2.1 .pdf. Popović, Z. (2009) Taggers Applied On Texts On Serbian Language, Language Tools And Machine Learning. In Infotheca, Vol. X, No. 2, (to appear). Przepiórkowski, A. and Woliński, M. (2003) A ...Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
-
Увођење доменских и семантичких маркера за област рударства у српске електронске речнике
... Jurafsky & James H. Martin, Speech and Lan- guage Processing, Draft of November 7, 2016. Крстев 2008: Cvetana Krstev, Processing of Serbian – Automata, Texts and Elec- tronic dictionaries Faculty of Philology, University of Belgrade, Belgrade. Крстев и др., 2008: Cvetana Krstev, DuškoVitas, Gordana Pav ...
... retrieval and extraction, and proposesanexpansion of the set of the semarkers for the field of mining. A brief description of the developed corpus of texts from the field of mining is also given, for the search of which the proposed markers are extremely important. ...Иван Обрадовић, Александра Томашевић, Ранка Станковић, Биљана Лазић. "Увођење доменских и семантичких маркера за област рударства у српске електронске речнике" in Научни састанак слависта у Вукове дане - Српски језик и његови ресурси: теорија, опис и примене, Београд : Међународни славистички центар на Филолошком факултету, Филолошки факултет (2017). https://doi.org/10.18485/msc.2017.46.3.ch10
-
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... among Serbian companies. e first corpus of contemporary Serbian, an electronic morphological dictionary of Serbian, aligned French- Serbian and English-Serbian corpora of literary texts, as well as different soware tools were developed in the scope of joint projects of the Faculty of Mathematics and ...
... the electronic dictionary of simple words was finalised, the development of a dictionary of compounds was initiated. Aligned French-Serbian andEnglish-Serbian corpora of literary texts were devel- oped, as well as local grammars for certain segments of Serbian (especially for named entities). Different ...
... countries aswell, but also due to the fact that in harmonisation with the European Union the source texts used are texts in English. ‚ euse of the Latin alphabet is increasing (except in official texts). ‚ Texts in Serbian are increasingly realised in digital form (use of computers, electronic publishing ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
-
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... One of the first challenges one encounters while trying to solve tasks of automatic recognition of verbal irony is selection of the col- lection of texts andmarking ironic statements in it. For that purpose, online resources, such as Twitter, are used very frequently, where the hashtag #irony can be used ...
... lexical resources. Although resources we are using were developed for Serbian primarily, their development was based on traditional re- sources and texts covering to certain extent other related languages as well, making them suitable for this task. A language classifier was built and assessed in the ...
... Serbian that combines three NLP tasks: PoS tagging, compound and named-entity recognition [10] (step 5 in Fig. 1) that was trained on various annotated texts – literary, newspaper and textbooks. Tagging results are represented by two previously given sentences (double-underlined are incor- rectly tagged words ...Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
-
Preparation of Multimedia Document “YU Rock Scene”
SUMMARY: This study will present the preparation process of a multimedia document entitled YU ROCK SCENE in which participants were senior students of undergraduate studies of the Department of Library and Information Science at the University of Belgrade Faculty of Philology during the academic year 2014/2015, as a part of the subject Multimedia Documents. This study gives an overview of the historical development of rock and roll in the territory of the former Yugoslavia, rock scene in Yugoslav republics, ...... sound and rock style) and the western, mostly Anglo-American rock. Unlike the world rock scene whose texts were a powerful means of propaganda against wars and class conflicts, original domestic rock texts had visual and musical identity related to youth fantasies, dreams about success, as well as to the ...
... music had much greater presence. Due to the conflict with the Soviet Union during the Cold War, Yugoslavia, one of the founding countries of the Non-Aligned Movement, was more open to the West and all products of pop-culture, especially American pop-culture. Yugoslavia was thus the only Communist country ...
... system of Yugoslavia. Until the emergence of punk and the New Wave the main topic of lyrics was love. With the development of punk and the New Wave the texts gained new breadth and complexity which was at variety with the social, cultural and political norms of the time. Although the New Wave was equated ...Milena Obradović, Aleksandra Arsenijević, Mihailo Škorić. "Preparation of Multimedia Document “YU Rock Scene”" in Infotheca - Journal for Digital Humanities, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2016.16.1_2.6
-
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... in social media, including profanities, derogatory and hate speech, has reached the level of a pandemic. A system that would be able to detect such texts could help in making the Internet and social media a better and more respectful virtual space. Research and commercial application in this area were ...
... written in Serbian. The resulting data set had 6,436 tweets and this set was used for annotation. Tweeter data differs significantly from other types of texts, e.g. books or newspaper articles, meaning that there are specific issues that have to be considered when processing such data. Some of them are: 1 ...
... and current circumstances to understand and annotate the message. In the next phase, we plan to extend the AbCoSER corpus with new tweets and with texts from other sources e.g. online news comments. Meanwhile, we started developing models for the automatic classification of abusive tweets and the first ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
-
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... run on any personal computer under Windows and supports simultaneous manipulation of various language resources: e-dictionaries, wordnets, and aligned texts. Implementation of LeXimir followed a modular approach. Namely, there exists a common core of the system, which is coupled with several modules ...
... and the wordnet can be used in production of concordances for aligned 1LeXimir is available under CC NC BY licence. For more information see http://korpus.matf.bg.ac.rs/soft/LeXimir.html Fig. 3. LeXimir’s editor for MWU dictionaries texts. On the other hand, it enables the use of LeXimir Core in different ...
... ” in Proceedings of HLT/EMNLP on Interactive Demonstrations, ser. HLT-Demo ’05, 2005, pp. 10–11. [4] C. Krstev, Processing of Serbian — Automata, Texts and Electronic Dictionaries. Belgrade: Faculty of Philology, University of Belgrade, 2008. [5] A. Savary, “Computational Inflection of Multi-Word Units ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)