Претрага
385 items
-
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... aggressive language detection. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pages 106–112. Cvetana Krstev, Sandra Gucul, Duško Vitas, and Vanja Radulović. 2007. Can we make the bell ring? In Proceedings of the Workshop on a Common Natural Language Processing Paradigm ...
... statements, or actions. This might include hate speech, derogatory language, profanity, toxic comments, racist and sexist statements.’ Computational processing of such language requires usage of finely-tuned, task specific language tools and resources, especially for morphologically rich and low- resource ...
... that will facilitate abusive language detection already exist. Serbian Morphological Dictionaries are certainly a staple in processing texts in Serbian (Krstev, 2008). In order to process implicitly abusive language, we need to take into account the usage of non-literal language, the rhetorical devices that ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
-
Vebran Web Services for Corpus Query Expansion
Ranka Stanković, Miloš Utvić (2020)U ovom radu se govori o razvoju veb usluga Vebran i njihovoj primeni u poboljšanju pretraživanja korpusa. Veb-servisi Vebran koriste se za konsultovanje spoljnih leksičkih izvora za srpski jezik (uglavnom elektronski morfološki rečnici i srpski Vordnet) i proširivanje korisničkih upita radi dobijanja relevantnijih rezultata iz srpskih korpusa.... First Workshop on Recent Advances in Slavonic Natural Language Processing, Sojka, P. and A. Horák, 65–70. Brno: Masaryk University, 2007 Schmid, Helmut. “Probabilistic Part-of-Speech Tagging Using Decision Trees”. In New Methods In Language Processing, Jones, D. B. and H. Somers, Chapter 12, 154–164 ...
... Stanković R. and Utvić M., “Vebran Web Service . . . ”, pp. 99–118 Sections 2 and 3 describe language resources for Serbian, corpora that we can search and lexical resources that Natural Language Processing (NLP) applications can consult. Vebran web services and their usage of lexical re- sources are ...
... 1997 Schmid, Helmut. “Improvements in Part-of-Speech Tagging with an Applica- tion to German”. In Natural Language Processing Using Very Large Cor- pora, Armstrong, S. et al. Text, Speech and Language Technology, Vol. 11, Chapter 12, 154–164. Dordrecht: Springer, 1999, Stanković, Ranka, Cvetana Krstev ...Ranka Stanković, Miloš Utvić. "Vebran Web Services for Corpus Query Expansion" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.5
-
GIS Application Improvement with Multilingual Lexical and Terminological Resources
... a workstation for language resources, named WS4LR, which greatly enhances the potential of manipulating each particular resource as well as several resources simultaneously (Krstev et al., 2008). This tool has already been successfully used for various language processing related tasks including ...
... Vitas D., G. Pavlović-Lažetić, C. Krstev, Lj. Popović, I. Obradović (2003): „Processing Serbian Written Texts: An Overview of Resources and Basic Tools“, Proceedings of the International Workshop on Balkan Language Resources and Tools, Thessaloniki, Greece, November 2003, S. Piperidis, V. K ...
... „Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases“, Polibits (37) 2008, Special section: Natural Langugage Processing, Journal of Research and Developement in Computer Science and Engeneering, ed. Grigori Sidorov, Centro Innovacion y Desarrollo T ...Ranka Stanković, Ivan Obradović, Olivera Kitanović. "GIS Application Improvement with Multilingual Lexical and Terminological Resources" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta, May 2010, Valetta, Malta : European Language Resources Association (2010)
-
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... of Philology, Her scientific field is Human Language Technologies (HLT) and technology enhanced learning (TEL). She published one book and more than 100 scientific papers, most of them related to natural language processing, more specifically to language resources development and their application ...
... infinite amount of digital information, new methods such as data mining, text mining, content management, search engines, spidering programs, natural language searching, linguistic analysis, semantic networks, knowledge extraction, etc. should be a part of recent developments in knowledge management ...
... components In designing Bibliša special attention is given to its language support component. It supports various aspects of multilingual libraries: its content is not only multilingual, but also aligned and it can be searched in any language. The proposed tool basically consists of the following components: ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
-
Towards translation of educational resources using GIZA++
... Koehn. “The Edinburgh/JHU Phrase-based Machine Translation Systems for WMT 2015”. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, September 2015 [17] G. Johannes, S. Clematide, and M. Volk. "Efficient Exploration of Translation Variants ...
... the program GIZA which was developed by the Statistical Machine Translation team during the summer workshop in 1999 at the Center for Language and Speech Processing at Johns-Hopkins University (CLSP/JHU). GIZA++ includes a lot of additional features. The extensions of GIZA++ were designed ...
... ~/corpus/edX.clean 1 80 Language Model Training A language model (LM) is used to ensure fluent output, built with the target language, in our case English. Following script creates lm folder, positions in it and finally execute command that will build an 3-gram language model. mkdir ~/lm cd ...Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)
-
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... Przepiórkowski (eds.), Polish Information Processing Society, ISBN 978-83-60810-47-7 Tiedemann, J. (2009). News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In: Recent Advances in Natural Language Processing (vol. 5) (pp 237-248), N. Nicolov and K. Bontcheva ...
... Corpora for Cross-Language Information Retrieval - US Patent 7,146,358 B1 - Google Patents. Kovačević, Lj., Injac, V., Begenišić, D. (2004). Bibliotekarski terminološki rečnik - englesko-srpski, srpsko-engleski, Beograd: Narodna biblioteka Srbije. Krstev, C. (2008). Processing of Serbian – Automata ...
... text in the first TUV is usually in the source language, and the texts in the remaining TUVs are in one or more target languages. Although the order of languages is the same in each TU, there is a TUV attribute xml:lang that denotes the language of the text within the TUV. The performance of ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... the general public. References Ahačič, K., Ledinek, N., & Perdih, A. (2015). Fran: The Next Generation Slovenian Dictionary Portal. In Natural Language Processing, Corpus Linguistics, Lexicography. Eight International Conference Bratislava, Slovakia, pp. 21-22. Berg, D. L., Gonnet, G. H., & Tompa ...
... Речника, Београд: Институт за српск(охрватск)и језик САНУ (рукопис), 1959. и (допуњено) 2017 [A Handbook for Dictionary Processing, Belgrade: Institute for Serbo(-Croatian) language SASA (manuscript), 1959 and (supplement) 2017]. 1 / 9 942 Proceedings of the XViii ...
... database, language resources, dictionary, Serbian language 1 Introduction The first volume of the Dictionary of the Serbo-Croatian Standard and Vernacular Language (re- ferred to as the Dictionary of Serbian Academy or DSA), prepared and compiled by the Institute for the Serbian Language of the Serbian ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... a Contextual Pragmatic Model to Detect Irony in Tweets, In 53rd Annual Meeting of the ACL and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of NLP (ACL 2015). Volume 2: Short Papers, 644–650. [20] Svetla Koeva, Cvetana Krstev, and Duško Vitas. 2008. M ...
... either independent of or specific to a particular natural language that is being investigated. For example, authors in [31] used a corpus of tweets in Portuguese and patterns specific to the Portuguese language so que, sim, na boa, as well as language inde- pendent ones, like (ADV +ADV |AD J+AD J )3 and ...
... ironic con- structs. There are not many direct antonyms in a natural language, therefore, their number is also small in the lexical-semantic network WordNet, compared to other relations. Also, indirect antonyms are often used in natural language, that is to say, synonyms of direct antonyms – e.g. in Princeton ...Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
-
Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking
U radu se prikazuju rezultati istraživanja vezanih za pripremu paralelnih korpusa, fokusirajući se na transformaciju u RDF grafove koristeći NLP Interchange Format (NIF) za lingvističku anotaciju. Pružamo pregled paralelnog korpusa koji je korišćen u ovom studijskom slučaju, kao i proces označavanja delova govora, lematizacije i prepoznavanja imenovanih entiteta (NER). Zatim opisujemo povezivanje imenovanih entiteta (NEL), konverziju podataka u RDF, i uključivanje NIF anotacija. Proizvedene NIF datoteke su evaluirane kroz istraživanje triplestore-a korišćenjem SPARQL upita. Na kraju, razmatra se povezivanje Linked ...paralelni korpusi, povezivanje imenovanih entiteta, prepoznavanje imenovanih entiteta, NER, NEL, povezani podaci, NIF, VikipodaciRanka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović. "Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
WS4LR - a Worksation for Lexical Resources
... Spanish, Norwegian, Arabic, German, Polish, Bulgarian, and Serbian. The Intex2 , Unitex3 and Nooj4 systems for natural language processing based on linguistic resources provide for text processing using this type of dictionaries, but offer no facilities for dictionary development and management. WS4LR ...
... criteria in the source language are highlighted (Figure 5). Figure 4. The form for expansion of the search criteria The user can also use the translation equivalence option which is aimed at locating equivalences in target language for occurrences found in the source language. This is done on ...
... ble + target_TextRow + target_TextRowChangeEvent 1695 5. Conclusions Although WS4LR has been used mainly for Serbian language resources, it is by no means language dependent. The only prerequisite is that the resources exist or are being developed according to the described formats and ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović. "WS4LR - a Worksation for Lexical Resources" in Proceedings of the Fifth Interantional Conference on Language Resources and Evaluation, Genoa, Italy, May 2006, ELRA - European Language Resources Association (2006)
-
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... guages with a deficient natural language processing support. Decis. Support Syst. 55(3), 710–719 (2013) 6. Graovac, J.: Wordnet-based serbian text categorization. INFOtheca 14(2), 2a–17a (2013) 7. Gross, M.: The use of finite automata in the lexical representation of natural lan- guage. In: Gross, M ...
... Serbian Language in the Digital Age. In: Rehm and Uszkoreit [20] (2012). http://www.meta-net.eu/whitepapers 27. Zečević, A., Stanković-Vujičić, S.: Language identification–the case of Serbian. In: Pavlović-Lažetić, G., Krstev, C., Vitas, D., Obradović, I. (eds.) Natural Lan- guage Processing for ...
... morphological electronic dictionaries and finite-state transducers for Serbian [12]. 3.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state methodology as described in [3,7] ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
-
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... assigning of surrogates is usually done by ex- tracting and selecting terms (words) that appear in the text of documents. To that end, many natural language processing (NLP) methods and techniques are used: determining the boundaries of sentences, tokenization, stemming, tagging, recognition of nominal phrases ...
... morphological electronic dictionaries and finite state transducers for Serbian [6]. 4.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state methodology as described in [1] ...
... models for information retrieval. Taaluitgeverij Nes- lia Paniculata (2001) 4. Jackson, P., Moulinier, I.: Natural language processing for online applications: Text retrieval, extraction and categorization, vol. 5. John Benjamins Publishing (2007) 5. Kešelj, V., Šipka, D.: A Suffix Subsumption-Based Approach ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
-
Fourth Summer Datathon on Linguistic Linked Open Data
Tijana Radović, Ranka Stanković (2023)The 4th Summer Datathon on Linguistic Linked Open Data (SD-LLOD-22) was held in Spain, in Cersedilla near Madrid, in May 2022, and organized by the COST Action NexusLinguarum. The school gathered interested researchers, academics, students who wanted to acquire and/or expand their knowledge in the field of linguistic linked data science. During the school, a spectrum of topics from the field of linked data was presented, from various ontologies, through document integration, annotation and natural language text processing tools ...Tijana Radović, Ranka Stanković. "Fourth Summer Datathon on Linguistic Linked Open Data" in Infotheca, Faculty of Philology, University of Belgrade (2023). https://doi.org/10.18485/infotheca.2023.23.1.6
-
Developing Students’ Mining and Geology Vocabulary Through Flashcards and L1 in the CLIL Classroom
... limited ability to process but a vast ability to store things in our brains. By having a huge store of ready-made language, available through flashcards, we are saving our limited processing capacity for dealing with other cognitive tasks. If this position holds then computer assisted learning with ...
... Littlewood, William, and Baohua Yu. 2011. First language and target language in the foreign language classroom. Language Teacher 44: 64-77. Llach, Agustin 2009. The role of Spanish L1 in the vocabulary use of CLIL and non-CLIL EFL learners. In Content and language integrated learning: Evidence from research ...
... and Ali Aldosari. 2010. Learners’ use of first language (Arabic) in pair work in an EFL class. Language Teaching Research 14: 355-375. Swain, Merrill, and Sharon Lapkin. 2000. Task-based second language learning: the uses of the first language. Language Teaching Research 4: 251-274. ...Lidija Beko, Ivan Obradović, Ranka Stanković. "Developing Students’ Mining and Geology Vocabulary Through Flashcards and L1 in the CLIL Classroom" in Proceedings of the Second International Conference on Teaching English for Specific Purposes and New Language Learning Technologies, May, 22-24, 2015, Niš, Serbia, Faculty of Electronic Engineering, University of Niš, Niš : Faculty of Electronic Engineering (2015)
-
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... Finally, we discuss some further possible applications of our procedure and LeXimir in language processing tasks. I. INTRODUCTION MORPHOLOGICAL electronic dictionaries of Serbian for natural language processing (NLP) are being de- veloped for many years now. Their development follows the methodology ...
... Their Automatic Processing,” Bulag — Bulletin de Linguistique Appliquée et Générale, vol. 32, pp. 73–94, 2007. [9] A. Savary, J. Rabiega-Wisniewska, and M. Wolinski, “Inflection of Polish Multi-Word Proper Names with Morfeusz and Multiflex,” in Aspects of Natural Language Processing, ser. Lecture Notes ...
... even greater in the forthcoming period, as many new MWU lists are being prepared. The benefits obtained by including the MWU dictionary in language processing tasks for Serbian are already clearly visible. Besides the benefits that were to be expected, it has been already shown that the MWU dictionary ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
-
Речници у дигиталном добу - информатичка подршка за српски језик
Биљана Рујевић (2022)Морфолошки речници српског језика представљају електронски језички ресурс који има значајну историју развоја и коришћења за потребе обраде природних језика. С обзиром на то да су чувани у облику датотека чији је број нарастао па је самим тим управљање речницима постало отежано јавила се потреба за смештањем информација из речника у облик лексикографске базе. Како би се омогућио симултани рад на развоју речника за више корисника јавила се потреба за веб-апликацијом заснованој на лексикографској бази. Како би се размотриле ...Биљана Рујевић. Речници у дигиталном добу - информатичка подршка за српски језик, Београд : [Б. Рујевић], 2022
-
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
-
Improvement of geodatabase queries within GeolISS
Ranka Stanković (2008)... adjustable tool, a workstation for language resources, labeled WS4LR, which greatly enhances the potential of manipulating each particular resource as well as several resources simultaneously [9]. This tool has already been successfully used for various language processing related tasks including query ...
... [8] Vitas D., G. Pavlović-Lažetić, C. Krstev, Lj. Popović, I. Obradović (2003): „Processing Serbian Written Texts: An Overview of Resources and Basic Tools“, Proceedings of the International Workshop on Balkan Language Resources and Tools, Thessaloniki, Greece, November 2003, S. Piperidis, V. Ka ...
... Belgrade) IMPROVEMENT OF GEODATABASE QUERIES WITHIN GEOLISS Abstract: We present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for improvement of queries for the geodatabase within the Geological information system ...Ranka Stanković. "Improvement of geodatabase queries within GeolISS" in Review of the National Center for Digitization, Beograd : Faculty of Mathematics, Belgrade (2008)
-
Keyword Extraction from Parallel Abstracts of Scientific Publications
... Organ. Sci. 39(1), 1–20 (2015) 2. Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of Empirical Methods in Natural Language Processing - EMNLP 2004, pp. 404–411. ACL, Barcelona (2004) 3. Marujo, L., Viveiros, M., Neto, J.P.: Keyphrase cloud generation of broadcast news. ...
... LNCS, vol. 10151, pp. 124–135. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-53640-8_11 11. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Inc., Sebastopol (2009) 12. Balakrishnan, V., Ethel, L.-Y.: Stemming and lemmatization: a comparison of retrieval ...
... particular language. Keyword Extraction from Parallel Abstracts of Scientific Publications 47 2.2 Text Preprocessing Tools Serbian is a highly inflectional Slavic language. Although we use the keyword extraction method designed with light or no linguistic knowledge, some text pre- processing is needed ...Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić . "Keyword Extraction from Parallel Abstracts of Scientific Publications" in Sematic Keyword-Based Search on Structured Data Sources - Third International KEYSTONE Conference, IKC 2017 Gdańsk, Poland, September 11–12, 2017 Revised Selected Papers and COST Action IC1302 Reports, Springer (2017)
-
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... wordnet with wn-toolkit and cro-deriv. In Proceedings of the International Conference Recent Advances in Natural Language Processing, pages 480–487. Simões, A., Gómez, X. G., and Almeida, J. J. (2016). Enriching a portuguese wordnet using synonyms from a monolingual dictionary. In Chair), N. C. C. ...
... International Language Resources and Evaluation (LREC’10), Valletta, Malta, may. European Language Resources Asso- ciation (ELRA). Matuschek, M. and Gurevych, I. (2013). Dijkstra-WSA: A graph-based approach to word sense alignment. TACL, 1:151–164. Mladenović, M. and Mitrović, J. (2014). Natural Language ...
... Microsoft language portal10 has published Microsoft Terminology Collection data in the form of a .tbx (ISO 30042:2008) file containing: Concept ID, Definition, Source term, Source language identifier, Target term, Target language identifier. The number of terms differ from language to language, due to ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)