Претрага
385 items
-
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... sounds uttered by a user. 62 Speech Input Signal Processing Speech Output Speech Synthesis Phonetic Lookup & Intonation Planning Natural Language Understanding & Dialogue Recognition 9: Speech-based dialogue system 2. Natural language understanding analyses the syntac- tic structure of a ...
... http://transpoetika.org. [31] Daniel Jurafsky and James H. Martin. Speech and Language Processing. Prentice Hall, 2 edition, 2009. [32] Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. 78 [33] Ronald Cole, Joseph Mariani, Hans Uszkoreit ...
... linguistics and psychology studies at the Universities of Belgrade and Novi Sad. Courses offered to students cover the ba- sic concepts of natural language processing, but they aim to educate students for other professions. As part of undergraduate studies at the Faculty of Mathemat- ics in Belgrade, courses ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... Grass, D. Maurel and O. Piton. Description of a Multilingual Database of Proper Names. Lecture Notes in Computer Science, Advances in Natural Language Processing, Third International Conference, PorTAL, June 2002, Faro, Portugal, 23-26, Springer, Berlin, Vol. 2389, pp.31-36, 2002. [8] A. Horák ...
... Wordnet Development Using a Multifunctional Tool | Ivan Obradović, Ranka Stanković | Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007 | 2007 | | http://dr.rgf.bg.ac.rs/s/repo/item/0005258 Дигитални репозиторијум Рударско-геолошког ...
... match in the target language, regardless of the fact whether these target language synsets have previously been retrieved from the wordnet by the user or not, and which PWN synsets do not have a match. The latter are obviously candidates for new synsets in the target language. Figure 9. ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
-
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... used for natural language processing - NLP. 3 TEI 4 LMF 5 Lemon 84 Infotheca Vol. 19, No. 2, December 2019 Scientific paper The LMF prescribes a standardized framework for recording linguistic in- formation in computer lexicons and is based on the Standard ISO 24613: 2008 (Language Resource Management ...
... Markup Framework - LMF). LMF is designed for lexicons specially designed for Natural Language Pro- cessing and Machine-Readable Dictionaries. LMF specification is represented as a subset of UML (Unified Modeling Language) language that provides lin- guistic description. The LMF consists of mandatory Core ...
... linguistic resource for languages with rich flexion. Therefore, Serbian morphological dictionaries represent a significant resource for Serbian language processing. The importance of this resource is in its multiple applications. Although Serbian morphological dictionaries (SMD) were initially developed ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... of The 12th Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; European Language Resources Association: Marseille, France, 2020; pp. 3954–3962. 36. Schmid, H. Improvements in Part-of-Speech Tagging with an Application to German. In Natural Language Processing Using Very ...
... presents a data driven approach aimed at using opportunities offered by electronic lexicography, as well as various available techniques of Natural Language Processing (NLP), to develop a semi-automatic pipeline for dictionary production. The approach is focused on raw material terminology, with an emphasis ...
... M., Pereira, I., Kallas, J., Jakubíček, M., Krek, S., Tiberius, C., Eds.; Lexical Computing: Brno, Czechia, 2019; pp. 1–3. 5. Krek, S. Natural Language Processing and Automatic Knowledge Extraction for Lexicography. Int. J. Lexicogr. 2019, 32, 115–118. [CrossRef] 6. Van der Merwe, M.F.; Horn, K. Mobile ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
-
Нове технологије за оживљавање старих текстова
удаљено читање, књижевни корпус, обрада српског језика, анотација врстом речи, лематизација, именовани ентитетиЦветана Крстев, Ранка Станковић, Бранислава Шандрих Тодоровић, Милица Иконић Нешић. "Нове технологије за оживљавање старих текстова" in Зборник радова Међународне научне конференције Дигитална хуманистика и словенско културно наслеђе II, Београд, 28-29 јуни 2021., Београд : Савез славистичких друштава Србије (2023)
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... field of Natural Language Processing. Initially, MWT extraction from domain texts has been tackled mainly using the statistical approach based on different statistical measures, following the seminal work of Kenneth Church and Patrick Hanks (1990; 1991) and Frank Smadja (1993). A language independent ...
... Arabic multi-word term extraction. Proc. Of the Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009, pp. 1--8. Broda, B., Derwojedowa, M., and Piasecki, M. (2008). Recognition of structured collocations in an inflective language. Systems Science, 34, pp. 27--36. Chen, J., Yeh ...
... Using Log-Likelihood Based Comparison with General Reference. In: Hopfe, C. J., Rezgui, Y., Métais, E., Preece, A., Li, H. (Eds.), Natural Language Processing and Information Systems. Berlin: Springer, pp. 248--255. Koeva, S. (2007). Multi-word term extraction for Bulgarian. In Proc. of the ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
-
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... ing terminol- ogy. The work presented in this paper is motivated by our belief that Natural Language Processing (NLP) resources, meth- ods and tools can help in the development of terminology in the Serbian language. Our work relies on the following presuppositions: 1. Serbian terminology is today ...
... The English Language in the Digital Age. META-NET White Paper Series. Georg Rehm and Hans Uszkoreit (Series Editors). Springer. Available online at http: //www.meta-net.eu/whitepapers. Baldwin, T. and Kim, S. N. (2010). Multiword expres- sions. Handbook of natural language processing, 2:267– 292. ...
... http://www.statmt.org/moses/ ?n=FactoredTraining.ScorePhrases 14More about Natural Language Toolkit for Python and its WordNet interface can be found at http://www.nltk.org/ howto/wordnet.html Before continuing to the next processing step, we defined “match” relation between chunks as follows: Let a chunk ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Softverski alati za korišćenje resursa za srpski jezik
Ivan Obradović, Ranka Stanković (2008)... solu- tion of various tasks related to processing of texts in e-form much easier. One of the tools, with the acronym WS4LR (Workstation for Lexical Abstract: In this paper we describe how lexical resourc- es for Serbian, developed within the Human Language Technology Group, such as various types ...
... Serbian, its usage is not language dependent. The only precondition is that resources exist, that is, that they are being developed in the appropriate formats. 3.2 Management of dictionaries Initially, the Intex system (Silberztein, 1993) was used for text processing using dictionaries in LADL ...
... Leipzig 2003, Zy- batow, Gerhild et al. (eds.), Peter Lang: Frankfurt am Main, pp. 3-17. Mohri, M. (1997) “Finite-state transducers in language and speech processing”, Computational Linguistics, vol. 23 , no. 2, pp. 269 – 311. IVAN OBRADOVIć, RANKA STANKOVIć 57a Ohmori K., Higashida M. (1999) ...Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
-
Using Query Expansion for Cross-Lingual Mathematical Terminology Extraction
Velislava Stoykova, Ranka Stanković (2018)Velislava Stoykova, Ranka Stanković. "Using Query Expansion for Cross-Lingual Mathematical Terminology Extraction" in Advances in Intelligent Systems and Computing, Springer International Publishing (2018). https://doi.org/10.1007/978-3-319-91189-2_16
-
Serbian NER&Beyond: The Archaic and the Modern Intertwinned
U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History” CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje ...... Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić | Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications | 2021 | | 10.26615/978-954-452-072-4_141 http://dr.rgf.bg.ac.rs/s/repo/item/0005139 ...
... access, as well as the employees' publications. - The Repository is available at: www.dr.rgf.bg.ac.rs Proceedings of Recent Advances in Natural Language Processing, pages 1252–1260 Sep 1–3, 2021. https://doi.org/10.26615/978-954-452-072-4_141 1252 Serbian NER&Beyond: The Archaic and the Modern In ...
... leção ELTeC-por. Linguamática, 12(2):29–49. Helmut Schmid. 1999. Improvements in Part-of- Speech Tagging with an Application to German. In Natural language processing using very large corpora, pages 13–25. Springer. Satoshi Sekine, Masako Nomoto, Kouta Nakayama, Asuka Sumida, Koji Matsuda, and Maya Ando ...Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
-
An Approach to Development of Bilingual Lexical Resources
... Ivan, Trtovac Aleksandra | Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012 | 2012 | | http://dr.rgf.bg.ac.rs/s/repo/item/0001462 Дигитални репозиторијум Рударск ...
... Polish Information Processing Society, ISBN 978-83-60810-47-7 [8] Stanković, R., Krstev, C., Obradović, I., Trtovac, A., Utvić, M. 2012. A Tool for Enhanced Search of Multilingual Digital Libraries of E- journals. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation ...
... Multilingual textual repositories, such as digital libraries of e- journals represent a specific type of language resources. Efficient search of these resources usually relies on specific language tools, which often use other available resources, such as e-dictionaries, wordnets and the like. An ...Stanković Ranka, Obradović Ivan, Trtovac Aleksandra. "An Approach to Development of Bilingual Lexical Resources" in Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012, Novi Sad : BCI (2012)
-
Combining Heterogeneous Lexical Resources
... Gordana Pavloviæ -Laž etiæ , professor, Faculty of Mathematics, Belgrade, gordana@matf.bg.ac.yu Abstract One of the main tasks of the Natural Language Processing Group at the Faculty of Mathematics, University of Belgrade is the development of various lexical resources. Among them the two most important ...
... their more effective retrieval, integration, and reuse across various Web applications. 1 Introduction One of the main tasks of the Natural Language Processing Group at the Faculty of Mathematics, University of Belgrade is the development of various lexical resources. Among them the two most ...
... International Wordnet Conference, Mysore, India. - Vitas, D. et al. (2003). Resources and Basic Tools for the Processing of Serbian Written Texts. Proc. of the Workshop on Balkan Language Resources, 1st Balkan Conference in Informatics. - Vossen, P. (ed.) (1998). EuroWordNet: A Multilingual Database ...Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... Terminological Da- tabase Using a Transducer Cascade. Proc. of Recent Advances in Natural Language Processing. (pp. 17-23). Baldwin, T., & Kim, S. N. (2010). Multiword expres- sions Handbook of Natural Language Processing, second edition. (267-292): CRC Press. Cerbah, F., & Daille, B. (2007). A Service ...
... Preece & H. Li (Eds.), Natural Language Processing and Information Systems (Vol. 6177, pp. 248-255): Springer Berlin Heidelberg. Justeson, J. S., & Katz, S. M. (1995). Technical ter- minology: some linguistic properties and an algo- rithm for identification in text. Natural Language Engineering, 1 (01): ...
... ap- proaches (Rodrıguez et al., 2007). Sag et al. re- ported that modern statistical Natural Language Processing (NLP) is in great need of better lan- guage models and linguistic tools must come to 1 Corpus processing System Unitex: http://www-igm.univ- mlv.fr/~unitex/ Proceedings of the conference ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... Entity Recognition Systems for Serbian - The Case of Personal Names | Branislava Šandrih, Cvetana Krstev, Ranka Stanković | Proceedings - Natural Language Processing in a Deep Learning World | 2019 | | 10.26615/978-954-452-056-4_122 http://dr.rgf.bg.ac.rs/s/repo/item/0005243 Дигитални репозиторијум ...
... access, as well as the employees' publications. - The Repository is available at: www.dr.rgf.bg.ac.rs Proceedings of Recent Advances in Natural Language Processing, pages 1060–1068, Varna, Bulgaria, Sep 2–4, 2019. https://doi.org/10.26615/978-954-452-056-4_122 1060 Development and Evaluation of Three ...
... Finkel and Christopher D. Manning. 2009. Nested Named Entity Recognition. In Proceed- ings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 10Gemini, https://github.com/fyh828/gemini/ 1068 1. Association for Computational Linguistics, pages 141–150. Nathalie ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
An Integrated Environment for Management and Exploitation of Linguistic Resources
Ranka Stanković, Ivan Obradović (2009)... Stanković, “Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases”, Polibits, Special section: Natural Language Processing, Journal of Research and Development in Computer Science and Engineering, ed. G. Sidorov (ed.), Centro Innovacion y Desarrollo T ...
... it offers possibilities for processing and visualization of aligned texts, conversions from one coding scheme to another, and migration from one resource format to another. Although WS4LR has mainly been used for resources in Serbian, it is not a language dependent tool. All of its func- ...
... R. Stanković, “ Wordnet Development Using a Multifunctional Tool ” , Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007 , Borovets, Bulgaria, C. Orasan, S. Kuebler (eds.), pp. 25-32, September 2007. [12] C. Krstev, R. Stanković, D. Vitas, I ...Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
-
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
Mihailo Škorić (2017)The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...... of the field and proposal of research frame- work. In The Seventeenth Annual Meeting of The Association for Natural Language Processing, 1159–1162. The Association for Natural Language Processing, 2011. Ptaszynski, Michael, Pawel Dybala, Radosalw Komuda, Rafal Rzepka, and Kenji Araki. Development of Emoticon ...
... according to the mood that is expressed in them to help find the necessary information. – Natural language understanding and analysis: understanding of written text and text queries, analysis of moods in the text, processing of digital linguistic resources such as automatic parallelization and automation of ...
... expanded, and in 2011 a paper was published on the research in which emoti- cons were defined as parts of natural language, so it was suggested that their research is should be included in natural language research (Ptaszynski et. al., 2011). 1.2 Basic information about the experiment Goal of the experiment ...Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... token its Part-of-Speech cat- egory (noun, verb, adjective, etc.) is a common Natural Language Processing (NLP) task, known as Part-of-Speech tagging (PoS-tagging). PoS-tagging precedes many other Natural Language Processing tasks, such as Text Classi- fication, Named Entity Recognition, Sentiment Analysis ...
... Conference on Language Resources and Evalu- ation (LREC’16), pages 4264–4270, Portorož, Slove- nia, May. European Language Resources Association (ELRA). Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., and McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. ...
... the scope of the project “Integrated European language data Repository Area” (Gavrilidou et al., 2006). It contains texts from law, health and edu- cation domains. Švejk, Floods, History are three short 1Unitex/GramLab — Cross Plaform Corpus Processing Suite, https://unitexgramlab.org/ 2The category ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
-
SrpELTeC: A Serbian Literary Corpus for Distant Reading
U članku je predstavljen SrpELTeC, korpus razvijen u okviru akcije COST Distant Reading for European Literary History (CA16204). Svi romani u SrpELTeC-u su odabrani, pripremljeni i obeleženi korišćenjem zajedničkih principa uspostavljenih za sve jezičke zbirke u Evropskoj zbirci književnog teksta (ELTeC). Navedeni su izazovi i rešenja u pripremi SrpELTeC od nule. Svi romani su ručno kodirani u TEI sa bogatim metapodacima i strukturnim napomenama. Automatska anotacija je uključivala POS-označavanje, lematizaciju i imenovane entitete, oslanjajući se na resurse za obradu ...digital humanities, Serbian literature, text corpora, distant reading , linked data, named entity recognition, text analyticsRanka Stanković, Cvetana Krstev, Duško Vitas. "SrpELTeC: A Serbian Literary Corpus for Distant Reading" in Primerjalna književnost, Research Centre of the Slovenian Academy of Sciences and Arts (2024). https://doi.org/10.3986/pkn.v47.i2.03
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... database. In Natural language processing, multilinguality, pages 72–80. Gliha Komac, N., Jakop, N., Ježovnik, J., Kern, B., Kle- menčič, S., Krvina, D., Ledinek, N., Meterc, M., Miche- lizza, M., Pavlič, M., et al. (2016). eSSKJ: Dictionary of the Slovenian Standard Language. ZRC SAZU, 3rd ...
... are dictionaries. Dictionaries form an important foundation of numerous natural language processing (NLP) tasks, including word sense disambiguation, machine trans- lation, question answering and automatic summarization. However, the task of combining dictionaries from different sources is difficult ...
... WordNet for increased domain coverage. In Proceedings of 5th International Joint Conference on Natural Lan- guage Processing, pages 883–892. Meyer, C. M. (2010). How web communities analyze hu- man language: Word senses in Wiktionary. In In Second Web Science Conference. Miles, A. and Bechhofer, S ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
-
Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface
Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos (2024)Predstavljamo trenutne aktivnosti na definisanju interfejsa leksikona i korpusa koji će služiti kao referenca u prikazu polileksemskih jedinica - višečlanih izraza - (različitih tipova - imenskih, glagolskih, itd.) u specijalizovanim leksikonima i povezivanju ovih unosa sa njihovim pojavljivanjima u korpusima. Konačni cilj je korišćenje ovakvih resursa za automatsko identifikovanje višečlanih izraza u tekstu. Uključivanje nekoliko prirodnih jezika ima za cilj univerzalnost rešenja koje nije usredsređeno na određeni jezik, kao i prilagođavanje idiosinkrazijama. Raspravljaju se izazovi u leksikografskom opisu višerečnih ...Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos. "Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)