Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

1017 items

A System for Named Entity Recognition Based on Local Grammars

Krstev Cvetana, Obradović Ivan, Utvić Miloš, Vitas Duško (2014)

Krstev Cvetana, Obradović Ivan, Utvić Miloš, Vitas Duško. "A System for Named Entity Recognition Based on Local Grammars" in Journal of Logic and Computation 24 no. 2, :Oxford University Press (2014): 473-489. https://doi.org/10.1093/logcom/exs079
Combining Heterogeneous Lexical Resources

Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić (2004)

development of lexical resources, morphological dictionaries, WordNet

... Lexical Resources Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Combining Heterogeneous Lexical Resources | Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović ...
... Heterogeneous Lexical Resources Cvetana Krstev, professor, Faculty of Philology, Belgrade, cvetana@matf.bg.ac.yu Duško Vitas, professor, Faculty of Mathematics, Belgrade, vitas@matf.bg.ac.yu Ranka Stankoviæ , assistant, Faculty of Mining and Geology, Ðušina 7, Belgrade, ranka@rgf.bg.ac.yu Ivan ...
... automatically. Bibliography - Silberztein, M. (2000). INTEX Manual, Paris: Asstril. - Stamou S., et al.: (2002). BALKANET: A Multilingual Semantic Network for Balkan Languages. Proceedings of 1st International Wordnet Conference, Mysore, India. - Vitas, D. et al. (2003). Resources and Basic Tools ...
Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
Distribution of canonical syllable types in Serbian

Obradović Ivan, Obuljen Aljoša, Vitas Duško, Krstev Cvetana, Radulović Vanja (2010)

Obradović Ivan, Obuljen Aljoša, Vitas Duško, Krstev Cvetana, Radulović Vanja. "Distribution of canonical syllable types in Serbian" in Text and Language, Structures · Functions · Interrelations. Quantitative Perspectives, P. Grzybek, E. Kelih, J. Mačutek (eds.), Wien:Praesens Verlag (2010): 145-157
Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection

Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Duško Vitas, Mihailo Škorić, Milica Ikonić Nešić (2022)

In this paper we present the Serbian part of the ELTeC multilingual corpus of novels written in the time period 1840-1920. The corpus is being built in order to test various distant reading methods and tools with the aim of re-thinking the European literary history. We present the various steps that led to the production of the Serbian sub-collection: the novel selection and retrieval, text preparation, structural annotation, POS-tagging, lemmatization and named entity recognition. The Serbian sub-collection was published ...

Corpus, Distant Reading, Digital Humanities, Linked Data, Named Entity Recognition, Text Analytics

Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Duško Vitas, Mihailo Škorić, Milica Ikonić Nešić. "Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection" in Proceedings of the Language Resources and Evaluation Conference, June 2022, Marseille, France, European Language Resources Association (2022)
Bilingual lexical extraction based on word alignment for improving corpus search

Jelena Andonovski, Branislava Šandrih, Olivera Kitanović (2019)

Library and Information Sciences,Computer Science Applications

Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
Resource-based WordNet Augmentation and Enrichment

Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev (2018)

In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset deﬁnitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...

WordNet, bilingual resources, term alignment, parallel lists

... Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Resource-based WordNet Augmentation and Enrichment | Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev | Proceedings of the ...
... Krstev, C., Stanković, R., Vitas, D., and Obradović, I. (2006). WS4LR: A Workstation for Lexical Resources. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, pages 1692–1697. Krstev, C., Stanković, R., and Vitas, D. (2010). A Description of M ...
... ac.rs ivano@rgf.bg.ac.rs Miljana Mladenović College for Preschool Teachers Bujanovac, Serbia ml.miljana@gmail.com Cvetana Krstev and Marko Vitas Faculty of Philology University of Belgrade, Serbia cvetana@matf.bg.ac.rs vitas.marko@gmail.com Abstract In this paper we present an approach ...
Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
From DELA Based Dictionary to Leximirka Lexical Database

Biljana Lazić, Mihailo Škorić (2020)

In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...

Morfološki rečnici, jezički resursi, Leksimirka

... Mining and Geology Belgrade, Serbia 1 Introduction Prof. Dr. Dusko Vitas and Prof. Dr. Cvetana Krstev started working on the development of Serbian morphological dictionaries more than 25 years ago (Vitas, 1993; Krstev, 1997; Vitas et al., 1993). Morphological dictionaries represent a significant linguistic ...
... no. 6 (2018): 993–1009, URL https://doi.org/10.1108/EL-11-2017-0239 Vitas, Duško. “Matematički model morfologije srpskohrvatskog jezika (imen- ska fleksija)”. Phdthesis, Univerzitet u Beogradu, Matematički fakultet, 1993 Vitas, Duško, Gordana Pavlovic-Lažetić and Cvetana Krstev. “Electronic ...
... “bibliotekar” is among the 10,000 most frequent words in the Serbian Corpus of the Serbian Language SrbCorp (version of 122 million words by Duško Vitas and Miloš Utvić)6. Information about the Corpus is stored in the KorpusMeta table. The LexicalRelation table stores information 6 Corpus of the Serbian ...
Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
Electronic Dictionaries - from File System to lemon Based Lexical Database

Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić (2018)

In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...

... Electronic dictionary en- coding: Customizing the TEI guidelines. In Proc. Eu- ralex. Villegas, M. and Bel, N. (2015). PAROLE/SIMPLE ’lemon’ ontology and lexicons. Semantic Web, 6:363– 369. Vitas, D., Pavlović-Lažetić, G., and Krstev, C. (1993). Electronic dictionary and text processing in ...
... al electronic dictionaries Morphological electronic dictionaries of Serbian for NLP are being developed for many years now (Vitas et al., 1993) (Krstev, Cvetana and Vitas, Duško, 2015). They cover gen- eral lexica, proper names (persons and toponyms), general knowledge (famous or fictitious persons ...
... udžbenike. Koeva, S., Krstev, C., and Vitas, D. (2008). Morpho- semantic relations in wordnet–a case study for two slavic languages. In Proceedings of Global WordNet Confer- ence 2008, pages 239–253. University of Szeged, De- partment of Informatics. Krstev, C. and Vitas, D. (2007). Extending the Serbian ...
Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Rule-based Automatic Multi-word Term Extraction and Lemmatization

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac (2016)

In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...

term extraction, terminology, multi-word units, lemmatization, finite-state transducers

... evaluation. Terminology, 16(2), pp.141--158. Vitas, D., Popović, Lj., Krstev, C., Obradović, I., Pavlović-Lažetić, G. and Stanojević, M. (2012). The Serbian Language in the Digital Age. Berlin; Springer-Verlag. 8. Language Resource References Vitas D., Utvić M. (2015). SrpKor22M, Serbian automatically ...
... Extraction and Enabling Technologies, pp. 59--66. Krstev, C., Obradović, I., Stanković, R., and Vitas, D. (2013). An Approach to Efficient Processing of Multi-Word Units. In: Przepiórkowski, A., Piasecki, M., Jassem, K., Fuglewicz, P. (Eds.) Computational Linguistics. Berlin: Springer, pp. 109--129 ...
... Krstev, C., and Vitas, D. (2011). Production of morphological dictionaries of multi-word units using a multipurpose tool. In Proc. of the Computational Linguistics-Applications Conference, October 17-19, 2011, Jachranka: Polskie Towarzystwo Informatyczne, pp. 77--84. Tadić, M., Šojat, K. (2003) ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić (2020)

The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...

Part-of-Speech tagging, lemmatization, corpus, evaluation, Serbian, morphological dictionary

... Utvić, M. (2011). Annotating the Corpus of Contemporary Serbian. INFOtheca, 12(2):36a–47a, December. 8. Language Resource References Cvetana Krstev, Duško Vitas. (2015). Serbian Morpho- logical Dictionary - SMD. University of Belgrade, HLT Group and Jerteh, Lexical resource, 2.0. Duško Vitas, Cvetana ...
... of the North American Chapter of the Associ- ation for Computational Linguistics: Human Language Technologies, pages 271–281. Constant, M., Krstev, C., and Vitas, D. (2018). Lexical analysis of serbian with conditional random fields and large-coverage finite-state resources. In Zygmunt Vetu- lani ...
... 389–398. Honnibal, M. and Montani, I. (2017). spaCy 2: Natural Language Understanding with Bloom Embeddings, Con- volutional Neural Networks and Incremental Parsing. To appear. Huang, Z., Xu, W., and Yu, K. (2015). Bidirectional LSTM-CRF Models for Sequence Tagging. Krstev, C., Vitas, D., and Erjavec ...
Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
An Italian-Serbian Sentence Aligned Parallel Literary Corpus

Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić (2023)

This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...

Aligned corpus, parallel corpus, Serbian, Italian, literature

Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis

Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović (2017)

U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...

Omeka, Wordnet, pretraga punog teksta, morfološka i semantička pretraga teksta, proširenje upita

... Krstev, Duško Vitas, “Corpus and Lexicon - Mutual Incompletness ”, in Proceedings of the Corpus Linguistics Conference, 14-17 July 2005, Birmingham, eds. Pernilla Danielsson and Martijn Wagenmakers, ISSN 1747-9398, http://www.corpus.bham.ac.uk/PCLC/, 2005 10 Cvetana Krstev, Ranka Stanković, Duško Vitas ...
... library. 15 Cvetana Krstev. Processing of Serbian – Automata, Text and Electronic Dictionaries, Faculty of philology, Belgrade, 2008 16 Duško Vitas, Cvetana Krstev, Ivan Obradović, Ljubomir Popović, Gordana Pavlović-Lažetić”, An Processing Serbian Written Texts: An Overview of Resources and ...
... be reached via a synchronized synsets. Figure 4. Sequence diagram a multilingual query expansions 17 Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović, “The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines”, in Proceedings of the Sixth ...
Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
Managing mining project documentation using human language technology

Aleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja (2018)

Purpose: This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing. Design/methodology/approach: The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases. Findings: The ...

Digital libraries, Information retrieval, Data mining, Human language technologies, Project documentation

Aleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja . "Managing mining project documentation using human language technology" in The Electronic Library (2018). https://doi.org/10.1108/EL-11-2017-0239
Претрага корпуса заснована на употреби екстерних лексичких ресурса путем веб-сервиса

Милош Утвић, Ранка Станковић, Александра Томашевић, Михаило Шкорић, Биљана Лазић (2019)

У раду се разматра хибридни приступ претрази корпуса, илустрован на примеру алатки OCWB и NoSketch Engine, примењених на специјални корпус из области рударства (РудКор) и Корпус савременог српског језика (СрпКор). Разматрани приступ комбинује постојеће могућности алатки OCWB и NoSketch Engine, које своју претрагу заснивају на лингвистичкој анотацији корпуса, са новим могућностима претраге у виду консултовања екстерних језичких ресурса (морфолошки електронски речници српског језика и лексичка база података Српски ворднет). Хибридни приступ је реализован надоградњом вебсучеља која поменуте алатке користе ...

корпус, рударство, претраживање информација, проширивање упита, лексички ресурси, лексичке релације

... анотацију, настала је као дериват система морфолошких електронских речника српског језика (у даљем тексту: СМР) чији су аутори Цветана Крстев и Душко Витас (Крстев 2008). Делимична морфолошка анотација у корпусу СрпКор2013 је реализована позиционим атрибутима pos (ознака врсте речи) и lemma (лема) ...
... допуна Српског ворднета свакако један од приоритета када је у питању унапређивање система за семантичко проширивање упита. ИЗВОРИ Корпус 2013: Душко Витас и Милош Утвић, „Корпус савременог српског језика (СрпКор), верзија СрпКор2013”, Група за језичке технологије Универзитета у Београду, http://www ...
... Workbench (CWB 3.4.16), May 2019, http://cwb.sourceforge. net/ files/ CQP_ Tutorial. pdf Крстев и др. 2004: Cvetana Krstev, Gordana Pavlović-Lažetić, Duško Vitas and Ivan Obradović, “Using Textual and Lexical Resources in Developing Ser- bian Wordnet”, Romanian Journal of Information Science and Technology ...
Милош Утвић, Ранка Станковић, Александра Томашевић, Михаило Шкорић, Биљана Лазић. "Претрага корпуса заснована на употреби екстерних лексичких ресурса путем веб-сервиса" in Научни састанак слависта у Вукове дане - Vol. 48/3 Српски језик и његови ресурси, Међународни славистички центар, Филолошки факултет, Универзитет у Београду (2019). https://doi.org/10.18485/msc.2019.48.3.ch12
Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking

Ranka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović (2024)

U radu se prikazuju rezultati istraživanja vezanih za pripremu paralelnih korpusa, fokusirajući se na transformaciju u RDF grafove koristeći NLP Interchange Format (NIF) za lingvističku anotaciju. Pružamo pregled paralelnog korpusa koji je korišćen u ovom studijskom slučaju, kao i proces označavanja delova govora, lematizacije i prepoznavanja imenovanih entiteta (NER). Zatim opisujemo povezivanje imenovanih entiteta (NEL), konverziju podataka u RDF, i uključivanje NIF anotacija. Proizvedene NIF datoteke su evaluirane kroz istraživanje triplestore-a korišćenjem SPARQL upita. Na kraju, razmatra se povezivanje Linked ...

paralelni korpusi, povezivanje imenovanih entiteta, prepoznavanje imenovanih entiteta, NER, NEL, povezani podaci, NIF, Vikipodaci

Ranka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović. "Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
Using English Baits to Catch Serbian Multi-Word Terminology

Cvetana Krstev, Branislava Šandrih, Ranka Stanković (2018)

In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...

aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inﬂection

... nor we lemmatize and POS-tag text. Instead we use shallow parsing relying on extensive morphological e-dictionaries of Serbian (Cvetana Krstev, Duško Vitas, 2015) that not only helps to identify terminology precisely, but also enables production of cor- rect MWT lemmas and consequently all its inflected ...
... Hans Uszkoreit (Series Editors). Springer. Available online at http://www.meta-net.eu/whitepapers. 8. Language Resource References Cvetana Krstev, Duško Vitas. (2015). Serbian Morpho- logical Dictionary - SMD. University of Belgrade, HLT Group and Jerteh, Lexical resource, 2.0. Cvetana Krstev. (2013) ...
... Morocco, may. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2008/. Vitas, D., Popović, L., Krstev, C., Obradović, I., zetić, G. P.-L., and Stanojević, M. (2012). Srpski jezik u digital- nom dobu – The Serbian Language in the Digital Age. META-NET White Paper ...
Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Part of Speech Tagging for Serbian language using Natural Language Toolkit

Ranka Stanković, Boro Milovanović (2020)

Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...

obrada prirodnog jezika, mašinsko učenje, neuronske mreže

... Slovenia, May 2016 [9] C. Krstev, D. Vitas, and T. Erjavec, “MorphoSyntactic Descriptions in MULTEXT-East | the Case of Serbian,” Informatica, vol. 28 no. 4 pp. 431–436, Dec. 2004. [10] M. Gavrilidou, P. Labropoulou, S. Piperidis, V. Giouli, N. Calzolari, M. Monachini, C. Soria, and K. Choukri ...
... Corpus of Contemporary Serbian,” INFOtheca, vol. 12 no. 2 pp 36a-47a, Dec. 2011 [7] M. Constant, C. Krstev, and D. Vitas “Lexical Analysis of Serbian with Conditional Random Fields and Large-Coverage Finite-State Resources”, Proc. 7th Language and Technology Conference (LTC), Poznan, Poland, Nov ...
... [13] M. d. Marneffe, T. Dozat, N. Silveira, K. Haverinen, F. Ginter, J. Nivre, and C. D. Manning, “Universal Dependencies: A cross-linguistic typology,” Proc. Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, May 2014 [14] C. Krstev and D. Vitas, “Serbian ...
Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)
Речник САНУ као база терминолошких речника (на примеру речника кулинарства)

Рада Стијовић, Олга Сабо, Ранка Станковић (2017)

... Krstev, Cvetana, Duško Vitas and Gordana Pavlović-Lažetić. „Resources and methods in the morphosyntactic processing of Serbo-Croatian.” In Gerhild Zybatow et al. (eds.) Formal Description of Slavic Languages: The Fifth Conference, Leipzig 2003, pp. 3-17. Frankfurt am Main. 2. Vitas, D., Popović, Lj ...
... речници користе у истраживањима језика и креирању језичких алата. Морфолошке речнике српског језика развили су проф. др Цветана Крстев и проф. др Душко Витас уз помоћ Групe за језичке технологије Универзитета у Београду. Анализа обрађеног корпуса обухватила је екстракцију речи и фраза засновану на ...
... edition of the Language Resources and Evaluation Conference (LREC), 23-28 May 2016, Portorož. 7. Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas, “Production of morphological dictionaries of multi-word units using a multipurpose tool”, In: Proceedings of the Computational Linguistics-Ap ...
Рада Стијовић, Олга Сабо, Ранка Станковић. "Речник САНУ као база терминолошких речника (на примеру речника кулинарства)" in Словенска терминологија данас, Београд : Српска академија наука и уметности (2017)
Proširivanje upita zasnovano na leksičkim resursima

Ranka Stanković, Ivan Obradović, Cvetana Krstev (2009)

U radu je opisano kako se leksički resursi za srpski jezik i softverski alati, razvijeni u okviru Grupe za jezičke tehnologije Univerziteta u Beogradu, mogu koristiti za unapređenje postavljanja upita. Rezultati pretrage mogu biti značajno unapređeni korišćenjem različitih leksičkih resursa, kakvi su morfološki rečnici i semantičke mreže. Izloženi pristup može se iskoristiti i u Sistemu naučnih, tehnoloških i poslovnih informacija, jer je efikasno pretraživanje ovog dragocenog resursa, imajući u vidu njegovu heterogenost i obim, kao i preovladavajući tekstualni sadržaj, ...

... , Bucureşti, Publishing house of the Romanian academy. [3] Vitas D., Krstev C. (2007), „Extending Serbian E dictionary by the Use of the Lexical Transducers“, Proc. of the 6th and 7th INTEX/Nooj Workshop, S. Koeva, D. Maurel, M. Silberztein (eds.), Formaliser les langues avec l'ordinateur: ...
... Lexical Database, The MIT Press. [5] Maurel D., Vitas D., Krstev S., Koeva S., (2007) „Prolex: a lexical model for translation of proper names. Application to French, Serbian and Bulgarian“, BULAG n°32, 2007. [6] Krstev C., Stanković R., Vitas D., Obradović I., “WS4LR: A Workstation for Lexical ...
... fakultetu Univerziteta u Beogradu već duži niz godina, tako da je danas na raspolaganju veliki broj različitih resursa, razvijenih u značajnom obimu (Vitas et al., 2003). Pored korpusa srpskog jezika, kao i višejezičnih paralelnih korpusa, od posebnog su značaja sistem morfoloških rečnika srpskog jezika ...
Ranka Stanković, Ivan Obradović, Cvetana Krstev. "Proširivanje upita zasnovano na leksičkim resursima" in SNTPI 09 - Naučno-stručni skup Sistem naučnih, tehnoloških i poslovnih informacija, Beograd 19. i 20. jun 2009, Beograd : Fakultet informacionih tehnologija (2009)
Towards Automatic Definition Extraction for Serbian

Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić (2021)

U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...

... nitexgramlab.org/) 2 A part of this lexicon is publicly available for use within the Unitex system words or a recognized syntactic structure (Vitas & Krstev 2012). Finite state transducers are visualized by graphs for easier development and use. A local grammar and its corresponding graph that ...
... Learning to define word embeddings in natural language. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 31, No. 1). Krstev, C., Vitas, D. & Stanković, R. (2015). A Lexical Approach to Acronyms and their Definitions. In Proceedings of 7th Language & Technology Conference, November ...
... extraction in free-and semi-structured text. In Proceedings of the 13th Linguistic Annotation Workshop, 2019, pp. 124–131. Stanković, R., Stijović, R., Vitas, D., Krstev, C. & Sabo O. (2018). The Dictionary of the Serbian Academy: from the Text to the Lexical Database. In: Proceedings of the XVIII EURALEX ...
Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)

Претрага

1017 items

A System for Named Entity Recognition Based on Local Grammars cite

Combining Heterogeneous Lexical Resources cite

Distribution of canonical syllable types in Serbian cite

Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection cite

Bilingual lexical extraction based on word alignment for improving corpus search cite

Resource-based WordNet Augmentation and Enrichment cite

From DELA Based Dictionary to Leximirka Lexical Database cite

Electronic Dictionaries - from File System to lemon Based Lexical Database cite

Rule-based Automatic Multi-word Term Extraction and Lemmatization cite

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian cite

An Italian-Serbian Sentence Aligned Parallel Literary Corpus cite

Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis cite

Managing mining project documentation using human language technology cite

Претрага корпуса заснована на употреби екстерних лексичких ресурса путем веб-сервиса cite

Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking cite

Using English Baits to Catch Serbian Multi-Word Terminology cite

Part of Speech Tagging for Serbian language using Natural Language Toolkit cite

Речник САНУ као база терминолошких речника (на примеру речника кулинарства) cite

Proširivanje upita zasnovano na leksičkim resursima cite

Towards Automatic Definition Extraction for Serbian cite

A System for Named Entity Recognition Based on Local Grammars

Combining Heterogeneous Lexical Resources

Distribution of canonical syllable types in Serbian

Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection

Bilingual lexical extraction based on word alignment for improving corpus search

Resource-based WordNet Augmentation and Enrichment

From DELA Based Dictionary to Leximirka Lexical Database

Electronic Dictionaries - from File System to lemon Based Lexical Database

Rule-based Automatic Multi-word Term Extraction and Lemmatization

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

An Italian-Serbian Sentence Aligned Parallel Literary Corpus

Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis

Managing mining project documentation using human language technology

Претрага корпуса заснована на употреби екстерних лексичких ресурса путем веб-сервиса

Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking

Using English Baits to Catch Serbian Multi-Word Terminology

Part of Speech Tagging for Serbian language using Natural Language Toolkit

Речник САНУ као база терминолошких речника (на примеру речника кулинарства)

Proširivanje upita zasnovano na leksičkim resursima

Towards Automatic Definition Extraction for Serbian