Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

98 items

Development Of The Serbian Geological Resources Portal

Ranka Stanković, Jelena Prodanović, Olivera Kitanović, Velizar Nikolić (2011)

... ences related to the select dictionary entry are dis- played, as well as terms of hyponym and hypernym concepts. The dictionary can also be searched with the use of key words. After entering a string of characters (word or part of a word), the user is offered a list of dictionary entries where the given ...
... of web services and web applications which consume them. Further steps encompass the creation of a lexicon of mapped units, and integration of the dictionary and cartographic representation of spatial objects in which they appear. Further publication of results of both recent, as well as older projects ...
... Apatin. (In Serbian). STANKOVIĆ, R., TRIVIĆ, B., KITANOVIĆ, O., BLAGOJEVIĆ, B., NIKOLIĆ, V., 2011. “The Development of the GeolISSTerm Terminological Dictionary”, INFOteka: časopis za informatiku i bibliotekarstvo, 12/1, Belgrade. ESRI: GIS and mapping software, http://www.esri.com, ESRI Developer network ...
Ranka Stanković, Jelena Prodanović, Olivera Kitanović, Velizar Nikolić. "Development Of The Serbian Geological Resources Portal" in Proceedings of the 17th Meeting of the Association of European Geological Societies, Belgrade, Serbia : The Serbian Geological Society (2011)
Bilingual lexical extraction based on word alignment for improving corpus search

Jelena Andonovski, Branislava Šandrih, Olivera Kitanović (2019)

Library and Information Sciences,Computer Science Applications

Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
GIS Application Improvement with Multilingual Lexical and Terminological Resources

Ranka Stanković, Ivan Obradović, Olivera Kitanović (2010)

... automatic production of all inflectional forms. The Serbian morphological dictionary of simple words contains 122,000 lemmas, which can generate approximately 1,400,000 different lexical words. The Serbian morphological dictionary of compounds contains about 4,300 lemmas (generating more than 70,000 ...
... stenama AND kvarcnima stenama thus disabling false retrieval. Due to the abundance of compounds in Serbian, the development of a comprehensive dictionary of Serbian compounds is a tedious task. In the attempt to alleviate this problem, we have developed a procedure for automatic creation of lemmas ...
... applications, as is the case here. Namely, it often happens that a technical term, which is frequently a compound, is not in the morphological e-dictionary of compounds. For example, in order to determine the third inflectional transducer for kvarcna stena ‘quartz rock’, the following rule of the ...
Ranka Stanković, Ivan Obradović, Olivera Kitanović. "GIS Application Improvement with Multilingual Lexical and Terminological Resources" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta, May 2010, Valetta, Malta : European Language Resources Association (2010)
A Lexical Approach to Acronyms and their Definitions

Cvetana Krstev, Duško Vitas, Ranka Stanković (2015)

In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.

... Party, not that DS is an acronym for Zoran Djindjić. 3. Lemmatizing the MWU names from the list obtained in Step 2 in order to obtain names in a dictionary form, normally in the singu- lar, nominative case, sometimes in the plural. (3) KFOR - Medjunarodna mirovna snaga na Kosovu (nominative, singular) ...
... (A:fp1). This becomes a value of a variable $a$ (upper part of the graph in Fig. 2), and its lemma ($a.LEMMA$) is retrieved from the following e-dictionary lines (lower part of the same graph): (8) mirovne,mirovan.A:aefs2g mirovne,mirovan.A:aefp1g 4In Unitex complex grammars can be modelled by using ...
... input is different and the used e-dictionaries as well. For the same example as before and the form (sim- ple word lemma) mirovan the following e-dictionary lines are used: (10) mirovan,mirovne.A:aefs2g mirovan,mirovne.A:aefp1g This form of e-dictionaries is obtained from the previous form by exchanging ...
Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities

Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov (2024)

Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...

полилексемске језинице, именовани ентитет, вишезначност значења речи, складиште смисла, LLOD

Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
Managing mining project documentation using human language technology

Aleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja (2018)

Purpose: This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing. Design/methodology/approach: The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases. Findings: The ...

Digital libraries, Information retrieval, Data mining, Human language technologies, Project documentation

Aleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja . "Managing mining project documentation using human language technology" in The Electronic Library (2018). https://doi.org/10.1108/EL-11-2017-0239
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration

Cvetana Krstev, Ranka Stanković, Vitas Duško (2010)

In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...

Morphology, Lexicon, lexical database, Standards for LRs

... language resources Kešelj, V., Kešelj, T., and Zlatić, L. (2004). R{j}ecnik.com: English-Serbo-Croatian electronic dictionary. In Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries (Geneva, Switzerland, August 29 - 29, 2004). ACL Workshops. ACL, Morristown, NJ, 61-64. ...
... MULTEXT-East description for Slavic languages (Przepiórkowski, 2003). On the other hand, several applications developed in the frame of LADL e-dictionary format use their own morphological descriptions, most notably ELAG for morphological disambiguation (Laporte & Monceaux, 1999) and Multiflex for ...
... t a Any (b) Multiflex, as a part of the Unitex (Paumier 2008), that we use for the inflection of Serbian compounds and the production of e-dictionary of compound forms uses a very simple morphosyntactic description. The following line describes the inflection of Serbian nouns: noun:(Nb,) ...
Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
An aproach to Implementation of blended learning in a university setting

Ivan Obradović, Ranka Stanković, Olivera Kitanović, Jelena Prodanović (2011)

... terms, we have developed an electronic dictionary of basic GIS terms. Besides the English and the Serbian term, each dictionary entry contains a short definition of the term in both languages, but without any relations between equivalents. An example of a dictionary entry, in English, and then ...
... или у вишекорисничкој релационој бази података ... http://www.esri.com/� http://edn.esri.com/� There is also a rather developed dictionary of statistical terms, organized in a somewhat different manner, containing both the Serbian and the English equivalent within the same entry ...
... Given that a Serbian thesaurus of geological terms is already developed (http://geoliss.ekoplan.gov.rs/term) and that it contains more than 3000 dictionary entries and the same number of English equivalents, and that the development of an ontology related to mining is underway, we now plan to connect ...
Ivan Obradović, Ranka Stanković, Olivera Kitanović, Jelena Prodanović . "An aproach to Implementation of blended learning in a university setting" in Proceedings of the Second International Conference on e-Learning, eLearning 2011, September 2011, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2011)
Improvement of geodatabase queries within GeolISS

Ranka Stanković (2008)

... obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and a geological dictionary. These lexical resources used within WS4QE (Workstation for query expansion) enable semantic and morphological expansion of the query, the latter ...
... Morphological dictionaries enable morphological expansion of the query, very important in highly inflective languages, such as Serbian. The geological dictionary, developed within GeolISS, supports semantic and multilingual expansions of the query. The Human Language Technology group at the University ...
... expansion is implemented in GeolISS search functions. Apart from HLT lexical resources mentioned, for semantic query expansion the geological dictionary developed within GeolISS can also be used, as a taxonomy with definitions for each entry, synonyms and bibliographical references, as well as ...
Ranka Stanković. "Improvement of geodatabase queries within GeolISS" in Review of the National Center for Digitization, Beograd : Faculty of Mathematics, Belgrade (2008)
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian

Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih (2021)

Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...

uvredljivi jezik, govor mržnje, srpski, tviter, leksikon, korpus

... the relation with the Serbian electronic dictionaries and the management platform Leximirka (Figure 6) [22], which enables the recognition of all inflected forms of trigger words. For the ranking and selection of illustrative tweets (or its parts) as a kind of dictionary usage examples, we have used ...
... and lexicons from other languages, lexicons of sentiment words and expressions, rhetorical figures, etc. To expand the dictionary, synsets from the Serbian WordNet and the dictionary of synonyms will be used for linking with Twitter examples. Regarding the categorization of terms in the lexicon, the ...
... and it is calculated based on the number of different meanings in the comprehensive explanatory dictionary of Serbian, and need to match neither corpus nor probability of use. An excerpt from the dictionary for the word lopov (thief) is presented in Listing 1. It can be seen that this word can be used ...
Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
Sentiment Analysis of Serbian Old Novels

Ranka Stanković, Miloš Košprdić, Milica Ikonić Nešić, Tijana Radović (2022)

In this paper we present first study of Sentiment Analysis (SA) of Serbian novels from the 1840-1920 period. The preparation of sentiment lexicon was based on three existing lexicons: NRC, AFFIN and Bing with additional extensive corrections. The first phase of dataset refinement included filtering the word that are not found in Serbian morphological dictionary and in second automatic POS tagging and lemma were manually corrected. The polarity lexicon was extracted and transformed into ontolex-lemon and published as initial ...

sentiment lexicon, sentiment analysis, distant-reading, machine learning, old novels

Ranka Stanković, Miloš Košprdić, Milica Ikonić Nešić, Tijana Radović. "Sentiment Analysis of Serbian Old Novels" in Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data, June 2022, Marseille, France, European Language Resources Association (2022)
Indexing of textual databases based on lexical resources: A case study for Serbian

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović (2015)

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...

... using the finite-state methodology as described in [1], [2]. The role of electronic dictionar- ies, covering both simple words and multi-word units, and dictionary finite-state transducers (FSTs) is text tagging. Each e-dictionary of forms consists of a list of entries supplied with their lemmas, morp ...
... several categories: cartographic content, multimedia, dictionaries and textual databases. The “core” is the whole information system of the Geological Dictionary (Thesaurus) containing about 4,000 geological terms described by definitions, of which about 3,000 have a translation into English. The most important ...
... integration of created indexes will enable the realization of a query expansion by adding synonyms from available resources, such as the geologic dictionary [15] for terminological query terms and WordNet for more general terms. Acknowledgement. This research was supported by the Serbian Ministry of ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
Keyword-Based Search on Bilingual Digital Libraries

Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović (2017)

This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...

Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines

Krstev Cvetana, Stanković Ranka, Vitas Duško, Obradović Ivan (2008)

In this paper we present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for tuning queries before submitting them to a web search engine. We argue that the selection of words chosen for a query, which are of paramount importance for the quality of results obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and wordnets. These dictionaries enable semantic ...

LR web services, MultiWord Expressions & Collocations, Information Extraction, Information Retrieval

... of lemmas accompanied with inflectional class codes which enables a precise production of all inflectional forms. The Serbian morphological dictionary of simple words contains 117,000 lemmas which yields the production of approximately 1,400,000 different lexical words. More than 85,000 simple ...
... lemmas belong to general lexica, while the remaining 32,000 lemmas represent various kinds of simple proper names. The Serbian morphological dictionary of compounds contains approximately 2,700 lemmas (yielding more than 60,000 different forms) and it is being constantly upgrading. 2. Inflectional ...
... process. The prediction of the phrase structure is also based on the frequencies of compound structures that we have obtained from our existing dictionary of compounds. This analysis shows that, not surprisingly, the most frequent structure for compounds with two components is adjective+noun, ...
Krstev Cvetana, Stanković Ranka, Vitas Duško, Obradović Ivan. "The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines" in LREC 2008: Conference on Language Resources and Evaluation, Marrakesh, Morocco, May 2008, European Language Resources Association (ELRA) (2008)
Microstructural and magnetic properties of electrospun hematite/cuprospinel composites

Mira Ristić, Aleksandar Kremenović, Michael Reissner, Željka Petrović, Svetozar Musić (2020)

Phase composition, microstructural and magnetic properties of electrospun hematite/cuprospinel composites were investigated. Samples were synthesized starting with 0 to 10 mol% of copper relative to iron. The round shape of reference electrospun fbres was preserved upon their heating up to 600 °C in air, whereas at 700 °C hollow substructure was additionally formed. In these reference samples the presence of hematite phase was detected by XRPD. A small amount (traces) of Fe3O4 /γ-Fe2O3 was also found, due to the ...

Electrical and Electronic Engineering, Condensed Matter Physics, Atomic and Molecular Physics and Optics, Electronic, Optical and Magnetic Materials

Mira Ristić, Aleksandar Kremenović, Michael Reissner, Željka Petrović, Svetozar Musić. "Microstructural and magnetic properties of electrospun hematite/cuprospinel composites" in Journal of Materials Science: Materials in Electronics, Springer Science and Business Media LLC (2020). https://doi.org/10.1007/s10854-020-03526-0
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis

Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović (2017)

U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...

Omeka, Wordnet, pretraga punog teksta, morfološka i semantička pretraga teksta, proširenje upita

... collection of digital objects stored as electronic documents can vary in size and scope, and can be maintained by individuals, organizations or institutions. The digital content may be stored locally, or accessed remotely via computer networks. An electronic library is a type of information retrieval ...
... capabilities, both full text and metadata search are customized and improved by query expansion via web service relaying on the Serbian morphological dictionary and the Serbian WordNet semantic network for providing morphological and semantic text search expansion. The paper outlines possibilities for further ...
... morphological dictionaries for Serbian language15, Serbian and English WordNets, terminological databases: Termi, GeolISSTerm, RudOnto and Librarian dictionary. Apart from the grammars in the form finite state automata and transducers, system is using rules for inflection of multiword units. Among textual ...
Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić (2020)

The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...

Part-of-Speech tagging, lemmatization, corpus, evaluation, Serbian, morphological dictionary

... The SR BASIC annotated dataset will also be published. Keywords: Part-of-Speech tagging, lemmatization, corpus, evaluation, Serbian, morphological dictionary 1. Introduction The task of assigning to each token its Part-of-Speech cat- egory (noun, verb, adjective, etc.) is a common Natural Language ...
... especially for the Novels test set. This comes as no surprise, due to the fact that it is a very specific text, which is fully covered by the new dictionary used for the TT19 model. Figure 3: Precision of lemmatization per token, obtained by two TreeTagger based taggers 3959 sentences tokens words ...
... y Serbian. INFOtheca, 12(2):36a–47a, December. 8. Language Resource References Cvetana Krstev, Duško Vitas. (2015). Serbian Morpho- logical Dictionary - SMD. University of Belgrade, HLT Group and Jerteh, Lexical resource, 2.0. Duško Vitas, Cvetana Krstev, Ranka Stanković, Miloš Utvić. (2019) ...
Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain

Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović (2021)

U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...

Srpski jezik, semantika okvira, FrameNet, scenario rizika, rudarski korpus, obrada prirodnog jezika

... also downloaded and used locally. As the website states, it can be used for different purposes: as a dictionary for language learning (since it contains more than 13,000 LUs); as a valence dictionary; as a training dataset for semantic role labeling14 which makes it a rich digital language resource (with ...
... used. This was the motivation for creating an online dictionary whose entries are frames rather than lexemes, as found in paper dictionaries, providing a notation better suited to such a complex system. Conceived in such a manner, an online dictionary allows for represen- tation of individual frame elements ...
... frequency lists, collocations, concordances with a narrower and broader con- text. Figure 5 shows the concordances extracted from the Leximirka20 digital dictionary management web app (Stanković et al. 2018) of the adjective-noun pattern containing the noun ризик (risk), while in Figure 6 there is a his- togram ...
Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović (2017)

Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...

... developed using the finite-state methodology as described in [3,7]. The role of electronic dictionar- ies, covering both simple words and multi-word units, and dictionary finite-state transducers (FSTs) is text tagging. Each e-dictionary of forms consists of a list of entries supplied with their lemmas, morp ...
... length, k1 = 1.2, k2 = 0.75 length normalisation; 5. Creating a dictionary of the whole document collection from all words selected in Step 4. For each term Tk in the document collection, k = 1, . . . M , where M is the size of the dictionary of document collection: (a) calculating document frequency dfk ...
... grouped into sev- eral categories: cartographic content, multimedia, dictionaries and textual data- bases. The “core” of GeolISS is the Geological Dictionary (Thesaurus) containing 5,152 geological terms described by definitions, of which 4,839 have a translation into English. The cartographic content ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
Part of Speech Tagging for Serbian language using Natural Language Toolkit

Ranka Stanković, Boro Milovanović (2020)

Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...

obrada prirodnog jezika, mašinsko učenje, neuronske mreže

... of low-resource languages so there’s a modest research on this topic. First attempts to create an automatic PoS tagger for Serbian relied on a dictionary. Delić et al. used custom transformations and rules [5]. Utvić created a parameter file TT11 for a TreeTagger Boro Milovanović is a PhD student ...
... two different tagsets. Tagset is a collection of tags. UD_POS is a Universal Dependency tagset [13]. N_POS is a tagset used in Serbian Morphology Dictionary [14] expanded with a gender category. From the given data we extracted token, N_POS and UD_POS tag. We stripped gender from the N_POS and got ...
... l Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, May 2014 [14] C. Krstev and D. Vitas, “Serbian Morphological Dictionary – SMD,” University of Belgrade, HLT Group and Jerteh, Lexical resource, 2.0, 2015 [15] A. Balvet, D. Stošić, and A. Miletić, (2014). TALC-Sef ...
Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)

Претрага

98 items

Development Of The Serbian Geological Resources Portal cite

Bilingual lexical extraction based on word alignment for improving corpus search cite

GIS Application Improvement with Multilingual Lexical and Terminological Resources cite

A Lexical Approach to Acronyms and their Definitions cite

Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities cite

Managing mining project documentation using human language technology cite

A Description of Morphological Features of Serbian: a Revision using Feature System Declaration cite

An aproach to Implementation of blended learning in a university setting cite

Improvement of geodatabase queries within GeolISS cite

A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian cite

Sentiment Analysis of Serbian Old Novels cite

Indexing of textual databases based on lexical resources: A case study for Serbian cite

Keyword-Based Search on Bilingual Digital Libraries cite

The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines cite

Microstructural and magnetic properties of electrospun hematite/cuprospinel composites cite

Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis cite

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian cite

FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain cite

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources cite

Part of Speech Tagging for Serbian language using Natural Language Toolkit cite

Development Of The Serbian Geological Resources Portal

Bilingual lexical extraction based on word alignment for improving corpus search

GIS Application Improvement with Multilingual Lexical and Terminological Resources

A Lexical Approach to Acronyms and their Definitions

Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities

Managing mining project documentation using human language technology

A Description of Morphological Features of Serbian: a Revision using Feature System Declaration

An aproach to Implementation of blended learning in a university setting

Improvement of geodatabase queries within GeolISS

A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian

Sentiment Analysis of Serbian Old Novels

Indexing of textual databases based on lexical resources: A case study for Serbian

Keyword-Based Search on Bilingual Digital Libraries

The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines

Microstructural and magnetic properties of electrospun hematite/cuprospinel composites

Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Part of Speech Tagging for Serbian language using Natural Language Toolkit