U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...
... well as the application that imple-
ments the method, are available on-line.
KEYWORDS: terminology extraction,
terminology validation, GIZA++, graphs,
Unitex, text classification.
PAPER SUBMITTED: 30 September 2019
PAPER ACCEPTED: 20 December 2019
Branislava Šandrih
branislava.sandrih@fil.bg.ac.rs
University ...
... of two modules for
the terminology extraction. The first module is a rule-based system re-
lying on e-dictionaries and local grammars developed in Unitex,6 that
are implemented as finite-state transducers (FST). The second module
implements various statistical measures used for ranking of term candi- ...
... containing aligned En-
glish/Serbian Single and Multi-Word literals was compiled. This list was
then merged with the bilingual list yielding a new list.
6 Unitex/GramLab, a lexical-based corpus processing suite
7 Serbian WordNet
124 Infotheca Vol. 19, No. 2, December 2019
Scientific paper
2. To each Serbian ...
Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6