In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...
... Rychlý, P., Suchomel, V. (2014). The
Sketch Engine: ten years on. Lexicography 1(1),
pp.7--36.
Dunning, T. (1993). Accurate methods for the statistics of
surprise and coincidence. Computational Linguistics,
19(1), pp.61--74.
Frantzi, K., Ananiadou, S., and Mima, H. (2000).
Automatic recognition ...
... other nouns, namely oblog ‘stupe’ and trak ‘tentacle’,
yielding various interpretations for various inflected
forms, as presented in Table 1. Data in Table 1 show that
only if forms oblogo trake and oblogama trake are
extracted from a text a correct lemma can be associated
with certainty. But ...
... them 97% were associated with correct lemmas.
Keywords: term extraction, terminology, multi-word units, lemmatization, finite-state transducers
1. Motivation
Various approaches have been proposed for multi-word
term (MWT) extraction as this problem has been gaining
in importance in the field ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)