Skip to main content
Пријава

Collected Item: “Transformer-Based Composite Language Models for Text Evaluation and Classification”

Врста публикације

Рад у часопису

Верзија рада

објављена верзија

Језик рада

енглески

Аутор/и (Милан Марковић, Никола Николић)

Mihailo Škorić, Miloš Utvić, Ranka Stanković

Наслов рада (Наслов - поднаслов)

Transformer-Based Composite Language Models for Text Evaluation and Classification

Наслов часописа

Mathematics

Издавач (Београд : Просвета)

MDPI AG

Година издавања

2023

Сажетак на енглеском језику

Parallel natural language processing systems were previously successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modeling, for which they achieved significantly better results than independent methods in the cases of seven European languages. The aim of this paper is to present the advantages of using composite language models in the processing and evaluation of texts written in arbitrary highly inflective and morphology-rich natural language, particularly Serbian. A perplexity-based dataset, the main asset for the methodology assessment, was created using a series of generative pre-trained transformers trained on different representations of the Serbian language corpus and a set of sentences classified into three groups (expert translations, corrupted translations, and machine translations). The paper describes a comparative analysis of calculated perplexities in order to measure the classification capability of different models on two binary classification tasks. In the course of the experiment, we tested three standalone language models (baseline) and two composite language models (which are based on perplexities outputted by all three standalone models). The presented results single out a complex stacked classifier using a multitude of features extracted from perplexity vectors as the optimal architecture of composite language models for both tasks.

Волумен/том или годиште часописа

11

Број часописа

22

DOI број

10.3390/math11224660

ISSN број часописа

2227-7390

Кључне речи на српском (одвојене знаком ", ")

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Кључне речи на енглеском (одвојене знаком ", ")

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Линк

https://www.mdpi.com/2227-7390/11/22/4660/pdf

Шира категорија рада према правилнику МПНТ

M20

Ужа категорија рада према правилнику МПНТ

М21а

Степен доступности

Отворени приступ

Лиценца

All rights reserved

Формат дигиталног објекта

.pdf
Click here to view the corresponding item.