IKKI TILLI PARALLEL MATNLARDA SOHA TERMINLARINI AVTOMATIK EKSTRAKSIYA QILISH VA ULARNING SEMANTIK EKVIVALENTLIGINI ANIQLASH ALGORITMLARI

Zumrad Gafarova

Authors

Zumrad Gafarova Asian International University Author

Keywords:

parallel corpora, mono- and multilingual embeddings, neural approaches, bilingual terms, alignment, semantic equivalent alignment.

Abstract

This article proposes an integrated algorithmic framework for automatic term extraction (ATE) and the alignment of their semantic equivalents (bilingual term alignment / bilingual lexicon induction) in bilingual parallel and comparable corpora. We integrate traditional statistical and morphological methods (C-value, TF–IDF, Alban) with modern neural approaches (mono- and multilingual embeddings, contextual transformer models, and word alignment). The experimental section provides an evaluation based on precision, recall, and MAP metrics using parallel corpora and domain-specific comparable corpora.

References

1. Rigouts Terryn, A., Hoste, V., Lefever, E. In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Language Resources and Evaluation. 2019. – P. 12-20.

2. Jiaji Huang, Xingyu Cai, Kenneth Church. Improving Bilingual Lexicon Induction for Low Frequency Words. EMNLP 2020. – P. 45-58.

3. Chris Dyer, Victor Chahuneau, Noah A. Smith. (2013). A Simple, Fast, and Effective Reparameterization of IBM Model 2 (fast_align). 2013. – P. 178.

4. Jingshu Liu, Emmanuel Morin, Peña Saldarriaga. Towards a unified framework for bilingual terminology extraction of single-word and multi-word terms. COLING. 2018. – P. 34.

5. Véronique Hoste. In no uncertain terms (dataset paper). awesome-align, neural aligner based on mBERT. 2019. – P. 57.

ALGORITHMS FOR AUTOMATIC EXTRACTION OF DOMAIN TERMS IN BILINGUAL PARALLEL TEXTS AND IDENTIFYING THEIR SEMANTIC EQUIVALENCE

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Language