IKKI TILLI PARALLEL MATNLARDA SOHA TERMINLARINI AVTOMATIK EKSTRAKSIYA QILISH VA ULARNING SEMANTIK EKVIVALENTLIGINI ANIQLASH ALGORITMLARI
Kalit so‘zlar:
paralell korpuslar, bir va ko‘p tilli embeddinglar, neyron yondashuv, bilingual term, alignment, ekvivalentlarini moslashtirish.Annotatsiya
Ushbu maqola ikki tilli parallel va comparable korpuslarda soha terminlarini avtomatik aniqlash (automatic term extraction, ATE) va ularning semantik ekvivalentlarini moslashtirish (bilingual term alignment / bilingual lexicon induction) uchun birlashgan algoritmik ramkani taklif etadi. Biz an’anaviy statistik va morfologik usullarni (C-value, TF–IDF, alban) va zamonaviy neyron yondashuvlarni (bir va ko‘p tilli embeddinglar, kontekstual transformer modellari, word-alignment) integratsiya qilamiz. Eksperimental qismda parallel korpuslar va domeniyaga xos comparable korpuslarda baholash usullari – precision/recall/MAP – asosida tahlil beriladi.
Iqtiboslar
1. Rigouts Terryn, A., Hoste, V., Lefever, E. In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Language Resources and Evaluation. 2019. – P. 12-20.
2. Jiaji Huang, Xingyu Cai, Kenneth Church. Improving Bilingual Lexicon Induction for Low Frequency Words. EMNLP 2020. – P. 45-58.
3. Chris Dyer, Victor Chahuneau, Noah A. Smith. (2013). A Simple, Fast, and Effective Reparameterization of IBM Model 2 (fast_align). 2013. – P. 178.
4. Jingshu Liu, Emmanuel Morin, Peña Saldarriaga. Towards a unified framework for bilingual terminology extraction of single-word and multi-word terms. COLING. 2018. – P. 34.
5. Véronique Hoste. In no uncertain terms (dataset paper). awesome-align, neural aligner based on mBERT. 2019. – P. 57.