Unify Word-level and Span-level Tasks: NJUNLP’s Participation for the WMT2023 Quality Estimation Shared Task

Xiang Geng; Zhejian Lai; Yu Zhang; Shimin Tao; Hao Yang; Jiajun Chen; Shujian Huang

doi:10.18653/v1/2023.wmt-1.71

Unify Word-level and Span-level Tasks: NJUNLP’s Participation for the WMT2023 Quality Estimation Shared Task

Xiang Geng, Zhejian Lai, Yu Zhang, Shimin Tao, Hao Yang, Jiajun Chen, Shujian Huang

Abstract

We introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe). We generate pseudo MQM data using parallel data from the WMT translation task. We pre-train the XLMR large model on pseudo QE data, then fine-tune it on real QE data. At both stages, we jointly learn sentence-level scores and word-level tags. Empirically, we conduct experiments to find the key hyper-parameters that improve the performance. Technically, we propose a simple method that covert the word-level outputs to fine-grained error span results. Overall, our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks by a considerable margin.

Anthology ID:: 2023.wmt-1.71
Volume:: Proceedings of the Eighth Conference on Machine Translation
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 829–834
Language:
URL:: https://aclanthology.org/2023.wmt-1.71
DOI:: 10.18653/v1/2023.wmt-1.71
Bibkey:
Cite (ACL):: Xiang Geng, Zhejian Lai, Yu Zhang, Shimin Tao, Hao Yang, Jiajun Chen, and Shujian Huang. 2023. Unify Word-level and Span-level Tasks: NJUNLP’s Participation for the WMT2023 Quality Estimation Shared Task. In Proceedings of the Eighth Conference on Machine Translation, pages 829–834, Singapore. Association for Computational Linguistics.
Cite (Informal):: Unify Word-level and Span-level Tasks: NJUNLP’s Participation for the WMT2023 Quality Estimation Shared Task (Geng et al., WMT 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.wmt-1.71.pdf

PDF Cite Search