Chinese National Conference on Computational Linguistics (2023)


up

pdf (full)
bib (full)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

pdf bib
基于推理链的多跳问答对抗攻击和对抗增强训练方法(Reasoning Chain Based Adversarial Attack and Adversarial Augmentation Training for Multi-hop Question Answering)
Jiayu Ding (佳玙丁,) | Siyuan Wang (王思远) | Zhongyu Wei (魏忠钰) | Qin Chen (陈琴) | Xuanjing Huang (黄萱菁)

“本文提出了一种基于多跳推理链的对抗攻击方法,通过向输入文本中加入对抗性的攻击文本,并测试问答模型在干扰数据下生成答案的准确性,以检测问答模型真正执行多跳推理的能力和可解释性。该方法首先从输入文本中抽取从问题实体到答案实体的推理链,并基于推理链的特征把多跳问题分为了不同的推理类型,提出了一个模型来自动化实现问题拆解和推理类型预测,然后根据推理类型对原问题进行修改来构造攻击干扰句。实验对多个多跳问答模型进行了对抗攻击测试,所有模型的性能都显著下降,验证了该攻击方法的有效性以及目前问答模型存在的不足;向原训练集中加入对抗样本进行增强训练后,模型性能均有所回升,证明了本对抗增强训练方法可以提升模型的鲁棒性。”

pdf bib
基于不完全标注的自监督多标签文本分类(Self-Training With Incomplete Labeling For Multi-Label Text Classification)
Junfei Ren (任俊飞) | Tong Zhu (朱桐) | Wenliang Chen (陈文亮)

“多标签文本分类((Multi-Label Text Classification, MLTC)旨在从预定义的候选标签集合中选择一个或多个文本对应的类别,是自然语言处理C)旨在从预定义的候选标签集合中选择一个或多个文本对应的类别,是自然语言处理(Natural Language Processing,NLP)的一项基本任务。前人工作大多基于规范且全面的标注数据集,而这些规范数据集需要严格的质量控制,一般很难获取。在真实的标注过程中,难免会丢失掉一些相关标签,进而导致不完全标注问题。为此本文提出了一种基于局部标注的自监督框架(Partial Self-Training,PST),该框架利用教师模型自动地给大规模无标注数据打伪标签,同时给不完全标注数据补充缺失标签,最后再利用这些数据反向更新教师模型。在合成数据集和真实数据集上的实验表明,本文提出的PST框架兼容现有的各类多标签文本分类模型,并且可以缓解不完全标注数据对模型的影响。”

pdf bib
融合汉越关联关系的多语言事件观点对象识别方法(A Multilingual Event Opinion Target Recognition Method Incorporating Chinese and Vietnamese Association Relations)
Gege Li (李格格) | Junjun Guo (郭军军) | Zhengtao Xu (余正涛) | Yan Xiang (相艳)

“越南语观点对象识别是越南语事件观点分析的重要研究内容。由于汉越两种语言的语法结构上存在差异,使得多语言事件关联复杂,观点对象表征困难。现有研究方法仅能实现汉越双语的表征,未能有效捕获并利用汉越双语事件中要素的关联关系。因此,本文提出一种融合汉越关联关系的多语言事件观点对象识别方法,利用中文和越南语事件间的要素共现和整体语义关联构建汉越多语言事件表征网络,使用多语言预训练语言模型获得要素节点的特征向量,利用图卷积网络对节点信息进行聚合,得到同一语义空间下汉越双语的公共表征,实现汉越事件观点对象的识别。实验结果表明本文模型能够更有效地构建多语言关联信息,其F1值较多个基线模型都有明显提高。”

pdf bib
基于网络词典的现代汉语词义消歧数据集构建(Construction of a Modern Chinese Word Sense Dataset Based on Online Dictionaries)
Fukang Yan (严福康) | Yue Zhang (章岳) | Zhenghua Li (李正华)

“词义消歧作为自然语言处理最经典的任务之一,旨在识别多义词在给定上下文中的正确词义。相比英文,中文的一词多义现象更普遍,然而当前公开发布的汉语词义消歧数据集很少。本文爬取并融合了两个公开的网络词典,并从中筛选1083个词语和相关义项作为待标注对象。进而,从网络数据及专业语料中为抽取相关句子。最后,以多人标注、专家审核的方式进行了人工标注。数据集1包含将近2万个句子,即每个词平均对应约20个句子。本文将数据集划分为训练集、验证集和测试集,对多种模型进行实验对比。”

pdf bib
基于多意图融合框架的联合意图识别和槽填充(A Multi-Intent Fusion Framework for Joint Intent Detection and Slot Filling)
Shangjian Yin (尹商鉴) | Peijie Huang (黄沛杰) | Dongzhu Liang (梁栋柱) | Zhuoqi He (何卓棋) | Qianer Li (黎倩尔) | Yuhong Xu (徐禹洪)

“近年来,多意图口语理解(SLU)已经成为自然语言处理领域的研究热点。当前先进的多意图SLU模型采用图-交互式框架进行联合多意图识别和槽位填充,能够有效地捕捉到词元级槽位填充任务的细粒度意图信息,取得了良好的性能。但是,它忽略了联合作用下的意图所包含的丰富信息,没有充分利用多意图信息对槽填充任务进行指引。为此,本文提出了一种基于多意图融合框架(MIFF)的联合多意图识别和槽填充框架,使得模型能够在准确地识别不同意图的同时,利用意图信息为槽填充任务提供更充分的指引。我们在MixATIS和MixSNIPS两个公共数据集上进行了实验,结果表明,我们的模型在性能和效率方面均超过了当前最先进的方法,同时能够有效从单领域数据集泛化到多领域数据集上。”

pdf bib
基于词频效应控制的神经机器翻译用词多样性增强方法(Improving Word-level Diversity in Neural Machine Translation by Controlling the Effects of Word Frequency)
Xuewen Shi (史学文) | Ping Jian (鉴萍) | Yikun Tang (唐翼琨) | Heyan HUang (黄河燕)

“通过最大似然估计优化的神经机器翻译(NMT)容易出现不可最大化的标记或低频词精度差等问题,这会导致生成的翻译缺乏词级别的多样性。词频在训练数据上的不均衡分布是造成上述现象的原因之一。本文旨在通过限制词频对 NMT 解码时估计概率的影响来缓解上述问题。具体地,我们采用了基于因果推断理论的半同胞回归去噪框架,并结合本文提出的自适应去噪系数来控制词频对模型估计概率的影响,以获得更准确的模型估计概率,并丰富 NMT 译文用词的多样性。本文的实验在四个代表不同资源规模的翻译任务上进行,分别是维吾尔语-汉语、汉语-英语、英语-德语和英语-法语。实验结果表明,本文所提出的方法在提升 NMT 译文词级别多样性的同时,不会损害译文的质量。另外,本文提出的方法还具有模型无关、可解释性强等优点。”

pdf bib
基于语音文本跨模态表征对齐的端到端语音翻译(End-to-end Speech Translation Based on Cross-modal Representation Alignment of Speech and Text)
Ling Zhou, Guojiang ang Dong | Zhengtao Yu | Shengxiang Gao | Wenjun Wang | Houli Ma | 国江 周 | 凌 董 | 正涛 余 | 盛祥 高 | 文君 王 | 候丽 马

“端到端语音翻译需要解决源语言语音到目标语言文本的跨语言和跨模态映射,有限标注数据条件下,建立语音文本表征间的统一映射,缓解跨模态差异是提升语音翻译性能的关键。本文提出语音文本跨模态表征对齐方法,对语音文本表征进行多粒度对齐并进行混合作为并行输入,基于多模态表征的一致性约束进行多任务融合训练。在MuST-C数据集上的实验表明,本文所提方法优于现有端到端语音翻译跨模态表征相关方法,有效提升了语音翻译模型跨模态映射能力和翻译性能。”

pdf bib
基于离散化自监督表征增强的老挝语非自回归语音合成方法(A Discretized Self-Supervised Representation Enhancement based Non-Autoregressive Speech Synthesis Method for Lao Language)
Zijian Feng (冯子健) | Linqin Wang (王琳钦) | Shengxaing Gao (高盛祥) | Zhengtao Yu (余正涛) | Ling Dong (董凌)

“老挝语的语音合成对中老两国合作与交流意义重大,但老挝语语音发音复杂,存在声调、音节及音素等发音特性,现有语音合成方法在老挝语上效果不尽人意。基于注意力机制建模的自回归模型难以拟合复杂的老挝语语音,模型泛化能力差,容易出现漏字、跳字等灾难性错误,合成音频缺乏自然性和流畅性。本文提出基于离散化自监督表征增强的老挝语非自回归语音合成方法,结合老挝语的语言语音特点,使用老挝语音素粒度的标注时长信息构建非自回归架构声学模型,通过自监督学习的预训练语音模型来提取语音内容和声调信息的离散化表征,融入到声学模型中增强模型的语音生成能力,增强合成音频的流畅性和自然性。实验证明,本文方法合成音频达到了4.03的MOS评分,基于离散化自监督表征增强的非自回归建模方法,能更好的在声调、音素时长、音高等细粒度层面刻画老挝语的语音特性。”

pdf bib
面向机器翻译的汉英小句复合体转换生成能力调查(Investigation of the Clause Complexes Transfer and Generation Capability from Chinese to English for Machine Translation)
Fukun Xing (邢富坤) | Jianing Xu (徐佳宁)

“小句复合体由小句组合而成,不同语言在小句的组合模式上存在差异,该差异对机器翻译有何影响尚不清楚。本文以汉英机器翻译为例,选取多语体的汉语小句复合体及专家译文,从话头共享关系和共享类型两方面对主流机器翻译系统以及ChatGPT开展调查。结果显示,与专家译文相比,机器翻译的小句复合体转换生成能力存在较大不足,表现为机器翻译在话头补足、转换、提炼等方面的能力较弱,小句组合模式单一且带有明显的汉语原文痕迹,译文的地道性受到较大影响。”

pdf bib
基于端到端预训练模型的藏文生成式文本摘要(Abstractive Summarization of Tibetan Based on end-to-end Pre-trained Model)
Shuo Huang (黄硕) | Xiaodong Yan (闫晓东) | Xinpeng OuYang (欧阳新鹏) | Jinpeng Yang (杨金鹏)

“近年来,预训练语言模型受到了广泛的关注,这些模型极大地促进了自然语言处理在不同下游任务中的应用。文本摘要作为自然语言处理中的一个重要分支,可以有效的减少冗余信息,从而提高浏览文本速度。藏文作为低资源语言,缺乏用于大规模的训练语料,藏文生成式文本摘要研究还处于起步阶段,为了解决藏文生成式文本摘要的问题,本文首次提出将端到端的预训练语言模型CMPT(Chinese Minority Pre-Trained Language Model)用于藏文生成式文本摘要研究,CMPT模型通过对其他不同低资源语言文本进行去噪和对比学习,同时为了提高编码器的理解能力,在编码器的输出层增加一个单层掩码语言模型(MLM)解码器,进行Seq2Seq的生成和理解的联合预训练。通过进一步微调可以有效地提高在藏文文本摘要任务上的性能。为了验证模型的性能,我们在自己构建的5w条藏文文本摘要数据集和公开数据集Ti-SUM上进行实验,在两个数据集上的实验表明,我们提出的方法在藏文生成式文本摘要的评测指标上有显著提升。同时,该方法不仅可以应用于藏文文本摘要任务,也可以拓展到其他语言的文本摘要任务中,具有较好的推广价值。”

pdf bib
融合多粒度特征的缅甸语文本图像识别方法(Burmese Language Recognition Method Fused with Multi-Granularity Features)
Enyu He (何恩宇) | Rui Chen (陈蕊) | Cunli Mao (毛存礼) | Yuxin Huang (黄于欣) | Shengxaing Gao (高盛祥) | Zhengtao Yu (余正涛)

“缅甸语属于东南亚低资源语言,缅甸语文本图像识别对开展缅甸语机器翻译等任务具有重要意义。由于缅甸语属于典型的字符组合型语言,一个感受野内存在多个字符嵌套,现有缅甸语识别方法主要是从字符粒度进行识别,在解码时会出现某些字符未能正确识别而导致局部乱码。考虑到缅甸语存在特殊的字符组合规则,本文提出了一种融合多粒度特征的缅甸语文本图像识别方法,将较细粒度的字符粒度和较粗粒度的字符簇粒度进行序列建模,然后将两种粒度特征序列进行融合后利用解码器进行解码。实验结果表明,该方法能够有效缓解识别结果乱码的现象,并且在人工构建的数据集上相比“VGG16+BiLSTM+Transformer”的基线模型识别准确率提高2.4%,达到97.35%。 "

pdf bib
TiKEM:基于知识增强的藏文预训练语言模型(TiKEM: Knowledge Enhanced Tibetan Pre-trained Language Model)
Junjie Deng (邓俊杰) | Long Chen (陈龙) | Yan Zhang (张廷) | YUan Sun (孙媛) | Xiaobin Zhao (赵小兵)

“预训练语言模型在中英文领域有着优异的表现,而低资源语言数据获取难度大,预训练语言模型在低资源语言如藏文上的研究刚取得初步进展。现有的藏文预训练语言模型,使用大规模无结构的文本语料库进行自监督学习,缺少外部知识指导,知识记忆能力和知识推理能力受限。为了解决以上问题,本文构建含有50万个三元组知识的藏文知识增强预训练数据集,联合结构化的知识表示和无结构化的文本表示,训练基于知识增强的藏文预训练语言模型TiKEM,以提高模型的知识记忆和推理能力。最后,本文在文本分类、实体关系分类和机器阅读理解三个下游任务中验证了模型的有效性。”

pdf bib
TiKG-30K:基于表示学习的藏语知识图谱数据集(TiKG-30K: A Tibetan Knowledge Graph Dataset Based on Representation Learning)
Wenhao Zhuang (庄文浩) | Ge Gao (高歌) | Yuan Sun (孙媛)

“知识图谱的表示学习旨在通过将实体和关系映射到低维向量空间中来学习知识图谱数据之间的复杂语义关联,为信息检索、智能问答、知识推理等研究提供了支撑。目前知识图谱的表示学习研究主要集中在英、汉等语言,公开高质量数据集(如FB15k-237,WN18RR)对其研究起到非常重要的作用。但是,对于低资源语言(如藏语),由于缺少公开的知识图谱数据集,相关研究任务还处于起步阶段。基于此,本文提出一个公开的藏语知识图谱数据集TiKG-30K,包含了146679个三元组,30986个实体和641种关系,可应用于知识图谱的表示学习及下游任务。针对现有藏语知识图谱数据量少、数据稀疏的问题,本文利用藏文三元组中实体的同指关系,借助其他语言丰富的知识库和非文本介质对知识库进行扩充,通过跨语言近义词检索、合并同义实体和关系、修正错误三元组等技术对知识图谱进行多层优化,最终构建了藏语知识图谱数据集TiKG-30K。最后,本文采用多种经典表示学习模型在TiKG-30K进行了实验,并与英文数据集FB15k-237、WN18RR以及藏文数据集TD50K进行了对比,结果表明,TiKG-30K可以与FB15k-237、WN18RR数据集相媲美。本文将TiKG-30K数据集公开,http://tikg-30k.cmli-nlp.com

pdf bib
噪声鲁棒的蒙古语语音数据增广模型结构(Noise robust Mongolian speech data augmentation model structure)
Zhiqaing Ma (马志强) | Jiaqi Sun (孙佳琦) | Jinyi Li (李晋益) | Jiatai Wang (王嘉泰)

“蒙古语语料库中语音多样性匮乏,虽然花费人力和经费收集数据在一定程度上能够增加语音的数量,但整个过程需要耗费大量的时间。数据增广能够解决这种数据匮乏问题,但数据增广模型的训练数据包含的环境噪声无法控制,导致增广语音中存在背景噪声。本文提出一种TTS和语音增强相结合的语音数据增广方法,以语音的频谱图为基础,从频域和时域两个维度进行语音增强。通过多组实验证明,蒙古语增广语音的合格率达到70%,增广语音的CBAK和COVL分别下降了0.66和0.81,WER和SER下降了2.75%和2.05%。”

pdf bib
基于数据增强的藏文机器阅读有难度问题的生成(Difficult Question Generation of Tibetan Machine Reading Based on Data Enhancement)
Zhengcuo Dan (旦正错) | Long Chen (陈龙) | Junjie Deng (邓俊杰) | Xian Pang (庞仙) | Yuan Sun (孙媛)

“问题生成是机器阅读理解数据集构建的子任务,指让计算机根据给定有(无)答案的上下文,生成流利通顺的问题集。在中英文领域,以端到端为基础的问题生成模型已经得到了很好的发展,并且构建了大批高质量的问答对。但是在低资源语言(藏文)领域,以机器阅读理解、智能问答系统为代表的数据驱动型任务中仍然普遍存在数据量较少和问答对过于简单的问题。因此,本文提出了三种面向藏文机器阅读的有难度问题的生成方法:(1)基于藏文预训练语言模型进行掩码、替换关键词生成不可回答问题。(2)根据相似段落的问题交叉生成不可回答的问题。(3)根据三元组生成具有知识推理的问题。最后,本文在构建的数据集上进行了实验,结果表明,包含不可回答、知识推理等类型的机器阅读理解数据集对模型的理解能力提出了更高的要求。另外,对构建的不可回答问题,从数据集的可读性、关联性和可回答性三个层面验证了数据集的质量。”

pdf bib
融合预训练模型的端到端语音命名实体识别(End-to-End Speech Named Entity Recognition with Pretrained Models)
Tianwei Lan (兰天伟) | Yuhang Guo (郭宇航)

“语音命名实体识别(Speech Named Entity Recognition, SNER)旨在从音频中识别出语音中命名实体的边界、种类和内容,是口语理解中的重要任务之一。直接从语音中识别出命名实体,即端到端方法是SNER目前的主流方法。但是语音命名实体识别的训练语料较少,端到端模型存在以下问题:(1)在跨领域识别的情况下模型的识别效果会有大幅度的下降。(2)模型在识别过程中会因同音词等现象对命名实体漏标、错标,进一步影响命名实体识别的准确性。针对问题(1),本文提出使用预训练实体识别模型构建语音实体识别的训练语料。针对问题(2),本文提出采用预训练语言模型对语音命名实体识别的N-BEST列表重打分,利用预训练模型中的外部知识帮助端到端模型挑选出最好的结果。为了验证模型的领域迁移能力,本文标注了少样本口语型数据集MAGICDATA-NER,在此数据上的实验表明,本文提出的方法相对于传统方法在F1值上有43.29%的提高。”

pdf bib
基于词向量的自适应领域术语抽取方法(An Adaptive Domain-Specific Terminology Extraction Approach Based on Word Embedding)
Xi Tang (唐溪) | Dongchen Jiang (蒋东辰) | Aoyuan Jiang (蒋翱远)

“术语分布呈现长尾特性。为了有效提取低频术语,本文提出了一种基于词向量的自适应术语抽取方法。该方法使用基于假设检验的统计方法,自适应地确定筛选阈值,通过逐步合并文本的强关联性字符串获得候选术语,避免了因固定阈值导致的低频术语遗漏问题;其后,本文基于掩码语言模型获得未登录候选术语的词向量,并通过融合词典知识的密度聚类算法获得候选术语归属的领域簇,将归属于目标领域簇的候选术语认定为领域术语。实验结果表明,我们的方法不仅在但值上优于对比方法,而且在不同体裁的文本中表现更为稳定。该方法能够全面有效地抽取出低频术语,实现领域术语的高质量提取。”

pdf bib
基于句法特征的事件要素抽取方法(Syntax-aware Event Argument Extraction )
Zijian Yu (余子健) | Tong Zhu (朱桐) | Wenliang Chen (陈文亮)

“事件要素抽取(Event Argument Extraction, EAE)旨在从非结构化文本中提取事件参与要素。编码器—解码器(Encoder-Decoder)框架是处理该任务的一种常见策略,此前的研究大多只向编码器端输入文本的字词信息,导致模型泛化和远程依赖处理能力较弱。为此,本文提出一种融入句法信息的事件要素抽取模型。首先对文本分析得到成分句法解析树,将词性标签和各节点的句法成分标签编码,增强模型的文本表征能力。然后,本文提出了一种基于树结构的注意力机制(Tree-Attention)辅助模型更好地感知结构化语义信息,提高模型处理远距离依赖的能力。实验结果表明,本文所提方法相较于基线系统F1值提升2.02%,证明该方法的有效性。”

pdf bib
相似音节增强的越汉跨语言实体消歧方法(Similar syllable enhanced cross-lingual entity disambiguation for Vietnamese-Chinese)
Yujuan Li (李裕娟) | Ran Song (宋燃) | Cunli Mao (毛存礼) | Yuxin Huang (黄于欣) | Shengxiang Gao (高盛祥) | Shan Lu (陆杉)

“跨语言实体消歧是在源语言句子中找到目标语言相对应的实体,对跨语言自然语言处理任务有重要支撑。现有跨语言实体消歧方法在资源丰富的语言上能得到较好的效果,但在资源稀缺的语种上效果不佳,其中越南语-汉语就是一对典型的低资源语言;另一方面,汉语和越南语是非同源语言存在较大差异,跨语言表征困难;因此现有的方法很难适用于越南语-汉语的实体消歧。事实上,汉语和越南语具有相似的音节特点,能够增强越-汉跨语言的实体表示。为更好的融合音节特征,我们提出相似音节增强的越汉跨语言实体消歧方法,缓解了越南语-汉语数据稀缺和语言差异导致性能不佳。实验表明,所提出方法优于现有的实体消歧方法,在R@1指标下提升了5.63%。”

pdf bib
英汉动物词的认知属性计量研究(Quantitative studies of congnitive attributes of English and Chinese animal words)
Ling Hua (华玲) | Bin Li (李斌) | Minxuan Feng (冯敏萱) | Haibo Kuang (匡海波)

“动物词承载了大量人类社会认知映射,不同民族对于同一个词的认知有所异同。通过隐喻研究动物词认知差异是近年来十分流行的趋势,反映人们对词语认知印象的认知属性就是一个简捷的切入口。本文选择《中华传统文化名词认知属性库》中的54种动物,借助中英文认知属性数据库,对比分析英汉语言中的认知属性差异。文章发现动物词的英汉认知属性之间有明显差异,且差异更多表现在主观属性上,并发现了中英文中动物词认知属性的整体异同。”

pdf bib
融合词典信息的古籍命名实体识别研究(A Study on the Recognition of Named Entities of Ancient Books Using Lexical Information)
Wenjun Kang (康文军) | Jiali Zuo (左家莉) | Anquan Jie (揭安全) | Wenbin Luo (罗文兵) | Mingwen Wang (王明文)

“古籍命名实体识别对于古籍实体知识库与语料库的建设具有显著的现实意义。目前古籍命名实体识别的研究较少,主要原因是缺乏足够的训练语料。本文从《资治通鉴》入手,人工构建了一份古籍命名实体识别数据集,以此展开对古籍命名实体识别任务的研究。针对古籍文本多以单字表意且存在大量省略的语言特点,本文采用预训练词向量作为词典信息,充分利用其中蕴涵的词汇信息。实验表明,这种方法可以有效处理古籍文本中人名实体识别的问题。”

pdf bib
结合全局对应矩阵和相对位置信息的古汉语实体关系联合抽取(Joint Extraction of Ancient Chinese Entity Relations by Combining Global Correspondence Matrix and Relative Position Information)
Yiyu Hu (胡益裕) | Jiali Zuo (左家莉) | Xueqiang Ceng (曾雪强) | Zhongying Wan (万中英) | Mingwen Wang (王明文)

“实体关系抽取是信息抽取领域中一项重要任务,目前实体关系抽取任务主要聚焦于英文和现代汉语领域,关于古汉语领域的数据集构建和方法的研究目前却较少。针对这一问题,本文在研究了开源的《资治通鉴》语料后,人工构建了一个古汉语实体关系数据集,并设计了一种结合全局对应矩阵和相对位置信息的实体关系联合抽取方法。最后通过在本文构建的数据集上进行实验,证明了该方法在古汉语实体关系抽取任务上的有效性。”

pdf bib
数字人文视域下的青藏高原文旅知识图谱构建研究——以塔尔寺为例(Research on the Construction of Cultural and Tourism Knowledge Atlas on the Qinghai-Tibet Plateau from the Perspective of Digital Humanity——A case study of Kumbum Monastery)
Xinhao Li (李鑫豪) | Weina Zhao (赵维纳) | Wanyi Zhao (赵婉亦) | Chaoqun Li (李超群)

“青藏地区多元的民族构成以及悠久的历史沉淀孕育出丰富且独特的青藏文化,使得这片雪域圣地焉然成为了“高原文化宝库”。然而受闭塞的交通条件和较滞后的经济水平的限制,青藏地区文旅资源的保护与弘扬工作始终处于滞后状态。本文以数字人文为导向,在提示学习框架下采用联合学习的方式对文本中实体与关系的抽取,实现低资源条件下的知识抽取,形成一套文旅知识图谱构建范式,并以全国重点文物保护单位‘塔尔寺’为代表,完整的介绍了塔尔寺知识图谱从本体设计、原始数据获取、知识抽取到可视化展示的详细流程。最终,本文所构建的塔尔寺知识图谱共包含4705个节点及17386条关系。本文的工作弥补了人文领域青藏文化的结构化数据不足的问题,同时为青藏文旅在数字人文领域的研究提供参考。”

pdf bib
基于互信息最大化和对比损失的多模态对话情绪识别模型(Multimodal Emotion Recognition in Conversation with Mutual Information Maximization and Contrastive Loss)
Qianer Li (黎倩尔) | Peijie Huang (黄沛杰) | Jiawei Chen (陈佳炜) | Jialin Wu (吴嘉林) | Yuhong Xu (徐禹洪) | Peiyuan Lin (林丕源)

“多模态的对话情绪识别(emotion recognition in conversation,ERC)是构建情感对话系统的关键。近年来基于图的融合方法在会话中动态聚合多模态上下文特征,提高了模型在多模态对话情绪识别方面的性能。然而,这些方法都没有充分保留和利用输入数据中的有价值的信息。具体地说,它们都没有保留从输入到融合结果的任务相关信息,并且忽略了标签本身蕴含的信息。本文提出了一种基于互信息最大化和对比损失的多模态对话情绪识别模型MMIC来解决上述的问题。模型通过在输入级和融合级上分级最大化模态之间的互信息(mutual information),使任务相关信息在融合过程中得以保存,从而生成更丰富的多模态表示。本文还在基于图的动态融合网络中引入了监督对比学习(supervised contrastive learning),通过充分利用标签蕴含的信息,使不同情绪相互排斥,增强了模型识别相似情绪的能力。在两个英文和一个中文的公共数据集上的大量实验证明了所提出模型的有效性和优越性。此外,在所提出模型上进行的案例探究有效地证实了模型可以有效保留任务相关信息,更好地区分出相似的情绪。消融实验和可视化结果证明了模型中每个模块的有效性。”

pdf bib
基于语义任务辅助的方面情感分析(Semantic Task-assisted Aspect-based Sentiment Analysis)
Zhaozhen Wu (吴肇真) | Hui Zhao (赵晖) | Tiquan Gu (谷体泉) | Guoyi Cao (曹国义)

“方面情感分析(Aspect-Based Sentiment Analysis,ABSA)任务旨在判断一句话中不同方面的细粒度情感极性。如何有效的捕获句子的语义信息是该任务的关键。现有的大多数分类方法通过引入外部知识并设计复杂的模块来理解句子的语义信息,而忽略了外部解析器的噪音及模型的复杂化。在本文中,我们提出了一种基于语义理解的多任务学习网络,它旨在通过多任务学习从原始语料中捕获句子的语义信息。本文考虑从多任务角度出发,在具有共享参数的原始数据集中,分别提出了两个语义辅助任务:方面上下文顺序预测任务和方面上下文句法依存预测任务。然后,将辅助任务与原始的方面情感分类任务进行多任务的训练得到增强了语义理解的编码器,最后将该编码器用于方面情感分类任务。实验结果表明,模型在三个主要的公开数据集Rest14、Lap14和Twitter上的准确率和Macro-F1值都有较好的表现。”

pdf bib
中国社会道德变化模型与发展动因探究——基于70年《人民日报》的计量与分析 (The Model of Moral Change and Motivation in Chinese Society ——The Vocabulary Analysis of the 70-year ”People’s Daily”)
Hongrui Wang (王弘睿) | Dong Yu (于东) | Pengyuan Liu (刘鹏远) | Liying Ceng (曾立英)

“社会道德的历时变迁研究具有重要意义。通过观察语言使用与道德变迁的历时联系,能够帮助描绘社会道德的变化趋势和发展规律、把握社会道德动态、推进道德建设。目前缺少从词汇角度、利用计算手段对大规模历时语料进行系统、全面的社会道德变迁研究。基于此,该文提出道德主题词历时计量模型,通过计量指标对1946-2015共70年的《人民日报》语料进行了历时计算与分析,观察了70年社会道德主题词的使用选择与变化。研究结果发现,道德词汇的历时使用与社会道德之间存在互动关系,反映出70年中国社会道德的历时变革与发展情况。”

pdf bib
动词视角下的汉语性别表征研究——基于多语体语料库与依存分析(Gendered Representation in Chinese via Verbal Analysis —Based on a Multi-register Corpus and Dependency Parsing)
Yingshi Chen (陈颖诗) | Dong Yu (于东) | Pengyuan Liu (刘鹏远)

“动作是反映性别社会化的重要形式,研究汉语中动词的性别表征,可以找到语言构建不同性别身份的路径,即所采用的方式、形式。本文以依存句法关系为抓手,在四种语体的语料中抽取出和不同性别词构成依存结构的动词,统计出有显著性别差异的动词,并根据性别词充当的句子成分,结合语义进行了定量和定性分析。总体来看,大部分汉语动词表征是中性的,能体现性别的动词是少数,汉语作为一种承载着中华智慧且具有深厚文化底蕴的语言,对性别的表征是中立且平等的,这也体现出了我国的性别平等观念。而在表征性别的动词中,能看到构建男性和女性身份的两种不同路径。显著表征女性的动词在不同语体的语料中均多于显著表征男性的,但是表征男性的动词的语义分布则更为均衡,体现了“男性默认-女性专门”。在司法动词上,女性常常作为暴力行为的受害者,同时施害者男性却隐身了,体现了筜男性主宰笭女性顺从笢。不同语体的动词在构建性别时体现了不同的功能,新闻塑造了较为传统的性别规范,传统和网络文学以不同的形式打破了固有的性别规范。”

pdf bib
基于多任务多模态交互学习的情感分类方法(Sentiment classification method based on multitasking and multimodal interactive learning)
Peng Xue (薛鹏) | Yang Li (李旸) | Suge Wang (王素格) | Jian Liao (廖健) | Jianxing Zheng (郑建兴) | Yujie Fu (符玉杰) | Deyu Li (李德玉)

“随着社交媒体的快速发展,多模态数据呈爆炸性增长,如何从多模态数据中挖掘和理解情感信息,已经成为一个较为热门的研究方向。而现有的基于文本、视频和音频的多模态情感分析方法往往将不同模态的高级特征与低级特征进行融合,忽视了不同模态特征层次之间的差异。因此,本文采用以文本模态为中心,音频模态和视频模态为补充的方式,提出了多任务多模态交互学习的自监督动态融合模型。通过多层的结构,构建了单模态特征表示与两两模态特征的层次融合表示,使模型将不同层次的特征进行融合,并设计了从低级特征渐变到高级特征的融合策略。为了进一步加强多模态特征融合,使用了分布相似性损失函数和异质损失函数,用于学习模态的共性表征和特性表征。在此基础上,利用多任务学习,获得模态的一致性及差异性特征。通过在CMU-MOSI和CMU-MOSEI数据集上分别实验,实验结果表明本文模型的情感分类性能优于基线模型。”

pdf bib
基于动态常识推理与多维语义特征的幽默识别(Humor Recognition based on Dynamically Commonsense Reasoning and Multi-Dimensional Semantic Features)
Tuerxun Tunike | Hongfei Lin | Dongyu Zhang | Liang Yang | Changrong Min | 吐尔逊 吐妮可 | 鸿飞 林 | 冬瑜 张 | 亮 杨 | 昶荣 闵

“随着社交媒体的飞速发展,幽默识别任务在近年来受到研究者的广泛关注。该任务的目标是判断给定的文本是否表达幽默。现有的幽默识别方法主要是在幽默产生理论的支撑下,利用规则或者设计神经网络模型来提取多种幽默相关特征,比如不一致性特征、情感特征以及语音特征等等。这些方法一方面说明情感信息在建模幽默语义当中的重要地位,另一方面说明幽默语义的构建依赖多个维度的特征。然而,这些方法没有充分捕捉文本内部的情感特征,忽略了幽默文本中的隐式情感表达,影响幽默识别的准确性。为了解决这一问题,本文提出一种动态常识与多维语义特征驱动的幽默识别方法CMSOR。该方法首先利用外部常识信息从文本中动态推理出说话者的隐式情感表达,然后引入外部词典WordNet计算文本内部词级语义距离进而捕捉不一致性,同时计算文本的模糊性特征。最后,根据上述三个特征维度构建幽默语义,实现幽默识别。本文在三个公开数据集上进行实验,结果表明本文所提方法CMSOR相比于当前基准模型有明显提升。”

pdf bib
融合Synonyms 词库的专利语义相似度计算研究(Patent Semantic Similarity Calculation by Fusing Synonyms Database)
Xinyu Tong (佟昕瑀) | Jialun Liao (廖佳伦) | Yonghe Lu (路永和)

“一直以来,专利相似度计算和比较等工作都由专利审查员人工进行并做出准确判断。然而,以人工方式分析和研判专利的原创性、实用性以及是否侵权等工作需要投入大量的人力物力资源且效率较低。基于此,本文将ALBERT预训练模型用于专利的文本表示,并通过引入Synonyms近义词库增强专利文本的语义表达能力,探索一种基于语义知识库和深度学习的专利文本表示模型与相似度计算方法。实验结果表明,加入Synonyms近义词库消歧后的专利文本相似性度量的实验准确率有一定的提升。”

pdf bib
中医临床切诊信息抽取与词法分析语料构建及联合建模方法(On Corpus Construction and Joint Modeling for Clinical Pulse Feeling and Palpation Information Extraction and Lexical Analysis of Traditional Chinese Medicine)
Yaqiang Wang (王亚强) | Wen Jiang (蒋文) | Yongguang Jiang (蒋永光) | Hongping Shu (舒红平)

“切诊是中医临床四诊方法中极具中医特色的疾病诊察方法,为中医临床辨证论治提供重要的依据,中医临床切诊信息抽取与词法分析研究具有重要的临床应用价值。本文首次开展了中医临床切诊信息抽取与词法分析语料构建及联合建模方法研究,以万余条中医临床记录为研究对象,提出了一种语料构建框架,分别制定了中医临床切诊信息抽取、中文分词和词性标注语料标注规范,形成了可支撑多任务联合建模的语料,语料最终的标注一致性达到0.94以上。基于同级多任务共享编码参数模型,探索了中医临床切诊信息抽取与词法分析联合建模方法,并验证了该方法的有效性。”

pdf bib
大规模语言模型增强的中文篇章多维度阅读体验量化研究(Quantitative Research on Multi-dimensional Reading Experience of Chinese Texts Enhanced by Large Language Model)
Jiadai Sun (孙嘉黛) | Siyi Tang (汤思怡) | Shike Wang (王诗可) | Dong Yu (于东) | Pengyuan Liu (刘鹏远)

“现有的文本分级阅读研究往往从文本可读性的角度出发,以离散的文本难度等级的形式为读者推荐阅读书目。目前,仍缺少一种研究读者在阅读过程中产生的多方面、深层次阅读体验的体系结构。对此,我们调研了读者在阅读中文篇章过程中产生的不同阅读体验,提出了中文篇章多维度阅读体验的量化体系。我们将阅读过程中呈现的连续性的阅读体验归纳为多种类别,并在此基础上构建了中文篇章多维度阅读体验数据集。同时,我们探究了以大规模语言模型为基础的ChatGPT对阅读体验的量化能力,发现其虽具备强大的信息抽取和语义理解能力,在阅读体验的量化上却表现不佳。但我们发现大规模语言模型所蕴含的能力能够以知识蒸馏的方式协助深层属性的量化,基于此,我们实现了大规模语言模型增强的中文篇章多维阅读体验量化模型。模型在各维度阅读体验上的平均F1值达到0.72,高于ChatGPT的Fewshot结果0.48。”

pdf bib
融合文本困惑度特征和相似度特征的推特机器人检测方法∗(Twitter robot detection method based on text perplexity feature and similarity feature)
Zhongjie Wang (王钟杰) | ZZhaowen Zhang (张朝文) | Wenqi Ding (丁文琪) | Yumeng Fu (付雨濛) | Lili Shan (单丽莉) | Bingquan Liu (刘秉权)

“推特机器人检测任务的目标是判断一个推特账号是真人账号还是自动化机器人账号。随着自动化账号拟人算法的快速迭代,检测最新类别的自动化账号变得越来越困难。最近,预训练语言模型在自然语言生成任务和其他任务上表现出了出色的水平,当这些预训练语言模型被用于推特文本自动生成时,会为推特机器人检测任务带来很大挑战。本文研究发现,困惑度偏低和相似度偏高的现象始终出现在不同时代自动化账号的历史推文中,且此现象不受预训练语言模型的影响。针对这些发现,本文提出了一种抽取历史推文困惑度特征和相似度特征的方法,并设计了一种特征融合策略,以更好地将这些新特征应用于已有的算法模型。本文方法在选定数据集上的性能超越了已有的基准方法,并在人民网主办、传播内容认知全国重点实验室承办的社交机器人识别大赛上取得了冠军。”

pdf bib
差比句结构及其缺省现象的识别补全研究(A Study on Identification and Completion of Comparative Sentence Structures with Ellipsis Phenomenon)
Pengfei Zhou (周鹏飞) | Weiguang Qv (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Bin Li (李斌) | Yanhui Gu (顾彦慧)

“差比句是用来表达两个或多个事物之间的相似或不同之处的句子结构,常用句式为“X比Y+比较结果”。差比句存在多种结构变体且大量存在省略现象,造成汉语语法研究和自然语言处理任务困难,因此实现差比句结构的识别和对其缺省结构进行补全非常有意义。本文采用序列化标注方法构建了一个差比句语料库,提出了一个能够融合字与词信息的LatticeBERT-BILSTM-CRF模型来对差比句结构自动识别,并且能对缺省单位进行自动补全,实验结果验证了方法的有效性。”

pdf bib
基于框架语义场景图的零形式填充方法(A Null Instantiation Filling Method based Frame Semantic Scenario Graph)
Yuzhi Wang (王俞智) | Ru Li (李茹) | Xuefeng Su (苏雪峰) | Zhichao Yan (闫智超) | Juncai Li (李俊材)

“零形式填充是在篇章上下文中为给定句子中的隐式框架语义角色找到相应的填充内容。传统的零形式填充方法采用pipeline模型,容易造成错误传播,并且忽略了显式语义角色及其填充内容的重要性。针对上述问题,本文提出了一种端到端的零形式填充方法,该方法结合汉语框架网信息构建出框架语义场景图并利用GAT对其建模,得到融合了显式框架元素信息的候选填充项表示,增强了模型对句中隐式语义成分的识别能力。在汉语零形式填充数据集上的实验表明,本文提出的模型相较于基于Bert的基线模型F1值提升了9.16%,证明了本文提出方法的有效性。”

pdf bib
基于FLAT的农业病虫害命名实体识别(Named Entity Recognition of Agricultural Pests and Diseases based on FLAT)
Yi Ren (任义) | Jie Shen (沈洁) | Shuai Yuan (袁帅)

“针对传统命名实体识别方法中词嵌入无法表征一词多义及字词融合的模型存在特征提取不够准确的问题,本文提出了一种基于FLAT的交互式特征融合模型,该模型首先通过外部词典匹配获得字、词向量,经过BERT预训练后,通过设计的交互式特征融合模块充分挖掘字词间的依赖关系。另外,引入对抗训练提升模型的鲁棒性。其次,采用了特殊的相对位置编码将数据输入到自注意力机制,最后通过CRF得到全局最优序列。本文模型在农业病虫害数据集上识别的准确率、召回率、F1值分别达到了93.76%、92.14%和92.94%。”

pdf bib
基于结构树库的补语位形容词语义分析及搭配库构建∗(Semantic analysis of complementary adjectives and construction of collocation database based on structural tree library)
Tian Siyu (思雨 田) | Shao Tian (田 邵) | Xun Endong (恩东 荀) | Rao Gaoqi (高琦 饶)

“在形容词充当补语的粘合式述补结构1中,通常以两个谓词性成分连用(”形容词+形容词”、“动词+形容词”)的形式出现,由于这一结构没有形式标记,为计算机自动识别该结构带来了较大的难度,同时,形容词充当补语并不是其最基本、典型(作定语、谓语)的用法,在语言学界与计算语言学界也没有受到足够的关注。因此,该文以补语位的形容词为研究对象,从大规模的句法结构树库中抽取形容词直接作补语的述补结构,并通过编程和人工校验的方式对语料进行降噪,对补语位形容词进行穷尽式检索,得到补语位形容词词表,进一步对补语位形容词的语义进行细分类,构建相应的语义搭配库。不仅可以提升句法切分的正确率,为深层句法语义分析提供语义信息,也可以为语言学本体的相关研究提供参考。”

pdf bib
基于BiLSTM聚合模型的汉语框架语义角色识别(Chinese Frame Semantic Role Identification Based on BiLSTM Aggregation Model)
Xuefei Cao (曹学飞) | Hongji Li (李济洪) | Ruibo Wang (王瑞波) | Qian Niu (牛倩)

“目前,基于神经网络的汉语框架语义角色识别模型的性能依然较低,考虑到神经网络模型的性能受到超参数的影响,本文将超参数调优和模型预测性能的提升统一到基于BiLSTM的聚合模型框架下解决。使用正则化交叉验证进行实验,通过正则化条件约束训练集和验证集的分布差异,避免分布不一致带来的性能波动。将交叉验证得到的结果进行众数投票,以投票后的结果对不同的超参数配置进行评估,并选择若干种没有显著差异的超参数配置构成最优的超参数配置集合。然后将最优的超参数配置集合对应的子模型进行聚合,构造汉语框架语义角色识别的聚合模型。实验结果显示,本文方法的性能较基准模型显著提升了9.56%。”

pdf bib
L2到L1的跨语言激活路径研究——基于词汇识别的ERP数据(Cross-lingual Activation Path from L2 to L1——Based on ERP Data during Word Recognition)
Siqin Yang (杨思琴) | Minghu Jiang (江铭虎)

“跨语言词汇激活模型是当下语言认知与计算研究的热门话题。本研究运用事件相关电位技术(event-related potentials,ERPs)探索了二语学习者在识别二语(second language,简称L2)词汇时激活母语(native language,简称L1)词汇表征的路径。研究设计了隐性启动范式来开展两个实验,通过观察被试能否感知只有激活L1词汇表征才能发现的对译词重复情况这一隐性条件来推测激活结果。脑电结果显示,实验一的被试在执行语义判断任务时,对译词重复与否产生了显著的N400差异,这表明被试经由概念表征激活了L1词汇表征,进而证明了激活路径Path-1(L2>L1)的存在;实验二的被试在执行书写形式判断任务时,在没有语义启动的情况下,同样感知到了对译词这一隐性条件,这表明他们可以由L2词汇表征直接激活L1词汇表征,从而证明了激活路径子Path-2(L2>L1)的存在。总体而言,词汇识别过程中从L2词汇表征到L1词汇表征的激活路径与修正层次模型(the Revised Hierarchical Model,RHM)描绘的词汇产出过程的激活路径类似。据此,本研究推测,尽管大脑在词汇识别和词汇产生过程中采用不同的处理机制,但在跨语言词汇激活过程中,它们依然存在某些共通之处。”

pdf bib
汉语语义构词的资源建设与计算评估(Construction of Chinese Semantic Word-Formation and its Computing Applications)
Yue Wang (王悦) | Yang Liu (刘扬) | Qiliang Liang (梁启亮) | Hansi Wang (王涵思)

“汉语是一种意合型语言,汉语中语素的构词方式与规律是描述、理解词义的重要因素。关于语素构词的方式,语言学界有语法构词与语义构词这两种观点,其中,语义构词对语素间关系的表达更为深入。本文采取语义构词的路线,基于语言学视角,考虑汉语构词特点,提出了一套面向计算的语义构词结构体系,通过随机森林自动标注与人工校验相结合的方式,构建汉语语义构词知识库,并在词义生成的任务上对该资源进行计算评估。实验取得了良好的结果,基于语义构词知识库的词义生成BLEU值达25.07,较此前的语法构词提升了3.17%,初步验证了这种知识表示方法的有效性。该知识表示方法与资源建设将为人文领域和信息处理等多方面的应用提供新的思路与方案。”

pdf bib
基于多尺度建模的端到端自动语音识别方法(An End-to-End Automatic Speech Recognition Method Based on Multiscale Modeling)
Hao Chen (陈昊) | Runlai Zhang (张润来) | Yuhao Zhang (张裕浩) | Chenghao Gao (高成浩) | Chen Xu (许晨) | Anxiang Ma (马安香) | Tong Xiao (肖桐) | Jingbo Zhu (朱靖波)

“近年来,基于深度学习的端到端自动语音识别模型直接对语音和文本进行建模,结构简单且性能上也具有显著优势,逐渐成为主流。然而,由于连续的语音信号与离散的文本在长度及表示尺度上存在巨大差异,二者间的模态鸿沟问题是该类任务一直存在的困扰。为解决该问题,本文提出了多尺度语音识别建模方法,该方法从利用细粒度分布知识的角度出发,建立多个不同尺度形式的文本信息,将特征序列从细粒度的低层次序列逐步对齐预测出文本序列。这种逐级预测的方式能够有效降低预测难度,缓解模态鸿沟带来的影响,并通过融合不同尺度下特征,提高语料信息的丰富性与完整性,进一步增强模型推理能力。本文在LibriSpeech小规模、大规模和TEDLIUM2数据集上实验,相比基线系统词错误率平均降低1.7、0.45和0.76,验证了方法的有效性。”

pdf bib
基于血缘关系结构的亲属关系推理算法研究(A Study on Kinship Inference Algorithm Based on Blood Relationship Structure)
Dawei Lu (卢达威) | Siqin Yang (杨思琴)

“以往的亲属关系推理系统,对推理的正确性无法保证,对复杂的亲属关系推理容易出错;而且难以解决多个亲属关系作为已知条件的亲属关系推理问题。本文在卢达威等(2019)的基础上,首先将推理规则和推理过程形式化和算法化;进而与基于一阶谓词逻辑的推理系统进行了对比,发现基于血缘关系结构的亲属关系推理在知识表示方法和推理规则方面都存在优势,主要表现在于执行效率更高,以及在编写和核查规则时更不容易出错;最后讨论了亲属关系推理算法的时间复杂度问题,发现该推理系统为是线性时间复杂度。本文的算法及其有效性分析得到了实验结果的支持。”

pdf bib
基于深加工语料库的《唐诗三百首》难度分级(The difficulty classification of ‘ Three Hundred Tang Poems ’ based on the deep processing corpus)
Yuyu Huang (黄宇宇) | Xinyu Chen (陈欣雨) | Minxuan Feng (冯敏萱) | Yunuo Wang (王禹诺) | Beiyuan Wang (蓓原王,) | Bin Li (李斌)

“为辅助中小学教材及读本中唐诗的选取,本文基于对《唐诗三百首》分词、词性、典故标记的深加工语料库,据诗句可读性创新性地构建了分级标准,共分4层,共计8项可量化指标:字层(通假字)、词层(双字词)、句层(特殊句式、标题长度、诗句长度)、艺术层(典故、其他修辞、描写手法)。据以上8项指标对语料库中313首诗评分,建立基于量化特征的向量空间模型,以K-means聚类算法将诗歌聚类以对应小学、初中和高中3个学段的唐诗学习。”

pdf bib
基于RoBERTa的中文仇恨言论侦测方法研究(Chinese Hate Speech detection method Based on RoBERTa-WWM)
Xiaojun Rao | Yangsen Zhang | Qilong Jia | Xueyang Liu | 晓俊 饶 | 仰森 张 | 爽 彭 | 启龙 贾 | 雪阳 刘

“随着互联网的普及,社交媒体虽然提供了交流观点的平台,但因其虚拟性和匿名性也加剧了仇恨言论的传播,因此自动侦测仇恨言论对于维护社交媒体平台的文明发展至关重要。针对以上问题,构建了一个中文仇恨言论数据集CHSD,并提出了一种中文仇恨言论侦测模型RoBERTa-CHHSD。该模型首先采用RoBERTa预训练语言模型对中文仇恨言论进行序列化处理,提取文本特征信息;再分别接入TextCNN模型和Bi-GRU模型,提取多层次局部语义特征和句子间全局依赖关系信息;将二者结果融合来提取文本中更深层次的仇恨言论特征,对中文仇恨言论进行分类,从而实现中文仇恨言论的侦测。实验结果表明,本模型在CHSD数据集上的F1值为89.12%,与当前最优主流模型RoBERTa-WWM相比提升了1.76%。”

pdf bib
汉语被动结构解析及其在CAMR中的应用研究(Parsing of Passive Structure in Chinese and Its Application in CAMR)
Kang Hu (康胡,) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Bin Li (李斌) | Yanhui Gu (顾彦慧)

“汉语被动句是一种重要的语言现象。本文采用BIO结合索引的标注方法,对被动句中的被动结构进行了细粒度标注,提出了一种基于BERT-wwm-ext预训练模型和双仿射注意力机制的CRF序列标注模型,实现对汉语被动句中内部结构的自动解析,F1值达到97.31%。本文提出的模型具有良好的泛化性,实验证明,利用本文模型的被动结构解析结果对CAMR图后处理,能有效提高CAMR被动句解析任务的性能。”

pdf bib
人工智能生成语言与人类语言对比研究——以ChatGPT为例(A Comparative Study of Language between Artificial Intelligence and Human: A Case Study of ChatGPT)
Zhu Junhui (君辉 朱) | Wang Mengyan (梦焰 王) | Yang Erhong (尔弘 杨) | Nie Jingran (锦燃 聂) | Wang Yujie (誉杰 王) | Yue Yan (岩 岳) | Yang Liner (麟儿 杨)

“基于自然语言生成技术的聊天机器人ChatGPT能够快速生成回答,但目前尚未对机器作答所使用的语言与人类真实语言在哪些方面存在差异进行充分研究。本研究提取并计算159个语言特征在人类和ChatGPT对中文开放域问题作答文本中的分布,使用随机森林、逻辑回归和支持向量机(SVM)三种机器学习算法训练人工智能探测器,并评估模型性能。实验结果表明,随机森林和SVM均能达到较高的分类准确率。通过对比分析,研究揭示了两种文本在描述性特征、字词常用度、字词多样性、句法复杂性、语篇凝聚力五个维度上语言表现的优势和不足。结果显示,两种文本之间的差异主要集中在描述性特征、字词常用度、字词多样性三个维度。”

pdf bib
古汉语通假字资源库的构建及应用研究(The Construction and Application of an Ancient Chinese Language Resource on Tongjiazi)
Zhaoji Wang (王兆基) | Shirui Zhang (张诗睿) | Xuetao Zhang (张学涛) | Renfen Hu (胡韧奋)

“古籍文本中的文字通假现象较为常见,这不仅为人理解文意造成了困难,也是古汉语信息处理面临的一项重要挑战。为了服务于通假字的人工判别和机器处理,本文构建并开源了一个多维度的通假字资源库,包括语料库、知识库和评测数据集三个子库。其中,语料库收录11000余条包含通假现象详细标注的语料;知识库以汉字为节点,通假和形声关系为边,从字音、字形、字义多个角度对通假字与正字的属性进行加工,共包含4185个字节点和8350对关联信息;评测数据集面向古汉语信息处理需求,支持通假字检测和正字识别两个子任务的评测,收录评测数据19678条。在此基础上,本文搭建了通假字自动识别的系列基线模型,并结合试验结果分析了影响通假字自动识别的因素与改进方法。进一步地,本文探讨了该资源库在古籍整理、人文研究和文言文教学中的应用。”

pdf bib
SpaCE2022中文空间语义理解评测任务数据集分析报告(A Quality Assessment Report of the Chinese Spatial Cognition Evaluation Benchmark)
Xiao Liming (力铭 肖) | Sun Chunhui (春晖 孙) | Zhan Weidong (卫东 詹) | Xing Dan (丹 邢) | Li Nan (楠 李) | Wang Chengwen (诚文 王) | Zhu Fangwei (方韦 祝)

“第二届中文空间语义理解评测任务(SpaCE2022)旨在测试机器的空间语义理解能力,包括三个子任务:(1)中文空间语义正误判断任务;(2)中文空间语义异常归因与异常文本识别任务;(3)中文空间实体识别与空间方位关系标注任务。本文围绕SpaCE2022数据集介绍了标注规范和数据集制作流程,总结了改善数据集质量的方法,包括构建STEP标注体系,规范描述空间语义信息;基于语言学知识生成空间异常句子,提高数据多样性;采取双人标注、基于规则的实时质检、人工抽样审核等方式加强数据质量控制;分级管理标注数据,优选高质量数据进入数据集。通过考察数据集分布情况以及机器表现和人类表现,本文发现SpaCE2022数据集的标签分布存在明显偏差,而且正误判断任务和异常归因任务的主观性强,一致性低,这些问题有待在将来的SpaCE任务设计中做进一步优化。”

pdf bib
基于预训练语言模型的端到端概念体系构建方法(End to End Taxonomy Construction Method with Pretrained Language Model)
Wang Siyi (思懿 王) | He Shizhu (世柱 何) | Liu Kang (康 刘) | Zhao Jun (军 赵)

“概念体系描述概念间上下文关系并组织为层次结构,是一类重要的知识资源。本文研究概念体系的自动构建技术,致力于把一个给定的概念集合(词语集合)按照上下位关系,组织成树状结构的概念体系(概念树)。传统做法将概念体系构建任务分解为概念间上下位语义关系判断及概念层次结构构建这两个独立的子任务。两个子任务缺乏信息反馈,容易造成错误累积等问题。近年来,越来越多任务使用预训练语言模型获取词语的语义特征并判断词语间的语义关系,虽然在概念体系构建中取得了一定效果,但是这类做法只能建模第一个子任务,依然存在错误累计等问题。为了解决分步式方法存在的错误累计问题并有效获取词语及其关系语义特征,本文提出一种基于预训练语言模型的端到端概念体系构建方法,一方面利用预训练语言模型获取概念及其上下位关系的语义信息和部分概念体系结构的结构信息,另一方面利用强化学习端到端地建模概念关系判断和完整体系结构的生成。在WordNet数据集上的实验表明,本文所提方法能取得了良好效果,同等条件下,我们的F1值比最好的模型有7.3%的相对提升。”

pdf bib
Ask to Understand: Question Generation for Multi-hop Question Answering
Li Jiawei | Ren Mucheng | Gao Yang | Yang Yizhe

“Multi-hop Question Answering (QA) requires the machine to answer complex questions by find-ing scattering clues and reasoning from multiple documents. Graph Network (GN) and Ques-tion Decomposition (QD) are two common approaches at present. The former uses the “black-box” reasoning process to capture the potential relationship between entities and sentences, thusachieving good performance. At the same time, the latter provides a clear reasoning logical routeby decomposing multi-hop questions into simple single-hop sub-questions. In this paper, wepropose a novel method to complete multi-hop QA from the perspective of Question Genera-tion (QG). Specifically, we carefully design an end-to-end QG module on the basis of a classicalQA module, which could help the model understand the context by asking inherently logicalsub-questions, thus inheriting interpretability from the QD-based method and showing superiorperformance. Experiments on the HotpotQA dataset demonstrate that the effectiveness of ourproposed QG module, human evaluation further clarifies its interpretability quantitatively, andthorough analysis shows that the QG module could generate better sub-questions than QD meth-ods in terms of fluency, consistency, and diversity.”

pdf bib
Learning on Structured Documents for Conditional Question Answering
Wang Zihan | Qian Hongjin | Dou Zhicheng

“Conditional question answering (CQA) is an important task in natural language processing thatinvolves answering questions that depend on specific conditions. CQA is crucial for domainsthat require the provision of personalized advice or making context-dependent analyses, such aslegal consulting and medical diagnosis. However, existing CQA models struggle with generatingmultiple conditional answers due to two main challenges: (1) the lack of supervised training datawith diverse conditions and corresponding answers, and (2) the difficulty to output in a complexformat that involves multiple conditions and answers. To address the challenge of limited super-vision, we propose LSD (Learning on Structured Documents), a self-supervised learning methodon structured documents for CQA. LSD involves a conditional problem generation method anda contrastive learning objective. The model is trained with LSD on massive unlabeled structureddocuments and is fine-tuned on labeled CQA dataset afterwards. To overcome the limitation ofoutputting answers with complex formats in CQA, we propose a pipeline that enables the gen-eration of multiple answers and conditions. Experimental results on the ConditionalQA datasetdemonstrate that LSD outperforms previous CQA models in terms of accuracy both in providinganswers and conditions.”

pdf bib
Overcoming Language Priors with Counterfactual Inference for Visual Question Answering
Ren Zhibo | Wang Huizhen | Zhu Muhua | Wang Yichao | Xiao Tong | Zhu Jingbo

“Recent years have seen a lot of efforts in attacking the issue of language priors in the field ofVisual Question Answering (VQA). Among the extensive efforts, causal inference is regarded asa promising direction to mitigate language bias by weakening the direct causal effect of questionson answers. In this paper, we follow the same direction and attack the issue of language priorsby incorporating counterfactual data. Moreover, we propose a two-stage training strategy whichis deemed to make better use of counterfactual data. Experiments on the widely used bench-mark VQA-CP v2 demonstrate the effectiveness of the proposed approach, which improves thebaseline by 21.21% and outperforms most of the previous systems.”

pdf bib
Rethinking Label Smoothing on Multi-hop Question Answering
Yin Zhangyue | Wang Yuxin | Hu Xiannian | Wu Yiguang | Yan Hang | Zhang Xinyu | Cao Zhao | Huang Xuanjing | Qiu Xipeng

“Multi-Hop Question Answering (MHQA) is a significant area in question answering, requiringmultiple reasoning components, including document retrieval, supporting sentence prediction,and answer span extraction. In this work, we present the first application of label smoothing tothe MHQA task, aiming to enhance generalization capabilities in MHQA systems while miti-gating overfitting of answer spans and reasoning paths in the training set. We introduce a novellabel smoothing technique, F1 Smoothing, which incorporates uncertainty into the learning pro-cess and is specifically tailored for Machine Reading Comprehension (MRC) tasks. Moreover,we employ a Linear Decay Label Smoothing Algorithm (LDLA) in conjunction with curricu-lum learning to progressively reduce uncertainty throughout the training process. Experimenton the HotpotQA dataset confirms the effectiveness of our approach in improving generaliza-tion and achieving significant improvements, leading to new state-of-the-art performance on theHotpotQA leaderboard.”

pdf bib
Improving Zero-shot Cross-lingual Dialogue State Tracking via Contrastive Learning
Xiang Yu | Zhang Ting | Di Hui | Huang Hui | Li Chunyou | Ouchi Kazushige | Chen Yufeng | Xu Jinan

“Recent works in dialogue state tracking (DST) focus on a handful of languages, as collectinglarge-scale manually annotated data in different languages is expensive. Existing models addressthis issue by code-switched data augmentation or intermediate fine-tuning of multilingual pre-trained models. However, these models can only perform implicit alignment across languages. In this paper, we propose a novel model named Contrastive Learning for Cross-Lingual DST(CLCL-DST) to enhance zero-shot cross-lingual adaptation. Specifically, we use a self-builtbilingual dictionary for lexical substitution to construct multilingual views of the same utterance. Then our approach leverages fine-grained contrastive learning to encourage representations ofspecific slot tokens in different views to be more similar than negative example pairs. By thismeans, CLCL-DST aligns similar words across languages into a more refined language-invariantspace. In addition, CLCL-DST uses a significance-based keyword extraction approach to selecttask-related words to build the bilingual dictionary for better cross-lingual positive examples. Experiment results on Multilingual WoZ 2.0 and parallel MultiWoZ 2.1 datasets show that ourproposed CLCL-DST outperforms existing state-of-the-art methods by a large margin, demon-strating the effectiveness of CLCL-DST.”

pdf bib
Unsupervised Style Transfer in News Headlines via Discrete Style Space
Liu Qianhui | Gao Yang | Yang Yizhe

“The goal of headline style transfer in this paper is to make a headline more attractive whilemaintaining its meaning. The absence of parallel training data is one of the main problems in thisfield. In this work, we design a discrete style space for unsupervised headline style transfer, shortfor D-HST. This model decomposes the style-dependent text generation into content-featureextraction and style modelling. Then, generation decoder receives input from content, style,and their mixing components. In particular, it is considered that textual style signal is moreabstract than the text itself. Therefore, we propose to model the style representation space asa discrete space, and each discrete point corresponds to a particular category of the styles thatcan be elicited by syntactic structure. Finally, we provide a new style-transfer dataset, namedas TechST, which focuses on transferring news headline into those that are more eye-catchingin technical social media. In the experiments, we develop two automatic evaluation metrics— style transfer rate (STR) and style-content trade-off (SCT) — along with a few traditionalcriteria to assess the overall effectiveness of the style transfer. In addition, the human evaluationis thoroughly conducted in terms of assessing the generation quality and creatively mimicking ascenario in which a user clicks on appealing headlines to determine the click-through rate. Ourresults indicate the D-HST achieves state-of-the-art results in these comprehensive evaluations. Introduction”

pdf bib
Lexical Complexity Controlled Sentence Generation for Language Learning
Nie Jinran | Yang Liner | Chen Yun | Kong Cunliang | Zhu Junhui | Yang Erhong

“Language teachers spend a lot of time developing good examples for language learners. For this reason, we define a new task for language learning, lexical complexity controlledsentence generation, which requires precise control over the lexical complexity in thekeywords to examples generation and better fluency and semantic consistency. The chal-lenge of this task is to generate fluent sentences only using words of given complexitylevels. We propose a simple but effective approach for this task based on complexityembedding while controlling sentence length and syntactic complexity at the decodingstage. Compared with potential solutions, our approach fuses the representations of theword complexity levels into the model to get better control of lexical complexity. Andwe demonstrate the feasibility of the approach for both training models from scratch andfine-tuning the pre-trained models. To facilitate the research, we develop two datasetsin English and Chinese respectively, on which extensive experiments are conducted. Ex-perimental results show that our approach provides more precise control over lexicalcomplexity, as well as better fluency and diversity.”

pdf bib
Dynamic-FACT: A Dynamic Framework for Adaptive Context-Aware Translation
Chen Linqing | Wang Weilei

“Document-level neural machine translation (NMT) has garnered considerable attention sincethe emergence of various context-aware NMT models. However, these static NMT models aretrained on fixed parallel datasets, thus lacking awareness of the target document during infer-ence. In order to alleviate this limitation, we propose a dynamic adapter-translator frameworkfor context-aware NMT, which adapts the trained NMT model to the input document prior totranslation. Specifically, the document adapter reconstructs the scrambled portion of the originaldocument from a deliberately corrupted version, thereby reducing the performance disparity be-tween training and inference. To achieve this, we employ an adaptation process in both the train-ing and inference stages. Our experimental results on document-level translation benchmarksdemonstrate significant enhancements in translation performance, underscoring the necessity ofdynamic adaptation for context-aware translation and the efficacy of our methodologies. Introduction”

pdf bib
TERL: Transformer Enhanced Reinforcement Learning for Relation Extraction
Wang Yashen | Shi Tuo | Ouyang Xiaoye | Guo Dayu

“Relation Extraction (RE) task aims to discover the semantic relation that holds between two entitiesand contributes to many applications such as knowledge graph construction and completion. Reinforcement Learning (RL) has been widely used for RE task and achieved SOTA results, whichare mainly designed with rewards to choose the optimal actions during the training procedure,to improve RE’s performance, especially for low-resource conditions. Recent work has shownthat offline or online RL can be flexibly formulated as a sequence understanding problem andsolved via approaches similar to large-scale pre-training language modeling. To strengthen theability for understanding the semantic signals interactions among the given text sequence, thispaper leverages Transformer architecture for RL-based RE methods, and proposes a genericframework called Transformer Enhanced RL (TERL) towards RE task. Unlike prior RL-basedRE approaches that usually fit value functions or compute policy gradients, TERL only outputsthe best actions by utilizing a masked Transformer. Experimental results show that the proposedTERL framework can improve many state-of-the-art RL-based RE methods.”

pdf bib
P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognitiong
Wang Zhuang | Zhang Yijia | An Kang | Zhou Xiaoying | Lu Mingyu | Lin Hongfei

“Multimodal Named Entity Recognition (MNER) is a challenging task in social mediadue to the combination of text and image features. Previous MNER work has focused onpredicting entity information after fusing visual and text features. However, pre-traininglanguage models have already acquired vast amounts of knowledge during their pre-training process. To leverage this knowledge, we propose a prompt network for MNERtasks (P-MNER).To minimize the noise generated by irrelevant areas in the image, wedesign a visual feature extraction model (FRR) based on FasterRCNN and ResNet, whichuses fine-grained visual features to assist MNER tasks. Moreover, we introduce a textcorrection fusion module (TCFM) into the model to address visual bias during modalfusion. We employ the idea of a residual network to modify the fused features using theoriginal text features. Our experiments on two benchmark datasets demonstrate that ourproposed model outperforms existing MNER methods. P-MNER’s ability to leveragepre-training knowledge from language models, incorporate fine-grained visual features,and correct for visual bias, makes it a promising approach for multimodal named entityrecognition in social media posts.”

pdf bib
Self Question-answering: Aspect Sentiment Triplet Extraction via a Multi-MRC Framework based on Rethink Mechanism
Zhang Fuyao | Zhang Yijia | Wang Mengyi | Yang Hong | Lu Mingyu | Yang Liang

“The purpose of Aspect Sentiment Triplet Extraction (ASTE) is to extract a triplet, including thetarget or aspect, its associated sentiment, and related opinion terms that explain the underlyingcause of the sentiment. Some recent studies fail to capture the strong interdependence betweenATE and OTE, while others fail to effectively introduce the relationship between aspects andopinions into sentiment classification tasks. To solve these problems, we construct a multi-roundmachine reading comprehension framework based on a rethink mechanism to solve ASTE tasksefficiently. The rethink mechanism allows the framework to model complex relationships be-tween entities, and exclusive classifiers and probability generation algorithms can reduce queryconflicts and unilateral drops in probability. Besides, the multi-round structure can fuse explicitsemantic information flow between aspect, opinion and sentiment. Extensive experiments showthat the proposed model achieves the most advanced effect and can be effectively applied toASTE tasks.”

pdf bib
Enhancing Ontology Knowledge for Domain-Specific Joint Entity and Relation Extraction
Xiong Xiong | Wang Chen | Liu Yunfei | Li Shengyang

“Pre-trained language models (PLMs) have been widely used in entity and relation extractionmethods in recent years. However, due to the semantic gap between general-domain text usedfor pre-training and domain-specific text, these methods encounter semantic redundancy anddomain semantics insufficiency when it comes to domain-specific tasks. To mitigate this issue,we propose a low-cost and effective knowledge-enhanced method to facilitate domain-specificsemantics modeling in joint entity and relation extraction. Precisely, we use ontology and entitytype descriptions as domain knowledge sources, which are encoded and incorporated into thedownstream entity and relation extraction model to improve its understanding of domain-specificinformation. We construct a dataset called SSUIE-RE for Chinese entity and relation extractionin space science and utilization domain of China Manned Space Engineering, which contains awealth of domain-specific knowledge. The experimental results on SSUIE-RE demonstrate theeffectiveness of our method, achieving a 1.4% absolute improvement in relation F1 score overprevious best approach. Introduction”

pdf bib
Document Information Extraction via Global Tagging
He Shaojie | Wang Tianshu | Lu Yaojie | Lin Hongyu | Han Xianpei | Sun Yingfei | Sun Le

“Document Information Extraction (DIE) is a crucial task for extracting key information fromvisually-rich documents. The typical pipeline approach for this task involves Optical Charac-ter Recognition (OCR), serializer, Semantic Entity Recognition (SER), and Relation Extraction(RE) modules. However, this pipeline presents significant challenges in real-world scenariosdue to issues such as unnatural text order and error propagation between different modules. Toaddress these challenges, we propose a novel tagging-based method – Global TaggeR (GTR),which converts the original sequence labeling task into a token relation classification task. Thisapproach globally links discontinuous semantic entities in complex layouts, and jointly extractsentities and relations from documents. In addition, we design a joint training loss and a jointdecoding strategy for SER and RE tasks based on GTR. Our experiments on multiple datasetsdemonstrate that GTR not only mitigates the issue of text in the wrong order but also improvesRE performance. Introduction”

pdf bib
A Distantly-Supervised Relation Extraction Method Based on Selective Gate and Noise Correction
Chen Zhuowei | Tian Yujia | Wang Lianxi | Jiang Shengyi

“Entity relation extraction, as a core task of information extraction, aims to predict the relation ofentity pairs identified by text, and its research results are applied to various fields. To addressthe problem that current distantly supervised relation extraction (DSRE) methods based on large-scale corpus annotation generate a large amount of noisy data, a DSRE method that incorporatesselective gate and noise correction framework is proposed. The selective gate is used to reason-ably select the sentence features in the sentence bag, while the noise correction is used to correctthe labels of small classes of samples that are misclassified into large classes during the modeltraining process, to reduce the negative impact of noisy data on relation extraction. The resultson the English datasets clearly demonstrate that our proposed method outperforms other base-line models. Moreover, the experimental results on the Chinese dataset indicate that our methodsurpasses other models, providing further evidence that our proposed method is both robust andeffective.”

pdf bib
Improving Cascade Decoding with Syntax-aware Aggregator and Contrastive Learning for Event Extraction
Sheng Zeyu | Liang Yuanyuan | Lan Yunshi

“Cascade decoding framework has shown superior performance on event extraction tasks. How-ever, it treats a sentence as a sequence and neglects the potential benefits of the syntactic struc-ture of sentences. In this paper, we improve cascade decoding with a novel module and a self-supervised task. Specifically, we propose a syntax-aware aggregator module to model the syntaxof a sentence based on cascade decoding framework such that it captures event dependencies aswell as syntactic information. Moreover, we design a type discrimination task to learn better syn-tactic representations of different event types, which could further boost the performance of eventextraction. Experimental results on two widely used event extraction datasets demonstrate thatour method could improve the original cascade decoding framework by up to 2.2% percentagepoints of F1 score and outperform a number of competitive baseline methods. Introduction”

pdf bib
Learnable Conjunction Enhanced Model for Chinese Sentiment Analysis
Zhao Bingfei | Zan Hongying | Wang Jiajia | Han Yingjie

“Sentiment analysis is a crucial text classification task that aims to extract, process, and analyzeopinions, sentiments, and subjectivity within texts. In current research on Chinese text, sentenceand aspect-based sentiment analysis is mainly tackled through well-designed models. However,despite the importance of word order and function words as essential means of semantic ex-pression in Chinese, they are often underutilized. This paper presents a new Chinese sentimentanalysis method that utilizes a Learnable Conjunctions Enhanced Model (LCEM). The LCEMadjusts the general structure of the pre-trained language model and incorporates conjunctionslocation information into the model’s fine-tuning process. Additionally, we discuss a variantstructure of residual connections to construct a residual structure that can learn critical informa-tion in the text and optimize it during training. We perform experiments on the public datasetsand demonstrate that our approach enhances performance on both sentence and aspect-basedsentiment analysis datasets compared to the baseline pre-trained language models. These resultsconfirm the effectiveness of our proposed method. Introduction”

pdf bib
Improving Affective Event Classification with Multi-Perspective Knowledge Injection
Yi Wenjia | Zhao Yanyan | Yuan Jianhua | Zhao Weixiang | Qin Bing

“In recent years, many researchers have recognized the importance of associating events withsentiments. Previous approaches focus on generalizing events and extracting sentimental in-formation from a large-scale corpus. However, since context is absent and sentiment is oftenimplicit in the event, these methods are limited in comprehending the semantics of the eventand capturing effective sentimental clues. In this work, we propose a novel Multi-perspectiveKnowledge-injected Interaction Network (MKIN) to fully understand the event and accuratelypredict its sentiment by injecting multi-perspective knowledge. Specifically, we leverage con-texts to provide sufficient semantic information and perform context modeling to capture thesemantic relationships between events and contexts. Moreover, we also introduce human emo-tional feedback and sentiment-related concepts to provide explicit sentimental clues from theperspective of human emotional state and word meaning, filling the reasoning gap in the senti-ment prediction process. Experimental results on the gold standard dataset show that our modelachieves better performance over the baseline models.”

pdf bib
Enhancing Implicit Sentiment Learning via the Incorporation of Part-of-Speech for Aspect-based Sentiment Analysis
Wang Junlang | Li Xia | He Junyi | Zheng Yongqiang | Ma Junteng

“Implicit sentiment modeling in aspect-based sentiment analysis is a challenging problem due tocomplex expressions and the lack of opinion words in sentences. Recent efforts focusing onimplicit sentiment in ABSA mostly leverage the dependency between aspects and pretrain onextra annotated corpora. We argue that linguistic knowledge can be incorporated into the modelto better learn implicit sentiment knowledge. In this paper, we propose a PLM-based, linguis-tically enhanced framework by incorporating Part-of-Speech (POS) for aspect-based sentimentanalysis. Specifically, we design an input template for PLMs that focuses on both aspect-relatedcontextualized features and POS-based linguistic features. By aligning with the representationsof the tokens and their POS sequences, the introduced knowledge is expected to guide the modelin learning implicit sentiment by capturing sentiment-related information. Moreover, we alsodesign an aspect-specific self-supervised contrastive learning strategy to optimize aspect-basedcontextualized representation construction and assist PLMs in concentrating on target aspects. Experimental results on public benchmarks show that our model can achieve competitive andstate-of-the-art performance without introducing extra annotated corpora.”

pdf bib
Case Retrieval for Legal Judgment Prediction in Legal Artificial Intelligence
Zhang Han | Dou Zhicheng

“Legal judgment prediction (LJP) is a basic task in legal artificial intelligence. It consists ofthree subtasks, which are relevant law article prediction, charge prediction and term of penaltyprediction, and gives the judgment results to assist the work of judges. In recent years, many deeplearning methods have emerged to improve the performance of the legal judgment prediction task. The previous methods mainly improve the performance by integrating law articles and the factdescription of a legal case. However, they rarely consider that the judges usually look up historicalcases before making a judgment in the actual scenario. To simulate this scenario, we propose ahistorical case retrieval framework for the legal judgment prediction task. Specifically, we selectsome historical cases which include all categories from the training dataset. Then, we retrieve themost similar Top-k historical cases of the current legal case and use the vector representation ofthese Top-k historical cases to help predict the judgment results. On two real-world legal datasets,our model achieves better results than several state-of-the-art baseline models.”

pdf bib
SentBench: Comprehensive Evaluation of Self-Supervised Sentence Representation with Benchmark Construction
Liu Xiaoming | Lin Hongyu | Han Xianpei | Sun Le

“Self-supervised learning has been widely used to learn effective sentence representations. Previ-ous evaluation of sentence representations mainly focuses on the limited combination of tasks andparadigms while failing to evaluate their effectiveness in a wider range of application scenarios. Such divergences prevent us from understanding the limitations of current sentence representa-tions, as well as the connections between learning approaches and downstream applications. Inthis paper, we propose SentBench, a new comprehensive benchmark to evaluate sentence repre-sentations. SentBench covers 12 kinds of tasks and evaluates sentence representations with threetypes of different downstream application paradigms. Based on SentBench, we re-evaluate sev-eral frequently used self-supervised sentence representation learning approaches. Experimentsshow that SentBench can effectively evaluate sentence representations from multiple perspec-tives, and the performance on SentBench leads to some novel findings which enlighten futureresearches.”

pdf bib
Adversarial Network with External Knowledge for Zero-Shot Stance Detection
Wang Chunling | Zhang Yijia | Yu Xingyu | Liu Guantong | Chen Fei | Lin Hongfei

“Zero-shot stance detection intends to detect previously unseen targets’ stances in the testingphase. However, achieving this goal can be difficult, as it requires minimizing the domain trans-fer between different targets, and improving the model’s inference and generalization abilities. To address this challenge, we propose an adversarial network with external knowledge (ANEK)model. Specifically, we adopt adversarial learning based on pre-trained models to learn transfer-able knowledge from the source targets, thereby enabling the model to generalize well to unseentargets. Additionally, we incorporate sentiment information and common sense knowledge intothe contextual representation to further enhance the model’s understanding. Experimental re-sults on several datasets reveal that our method achieves excellent performance, demonstratingits validity and feasibility.”

pdf bib
The Contextualized Representation of Collocation
Liu Daohuan | Tang Xuri

“Collocate list and collocation network are two widely used representation methods of colloca-tions, but they have significant weaknesses in representing contextual information. To solve thisproblem, we propose a new representation method, namely the contextualized representation ofcollocate (CRC), which highlights the importance of the position of the collocates and pins acollocate as the interaction of two dimensions: association strength and co-occurrence position. With a full image of all the collocates surrounding the node word, CRC carries the contextualinformation and makes the representation more informative and intuitive. Through three casestudies, i.e., synonym distinction, image analysis, and efficiency in lexical use, we demonstratethe advantages of CRC in practical applications. CRC is also a new quantitative tool to measurelexical usage pattern similarities for corpus-based research. It can provide a new representationframework for language researchers and learners.”

pdf bib
Training NLI Models Through Universal Adversarial Attack
Lin Jieyu | Liu Wei | Zou Jiajie | Ding Nai

“Pre-trained language models are sensitive to adversarial attacks, and recent works have demon-strated universal adversarial attacks that can apply input-agnostic perturbations to mislead mod-els. Here, we demonstrate that universal adversarial attacks can also be used to harden NLPmodels. Based on NLI task, we propose a simple universal adversarial attack that can misleadmodels to produce the same output for all premises by replacing the original hypothesis with anirrelevant string of words. To defend against this attack, we propose Training with UNiversalAdversarial Samples (TUNAS), which iteratively generates universal adversarial samples andutilizes them for fine-tuning. The method is tested on two datasets, i.e., MNLI and SNLI. It isdemonstrated that, TUNAS can reduce the mean success rate of the universal adversarial attackfrom above 79% to below 5%, while maintaining similar performance on the original datasets. Furthermore, TUNAS models are also more robust to the attack targeting at individual samples:When search for hypotheses that are best entailed by a premise, the hypotheses found by TUNASmodels are more compatible with the premise than those found by baseline models. In sum, weuse universal adversarial attack to yield more robust models. Introduction”

pdf bib
MCLS: A Large-Scale Multimodal Cross-Lingual Summarization Dataset
Shi Xiaorui

“Multimodal summarization which aims to generate summaries with multimodal inputs, e.g., textand visual features, has attracted much attention in the research community. However, previousstudies only focus on monolingual multimodal summarization and neglect the non-native readerto understand the cross-lingual news in practical applications. It inspires us to present a newtask, named Multimodal Cross-Lingual Summarization for news (MCLS), which generates cross-lingual summaries from multi-source information. To this end, we present a large-scale multimodalcross-lingual summarization dataset, which consists of 1.1 million article-summary pairs with 3.4million images in 44 * 43 language pairs. To generate a summary in any language, we propose aunified framework that jointly trains the multimodal monolingual and cross-lingual summarizationtasks, where a bi-directional knowledge distillation approach is designed to transfer knowledgebetween both tasks. Extensive experiments on many-to-many settings show the effectiveness ofthe proposed model.”

pdf bib
CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection
Wei Congcong | Feng Zhenbing | Huang Shutan | Li Wei | Shao Yanqiu

“Event detection (ED) is a crucial area of natural language processing that automates the extrac-tion of specific event types from large-scale text, and studying historical ED in classical Chinesetexts helps preserve and inherit historical and cultural heritage by extracting valuable informa-tion. However, classical Chinese language characteristics, such as ambiguous word classes andcomplex semantics, have posed challenges and led to a lack of datasets and limited research onevent schema construction. In addition, large-scale datasets in English and modern Chinese arenot directly applicable to historical ED in classical Chinese. To address these issues, we con-structed a logical event schema for classical Chinese historical texts and annotated the resultingdataset, which is called classical Chinese Historical Event Dataset (CHED). The main challengesin our work on classical Chinese historical ED are accurately identifying and classifying eventswithin cultural and linguistic contexts and addressing ambiguity resulting from multiple mean-ings of words in historical texts. Therefore, we have developed a set of annotation guidelinesand provided annotators with an objective reference translation. The average Kappa coefficientafter multiple cross-validation is 68.49%, indicating high quality and consistency. We conductedvarious tasks and comparative experiments on established baseline models for historical ED inclassical Chinese. The results showed that BERT+CRF had the best performance on sequencelabeling task, with an f1-score of 76.10%, indicating potential for further improvement. 1Introduction”

pdf bib
Revisiting k-NN for Fine-tuning Pre-trained Language Models
Li Lei | Chen Jing | Tian Botzhong | Zhang Ningyu

“Pre-trained Language Models (PLMs), as parametric-based eager learners, have become thede-facto choice for current paradigms of Natural Language Processing (NLP). In contrast, k-Nearest-Neighbor (k-NN) classifiers, as the lazy learning paradigm, tend to mitigate over-fittingand isolated noise. In this paper, we revisit k-NN classifiers for augmenting the PLMs-based clas-sifiers. From the methodological level, we propose to adopt k-NN with textual representationsof PLMs in two steps: (1) Utilize k-NN as prior knowledge to calibrate the training process.(2) Linearly interpolate the probability distribution predicted by k-NN with that of the PLMs’classifier. At the heart of our approach is the implementation of k-NN-calibrated training, whichtreats predicted results as indicators for easy versus hard examples during the training process. From the perspective of the diversity of application scenarios, we conduct extensive experimentson fine-tuning, prompt-tuning paradigms and zero-shot, few-shot and fully-supervised settings,respectively, across eight diverse end-tasks. We hope our exploration will encourage the commu-nity to revisit the power of classical methods for efficient NLP1.”

pdf bib
Adder Encoder for Pre-trained Language Model
Ding Jianbang | Zhang Suiyun | Li Linlin

“BERT, a pre-trained language model entirely based on attention, has proven to be highly per-formant for many natural language understanding tasks. However, pre-trained language mod-els (PLMs) are often computationally expensive and can hardly be implemented with limitedresources. To reduce energy burden, we introduce adder operations into the Transformer en-coder and propose a novel AdderBERT with powerful representation capability. Moreover, weadopt mapping-based distillation to further improve its energy efficiency with an assured perfor-mance. Empirical results demonstrate that AddderBERT6 achieves highly competitive perfor-mance against that of its teacher BERTBASE on the GLUE benchmark while obtaining a 4.9xreduction in energy consumption.”

pdf bib
FinBART: A Pre-trained Seq2seq Language Model for Chinese Financial Tasks
Dong Hongyuan | Che Wanxiang | He Xiaoyu | Zheng Guidong | Wen Junjie

“Pretrained language models are making a more profound impact on our lives than ever before. They exhibit promising performance on a variety of general domain Natural Language Process-ing (NLP) tasks. However, few work focuses on Chinese financial NLP tasks, which comprisea significant portion of social communication. To this end, we propose FinBART, a pretrainedseq2seq language model for Chinese financial communication tasks. Experiments show thatFinBART outperforms baseline models on a series of downstream tasks including text classifica-tion, sequence labeling and text generation. We further pretrain the model on customer servicecorpora, and results show that our model outperforms baseline models and achieves promisingperformance on various real world customer service text mining tasks.”

pdf bib
Exploring Accurate and Generic Simile Knowledge from Pre-trained Language Models
Zhou Shuhan | Ma Longxuan | Shao Yanqiu

“A simile is an important linguistic phenomenon in daily communication and an important taskin natural language processing (NLP). In recent years, pre-trained language models (PLMs) haveachieved great success in NLP since they learn generic knowledge from a large corpus. However,PLMs still have hallucination problems that they could generate unrealistic or context-unrelatedinformation.In this paper, we aim to explore more accurate simile knowledge from PLMs.To this end, we first fine-tune a single model to perform three main simile tasks (recognition,interpretation, and generation). In this way, the model gains a better understanding of the simileknowledge. However, this understanding may be limited by the distribution of the training data. To explore more generic simile knowledge from PLMs, we further add semantic dependencyfeatures in three tasks. The semantic dependency feature serves as a global signal and helpsthe model learn simile knowledge that can be applied to unseen domains. We test with seenand unseen domains after training. Automatic evaluations demonstrate that our method helps thePLMs to explore more accurate and generic simile knowledge for downstream tasks. Our methodof exploring more accurate knowledge is not only useful for simile study but also useful for otherNLP tasks leveraging knowledge from PLMs. Our code and data will be released on GitHub.”

up

pdf (full)
bib (full)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum)

pdf bib
基座模型训练中的数据与模型架构(Data and Model Architecture in Base Model Training)
Hang Yan (航 颜) | Yang Gao (扬 高) | Chaoye Fei (朝烨 费) | Xiaopeng Yang (小珪 杨) | Xipeng Qiu (锡鹏 邱)

“ChatGPT以对话形式的交互方式,降低了使用大模型的门槛,因此迅速在全球范围内流行起来。尽管OpenAI并未公开ChatGPT的技术路线,但一些后续的工作宣称已经在开源的基座模型上复现了ChatGPT的性能。然而,尽管这些模型在某些评测上表现出与ChatGPT相似的性能,但在实际的知识量和推理能力上,它们仍然不如ChatGPT。为了更接近ChatGPT甚至GPT4的性能,我们需要对基座模型的训练进行更深入的研究。本文针对基座模型训练的数据以及模型架构进行讨论,首先总结了当前预训练数据的来源以及基本处理流程,并针对目前关注较少的代码预训练数据和中文预训练数据进行了分析;然后对当前已有基座模型的网络架构进行了回顾,并针对这些架构调整背后的动机进行了阐述。”

pdf bib
Unleashing the Power of Large Models: Exploring Human-Machine Conversations
Liu Yuhan | Chen Xiuying | Yan Rui

“In recent years, large language models (LLMs) have garnered significant attention across variousdomains, resulting in profound impacts. In this paper, we aim to explore the potential of LLMsin the field of human-machine conversations. It begins by examining the rise and milestonesof these models, tracing their origins from neural language models to the transformative impactof the Transformer architecture on conversation processing. Next, we discuss the emergence oflarge pre-training models and their utilization of contextual knowledge at a large scale, as wellas the scaling to billion-parameter models that push the boundaries of language generation. Wefurther highlight advancements in multi-modal conversations, showcasing how LLMs bridge thegap between language and vision. We also introduce various applications in human-machine con-versations, such as intelligent assistant-style dialogues and emotionally supportive conversations,supported by successful case studies in diverse fields. Lastly, we explore the challenges facedby LLMs in this context and provide insights into future development directions and prospects. Overall, we offer a comprehensive overview of the potential and future development of LLMs inhuman-machine conversations, encompassing their milestones, applications, and the challengesahead.”

pdf bib
机器翻译和大语言模型研究进展(Research Development of Machine translation and Large Language Model)
Wenhao Zhu (文昊 朱) | Hao Zhou (昊 周) | Changjiang Gao (长江 高) | Sizhe Liu (斯哲 刘) | Shujian Huang (书剑 黄)

“机器翻译旨在通过计算机自动将一种自然语言翻译成另一种自然语言,这个过程对于机器翻译模型的语言理解、语言生成能力有着极高的要求。因此机器翻译一直以来都是一项极具研究价值和研究难度的自然语言处理任务。近期研究表明,大语言模型能够根据人类指令完成包括翻译在内的许多任务,在这一过程中展现出强大的语言理解和生成能力,为自然语言处理范式革新提供了新的可能。为了在大语言模型支持下更好地完成机器翻译任务,研究人员对大语言模型的机器翻译和多语言能力进行了大量的研究和分析。本文从以下三方面介绍相关研究热点和最新进展,包括:大语言模型翻译能力评估、大语言模型翻译能力激发、大语言模型在不同语言上的能力展现。”

pdf bib
A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks
Ni Xuanfan | Li Piji

“Recent efforts have evaluated large language models (LLMs) in areas such as com-monsense reasoning, mathematical reasoning, and code generation. However, to thebest of our knowledge, no work has specifically investigated the performance of LLMsin natural language generation (NLG) tasks, a pivotal criterion for determining modelexcellence. Thus, this paper conducts a comprehensive evaluation of well-known andhigh-performing LLMs, namely ChatGPT, ChatGLM, T5-based models, LLaMA-basedmodels, and Pythia-based models, in the context of NLG tasks. We select English andChinese datasets encompassing Dialogue Generation and Text Summarization. More-over, we propose a common evaluation setting that incorporates input templates andpost-processing strategies. Our study reports both automatic results, accompanied by adetailed analysis.”

pdf bib
生成式信息检索前沿进展与挑战(Challenges and Advances in Generative Information Retrieval)
Yixing Fan (意兴 范) | Yubao Tang (钰葆 唐) | Jiangui Chen (建贵 陈) | Ruqing Zhang (儒清 张) | Jiafeng Guo (嘉丰 郭)

“信息检索(Information Retrieval, IR)旨在从大规模的语料集合中找到与用户查询相关的信息,已经成为人们解决日常工作和生活中问题的最重要工具之一。现有的IR系统主要依赖于“索引-召回-重排”的框架,将复杂的检索任务建模成多阶段耦合的搜索过程。这种解耦建模的方式,一方面提升了系统检索的效率,使得检索系统能够轻松应对数十亿的语料集合;另一方面也加重了系统架构的复杂性,无法实现端到端联合优化。为了应对这个问题,近年来研究人员开始探索利用一个统一的模型建模整个搜索过程,并提出了新的生成式信息检索范式,这种新的范式将整个语料集合编码到检索模型中,可以实现端到端优化,消除了检索系统对于外部索引的依赖。当前,生成式检索已经成为坉坒领域热门研究方向之一,研究人员提出了不同的方案来提升检索的效果,考虑到这个方向的快速进展,本文将对生成式信息检索进行系统的综述,包括基础概念,文档标识符和模型容量。此外,我们还讨论了一些未解决的挑战以及有前景的研究方向,希望能激发和促进更多关于这些主题的未来研究。”

pdf bib
大模型与知识图谱(Large Language Models and Knowledge Graphs)
Yubo Chen (玉博 陈) | Shaoru Guo (少茹 郭) | Kang Liu (康 刘) | Jun Zhao (军 赵)

“知识图谱作为一种重要的知识组织形式,常被视为下一代人工智能技术的基础设施之一,引起了工业界和学术界的广泛关注。传统知识图谱表示方法主要使用符号显式地描述概念及其之间的结构关系,具有语义清晰和可解释性好等特点,但其知识类型有限,难以应对开放域应用场景。随着大规模预训练语言模型(大模型)的发展,将参数化的大模型视为知识图谱成为研究热点。在这一背景下,本文聚焦于大模型在知识图谱生命周期中的研究,总结分析了大模型在知识建模、知识获取、知识融合、知识管理、知识推理和知识应用等环节中的研究进展。最后,对大模型与知识图谱未来发展趋势予以展望。”

pdf bib
大语言模型对齐:概念、挑战、路线、评测及趋势(Large Language Model Alignment: Concepts, Challenges, Roadmaps, Evaluations and Trends)
Xiong Deyi (德意 熊)

通用智能的”智能-目标”正交性及”工具性趋同”论点均要求通用智能的发展要智善结合。目前大语言模型在能力(智)方面发展迅速,但在更具挑战性的价值对齐(善)方面研究相对滞后。本综述将概述对齐的基本概念和必要性,简述其存在的社会和技术挑战,分析大语言模型对齐的主要技术路线和方法,探讨如何对大语言模型对齐进行评测,并对未来趋势进行展望。”

pdf bib
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
Zhuang Ziyu | Chen Qiguang | Ma Longxuan | Li Mingda | Han Yi | Qian Yushan | Bai Haopeng | Zhang Weinan | Ting Liu

“From pre-trained language model (PLM) to large language model (LLM), the field of naturallanguage processing (NLP) has witnessed steep performance gains and wide practical uses. Theevaluation of a research field guides its direction of improvement. However, LLMs are extremelyhard to thoroughly evaluate for two reasons. First of all, traditional NLP tasks become inade-quate due to the excellent performance of LLM. Secondly, existing evaluation tasks are difficultto keep up with the wide range of applications in real-world scenarios. To tackle these problems,existing works proposed various benchmarks to better evaluate LLMs. To clarify the numerousevaluation tasks in both academia and industry, we investigate multiple papers concerning LLMevaluations. We summarize 4 core competencies of LLM, including reasoning, knowledge, relia-bility, and safety. For every competency, we introduce its definition, corresponding benchmarks,and metrics. Under this competency architecture, similar tasks are combined to reflect corre-sponding ability, while new tasks can also be easily added into the system. Finally, we give oursuggestions on the future direction of LLM’s evaluation.”

pdf bib
Frontier Review of Multimodal AI
Duan Nan

“Pre-training techniques have enabled foundation models (such as BERT, T5, GPT) to achieveremarkable success in natural language processing (NLP) and multimodal tasks that involve text,audio and visual contents. Some of the latest multimodal generative models, such as DALL·Eand Stable Diffusion, can synthesize novel visual content from text or video inputs, which greatlyenhances the creativity and productivity of content creators. However, multimodal AI also facessome challenges, such as adding new modalities or handling diverse tasks that require signalsbeyond their understanding. Therefore, a new trend in multimodal AI is to build a compositionalAI system that connects existing foundation models with external modules and tools. This way,the system can perform more varied tasks by leveraging different modalities and signals.Inthis paper, we will give a brief overview of the state-of-the-art multimodal AI techniques and thedirection of building compositional AI systems. We will also discuss the potential future researchtopics in multimodal AI.”

up

pdf (full)
bib (full)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

pdf bib
CCL23-Eval 任务1系统报告:基于信息论约束及篇章信息的古籍命名实体识别(System Report for CCL23-Eval Task 1: Information Theory Constraint and Paragraph based Paragraph Classical Named Entity Recognition)
Xinghua Zhang (张兴华) | Tianjun Liu (刘天昀) | Wenyuan Zhang (张文源) | Tingwen Liu (柳厅文)

“命名实体识别旨在自动识别出文本中具有特定意义的实体(例如,人名、地名),古籍文献中的命名实体识别通过识别人名、书籍、官职等实体,为深度挖掘、组织古汉语人文知识提供重要支撑。现有的中文命名实体识别方法主要聚焦在现代文,但古籍中的实体识别具有更大的挑战,表现在实体的歧义性和边界模糊性两方面。由于古籍行文简练,单字表达加剧了实体的歧义性问题,句读及分词断句难度的提升使实体边界的识别更具挑战性。为有效处理上述问题,本文提出一种基于信息论及篇章信息的古籍命名实体识别方法。通过检索古籍文本的来源信息融入篇章先验知识,并在同一篇章的古籍文本上采取滑动窗口采样增强,以引入篇章背景信息,有效缓解实体歧义性问题。此外,在信息论视角下,约束实体的上下文信息及实体本身特征的编码,最大程度保留泛化特征,去除冗余信息,缓解实体边界模糊的问题,在词义复杂多样、句读困难的古文典籍中提升命名实体识别性能。最终,在token-wise和span-level感知的命名实体识别基础框架下,本文的方法取得了最优的评测性能。”

pdf bib
CCL23-Eval 任务1系统报告:基于持续预训练方法与上下文增强策略的古籍命名实体识别(System Report for CCL23-Eval Task 1:Named Entity Recognition for Ancient Books based on Continual Pre-training Method and Context Augmentation Strategy)
Shiquan Wang (士权王,) | Lingling Shi (石玲玲) | Luwen Pu (蒲璐汶) | Ruiyu Fang (方瑞玉) | Yu Zhao (宇赵,) | Shuangyong Song (宋双永)

“本文描述了队伍“翼智团”在CCL23古籍命名实体识别评测中提交的参赛系统。该任务旨在自动识别出古籍文本中人名、书名、官职名等事件基本构成要素的重要实体,并根据使用模型参数是否大于10b分为开放赛道和封闭赛道。该任务中,我们首先利用古籍相关的领域数据和任务数据对开源预训练模型进行持续预训练和微调,显著提升了基座模型在古籍命名实体识别任务上的性能表现。其次提出了一种基于pair-wise投票的不置信实体筛选算法用来得到候选实体,并对候选实体利用上下文增强策略进行实体识别修正。在最终的评估中,我们的系统在封闭赛道中排名第二,F1得分为95.8727。”

pdf bib
CCL23-Eval 任务1系统报告:基于增量预训练与对抗学习的古籍命名实体识别(System Report for CCL23-Eval Task 1:::GuNER Based on Incremental Pretraining and Adversarial Learning)
Jianlong Li (剑龙李,) | Youren Yu (于右任) | Xueyang Liu (刘雪阳) | Siwen Zhu (朱思文)

“古籍命名实体识别是正确分析处理古汉语文本的基础步骤,也是深度挖掘、组织人文知识的重要前提。古汉语信息熵高、艰涩难懂,因此该领域技术研究进展缓慢。针对现有实体识别模型抗干扰能力差、实体边界识别不准确的问题,本文提出使用NEZHA-TCN与全局指针相结合的方式进行古籍命名实体识别。同时构建了一套古文数据集,该数据集包含正史中各种古籍文本,共87M,397,995条文本,用于NEZHA-TCN模型的增量预训练。在模型训练过程中,为了增强模型的抗干扰能力,引入快速梯度法对词嵌入层添加干扰。实验结果表明,本文提出的方法能够有效挖掘潜藏在古籍文本中的实体信息,F1值为95.34%。”

pdf bib
CCL23-Eval任务1总结报告:古籍命名实体识别(GuNER2023)(Overview of CCL23-Eval Task 1: Named Entity Recognition in Ancient Chinese Books)
Qi Su (祺苏,) | Yingying Wang (王莹莹) | Zekun Deng (邓泽琨) | Hao Yang (杨浩) | Jun Wang (王军)

“第23届中国计算语言学大会(CCL)提出了中文信息处理方面的10个评测任务。其中,任务1为古籍命名实体识别评测,由北京大学数字人文研究中心、北京大学人工智能研究院组织。该任务的主要目标是自动识别古籍文本中事件基本构成要素的重要实体,以提供对古汉语文本进行分析处理的基础。评测发布了覆盖多个朝代和领域的”二十四史”评测数据集,共15万余字,包含人名、书名、官职名三种实体超万数。同时设置了封闭和开放两个赛道,聚焦于不同规格的预训练模型的应用能力。共有127支队伍报名参加了该评测任务。在封闭赛道上,参赛系统在测试集上的最佳性能达到了96.15%的F1值;在开放赛道上,最佳性能达到了95.48%的F1值。”

pdf bib
CCL23-Eval 任务2系统报告:基于图融合的自回归和非自回归中文AMR语义分析(System Report for CCL23-Eval Task 2: Autoregressive and Non-autoregressive Chinese AMR Semantic Parsing based on Graph Ensembling)
Yanggan Gu (辜仰淦) | Shilin Zhou (周仕林) | Zhenghua Li (李正华)

“本文介绍了我们在第二十二届中国计算语言学大会中文抽象语义表示解析评测中提交的参赛系统。抽象语义表示(Abstract Meaning Representation,AMR)以有向无环图的形式表示一个句子的语义。本次评测任务针对中文抽象语义表示(Chinese AMR,CAMR),参赛系统不仅需要对常规的AMR图解析预测,还需要预测CAMR数据特有的概念节点对齐、虚词关系对齐、概念同指。我们同时使用多个自回归模型和多个非自回归模型,然后基于图融合的方法将多个模型输出结果融合起来。最终,我们在两个赛道共六个测试集上取得了五项第一名,一项第二名。”

pdf bib
CCL23-Eval 任务2系统报告:WestlakeNLP,基于生成式大语言模型的中文抽象语义表示解析(System Report for CCL23-Eval Task 2: WestlakeNLP, Investigating Generative Large Language Models for Chinese AMR Parsing)
Wenyang Gao (高文炀) | Xuefeng Bai (白雪峰) | Yue Zhang (张岳)

“本文介绍了我们在第二十二届中文计算语言学大会中文抽象语义表示解析评测任务中提交的参赛系统。中文抽象语义表示(Chinese Abstract Meaning Representa-tion,CAMR)不仅以图的方式表示句子的语义,还保证了概念对齐和关系对齐。近期,生成式大规模语言模型在诸多自然语言处理任务上展现了优秀的生成能力和泛化能力。受此启发,我们选择微调Baichuan-7B模型来以端到端的形式从文本直接生成序列化的CAMR。实验结果表明,我们的系统能够在不依赖于词性、依存句法信息以及复杂规则的前提下取得了同现有方法可比的性能。”

pdf bib
Overview of CCL23-Eval Task 2: The Third Chinese Abstract Meaning Representation Parsing Evaluation
Zhixing Xu | Yixuan Zhang | Bin Li | Zhou Junsheng | Weiguang Qu

“Abstract Meaning Representation has emerged as a prominent area of research in sentence-levelsemantic parsing within the field of natural language processing in recent years. Substantialprogress has been made in various NLP subtasks through the application of AMR. This paperpresents the third Chinese Abstract Meaning Representation Parsing Evaluation, held as part ofthe Technical Evaluation Task Workshop at the 22nd Chinese Computational Linguistics Confer-ence. The evaluation was specifically tailored for the Chinese and utilized the Align-smatch met-ric as the standard evaluation criterion. Building upon high-quality semantic annotation schemesand annotated corpora, this evaluation introduced a new test set comprising interrogative sen-tences for comprehensive evaluation. The results of the evaluation, as measured by the F-score,indicate notable performance achievements. The top-performing team attained a score of 0.8137in the closed test and 0.8261 in the open test, respectively, using the Align-smatch metric. No-tably, the leading result surpassed the SOTA performance at CoNLL 2020 by 3.64 percentagepoints when evaluated using the MRP metric. Further analysis revealed that this significantprogress primarily stemmed from improved relation prediction between concepts. However, thechallenge of effectively utilizing semantic relation alignments remains an area that requires fur-ther enhancement.”

pdf bib
CCL23-Eval 任务3系统报告:苏州大学CFSP系统(System Report for CCL23-Eval Task3: SUDA CFSP System)
Yahui Liu (刘亚慧) | Zhenghua Li (李正华) | Min Zhang (张民)

“本文介绍了我们在第二十二届中国计算语言学大会汉语框架语义解析评测中提交的参赛系统。框架语义解析是自然语言处理领域中重要的任务,其目标是从句子中提取框架语义结构。本次评测任务针对汉语框架语义的三个子任务(框架识别、论元范围识别和论元角色识别)使用不同的端到端框架进行解析,并利用数据增强和投票方法进一步提高预测的精度,最终,在A榜测试集上取得第二名,B榜测试集上取得第三名。”

pdf bib
CCL23-Eval 任务3系统报告:基于旋转式位置编码的实体分类在汉语框架语义解析中的应用(System Report for CCL23-Eval Task 3: Application of Entity Classification Model Based on Rotary Position Embedding in Chiness Frame Semantic Parsing)
Zuoheng Li (李作恒) | Xuanzhi Guo (郭炫志) | Dengjian Qiao (乔登俭) | Fan Wu (吴钒)

“汉语框架语义解析(Chinese Frame Semantic Parsing,CFSP)是中文自然语言处理领域中的一项重要任务,其目标是从句子中提取框架语义结构,实现对句子中涉及到的事件或情境的深层理解。本文主要研究子任务框架识别和论元角色识别,自然语言处理中常用的方法在框架识别和论元角色识别中会丢失目标词与整体句子之间的位置信息关系以及目标词内部信息,对此本文提出基于旋转式位置编码的实体分类模型对实体之间计算注意力然后进行分类,并在天池“CCL2023-Eval 汉语框架语义解析评测”比赛上获得A、B榜第一名的成绩1。”

pdf bib
CCL23-Eval 任务3系统报告:基于多任务pipeline策略的汉语框架语义解析(System Report for CCL23-Eval Task 3: Chinese Frame Semantic Parsing Based on Multi task Pipeline Strategy)
Shutan Huang (黄舒坦) | Qiuyan Shao (邵艳秋) | Wei Li (李炜)

“本论文为2023届CCL汉语框架语义解析评测任务提供了实现方法。针对汉语框架语义解析任务是多任务的特点,考虑到各子任务之间具有较强的时序性和关联性,方法采用了多任务pipeline策略的框架结构,主要由框架分类,论元识别,角色分类三个子模块组成,分别对应框架识别,论元范围识别,论元角色识别三个子任务。本文将框架识别和论元角色识别任务建模为文本分类任务,将论元范围识别任务建模为实体识别任务。考虑到各子任务之间具有较强的时序性和关联性,方法在每个模块均充分考虑了如何利用完成其他子任务时所抽取到的特征和信息。比如在进行角色分类时,利用了框架分类模块识别出的框架类别,以及论元识别模块识别出的论元范围。考虑到目标词及其上下文语境的重要性,本文使用预训练语言模型进行finetune。观察到模型的表现不稳定,训练时使用了对抗训练等策略提升模型性能。最终A榜分数值达到71.91,B榜分数值达到70.60,排名第2,验证了本文方法的有效性。”

pdf bib
CCL23-Eval 任务3总结报告:汉语框架语义解析评测(Overview of CCL23-Eval Task 1:Chinese FrameNet Semantic Parsing)
Juncai Li (李俊材) | Zhichao Yan (闫智超) | Xuefeng Su (苏雪峰) | Boxiang Ma (马博翔) | Peiyuan Yang1 (杨沛渊) | Ru Li (李茹)

“汉语框架语义解析评测任务致力于提升机器模型理解细粒度语义信息的能力。该评测数据集包括20000条标注的框架语义解析例句和近700个框架信息。评测任务分为框架识别、论元范围识别和论元角色识别三个子任务,最终成绩根据这三个任务的得分综合计算。本次评测受到工业界和学术界的广泛关注,共有55支队伍报名参赛,其中12支队伍提交了结果,我们选取5支队伍的模型进行结果复现,最终来自四川的李作恒以71.49的分数排名第一。该任务的更多信息,包括系统提交、评测结果以及数据资源,可从CCL-2023汉语框架语义解析评测任务网址1查看。”

pdf bib
System Report for CCL23-Eval Task 3: UIR-ISC Pre-trained Language Medel for Chinese Frame Semantic Parsing
Yingxuan Guan | Xunyuan Liu | Lu Zhang | Zexian Xie | Binyang Li

“Chinese Frame Semantic Parsing (CFSP) is a semantic parsing task based on Chinese FrameNet(CFN). This paper presents a solution for CCL2023-Eval Task 3. We first attempt various pre-trained models for different sub-tasks. Then, we explore multiple approaches to solving eachtask from the perspectives of feature engineering, model structure, and other tricks. Finally,we provide prospects for the task and propose potential alternative solutions. We conductedextensive comparative experiments to validate the effectiveness of our system. Introduction”

pdf bib
CCL23-Eval任务4系统报告:基于深度学习的空间语义理解(System Report for CCL23-Eval Task4:Spatial Semantic Understanding Based on Deep Learning.)
ChenKun Tan (谭臣坤) | XianNian Hu (胡先念) | XinPeng Qiu (邱锡鹏)

“本文介绍了参赛系统在第三届中文空间语义理解评测(SpaCE2023)采用的技术路线:面向空间语义异常识别任务提出了抽取方法,并结合生成器进一步完成了空间语义角色标注任务,空间场景异同判断任务则使用了大语言模型生成。本文进一步探索了大语言模型在评测数据集上的应用,发现指令设计是未来工作的重点和难点。参赛系统的代码和模型见https://github.com/ShacklesLay/Space2023。”

pdf bib
CCL23-Eval任务4总结报告:第三届中文空间语义理解评测(Overview of CCL23-Eval Task 4:The 3rd Chinese Spatial Cognition Evaluation)
Liming Xiao (肖力铭) | Weidong Zhan (詹卫东) | Zhifang Sui (穗志方) | Yuhang Qin (秦宇航) | Chunhui Sun (孙春晖) | Dan Xing (邢丹) | Nan Li (李楠) | Fangwei Zhu (祝方韦) | Peiyi Wang (王培懿)

“第三届中文空间语义理解评测任务(SpaCE2023)旨在测试机器的空间语义理解能力,包括三个子任务:(1)空间信息异常识别任务;(2)空间语义角色标注任务;(3)空间场景异同判断任务。本届评测在SpaCE2022的基础上,优化了子任务一和子任务二的任务设计,并提出了子任务三这一全新的评测任务。最终有1支队伍提交参赛结果,并且在子任务一上的成绩超过了基线模型。本文还报告了大语言模型ChatGPT在SpaCE2023三个子任务上的表现,结合问题提出指令设计可改进的方向。”

pdf bib
CCL23-Eval 任务5总结报告:跨领域句子级别中文省略消解(Overview of CCL23-Eval Task 5: Sentence Level Multi-domain Chinese Ellipsis Resolution)
Wei Li (李炜) | Qiuyan Shao (邵艳秋) | Jialu Qi (祁佳璐)

“省略是一种会出现在包括中文在内的各种语言中的一种语言现象。虽然人类一般能够正确理解带有省略的文本,但是其对机器在句法、语义等方面的理解却会造成影响。因此自动恢复省略成分对文本自动分析理解具有重要意义。本任务提出一个面向应用的省略恢复任务,旨在恢复在句子句法结构中占据有效位置同时在句子中扮演语义成分的被省略内容。本任务将省略恢复任务划分成两个子任务:省略位置探测和省略内容生成,并分别描述在两个子任务中取得较好结果的基线方法。此外,为了推进对大语言模型的研究,本文还尝试使用场景学习的方法使用ChatGPT来完成本任务,并进行了相关分析。”

pdf bib
CCL23-Eval 任务6系统报告:基于深度学习的电信网络诈骗案件分类(System Report for CCL23-Eval Task 6: Classification of Telecom Internet Fraud Cases Based on Deep Learning)
Chenyang Li (李晨阳) | Long Zhang (张龙) | Zhongjie Zhao (赵中杰) | Hui Guo (郭辉)

“文本分类任务作为自然语言处理领域的基础任务,在面向电信网络诈骗领域的案件分类中扮演着至关重要的角色,对于智能化案件分析具有重大意义和深远影响。本任务的目的是对给定案件描述文本进行分类,案件文本包含对案件的经过脱敏处理后的整体描述。我们首先采用Ernie预训练模型对案件内容进行微调的方法得到每个案件的类别,再使用伪标签和模型融合方法对目前的F1值进行提升,最终在CCL23-Eval任务6电信网络诈骗案件分类评测中取得第二名的成绩,该任务的评价指标F1值为0.8628,达到了较为先进的检测效果。”

pdf bib
CCL23-Eval 任务6系统报告:面向电信网络诈骗案件分类的优化策略(CCL23-Eval Task 6 System Report: Research on Optimization Strategies for Telecom Internet fraud Case Classification)
Junhui Yu (余俊晖) | Zhi Li (李智)

“电信网络诈骗案件的激增给社会带来了巨大的安全威胁,因此准确、高效地分类和检测电信网络诈骗成为了当务之急。本研究旨在针对电信网络诈骗案件分类问题,探索了一系列优化策略,并在“电信网络诈骗案件分类评测”技术评测比赛中最终成绩排名第一。本研究基于文本分类模型,并采用了BERT的继续预训练、FreeLB的对抗训练和模型融合等trick。通过BERT的继续预训练,使模型具备更好的语义理解能力和特征提取能力。而通过FreeLB的对抗训练,增强了模型的鲁棒性,使其能够更好地应对噪声和干扰。此外,本文采用模型融合的方法将多个模型的预测结果进行融合,进一步提高了分类的准确性。实验结果表明,本文的优化策略在比赛中取得了显著的成绩,证明了其在电信网络诈骗案件分类中的有效性和优越性。本研究的成果对于提高电信网络诈骗案件的分类性能具有重要意义,为相关领域的研究和实践提供了有益的参考。”

pdf bib
CCL23-Eval 任务6系统报告:基于CLS动态加权平均和数据增强的电信网络诈骗案件分类(System Report for CCL23-Eval Task 6:::Classification of Telecom Internet Fraud Cases Based on CLS Dynamic Weighted Average and Data Augement)
Tianjun Liu (天昀刘,) | Tianhua Zhang (张兴华) | Mengxiao Song (宋梦潇) | Tingwen Liu (柳厅文)

“电信网络诈骗领域的案件分类作为文本分类的一项落地应用,其目的是为相关案件进行智能化的分析,有助于公安部门掌握诈骗案件的特点,针对性的预防、制止、侦查。本文以此问题为基础,从模型设计、训练过程、数据增强三个方面进行了研究,通过CLS动态加权平均、Multi-Sample Dropout、对抗训练FGM、回译等方法显著提升了模型对诈骗案件描述的分类性能。”

pdf bib
CCL23-Eval 任务6系统报告:基于CLS动态加权平均和数据增强的电信网络诈骗案件分类(System Report for CCL23-Eval Task 6:::Classification of Telecom Internet Fraud Cases Based on CLS Dynamic Weighted Average and Data Augement)
Tianjun Liu (天昀刘,) | Tianhua Zhang (张兴华) | Mengxiao Song (宋梦潇) | Tingwen Liu (柳厅文)

“电信网络诈骗领域的案件分类作为文本分类的一项落地应用,其目的是为相关案件进行智能化的分析,有助于公安部门掌握诈骗案件的特点,针对性的预防、制止、侦查。本文以此问题为基础,从模型设计、训练过程、数据增强三个方面进行了研究,通过CLS动态加权平均、Multi-Sample Dropout、对抗训练FGM、回译等方法显著提升了模型对诈骗案件描述的分类性能。”

pdf bib
CCL23-Eval 任务6系统报告:基于预训练语言模型的双策略分类优化算法(System Report for CCL23-Eval Task 6:Double-strategy classification optimization algorithm based on pre-training language model)
Yongqing Huang (黄永清) | Hailong Yang (杨海龙) | Fu Xuelin (傅薛林)

“诈骗案件分类问题是打击电信网络诈骗犯罪过程中的关键一环,根据不同的诈骗方式、手法等将其分类,通过对不同案件进行有效分类能够便于统计现状,有助于公安部门掌握当前电信网络诈骗案件的分布特点,进而能够对不同类别的诈骗案件作出针对性的预防、监管、制止、侦查等措施。诈骗案件分类属于自然语言处理领域的文本分类任务,传统的基于LSTM和CNN等分类模型能在起到一定的效果,但是由于它们模型结构的参数量的限制,难以达到较为理想的效果。本文基于预训练语言模型Nezha,结合对抗扰动和指数移动平均策略,有助于电信网络诈骗案件分类任务取得更好效果,充分利用电信网络诈骗案件的数据。我们队伍未采用多模型融合的方法,并最终在此次评测任务中排名第三,评测指标分数为0.8625。”

pdf bib
CCL23-Eval 任务6总结报告:电信网络诈骗案件分类(Overview of CCL23-Eval Task 6: Telecom Network Fraud Case Classification)
Chengjie Sun (孙承杰) | Jie Ji (纪杰) | Boyue Shang (尚伯乐) | Binguan Liu (刘秉权)

“近年来,电信网络诈骗形势较为严峻,自动化案件分类有助于打击犯罪。本文介绍了任务相关的分类体系,其次从数据集、任务介绍、比赛结果等方面介绍并展示了本次评测任务的相关信息。本次任务共有60支参赛队伍报名,最终有34支队伍提交结果,其中有15支队伍得分超过 baseline,最高得分为0.8660,高于baseline 1.6%。根据结果分析,大部分队伍均采用了BERT类模型。”

pdf bib
CCL23-Eval任务6系统报告:基于原型监督对比学习和模型融合的电信网络诈骗案件分类(System Report for CCL23-Eval Task 6: Classification of Telecom Network Fraud Cases Based on Prototypical Supervised Contrastive Learning and Model Fusion)
Site Xiong (熊思诗) | Jili Zhang (张吉力) | Yu Zhao (赵宇) | Xinzhang Liu (刘欣璋) | Yongshuang Song (宋双永)

“本文提出了一种基于原型监督对比学习和模型融合的电信网络诈骗案件分类方法。为了增强模型区分易混淆类别的能力,我们采用特征学习与分类器学习并行的双分支神经网络训练框架,并通过领域预训练、模型融合、后置分类等策略优化分类效果。最终,本文方法在CCL2023-FCC评测任务上取得了Macro-F1为0.8601 的成绩。”

pdf bib
System Report for CCL23-Eval Task 6: A Method For Telecom Network Fraud Case Classification Based on Two-stage Training Framework and Within-task Pretraining
Guangyu Zheng | Tingting He | Zhenyu Wang | Haochang Wang

“Domain-specific text classification often needs more external knowledge, and fraud cases havefewer descriptions. Existing methods usually utilize single-stage deep models to extract semanticfeatures, which is less reusable. To tackle this issue, we propose a two-stage training frameworkbased on within-task pretraining and multi-dimensional semantic enhancement for CCL23-EvalTask 6 (Telecom Network Fraud Case Classification, FCC). Our training framework is dividedinto two stages. First, we pre-train using the training corpus to obtain specific BERT. The seman-tic mining ability of the model is enhanced from the feature space perspective by introducing ad-versarial training and multiple random sampling. The pseudo-labeled data is generated throughthe test data above a certain threshold. Second, pseudo-labeled samples are added to the trainingset for semantic enhancement based on the sample space dimension. We utilize the same back-bone for prediction to obtain the results. Experimental results show that our proposed methodoutperforms the single-stage benchmarks and achieves competitive performance with 0.859259F1. It also performs better in the few-shot patent classification task with 65.160% F1, whichindicates robustness.”

pdf bib
CCL23-Eval 任务7赛道一系统报告:基于序列到序列模型的自动化文本纠错系统(System Report for CCL23-Eval Task 7 Track 1: Automated text error correction pipeline based on sequence-to-sequence models)
Shixuan Liu (刘世萱) | Xinzhang Liu (刘欣璋) | Yuyao Huang (黄钰瑶) | Chao Wang (王超) | Yongshuang Song (宋双永)

“本文介绍了本队伍在CCL-2023汉语学习者文本纠错评测大赛赛道一中提交的参赛系统。近年来,大规模的中文预训练模型在各种任务上表现出色,而不同的预训练模型在特定任务上也各有优势。然而,由于汉语学习者文本纠错任务存在语法错误复杂和纠错语料稀缺等特点,因此采用基于序列标记的预训练文本纠错模型来解决问题是自然的选择。我们的团队采用了序列到序列的纠错模型,并采取了两阶段训练策略,设计了一套基于序列到序列文本纠错的pipeline。首先,我们对训练集数据进行了清洗处理;在第一阶段训练中,我们在训练集上使用数据增强技术;在第二阶段,我们利用验证集进行微调,并最终采用多个模型投票集成的方式完成后处理。在实际的系统测评中,我们提交的结果在封闭任务排行榜上超出baseline模型17.01分(40.59->57.6)。”

pdf bib
CCL23-Eval任务7赛道一系统报告:Suda &Alibaba 文本纠错系统(CCL23-Eval Task 7 Track 1 System Report: Suda &Alibaba Team Text Error Correction System)
Haochen Jiang (蒋浩辰) | Yumeng Liu (刘雨萌) | Houquan Zhou (周厚全) | Ziheng Qiao (乔子恒) | Bo Zhang (波章,) | Chen Li (李辰) | Zhenghua Li (李正华) | Min Zhang (张民)

“本报告描述 Suda &Alibaba 纠错团队在 CCL2023 汉语学习者文本纠错评测任务的赛道一:多维度汉语学习者文本纠错(Multidimensional Chinese Learner Text Correc-tion)中提交的参赛系统。在模型方面,本队伍使用了序列到序列和序列到编辑两种纠错模型。在数据方面,本队伍分别使用基于混淆集构造的伪数据、Lang-8 真实数据以及 YACLC 开发集进行三阶段训练;在开放任务上还额外使用HSK、CGED等数据进行训练。本队伍还使用了一系列有效的性能提升技术,包括了基于规则的数据增强,数据清洗,后处理以及模型集成等 .除此之外,本队伍还在如何使用GPT3.5、GPT4等大模型来辅助中文文本纠错上进行了一些探索,提出了一种可以有效避免大模型过纠问题的方法,并尝试了多种 Prompt。在封闭和开放两个任务上,本队伍在最小改动、流利提升和平均 F0.5 得分上均位列第一。”

pdf bib
CCL23-Eval 任务7系统报告:基于序列标注和指针生成网络的语法纠错方法(System Report for CCL23-Eval Task 7:A Syntactic Error Correction Approach Based on Sequence Labeling and Pointer Generation Networks)
Youren Yu (于右任) | Yangsen Zhang (张仰森) | Guanguang Chang (畅冠光) | Beibei Gao (高贝贝) | Yushan Jiang (姜雨杉) | Tuo Xiao (肖拓)

“针对当前大多数中文语法纠错模型存在错误边界识别不准确以及过度纠正的问题,我们提出了一种基于序列标注与指针生成网络的中文语法纠错模型。首先,在数据方面,我们使用了官方提供的lang8数据集和历年的CGED数据集,并对该数据集进行了繁体转简体、数据清洗等操作。其次,在模型方面,我们采用了ERNIE+Global Pointer的序列标注模型、基于ERNIE+CRF的序列标注模型、基于BART+指针生成网络的纠错模型以及基于CECToR的纠错模型。最后,在模型集成方面,我们使用了投票和基于ERNIE模型计算困惑度的方法,来生成最终预测结果。根据测试集的结果,我们的乃乏乍指标达到了48.68,位居第二名。”

pdf bib
CCL23-Eval 任务7总结报告: 汉语学习者文本纠错(Overview of CCL23-Eval Task: Chinese Learner Text Correction)
Hongxiang Chang | Yang Liu | Meng Xu | Yingying Wang | Cunliang Kong | Liner Yang | Yang Erhong | Maosong Sun | Gaoqi Rao | Renfen Hu | Zhenghao Liu | 鸿翔 常 | 洋 刘 | 萌 徐 | 莹莹 王 | 存良 孔 | 麟儿 杨 | 尔弘 杨 | 茂松 孙 | 高琦 饶 | 韧奋 胡 | 正皓 刘

“汉语学习者文本纠错(Chinese Learner Text Correction)评测比赛,是依托于第22届中国计算语言学大会举办的技术评测。针对汉语学习者文本,设置了多维度汉语学习者文本纠错和中文语法错误检测两个赛道。结合人工智能技术的不断进步和发展的时代背景,在两赛道下分别设置开放和封闭任务。开放任务允许使用大模型。以汉语学习者文本多维标注语料库YACLC为基础建设评测数据集,建立基于多参考答案的评价标准,构建基准评测框架,进一步推动汉语学习者文本纠错研究的发展。共38支队伍报名参赛,其中5支队伍成绩优异并提交了技术报告。”

pdf bib
System Report for CCL23-Eval Task 7: Chinese Grammatical Error Diagnosis Based on Model Fusion
Yanmei Ma | Laiqi Wang | Zhenghua Chen | Yanran Zhou | Ya Han | Jie Zhang

“The purpose of the Chinese Grammatical Error Diagnosis task is to identify the positions andtypes of grammar errors in Chinese texts. In Track 2 of CCL2023-CLTC, Chinese grammarerrors are classified into four categories: Redundant Words, Missing Words, Word Selection, andWord Ordering Errors. We conducted data filtering, model research, and model fine-tuning insequence. Then, we performed weighted fusion of models based on perplexity calculations andintroduced various post-processing strategies. As a result, the performance of the model on thetest set, measured by COM, reached 49.12.”

pdf bib
System Report for CCL23-Eval Task 7: THU KELab (sz) - Exploring Data Augmentation and Denoising for Chinese Grammatical Error Correction
Jingheng Ye | Yinghui Li | Haitao Zheng

“This paper explains our GEC system submitted by THU KELab (sz) in the CCL2023-Eval Task7 CLTC (Chinese Learner Text Correction) Track 1: Multidimensional Chinese Learner TextCorrection. Recent studies have demonstrate GEC performance can be improved by increasingthe amount of training data. However, high-quality public GEC data is much less abundant. To address this issue, we propose two data-driven techniques, data augmentation and data de-noising, to improve the GEC performance. Data augmentation creates pseudo data to enhancegeneralization, while data denoising removes noise from the realistic training data. The resultson the official evaluation dataset YACLC demonstrate the effectiveness of our approach. Finally,our GEC system ranked second in both close and open tasks. All of our datasets and codes areavailabel at https://github.com/THUKElab/CCL2023-CLTC-THU_KELab.”

pdf bib
System Report for CCL23-Eval Task 8: Chinese Grammar Error Detection and Correction Using Multi-Granularity Information
Yixuan Wang | Yijun Liu | Bo Sun | Wanxiang Che

“This paper introduces our system at CCL-2023 Task: Chinese Essay Fluency Evaluation (CEFE).The CEFE task aims to study the identification and correction of grammatical errors in primaryand middle school students’ test compositions. The evaluation has three tracks to examine therecognition of wrong sentence types, character-level error correction, and wrong sentence rewrit-ing. According to the task characteristics and data distribution of each track, we propose a token-level discriminative model based on sequence labeling for the multi-label classification task ofwrong sentences, an auto-encoder model based on edited labels for character-level error correc-tion and a seq2seq model obtained by pre-training on pseudo data and fine-tuning on labeleddata to solve the wrong sentence rewriting task. In the final evaluation results, the method weproposed won the first place in all three tracks according to the corresponding evaluation metrics.”

pdf bib
Overview of CCL23-Eval Task 8: Chinese Essay Fluency Evaluation (CEFE) Task
Xinshu Shen | Hongyi Wu | Xiaopeng Bai | Yuanbin Wu | Aimin Zhou | Shaoguang Mao | Tao Ge | Yan Xia

“This paper provides a comprehensive review of the CCL23-Eval Task 8, i.e., Chinese EssayFluency Evaluation (CEFE). The primary aim of this task is to systematically identify the typesof grammatical fine-grained errors that affect the readability and coherence of essays writtenby Chinese primary and secondary school students, and then to suggest suitable corrections toenhance the fluidity of their written expression. This task consists of three distinct tracks: (1)Coarse-grained and fine-grained error identification; (2) Character-level error identification andcorrection; (3) Error sentence rewriting. In the end, we received 44 completed registration forms,leading to a total of 130 submissions from 11 dedicated participating teams. We present theresults of all participants and our analysis of these results. Both the dataset and evaluation toolused in this task are available1.”

pdf bib
CCL23-Eval 任务9系统报告:基于重叠片段生成增强阅读理解模型鲁棒性的方法(System Report for CCL23-Eval Task 9: Improving MRC Robustness with Overlapping Segments Generation for GCRC_advRobust)
Suzhe He (何苏哲) | Chongsheng Yang (杨崇盛) | Shumin Shi (史树敏)

“目前机器阅读理解在抽取语义完整的选项证据时存在诸多挑战。现有通过无监督方式进行证据抽取的工作主要分为两类,一是利用静态词向量,采用集束搜索迭代地提取相关句子;另一类是使用实例级监督方法,包括独立式证据抽取和端到端式证据抽取。前者处理流程上较为繁琐,后者在联合训练时存在不稳定性,直接导致模型性能难以稳定提升。在CCL23-Eval 任务9中,本文提出了一种基于重叠片段生成的自适应端到端证据抽取方法。该方法针对证据句边界不明确的问题,通过将文档划分为多个重叠的句子片段,并提取关键部分作为证据来实现整体语义的抽取。同时,将证据提取嵌入模块予以优化,实现了证据片段置信度自动调整。实验结果表明本文所提出方法能够极大地排除冗余内容干扰,仅需一个超参数即可稳定提升阅读理解模型性能,增强了模型鲁棒性。”

pdf bib
CCL23-Eval 任务9总结报告:汉语高考阅读理解对抗鲁棒评测 (Overview of CCL23-Eval Task 9: Adversarial Robustness Evaluation for Chinese Gaokao Reading Comprehension)
Yaxin Guo (郭亚鑫) | Guohang Yan (闫国航) | Hongye Tan (谭红叶) | Ru Li (李茹)

“汉语高考阅读理解对抗鲁棒评测任务致力于提升机器阅读理解模型在复杂、真实对抗环境下的鲁棒性。本次任务设计了四种对抗攻击策略(关键词扰动、推理逻辑扰动、时空属性扰动、因果关系扰动),构建了对抗鲁棒子集GCRC advRobust。任务需要根据给定的文章和问题从4个选项中选择正确的答案。本次评测受到工业界和学术界的广泛关注,共有29支队伍报名参赛,但由于难度较大,仅有8支队伍提交了结果。有关该任务的所有技术信息,包括系统提交、官方结果以及支持资源和软件的链接,可从任务网站获取1。”

pdf bib
System Report for CCL23-Eval Task 9: HUST1037 Explore Proper Prompt Strategy for LLM in MRC Task
Xiao Liu | Junfeng Yu | Yibo He | Lujun Zhang | Kaiyichen Wei | Hongbo Sun | Gang Tu

“Our research paper delves into the Adversarial Robustness Evaluation for Chinese Gaokao Read-ing Comprehension (GCRC advRobust). While Chinese reading comprehension tasks havegained significant attention in recent years, previous methods have not proven effective for thischallenging dataset. We focus on exploring how prompt engineering can impact a model’s read-ing comprehension ability. Through our experiments using ChatGLM, GPT3.5, and GPT4, wediscovered a correlation between prompt and LLM reading comprehension ability, and found thatprompt engineering improves the performance of each model. Our team submitted the results ofour system evaluation, which ranked first in three indexes and total scores. Keywords— LLM, Prompt, Chinese Reading Comprehension”

up

pdf (full)
bib (full)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 4: Tutorial Abstracts)

pdf bib
预训练语言模型中的知识分析、萃取与增强(Knowledge Analysis, Extraction and Enhancement inPre-trained Language Models)
Chen Yubo (玉博 陈) | Cao Pengfei (鹏飞 曹) | Wang Chenhao (晨皓 王) | Li Jiachun (嘉淳 李) | Liu Kang (康 刘) | Zhao Jun (军 赵)

“近年来,大规模预训练语言模型在知识密集型的自然语言处理任务上取得了令人瞩目的进步。这似乎表明,预训练语言模型能够自发地从语料中学习大量知识,并隐式地保存在参数之中。然而,这一现象的背后机理仍然萦绕着许多谜团,语言模型究竟掌握了哪些知识,如何提取和利用这些知识,如何用外部知识弥补模型不足,这些问题都亟待进一步探索。在本次讲习班中,我们将重点介绍在预训练语言模型知识分析、知识萃取、知识增强等领域的近期研究进展。”

pdf bib
Safety and Ethical Concerns of Large Language Models
Xi Zhiheng | Zheng Rui | Gui Tao

“Recent months have witnessed significant progress in the field of large language models (LLMs).Represented by ChatGPT and GPT-4, LLMs perform well in various natural language process-ing tasks and have been applied to many downstream applications to facilitate people’s lives. However, there still exist safety and ethical concerns. Specifically, LLMs suffer from social bias,robustness problems, and poisoning issues, all of which may induce LLMs to spew harmful con-tents. We propose this tutorial as a gentle introduction to the safety and ethical issues of LLMs.”

pdf bib
Studying Language Processing in the Human Brain with Speech and Language Models
Zhang Chao | Thwaites Andrew | Wingfield Cai

“Speech and language computational models have been instrumental in advancing Artificial In-telligence in recent years. However, it remains an open question whether the human brain isemploying similar approaches to these models. This tutorial aims to provide an accessible intro-duction to the extensive research on this topic, specifically focusing on studies that seek to es-tablish quantitative correlations between neuroimaging data from human subjects and the outputof language models or automatic speech recognition systems. The tutorial covers various aspectsof this research, including a brief overview of brain-computer interfaces and neuroscience, com-mon techniques for data processing and pattern analysis, and representative research examples. Finally, the tutorial addresses the main limitations and technical challenges encountered in thisfield, as well as the relationship between brain mechanism research and brain-inspired artificialintelligence.”

pdf bib
Foundation Models for Robotics: Best Known Practices
Xu Shaocong | Zhao Hao

“Artificial general intelligence (AGI) used to be a sci-fi word but recently the surprising general-ization capability of foundation models have triggered a lot of attention to AGI, in both academiaand industry. Large language models can now answer questions or chat with human beings,using fluent sentences and clear reasoning. Diffusion models can now draw pictures of unprece-dented photo-realism, according to human commands and controls. Researchers have also madesubstantial efforts to explore new possibilities for robotics applications with the help of founda-tion models. Since this interdisciplinary field is still under fast development, there is no clearmethodological conclusions for now. In this tutorial, I will briefly go through best known prac-tices that have shown transformative capabilities in several sub-fields. Specifically, there are fiverepresentative paradigms: (1) Using foundation models to allow human-friendly human-car in-teraction; (2) Using foundation models to equip robots the capabilities of understanding vaguehuman needs; (3) Using foundation models to break down complex tasks into achievable sub-tasks; (4) Using foundation models to composite skill primitives so that reinforcement learningcan work with sparse rewards; (5) Using foundation models to bridge languge commands andlow-level control dynamics. I hope these best known practices to inspire NLP researchers.”