摘要
针对维吾尔语名词短语指代现象,提出了一种利用栈式自编码深度学习算法进行基于语义特征的指代消解方法.通过对维吾尔语名词短语指称性的研究,提取出利于消解任务的13项特征.为提高特征对文本语义的表达,在特征集中引入富含词汇语义及上下文位置关系的Word embedding.利用深度学习机制无监督的提取隐含的深层语义特征,训练Softmax分类器进而完成指代消解任务.该方法在维吾尔语指代消解任务中的准确率为74.5%,召回率为70.6%,F值为72.4%.实验结果证明,深度学习模型较浅层的支持向量机更合适于本文的指代消解任务,对Word embedding特征项的引入,有效地提高了指代消解模型的性能.
Aimed at the reference phenomena of Uyghur noun phrases, a method using stacked autoencoder model to achieve coreference resolution based on semantic characteristics is presented. Through the study of noun phrases referentiality, we pick up beneficial 13 features for coreference resolution tasks. In order to improve the expression of features for semantic text, Word embedding is added into feature sets, which makes feature sets contain lexical semantic information and context positional relationship. A deep learning algorithm is proposed for unsupervised detection of implicit semantic information, and also introduced is a softmax classifier to decide whether the two markables actually corefer. Experiments show that precision rate, recall rate and F value of coreference resolution reach 74.5 %, 70.6 % and 72.4 %, respectively, which demonstrates that the proposed method on coreference resolution of Uyghur noun phrase and introduction of Word embedding to feature sets are able to improve the performance of coreference resolution system.
出处
《自动化学报》
EI
CSCD
北大核心
2017年第11期1984-1992,共9页
Acta Automatica Sinica
基金
国家自然科学基金(61563051
61262064
61662074
61331011)
自治区科技人才培养项目(QN2016YX0051)资助~~