基于支持向量机分类和语义信息的中文跨文本指代消解被引量：5

Chinese cross document co-reference resolution based on SVM classification and semantics

下载PDF

导出

摘要跨文本(实体)指代消解(CDCR)的任务就是把所有分布在不同文本但指向相同实体的词组合在一起形成一个指代链。传统的跨文本指代消解主要采用聚类方法来解决信息检索中遇到的重名消歧问题。将聚类问题转换为分类问题,并采用支持向量机(SVM)分类器来解决信息抽取中的重名消歧和多名聚合问题。该方法可有效融合实体名称的构词特征、读音特征以及文本内部和文本外部的多种语义特征。在中文跨文本指代语料库上的实验表明,同聚类方法相比,该方法在提高精度的同时,也提高了召回率。 The task of Cross-Document Co-reference Resolution（CDCR） aims to merge those words distributed in different texts which refer to the same entity together to form co-reference chains.The traditional research on CDCR addresses name disambiguation posed in information retrieval using clustering methods.This paper transformed CDCR as a classification problem by using an Support Vector Machine（SVM） classifier to resolve both name disambiguation and variant consolidation,both of which were prevalent in information extraction.This method can effectively integrate various features,such as morphological,phonetic,and semantic knowledge collected from the corpus and the Internet.The experiment on a Chinese cross-document co-reference corpus shows the classification method outperforms clustering methods in both precision and recall.

作者赵知纬顾静航胡亚楠钱龙华周国栋

机构地区苏州大学自然语言处理实验室苏州大学计算机科学与技术学院

出处《计算机应用》 CSCD 北大核心 2013年第4期984-987,共4页 journal of Computer Applications

基金国家自然科学基金资助项目(60873150 90920004) 江苏省自然科学基金资助项目(BK2010219) 江苏省高校自然科学重大项目(11KJA520003)

关键词跨文本指代信息抽取支持向量机分类器语义信息重名消歧多名聚合 cross document co-reference resolution information extraction Support Vector Machine（SVM） classifier semantics name disambiguation variant consolidation

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献17

1MCCARTHY L W. Using decision trees for coreference resolution [ C]// MUC-6: Proceedings of the Sixth Message Understanding Conference. Montreal, Quebec, Canada: [s.n.], 1995: 20-25.
2BAGGA A, BALDWIN B. Entity-based cross-document coreferenc- ing using the vector space model [ C]//COLING-ACL'98: Proceed- ings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 1998:79 - 85.
3NIST speech group. The ACE2008 evaluation plan: assessment of detection and recognition of entities and relations within and across documents [ EB/OL]. [ 2008 - 08 - 08]. http://www, nist. gov/ speech/tests/ace/2008/doc/ace08-evalplan, vl. 2d. pdf.
4BARON A, FREEDMAN M. Who is who and what is what: experi- ments in cross-document co-reference [ C]// [MNLP'08: Proceed- ings of the 2008 Conference on Empirical Methods in Natural Lan- guage Processing. StroudsbUrg, PA, USA: Association for Computa- tional Linguistics, 2008:274-283.
5SINGH S, SUBRAMANYA A, PEREIRA F, et al. Large-scale cross-document coreference using distributed inference and hierarchical models [ C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011:793 -803.
6GOOI C H, ALLAN J. Cross-document coreference on a large scale corpus [ C]// HLT-NAACL 2004. Stroudsburg, PA, USA: Associ- ation for Computational Linguistics, 2004:9 - 16.
7BOLLEGALA D, MATSUO Y, ISHIZUKA M. Disambiguating per- sonal names on the Web using automatically extracted key phrases [ C]// Proceedings of the European Community of Artificial Intelli- gence. [ S. 1. ] : IOS Press, 2006:553 -557.
8HUANG J]AN , TAYLOR S M , SMITH J L , et al. Profile based cross-document coreference using kernelized fuzzy relational cluste- ring [ C]//Proceedings of the 47th Annual Meeting of the ACL and the4th [JCNLP of the AFNLP. Stroudsbnrg, PA, USA: Association for Computational Linguistics, 2009:414 - 422.
9POPESCU O. Person cross document coreference with name perplex- ity estimates[ C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009:997 - 1006.
10POPESCU O. Dynamic parameters for cross document coreference [ C]//COLIN 2010. Beijing: [ s. n. ], 2010:988 -996.

同被引文献42

1陈小全,张继红.基于改进粒子群算法的聚类算法[J].计算机研究与发展,2012,49(S1):287-291. 被引量：31
2陈兴蜀,吴小松,王文贤,王海舟.基于特征关联度的K-means初始聚类中心优化算法[J].四川大学学报（工程科学版）,2015,47(1):13-19. 被引量：29
3肖雪,何中市.基于向量空间模型的中文文本层次分类方法研究[J].计算机应用,2006,26(5):1125-1126. 被引量：12
4彭京,杨冬青,唐世渭,付艳,蒋汉奎.一种基于语义内积空间模型的文本聚类算法[J].计算机学报,2007,30(8):1354-1363. 被引量：44
5CHEN Chunling,TSENG F S C,LIANG Tyne. Hierarchical document clustering using fuzzy association rule mining[A].Dalian Liaoning:Conference Publications,2008.326-329.
6刘海峰,王元元,刘守生.一种组合型中文文本分类特征选择方法[J].广西师范大学学报（自然科学版）,2007,25(4):208-211. 被引量：9
7孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量：1063
8KHAN S S, AHAMAND A. Cluster center initialization algorithm for k-means clustering [ J]. Pattern Recognition Letters ,2004,25 ( 11 ) : 1293-1302.
9HUAN{, Z X. Extensions to the k-means algorithm for etuslering large data sets with categorical values[ J]. Data Mining and Knwledg Discovery, 1998,2( 3 ) :283-304.
10CAO F Y, I,IANG J Y. An initialization method for Cate- gorical data clustering[ J ]. Expert Systems with Applica- lions, 2009,36 ( 7 ) : 1 0223-10228.

引证文献5

1符保龙,张爱科.基于均值密度中心估计的k-means聚类文本挖掘方法[J].重庆邮电大学学报（自然科学版）,2014,26(1):111-116. 被引量：13
2陈小辉,张功萱.基于信息熵的符号属性精确赋权聚类方法[J].重庆邮电大学学报（自然科学版）,2014,26(6):850-855. 被引量：4
3文静,曹妍,牟向伟.双重遗传算法在文本聚类中的应用[J].计算机工程与设计,2016,37(9):2435-2439.
4程东生,范广璐,俞雯静,伍飞,曾伟波.基于极限学习机的中文文本分类方法[J].重庆理工大学学报（自然科学）,2018,32(8):156-164. 被引量：5
5吕国英,武宇娟,李茹,张月平,关勇,郭少茹.基于汉语框架语义的共指消解研究[J].计算机工程,2020,46(10):74-80. 被引量：1

二级引证文献22

1张新娟.一种基于改进粒子群算法的图像分类方法研究[J].自动化与仪器仪表,2016(7):163-164. 被引量：1
2朱俚治.一种基于计算机病毒行为权值的检测算法[J].信息技术与标准化,2018(12):69-72.
3申玫,徐宁,周明顺,赵晓玲,李先强.数据挖掘技术在中高职课程衔接中的应用[J].现代教育科学（高教研究）,2014(4):70-73. 被引量：2
4孙菲,张健沛,董野,任福栋,于涛,郭春平.基于标准偏移量的学生成绩K-means聚类分析算法研究[J].齐齐哈尔大学学报（自然科学版）,2015,31(2):57-64. 被引量：6
5陈佳,石林.数据挖掘中模糊C聚类算法的寻优能力优化[J].科技通报,2015,31(9):208-211. 被引量：2
6王东,孙彬.情绪波动方程下微信息推介演变模型[J].沈阳工业大学学报,2016,38(4):434-439. 被引量：2
7赵慧珍,刘付显,李龙跃.Parzen窗确定系数的协同模糊C均值算法[J].重庆邮电大学学报（自然科学版）,2017,29(2):272-278. 被引量：4
8龚静,黄欣阳.基于改进模糊语法增量式算法的文本分类方法[J].计算机应用研究,2017,34(11):3355-3358. 被引量：3
9涂坤,孙彬,王东.知识网络情绪互信息熵检测[J].沈阳工业大学学报,2018,40(3):304-309. 被引量：3
10张淑芬,董岩岩,陈学斌.基于云计算平台Hadoop的HKM聚类算法设计研究[J].应用科学学报,2018,36(3):524-534. 被引量：9

1刘海峰,苏展,刘守生.一种基于词频信息的改进CHI文本特征选择[J].计算机工程与应用,2013,49(22):110-114. 被引量：24
2钟灵.关于一些新词性质的分析和判定[J].科技信息,2011(14).
3李勇,李跃龙.基于关系数据库存储OWL本体的方法研究[J].计算机工程与科学,2008,30(7):105-107. 被引量：21
4于洪志,李亚超,汪昆,冷本扎西.融合音节特征的最大熵藏文词性标注研究[J].中文信息学报,2013,27(5):160-165. 被引量：15
5董静.网络语言的构词特征OnWord-formationNetworkLanguage[J].科技视界,2016(3):157-157.
6黄剑韬.基于商空间的向量空间模型文本分类方法[J].计算机应用,2011,31(A02):67-69. 被引量：5
7刘秋红.网络流行语的构词特征与语言经济原则[J].科技信息,2010(11):131-131. 被引量：5
8罗军,张杰,刘艺茹.OWL DL和关系数据库的映射研究[J].世界科技研究与发展,2013,35(2):208-210. 被引量：1
9徐冬冬,吴韶波.一种基于类别描述的TF-IDF特征选择方法的改进[J].现代图书情报技术,2015(3):39-48. 被引量：13
10陈晨,王厚峰.基于社会网络的跨文本同名消歧[J].中文信息学报,2011,25(5):75-82. 被引量：13

计算机应用

2013年第4期

浏览历史

内容加载中请稍等...

基于支持向量机分类和语义信息的中文跨文本指代消解被引量：5

参考文献17

同被引文献42

引证文献5

二级引证文献22

相关作者

相关机构

相关主题

浏览历史

基于支持向量机分类和语义信息的中文跨文本指代消解 被引量：5

参考文献17

同被引文献42

引证文献5

二级引证文献22

相关作者

相关机构

相关主题

浏览历史

基于支持向量机分类和语义信息的中文跨文本指代消解被引量：5