基于在线百科知识库的文本语义相关度计算

Text Semantic Relativity Calculation Based on Online Encyclopedia

下载PDF

导出

摘要本文在中文维基百科知识库的基础上,对文本语义相关度计算进行了研究.实验选取了2014年12月15日在中文维基百科网站下载的主题文章,进行处理后作为语义概念知识库.在Words-240测试集上的实验结果表明,该方法比基于Word Net的LSA算法的效果要好. In text semantic understanding , massive quantity of common sense and specialized knowledge is needed;it doesn ’ t suffice to only use human-compiled dictionary and thesaurus .As social networks develop , on-line Wikipedia provides a platform for sharing and improving human knowledge .This paper , on the basis of Chi-nese Wikipedia , studies the calculation of text semantic relativity .A corpus of processed texts is used as knowledge base of concepts;the texts are downloaded from Chinese Wikipedia as of 2014 December 15 .The results of experi-ment on Words-240 test set indicate that the method discussed in this paper is superior to WordNet-based approa-ches and LSA method .

作者刘海静

机构地区太原工业学院计算机工程系

出处《洛阳师范学院学报》 2015年第5期80-83,共4页 Journal of Luoyang Normal University

关键词语义理解在线百科知识库语义相关度 semantic understanding online encyclopedia semantic relativity

分类号 N37 [自然科学总论]

引文网络
相关文献

参考文献11

1Deerwester S, Dumais S, Fumas G, et al.. Indexing by latent semantic analysis[ J]. Journal of the American Socie- ty for Information Science, 1990,41 (6), 391 -407.
2Fellbaum C. WordNet: An Electronic Lexical Database [ M]. MIT Press, Cambridge, 1998.
3Roget P. Roget's Thesaurus of English Words and Phrases [ M]. Longman Group Ltd ,1852.
4Budanitsky A, Hirst G. Evaluating wordnet - based meas- ures of lexical semantic Relatedness [ J ]. Computational Linguistics, 2006, 32 (1), 13 - 47.
5Michael S, Sinone P. WikiRelate Computingsemantic relat- edness using Wikipedia[ A]. In proceedings of.the 21th A- merican Association for Artifiaial Intelligence[C]. Boston, AAAI Press ,2006 : 1419 - 1424.
6Gurevyeh I, Mueller C, Zesch T. What to be? - electron- ic career guidance based on semantic relatedness. In Pro- ceedings of the 45th Annual Meeting of the Association for Computational Linguistics ,2007.
7Chang M, Ratinov L, Roth D, et al. Importance of seman- tic representation: Dataless classification. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence , 2008.
8Evgeniy G, Ahanl M. Wikipedia- based Semantic Inter- pretation for Natural Language Processing[ J]. Journal of Artificial Intelligence Research, 2009 (34) :443 - 498.
9汪祥,贾焰,周斌,丁兆云,梁政.基于中文维基百科链接结构与分类体系的语义相关度计算[J].小型微型计算机系统,2011,32(11):2237-2242. 被引量：18
10Lee M, Pincombe B, Welsh M. A comparison of machine measures of text document similarity with human judg- ments. In 27th Annual Meeting of the Cognitive Science Society, 2005.

二级参考文献13

1Philip Resnik. Using information content to evaluate semantic simi- larity in a taxonomy [A]. In: C. Raymond Perrault, Chris S. Mellish, Renato deMori eds. Proceedings of the 14th International Joint Conference on Artificial InteUigence [ C]. Montreal: AAAI Press, 1995:448-453.
2George A Miller. WordNet: a lexical database for english [ C].Communications of the ACM, 1995:38( 11 ) :39-41.
3Ted Pedersen, Siddharth Patwardhan, Jason Michelizzi. WordNet: similarity: measuring the relatedness of concepts [ C ]. In: David Palmer, Joseph Polifroni, Deb Roy, eds. Proc. of Human Lan- guage Tectmology conference. Montteal: Association for Computa- tional Linguistics, 2004:38-41.
4Li Yun. Mining semantic knowledge from chinese Wikipedia [D]. Beijing University of Posts and Telecommunications,2009.
5Evgeniy Gabrilovich, Shaul Markovitch. Computing semantic relat edness using Wikipedia-based explicit semantic analysis [ A]. InI Manuela Veloso. Proceedings of the 20th International Joint Confe1 ence on Artificial Intelligence [ C ]. Hyderabad: AAAI Press 2007 : 1606-1611.
6David Milne, Ian H Witten. An effective, low-cost measure of se- mantic relatedness obtained from Wikipedia links [ A]. In: Taylor Matthew, Dfiessens Kurt, Fern Alan eds. Proc. of the 23th Associ- ation for the Advancement of Artificial Intelligence [ C ]. Chicago: AAAI Press,2008:25-30.
7Thomas K Landauer, Peter W Foltz, Darrell Laham. An introduc- tion to latent semantic analysis [ J]. Discourse Processes, 1998,25 (2-3) :259-284.
8Liu Qun,Li Su-jian. Word slmHarlty computing based on how-net [ J]. International Journal of Computational Linguistics & Chinese Language Processing,2002,7 (2) :59-76.
9Michael S~rube, Shnone Paolo Ponzetto. WfidRelate computing se- mantic relatedness using Wikipedia [ A]. In: Anthony Colin, Uni-versity of Leeds, eds. Proceedings of the 21th American Associa- tion for Artificial Intelligence [ C ]. Boston: AAAI Press, 2006: 1419-t424.
10Jay J Jiang, David W Conrath. Semantic s'nnilarity based on corpus statistics and lexical taxonomy [ C]. In Proceedings of Internation- al Conference Research on Computational Linguistics, Taiwan, 1997 : 1-15.

共引文献163

1冉丽,何毅舟,许龙飞.基于Web结构挖掘的搜索引擎作弊检测方法[J].计算机应用,2004,24(10):158-160. 被引量：4
2陈科,贾焰,杨树强,王永恒.汉语短文话题提取系统中SDTF*PDF算法的研究[J].计算机应用,2005,25(1):14-16. 被引量：1
3孙宝军,王新军.P2P中基于本体论的知识管理框架模型及实现[J].计算机科学,2005,32(2):31-32. 被引量：1
4张涛,杨尔弘.基于上下文词语同现向量的词语相似度计算[J].电脑开发与应用,2005,18(3):41-43. 被引量：8
5许云,樊孝忠,张锋.基于知网的语义相关度计算[J].北京理工大学学报,2005,25(5):411-414. 被引量：53
6张丙奇,白硕,赵章界.XML数据相似度研究[J].计算机工程,2005,31(11):25-27. 被引量：6
7朱礼军,陈虔,刘慧,黄晓云.基于知识本体的资源管理平台框架设计与实现[J].北京航空航天大学学报,2005,31(11):1245-1249. 被引量：15
8张运良,张全.基于HNC理论的语义相关度计算方法[J].计算机工程与应用,2005,41(34):1-3. 被引量：18
9余刚,裴仰军,朱征宇,陈华月.基于词汇语义计算的文本相似度研究[J].计算机工程与设计,2006,27(2):241-244. 被引量：25
10龚永恩,袁春风,武港山.基于语义的词义消歧算法初探[J].计算机应用研究,2006,23(3):41-43. 被引量：8

1李洪燕,樊治平.一种基于二元语义的多指标群决策方法[J].东北大学学报（自然科学版）,2003,24(5):495-498. 被引量：24
2殷杰,董佳蓉.当代人工智能表征的分解方法及其问题[J].科学技术与辩证法,2009,26(2):23-28. 被引量：2
3王欣荣,樊治平.基于二元语义信息处理的一种语言群决策方法[J].管理科学学报,2003,6(5):1-5. 被引量：70
4万静,王文聪,易军凯.基于本体和局部上下文分析的查询扩展[J].控制工程,2013,20(3):558-561. 被引量：2
5郭贵春,刘伟伟.德国语义学发展的历史趋势及其内在特征——逻辑实证主义之前语义学的诞生与兴起[J].科学技术哲学研究,2010,27(1):7-14. 被引量：1
6杨维恒,郭贵春.生物学中信息概念的语义分析[J].自然辩证法研究,2013,29(8):20-25. 被引量：1
7鲍广宇,付丰科,赵志敏.一种基于二元语义信息处理的群体决策方法[J].解放军理工大学学报（自然科学版）,2009,10(5):435-439. 被引量：3
8程瑞.时空实在论与结构实在论[J].科学技术哲学研究,2009,26(4):33-39. 被引量：2
9于春海,樊治平,姜艳萍.一种基于语言评价信息的多指标群聚类方法[J].控制与决策,2005,20(5):533-536. 被引量：3
10王宏智,高学东,赵杨.一种群体智能聚类算法研究[J].中国管理信息化,2013,16(2):74-75. 被引量：1

洛阳师范学院学报

2015年第5期

浏览历史

内容加载中请稍等...

基于在线百科知识库的文本语义相关度计算

参考文献11

二级参考文献13

共引文献163

相关作者

相关机构

相关主题

浏览历史