自动获取不同义项的相似词算法

A SIMILAR WORDS ALGORITHM AUTOMATICALLY CLASSIFYING DIFFERENT SENSES

下载PDF

导出

摘要词汇相似度广泛应用于自然语言处理的多个领域。然而词汇相似度的计算一般都是基于词而不是基于词的义项来进行的。针对这种情况,提出一种相似词的分类算法。算法首先采用PMImax工具来计算目标词的相似词,然后以Word Net的义项为参照,采用一种改进后的Lesk算法自动将这些相似词按照不同的义项进行分类,每一类相似词只跟对应的义项相似。实验结果表示,该算法的分类正确率可达到84.27%。 Word similarity is widely used in quite a few natural language processing fields. However usually the word similarity measures are computed based on words rather than word senses. Aiming at this situation,in this paper we present a classification algorithm for similar words. It first uses PMImaxtool to calculate similar words of the target words,and then takes the sense of Word Net as reference,it adopts an improved Lesk algorithm to automatically classify these similar words according to different senses,and the similar word of each class only resembles the corresponding sense. Experimental results suggest that this algorithm achieves the classification accuracy up to 84. 27%.

作者王永生

机构地区同济大学出国培训学院

出处《计算机应用与软件》 CSCD 2015年第3期258-260,288,共4页 Computer Applications and Software

基金教育部人文社会科学研究基金青年项目(07JC740009)

关键词词汇相似度点互信息 Lesk算法 WORDNET Word similarity Point mutual information（PMI） Lesk algorithm Word Net

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献21

1Latreche Amina,Guezouli Larbi.Similarity measure for semi-structured information retrieval based on the path and neighborhood[C].ICITeS2012.2012.
2Kobylinskiukasz,Kopec Mateusz.Semantic similarity functions in word sense disambiguation[C]//Proceedings of the 15th International Conference,TSD 2012.2012:31-8.
3Meng Lingling,Gu Junzhong.A new method for calculating word sense similarity in Word Net[J].International Journal of Signal Processing,Image Processing and Pattern Recognition,2012,5(3):197-206.
4Ren Wuling,Guo Jinju.Word similarity algorithm based on wordnet and hownet[J].Applied Mechanics and Materials,2012,155-156:375-380.
5Chen Li,Song Zilin,Miao Zhuang,et al.Determining similarity between concepts in corpus[J].Lecture Notes in Electrical Engineering,2012,113:1035-1041.
6张东娜,周春光,刘彦斌,郭东伟.一种基于WordNet和Corpus Statistics的语义相似性计算方法[J].吉林大学学报（理学版）,2010,48(5):811-816. 被引量：6
7Harris Z.Distributional structure[M]//katz,J J.The philosophy of Linguistics.New York:Oxford University Press,1985:26-47.
8Yang D,Powers D M.Automatic thesaurus construction[C]//Proc.31stAustralasian Conf.on Computer Science,2008,74:147-156.
9Church K,Hanks P.Word association norms,mutual information and lexicography[C]//Proc.27thAnnual Conf.of the ACL.1989:76-83.
10Turney P.Mining the web for synonyms:PMI-IR versus LSA on TOEFL[C]//Proc.12thEuropean Conf.on Machine Learning,2001:491-502.

二级参考文献14

1Aminul I,Diana I.Semantic Text Similarity Using Corpus-Based Word Similarity and String Similarity[J].ACM Transactions on Knowledge Discovery from Data,2008,2(2):Article 10.
2Park E K,Ra D Y,Jang M G.Techniques for Improving Web Retrieval Effectiveness[J].Information Processing and Management,2005,41(5):1207-1223.
3Budanitsky A,Hirst G.Evaluating WordNet-Based Measures of Lexical Semantic Relatedness[J].Computational Linguistics,2006,32(1):13-47.
4LI Yu-hua,Bandar Z H,McLean D.An Approach for Measuring Semantic Similarity between Works Using Multiple Information Sources[J].IEEE Trans Knowledge and Data Eng,2003,15(4):871-882.
5Leacock C,Chodorow M.Combining Local Context and WordNet Similarity for Word Sense Identification[M].Cambridge:MIT Press,1998:147-165.
6Resnik P.Using Information Content to Evaluate Semantic Similarity in a Taxonomy[C]//Proceedings of the 14th International Joint Conference on Artificial Intelligence.San Francisco:Morgan Kaufmann Publishers Inc,1995:448-453.
7Lin D.An Information-Theoretic Definition of Similarity[C]//Proceedings of the 15th International Conference on Machine Learning.San Francisco:Morgan Kaufmanm,1998:296-304.
8Jiang J J,Conrath D W.Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy[C]//Proceedings of International Conference on Research in Computational Linguistics.Taiwan:[s.n.],1997:19-33.
9Lenhart Schubert,Matthew Tong.Extracting and Evaluating General World Knowledge from the Brown Corpus[C]//Human Language Technology Conference,Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning-Volume 9.Morristown:Association for Computational Linguistics,2003:7-13.
10Rubenstein H,Goodenough J B.Contextual Correlates of Synonymy[J].Comm ACM,1965,8(10):627-633.

共引文献5

1翟延冬,王康平,张东娜,黄岚,周春光.一种基于WordNet的短文本语义相似性算法[J].电子学报,2012,40(3):617-620. 被引量：34
2邓盼盼,常春.基于精确匹配的概念映射关系规则研究[J].图书情报工作,2013,57(16):25-29. 被引量：9
3唐晓波,钟林霞,王中勤.基于本体和标签的个性化推荐[J].情报理论与实践,2016,39(12):114-119. 被引量：11
4沃强,翟丽丽,张树臣.大数据联盟显性数据资源需求多层次匹配模型[J].情报理论与实践,2018,41(3):83-88. 被引量：9
5郑开雨,竹翠.基于上下文语义的朴素贝叶斯文本分类算法[J].计算机与现代化,2018(6):1-6. 被引量：4

1王永生.基于改进的Lesk算法的词义排歧算法[J].微型机与应用,2013,32(24):69-71. 被引量：4
2马海昌,赵学锋,杨晏,王济深.潜在语义分析在词汇相似度中的应用[J].甘肃科技纵横,2014,43(3):50-51.
3林建滨,何路,李天智,房鼎益.一种抗攻击的中文同义词替换文本水印算法[J].西北大学学报（自然科学版）,2010,40(3):433-436. 被引量：9
4桂振文,刘越,陈靖,王涌天,徐志伟.一种适用于智能手机的图像识别算法[J].电子学报,2014,42(8):1487-1494. 被引量：10
5乔亚男,齐勇,侯迪.具有孤立项过滤的信息检索查询词的分析方法[J].西安交通大学学报,2009,43(8):6-10.
6杨丙贤,刘超.基于软件结构的文档与代码间可追踪性研究[J].计算机科学与探索,2014,8(6):694-703. 被引量：4
7古平,吴庭君,文静云.基于概念与词根双特征互助文本分类模型[J].计算机与现代化,2015(8):93-97.
8徐芳,侯进,吴玲,向宇.针对虚拟人的文本情感语义分析[J].西南科技大学学报,2012,27(1):40-43.
9常振华,陈伯成,李英杰,刘文煌,闫学为.SVM的几何方法—SK类思路的研究[J].计算机工程与应用,2011,47(8):149-153. 被引量：1
10胡波,德文智,赵磊,杨晓峰,宁晓斐.基于颜色不变量的BRISK算法的改进[J].数字技术与应用,2014,32(2):131-132.

计算机应用与软件

2015年第3期

浏览历史

内容加载中请稍等...

自动获取不同义项的相似词算法

参考文献21

二级参考文献14

共引文献5

相关作者

相关机构

相关主题

浏览历史