期刊文献+

基于中文维基百科的领域概念相关性研究 被引量:3

Research on Semantic Relatedness of Domain-specific Concepts Based on Chinese Wikipedia
原文传递
导出
摘要 以提高领域概念相关性判断的准确度为研究宗旨,提出综合利用中文维基百科的分类体系结构和概念释义内容进行概念间语义相关度计算的方法。选取中文维基百科分类体系下的图书情报领域的概念为实验对象,将基于分类信息和文本信息的加权算法与单独基于分类信息的语义距离算法和信息量算法,以及基于文本信息的文本重叠算法进行对比分析。实验结果表明:加权算法能取得更好的效果,可为实现面向领域的信息检索、领域本体构建等应用提供重要技术支持。 In order to improve the accuracy of computing the relatedness of the domain-specific concepts, this paper proposes a new semantic relatedness algorithm using Chinese Wikipedia category architecture and concept interpretation content. The concepts in library and information science in concept-hierarchy of Chinese Wikipedia are taken as experiment objects, and weighted algorithm based on category and text information are compared with other algorithms only based on Chinese Wikipedia category like Relsctup and Rel or on Chinese Wikipedia article like Rel,r. The experimental results show that the weighted algorithm is better than the others, and provide important technical support for application such as domain - oriented information retrieval, construction of domain ontology and so on.
出处 《图书情报工作》 CSSCI 北大核心 2014年第23期136-142,共7页 Library and Information Service
基金 国家社会科学基金重大项目"基于特定领域的网络资源知识组织与导航机制研究"(项目编号:12&ZD222) 教育部人文社会科学研究青年基金项目"面向轻博客热点话题情感倾向性分析的研究"(项目编号:12YJC870023)研究成果之一
关键词 中文维基百科 领域概念 语义相关性 语义关系 Chinese Wikipedia domain-specific concept semantic relatedness semantic relation
  • 相关文献

参考文献26

  • 1Jiang J J, Conrath D W. Semantic similarity based on corpus statis- tics and lexical taxonomy[ C ]//Proceedings of International Con- ference Research on Computational Linguistics. Taipei: Association for Computational Linguistics, 1997 : 13 - 33.
  • 2Church K, Hanks P. Word association norms, mutual information, and lexicography [ J 1. Computational Linguistics, 1990, 16 ( 1 ) : 22 -29.
  • 3Cilibrasi R L, Vitanyi P M B. The Google similarity distance[ J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19 (3) :370-383.
  • 4Landauer T K, Foltz P W, Laham D. An introduction to latent se- mantic analysis[ J]. Discourse Processes, 1998,25 (2/3) : 259 - 284.
  • 5Fellbaum C. WordNet : An electronic lexical database [ M ]. Cam- bridge : MIT Press, 1998 : 18 - 19.
  • 6Jarmasz M, Szpakowicz S. Roget' s thesaurus and semantic similar- ity [ C //Proceedings of RANLP. Borovcts, Bulgaria: Association for Computational Linguistics, 2003:212 - 219.
  • 7刘群 李素建.基于《知网》的词汇语义相似度计算.中文计算语言学,2002,7(2):59-76.
  • 8田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报(信息科学版),2010,28(6):602-608. 被引量:179
  • 9Strube M, Ponzetto S P. WikiRelate! Computing semantic related- ness using Wikipedia[ C ]//Proceedings of AAAI. Boston: Ameri- can Association for Artificial Intelligence, 2006 : 1419 - 1424.
  • 10Gabrilovich E, Markovitch S. Computing semantic relatedness u- sing Wikipedia-based explicit semantic analysis[ C ]//Proceedings of IJCAI. Hyderabad, India: American Association for Artificial In- telligence, 2007:1606 - 1611.

二级参考文献16

  • 1余刚,裴仰军,朱征宇,陈华月.基于词汇语义计算的文本相似度研究[J].计算机工程与设计,2006,27(2):241-244. 被引量:25
  • 2程涛,施水才,王霞,吕学强.基于同义词词林的中文文本主题词提取[J].广西师范大学学报(自然科学版),2007,25(2):145-148. 被引量:11
  • 3关毅,王晓龙.基于统计的汉语词汇间语义相似度计算[C]//全国第七届计算语言学联合学术会议论文集,哈尔滨,2003:221-227.
  • 4YU Sheng-quan,HE Ke-kang.The Research of Adaptive Learning System Based on Internet[C] ∥The Third Global Chinese Computer Application Conference Analects.Macao,China:Macao University Press,1999:34-40.
  • 5梅家驹,竺一鸣,高蕴琦,等.同义词词林[M].上海:上海辞书出版社,1993:106-108.
  • 6刘群,李素建.基于"知网"的词汇语义相似度计算[C] ∥计算语言学与中文语言处理--第三届汉语词汇语义学研讨会论文集.台北:台北市中研院语言学研究所,2002:59-76.
  • 7Michael Strube,Simon Paolo Ponzetto.WikiRelate!Computing semantic relatedness using Wikipedia[C] //Proceedings of the 21rd national conference onArtificial intelligence,2006:1419-1424.
  • 8Simone Paolo Ponzetto,Michael Strube.KnowledgeDerived From Wikipedia For Computing SemanticRelatedness[J].Journal of Artificial IntelligenceResearch,2007,30:181-212.
  • 9Torsten Zesch,Christof Muller,Iryna Gurevych.Using Wiktionary for Computing Semantic Relatedness[C] //Proceedings of the 23rd national conference onArtificial intelligence,2008:861-867.
  • 10Evgeniy Gabrilovich,Shaul Markovitch.ComputingSemantic Relatedness using Wikipedia-based ExplicitSemantic Analysis[C] //Proceedings of the 20thInternational Joint Conference on ArtificialIntelligence,2006:1606-1611.

共引文献347

同被引文献46

  • 1黄毅,王庆林,刘禹.一种基于条件随机场的领域术语上下位关系获取方法[J].中南大学学报(自然科学版),2013,44(S2):355-359. 被引量:5
  • 2安新颖,冷伏海.基于非相关文献的知识发现原理研究[J].情报学报,2006,25(1):87-93. 被引量:36
  • 3中国科学院计算技术研究所.ICTCLAS汉语分词系统【EB/OL】.[2011-02—16】.http://ictclas.org/.
  • 4赵鹏,耿焕同,蔡庆生,王清毅.一种基于加权复杂网络特征的K-means聚类算法[J].计算机技术与发展,2007,17(9):35-37. 被引量:16
  • 5化柏林.基于NLP的知识抽取系统架构研究[J].现代图书情报技术,2007(10):38-41. 被引量:16
  • 6孟佳娜.迁移学习在文本分类中的应用研究[D].大连:大连理工大学,2011.
  • 7YANG P, GAO W. Information-theoretic multi-view domain ad- aptation: a theoretical and empirical study [ J]. Journal of Ar- tificial Intelligence Research, 2014, 49 : 501-525.
  • 8XIA R, HU X, LU J, et al. Instance selection and instance weighting for cross-domain sentiment classification via PU learn- ing [ C ] //Proceedings of the 23ra Int Joint Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2013: 2176-2182.
  • 9HUANG J, SMOLA A, GRETI'ON A, et al. Correcting sam- pie selection bias by unlabeled data [ C ] //Proceedings of the 21st Annual Conference on Neural nformation Processing Sys- tems. Cambridge: MITPress, 2007 (19): 601-608.
  • 10DAI W, YANG Q, XUE G, et al. Boosting for transfer learn- ing [ C ] //Proceedings of the 24th International Conference on Machine Learning (ICML). New York, USA: ACM, 2007: 193 -200.

引证文献3

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部