摘要
中文在线百科包含大量有价值的信息,很多工作成功地将其用于各类知识获取任务。例如,拥有相似话题的文档可以被归为一个概念。从这些在线百科中构建出的针对某一概念的层次话题对于搜索与浏览、信息组织和检索等应用都有很大的帮助。然而,目前尚未出现对在线百科中某一概念层次话题构建的研究。针对中文在线百科的异构性与粗糙性的问题,提出了一种基于贝叶斯网络的话题层次构建方法。该方法同时综合文档的结构化目录信息和非结构化文本信息,采用最大树形图算法自动地在文档所属概念的贝叶斯话题网络中建立层次话题。实验证明,与原有的百科话题结构相比较,所提方法在保持75%的准确性的同时扩充了4倍的内容。
Chinese online encyclopedia carries a huge amount of high quality information. Previous studies have utilized it for different knowledge acquisition tasks. For instance, the articles with similar subjects are grouped together into ca- tegories. Constructing a certain category topical hierarchy from the online encyclopedia is significantly beneficial for many applications such as search and browsing, information organizing and information retrieval. However, no attempts have been made to explore topic hierarchy of given category in online encyclopedia. Considering most of the online ency- clopedia is heterogeneous and rough, this paper proposed a novel scheme of constructing topic hierarchy based on the Bayesian network. This scheme will incorporate both the structured contents table and unstructured text descriptions in the articles of the same category into automatic topic hierarchy learning for the online encyclopedia category using the algorithm of maximum spanning tree on the Bayesian topic network. Experimental results show that, compared with the existed encyclopedia topical hierarchy, our approach expand the content of 4 times while maintaining the accuracy of 75%.
出处
《计算机科学》
CSCD
北大核心
2017年第5期226-231,共6页
Computer Science
基金
国家自然科学基金项目(61309007)
国家"八六三"高技术研究发展计划基金项目(2006AA01Z409)资助
关键词
中文在线百科
层次话题
结构化目录信息
非结构化文本信息
Chinese online encyclopedia, Topic hierarchy, Structured contents table, Unstructured text description