期刊文献+

中文UGC信息源的本体概念抽取研究 被引量:4

Research of Ontology Concept Extraction Based on Chinese UGC Sources
原文传递
导出
摘要 【目的】实现基于UGC信息源的本体概念抽取。【方法】针对UGC信息源特征,提出一种基于语言学的细粒度词抽取组合并应用统计过滤组成概念的本体概念抽取方法,建立基于UGC信息源的概念抽取模型并对原型系统进行验证。【结果】在UGC信息源概念抽取实验中,该方法的结果比其他4组概念抽取方法的表现更为优异,准确率达68.42%,召回率达85.35%。【局限】概念抽取的测试集来自信息质量较高的UGC信息源,部分信息经过人工过滤,语料规模存在不足。【结论】概念抽取方法与技术在实现基于UGC信息源的本体概念抽取中具有一定的意义。 [Objective] In order to extract Ontology concepts from Chinese UGC information sources. [Methods] This paper proposes a mixed Ontology extraction method which extracting the fine-grained words and combining them into concepts based on linguistic methods and filters the concepts based on statistical methods. To prove the methods, the paper establishes the Ontology extraction model and develops a prototype system of concept extraction which is based on the UGC sources. [Results] The method has more excellent performance than other four concept extraction methods as the comparative samples in the experiments of concept extraction from UGC. The results of the accuracy rate and the recall rate respectively reaches 68.42% and 85.35%. [Limitations] The test set of concept extraction is from high-quality UGC sources and some of the test set is filtered manually.So the corpus scale is not enough. [Conclusions] This concept extraction method and technology has some significance in the Ontology concept extraction based on UGC.
作者 唐晓波 胡华
出处 《现代图书情报技术》 CSSCI 北大核心 2014年第5期41-49,共9页 New Technology of Library and Information Service
基金 国家自然科学基金项目"社会化媒体集成检索与语义分析方法研究"(项目编号:71273194)的研究成果之一
关键词 概念抽取 词性规则 中心词 互信息 信息熵 Concept extraction Speech rules Seed word Mutual information Information entropy
  • 相关文献

参考文献15

二级参考文献99

共引文献134

同被引文献41

引证文献4

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部