期刊文献+

概念图构建中概念术语自动提取的研究与实现 被引量:2

Research on auto-extraction of concept terms in process of constructing concept maps
下载PDF
导出
摘要 概念图的构建是一项复杂的工程,在概念术语提取阶段往往需要领域专家花费大量时间手工完成。随着概念图在信息处理和知识管理系统中得到日益广泛的应用,仅仅依靠领域专家来手工提取概念术语生成概念图的办法已不能满足应用需求。基于此,提出结合网络爬虫技术和LSA的方法自动提取概念术语,生成概念图的方法,可以降低概念图制作的人工复杂度,高效、准确的构建概念图,可以大大扩展概念图的应用范围。从指定网站上爬取大量领域文本资源;进行文本预处理并抽取特征项;再利用LSA挖掘特征项与特征项、特征项与文本之间的潜在语义结构,消除噪音及冗余特征项,提取概念术语。实验结果表明,结合网络爬虫技术和LSA方法能够降低概念术语的提取过程中的人力复杂度,去除冗余概念,并提高准确性。 Constructing concept maps is a complex task requiring lots of domain experts' time to manually extract concept terms from the unstructured text. With the rapid growth applications of concept maps, it's obviously hard to meet the demand by rel- ying solely on the manual efforts of extracting the terms. A method of auto-extraction of terms of domain concepts is proposed by combining web crawler technology and LSA technique. Firstly, through the specific domain sites, numerous text resources are captured. Then, the texts and extracts features from them are preprocessed. Finally, it extracts the domain concept terms by e- liminating the noisy terms and redundant features through a method of LSA, which can mine the potential semantic structures between features, and those between features and texts. Experiments show that the method of the combination of web crawler technology and LSA technique can decrease the artificial complexity, remove redundant terms and improve the accuracy of the ex- traction of domain concepts terms.
出处 《计算机工程与设计》 CSCD 北大核心 2012年第7期2864-2867,共4页 Computer Engineering and Design
基金 全国教育科学规划项目国家青年基金课题基金项目(CCA100176) 四川省教育厅科研基金项目(09ZC080)
关键词 概念图 概念术语 网络爬虫技术 潜在语义分析 特征项 concept map concept terms web crawler technology latent semantic analysis features
  • 相关文献

参考文献9

二级参考文献110

共引文献159

同被引文献16

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部