期刊文献+

一种面向e-Science环境的多领域Web文本特征抽取模型

Multi-domain Web Text Feature Extraction Model for e-Science Environment
下载PDF
导出
摘要 传统领域信息抽取方法多依赖领域词典实现文本特征的发现,既不便于实验复现,也不易于其在多领域环境中移植与推广,严重制约了模型的应用范围.针对上述不足,提出一种适用于e-Science环境的多领域Web文本特征抽取模型(简称e-WTDE).该模型将无词典分词技术引入多领域文本特征发现过程,摆脱了对于领域词典的依赖;借助对领域主题及其具体事件中共性与个性特征的抽取与分类,模型动态追踪领域事件发生及其发展变化,并最终形成多个区域性数据中心;通过对各数据中心中领域知识的协同调度,有力提高了领域信息在全局范围内的利用效率.验证实验中分别对多领域特征抽取、主题特征动态追踪以及领域知识协同调度予以有效性验证,并进一步证明了模型的实用效果. The traditional information extraction methods based on specific domain usually depend on the domain dictionaries to discover the text feature.It is inconvenient for reproducing and difficult to transplant in multi-domain environment.The application scope is limited seriously.Oriented to the deficiencies above,a multi-domain web text feature extraction model for e-Science is proposed(named e-WTDE).This model adopts the Chinese split words technology without dictionary into the process of multi-domain text feature discovery and avoids the dependency of domain dictionaries effectively.With the help of classification of common and individual features,the model tracks the generation and the development trend of domain events dynamically,and forms a couple of local data centers eventually.Through cooperative scheduling the domain knowledge between different local data centers,the knowledge utilization efficiency of the domain information in the global scope is improved sharply.To validate the performance,the experiments on the multi-domain text feature extraction,topic features dynamical tracking and the domain knowledge cooperative scheduling demonstrate that the model has higher application validity and practicality in e-Science environment.
出处 《小型微型计算机系统》 CSCD 北大核心 2011年第1期17-23,共7页 Journal of Chinese Computer Systems
基金 国家"十一五"科技支撑计划项目(2006BAK11B03)资助
关键词 E-SCIENCE环境 特征发现 多领域数据模型 WEB文本挖掘 e-Science environment feature discovery multi-domain data model Web text mining
  • 相关文献

参考文献2

二级参考文献4

共引文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部