期刊文献+

基于领域本体的微博用户信息抽取方法 被引量:1

Method for Extracting Microblog Users' Information Based on Domain Ontologh
下载PDF
导出
摘要 传统基于本体的Web页面信息抽取以单个信息项为最小抽取单位,抽取出的实体语义关联性较差和抽取准确率不理想。针对上述问题,以微博领域本体为基础,提出了一种两层次匹配的用户信息抽取方法:将微博中具有语义关联的不同层次的用户信息划分成对应信息块,以信息块作为最小抽取单位分别抽取其中包含的用户各属性信息(包含个人信息、关注的好友信息和所发文本微博信息)。试验结果证明,与传统信息抽取方法相比,设计的抽取规则算法能够有效地提高信息的准确率和召回率,对微博页面结构复杂以及信息量大的Web网页有良好的抽取效果。 There are some problems of existing the traditional ontology-based Web information which uses single information item as the smallest unit,the extracted entities lack of associating semantics and with poor extraction accuracy.In response to the problems,a two-level matching method of users' personal information extraction is proposed based on the microblog domain ontology,microblog user information is divided into different blocks,then the information block is used as the smallest unit to extract information from the each user's property(including personal information,information of concerned friend and issued the text tweets).Experimental results show that compared with traditional information extraction method,the proposed method can effectively improve the accuracy and the recall of information extraction and has good extraction results with the complex microblogging page and infor.
出处 《长江大学学报(自科版)(上旬)》 CAS 2015年第4期36-40,4,共5页 JOURNAL OF YANGTZE UNIVERSITY (NATURAL SCIENCE EDITION) SCI & ENG
基金 安徽省教育厅基金项目(KJ2013B020) 国家级大学生创新与创业训练计划(201210363066 201310363097)
关键词 领域本体 两层次匹配 信息抽取 微博 抽取规则 Domain ontology two-level matching Information extraction microblog extraction rules
  • 相关文献

参考文献8

二级参考文献64

共引文献115

同被引文献58

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部