期刊文献+

基于规则的百科人物属性抽取 被引量:3

Rules-Based Character Attributes Extraction from Baidu Encyclopedia
下载PDF
导出
摘要 信息抽取是数据挖掘的一个重要领域,文本信息抽取是指从一段自由文本中抽取出指定的信息并将其结构化数据存入知识库供用户查询或下一步处理所用。人物属性信息抽取是智能人物类搜索引擎构建的重要基础,同时结构化信息也是计算机所能理解的一种数据格式。作者提出了一种自动获取百科人物属性的方法,该方法利用各属性值的词性信息来定位到百科自由文本中,通过统计的方法发现规则,再根据规则匹配从百科文本中获取人物属性信息。实验表明该方法从百科文本中抽取人物属性信息是有效的。抽取的结果可以用来构建人物属性知识库。 Information extraction is an important area of data mining. Text information extraction means extracting specified information from a section of free text and storing structured data in the knowledge base for user querying or further processing. Character attribute information extraction is an important instrument of building search engine of persons, and is also a technology for computer program understanding. This paper presents an automatic method to obtain encyclopedia character attributes, and this method uses the speech tagging of each attribute value to locate the encyclopedia free text. The rules are discovered by statistical method, and the character attributes information is obtained from encyclopedia text according to rules matching. Experiments show that this method is effective in extracting character attribute information from encyclopedia text. The extracted results can be used to build the knowledge base of the character attributes.
出处 《集成技术》 2013年第3期1-4,共4页 Journal of Integration Technology
基金 国家自然科学基金(61152001 61170111) 中国科学院自动化研究所复杂系统管理与控制重点实验室开放课题(20110102) 中央高校基本科研业务费专项资金(SWJTU11ZT08)
关键词 人物属性抽取 规则获取 自由文本 character attributes extraction rules acquisition free text
  • 相关文献

参考文献7

二级参考文献54

  • 1车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:115
  • 2李向阳,戴江山,张亚非.一种Web信息抽取规则的优化方法[J].兰州理工大学学报,2006,32(1):90-93. 被引量:3
  • 3王璐,朱东华,任智军.科技术语属性抽取方法研究[J].现代图书情报技术,2007(5):69-72. 被引量:8
  • 4[1]R Gaizauskas,Y Wilks.Information extraction:Beyond document retrieval.Journal of Documentation,1998,54(1):70-105
  • 5[2]C Aone,M Ramos-Santacruz.Rees:A large-scale relation and event extraction system.The 6th Applied Natural Language Processing Conference,Washington,USA,2000
  • 6[4]S Soderland.Learning information extraction rules for semi-structured and free text.Machine Learning,1999,34(1-3):233-272
  • 7[5]D Zelenko,C Aone,A Richardella.Kernel methods for relation extraction.Journal of Machine Learning Research,2003,3:1083-1106
  • 8[6]D Freitag.Machine learning for information extraction in informal domains:[Ph D dissertation].Pittsburghers,USA:Carnegie Mellon University,1998
  • 9[7]Sergey Brin.Extracting patterns and relations from the World Wide Web.In:Lecture Notes in Computer Science 1590,Berlin:Srpinger,1998.172-183
  • 10[8]T Hasegawa,S Sekine,R Grishman.Discovering relations among named entities for large corpora.Association for Computational Linguistics(ACL-2004),Barcelona,Spain,2004

共引文献163

同被引文献27

  • 1程梦,洪宇,尉桢楷,姚建民.融合情感词交互注意力机制的属性抽取研究[J].中文信息学报,2021,35(10):90-100. 被引量:3
  • 2马进,杨一帆,陈文亮.基于远程监督的人物属性抽取研究[J].中文信息学报,2020(6):64-72. 被引量:11
  • 3Tang J,Zhang J,Yao L,et al. Arnetminer: Extraction and Mining of Academic Social Networks[C]//Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2008:990-998.
  • 4Yang Q, Zhang C, Niu Z. Two-stage Web Record Extraction[C]//Computer Science & Education (ICCSE), 2013 8th In- ternational Conference on. IEEE,2013:783-788.
  • 5Bing L, Lam W, Wong T L. Wikipedia Entity Expansion and attribute Extraction form the Web using Semi-supervised Learning[C] // Proceedings of the sixth ACM International Conference on Web Search and Data Mining. ACM, 2013: 567- 576.
  • 6Wu B,Cheng X,Wang Y,et al. Simultaneous Product Attribute 'Name and Value Extraction from Web Pages[C]//Pro- ceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technol- ogy. IEEE Computer Society, 2009 : 295-298.
  • 7Wong T L, Lam W, Wong T S. An Unsupervised Framework for Extracting and Normalizing Product Attributes from Multiple Web Sites[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and develop- ment in information retrieval. ACM, 2008 : 35-42.
  • 8Han H,Giles C L, Manavoglu E,et al. Automatic Document Metadata Extraction Uging Support Vector Machines[C]// Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. IEEE,2003:37-48.
  • 9Sekine S, Artiles J. Weps2 Attribute Extraction Task[C]//2nd Web People Search Evaluation Workshop, 18th WWW Conference, 2009.
  • 10de Pablo-Sanchez C, Martinez Fernfindez P. UC3M at WePS2-AE:Acquiring Patterns for People Attribute Extraction from Webpages[C]//2nd Web People Search Evaluation Workshop, 18th WWW Conference, 2009.

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部