基于Web弱指导的本体概念实例及属性的同步提取被引量：4

Weakly-Supervised Extraction of Ontology Concept Instances and Concept Attributes from the Web

下载PDF

导出

摘要该文提出了一种基于Web弱指导的本体概念实例和属性的同步提取方法,利用小规模的种子实例和属性集,该文从Web上自动获取实例和属性共现的上下文模式,并利用种子实例和属性的关联性来评价这些模式。进一步,根据上下文模式提取候选概念实例和属性后,该文提出两种方法来评价提取的候选实例和属性。第一,利用概念实例和属性的关联性来互相评价对方的准确度;第二,利用候选实例或候选属性与种子实例或属性在上下文模式分布上的相似度来评价准确度。在疾病类实验结果表明,人工确认候选实例的准确率在前500个结果达到94%,前1 000个结果的准确率也高达93%。 In this paper, we propose a weakly-supervised method of extracting Ontology concept instances and attributes from the Web. We automatically acquire the co-occurrence patterns of the concept instances and attributes from the Web, and we evaluate these patterns based on the assumption that concept instances are relevant to their attributes. Furthermore, we extract the candidate concept instances and attributes. This paper proposes two ways to evaluate the accuracy of the candidate instances and attributes： the first measure is based on the correlation between concept instances and attributes, and the second one is based on the distribution similarity on the context patterns between the candidate instances （or attributes） and the seed instances （or attributes）. Experiments on disease domain show that the precision of the top 500 and 1 000 results reaches 94% and 93%, respectively.

作者康为穗志方

机构地区北京大学计算语言学研究所北京大学计算语言学教育部重点实验室

出处《中文信息学报》 CSCD 北大核心 2010年第1期54-59,共6页 Journal of Chinese Information Processing

基金国家自然科学基金资助项目(60873156) 国家社科基金资助项目(09BYY032)

关键词计算机应用中文信息处理 WEB 概念实例提取属性提取弱指导上下文模式 computer application Chinese information processing web domain concept instance extraction attributes extraction weakly-supervised contextual pattern

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1M. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora[C]//Proceedings of the 14th International Conference on Computational Linguistics [C]. Nantes, France, 1992:539-545.
2M. Poesio, A. Almuhareb. Identifying Concept Attributes Using a Classifier[C]//Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition. Ann Arbor, 2005:18-27.
3O. Etzioni, M. Cafarella, D. Downey,等. Unsupervised Named-Entity Extraction from the Web: An Ex perimental Study [J]. Artificial Intelligence, June 2005, 165:91-134.
4M.J. Cafarella, D. Downey, S. Soderland, O. Etzioni. KnowItNow: Fast, Scalable Information Extraction from the Web[C]//Proceedings of HLT/EMNLP. Vancouver, October 2005:563-570.
5N. Yoshinaga, K. Torisawa. Open-Domain Attribute Value Acquisition from Semi-Structured Texts[C]// Proceedings of the OntoLex 2007. Busan, South-Korea, November llth, 2007.
6S. Ravi, M. Pasca. Using Structured Text for Large- Scale Attribute Extraction [C]//Proeeedings of the 17th International Conference on Information and Knowledge Management (CIKM-08). Napa Valley, California, USA, October 2008: 1183-1192.
7G. Cui, Q. Lu, W. Li, Y. Chen. Automatic Acquisition of Attributes for Ontology Construction[C]//ICCPOL2009, Springer, 2009:248-259.
8M. Pasca, B.V. Durme. Weakly-Supervised Acquisition of Open Domain Classes and Class Attributes from Web Documents and Query Logs[C]//Proceedings of the ACL-08.. HLT. Columbus, Ohio, USA, June 2008: 19-27.
9F. Keller, M. Lapata, O. Ourioupina. Using The Web to Overcome Data Sparseness[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Philadelphia, July 2002: 230-237.
10P. Turney. Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL[C]//Proceedings of the 12th ECML-2001 ). Freiburg, Germany, September, 2001: 491-502.

同被引文献56

1张雪英,闾国年.基于字面相似度的地理信息分类体系自动转换方法[J].遥感学报,2008,12(3):433-441. 被引量：15
2叶正,林鸿飞,苏绥,刘菁菁.基于支持向量机的人物属性抽取[J].计算机研究与发展,2007,44(z2):271-275. 被引量：11
3王源,吴晓滨,涂从文,刘滨,章元峰,王金娥.后控规范的计算机处理[J].现代图书情报技术,1993(2):4-7. 被引量：30
4耿骞,耿崇.利用词语共现进行Ontology的概念获取[J].现代图书情报技术,2006(2):43-45. 被引量：10
5耿焕同,蔡庆生,于琨,赵鹏.一种基于词共现图的文档主题词自动抽取方法[J].南京大学学报（自然科学版）,2006,42(2):156-162. 被引量：30
6M.Hearst.Automatic Acquisition of Hyponyms from Large Text Corpora[C]//Proceedings of the 14th International Conference on Computational Linguistics.Nantes,France,1992:539-545.
7R.C.Wang,W.W.Cohen.Automatic Set Instance Extraction using the Web[C]//Proceedings of ACLIJCNLP-09,Suntec City,Singapore,August 2009.
8Z. Kozareva,E. Riloff,E. Hovy. Semantic class learning from the web with hyponym pattern linkage graphs[C]//Proceedings of ACL-08:HLT,Columbus,Ohio,June:2008,1048-1056.
9M.Pasca,B. Van Durme,N. Garera.The role of documents vs. queries in extracting class attributes from text[C]//Proceedings of the I6th CIKM (CIKM-07),Lisbon,Portugal,2007:485-494.
10M.Pasca,B.V.Durme.Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs[C]//Proceedings of the ACL-08: HLT. Columbus, Ohio, USA, June 2008.

引证文献4

1李文杰,穗志方.基于并列结构的概念实例和属性的同步提取方法[J].中文信息学报,2012,26(2):82-87. 被引量：4
2郭剑毅,李真,余正涛,张志坤.领域本体概念实例、属性和属性值的抽取及关系预测[J].南京大学学报（自然科学版）,2012,48(4):383-389. 被引量：32
3贾真,杨宇飞,何大可,刘胜久,尹红风.面向中文网络百科的属性和属性值抽取[J].北京大学学报（自然科学版）,2014,50(1):41-47. 被引量：12
4王汀,冀付军,徐天晟.一种面向中文网络百科非结构化信息的知识获取方法[J].图书情报工作,2016,60(13):126-133. 被引量：6

二级引证文献50

1杨宇飞,戴齐,贾真,尹红风.基于弱监督的属性关系抽取方法[J].计算机应用,2014,34(1):64-68. 被引量：10
2贾真,杨宇飞,何大可,刘胜久,尹红风.面向中文网络百科的属性和属性值抽取[J].北京大学学报（自然科学版）,2014,50(1):41-47. 被引量：12
3朱俚治.一种基于文件型病毒的智能检测算法[J].计算机安全,2014(4):14-17. 被引量：1
4朱俚治.一种智能型的病毒检测方法[J].电子世界,2014(12):65-66.
5段玉聪,邵礼旭,曹步清,周长兵,唐朝胜,宋正阳.非确定、不保真、复杂资源环境的正反双向动态平衡搜索服务[J].小型微型计算机系统,2019,40(1):181-185.
6刘丽佳,郭剑毅,周兰江,余正涛,邵发,张金鹏.基于LM算法的领域概念实体属性关系抽取[J].中文信息学报,2014,28(6):216-222. 被引量：4
7余丽,陆锋,张恒才.网络文本蕴涵地理信息抽取:研究进展与展望[J].地球信息科学学报,2015,17(2):127-134. 被引量：41
8朱俚治.一种基于文件型病毒的粒子群检测方法[J].计算机技术与发展,2014,24(12):128-132. 被引量：1
9陈鹏,郭剑毅,余正涛,严馨,张志坤,高盛祥.融合领域知识短语树核函数的中文领域实体关系抽取[J].南京大学学报（自然科学版）,2015,51(1):181-186. 被引量：9
10丁玉飞,王曰芬,刘卫江.面向半结构化文本的知识抽取研究[J].情报理论与实践,2015,38(3):101-106. 被引量：7

1李文杰,穗志方.基于并列结构的概念实例和属性的同步提取方法[J].中文信息学报,2012,26(2):82-87. 被引量：4
2胡文琪.浅谈微软Unity设计模式研究[J].消费电子,2014(14):192-192.
3周云水.基于FPGA的锁相环位同步提取电路设计[J].电子设计应用,2006(4):94-95. 被引量：3
4陈惠珍,田红心,易克初.一种基于二次扩频的帧同步提取的FPGA实现[J].电子设计应用,2004(1):40-42. 被引量：3
5院金彪,周强,郑海英,郭文强,汤伟.基于朴素贝叶斯分类器的纸病离线静态辨识方法研究[J].中国造纸学报,2014,29(1):58-62. 被引量：11
6张骁,宋杰,丁昊.基于FPGA的简易数字信号传输性能分析仪[J].电子测量技术,2012,35(8):78-81. 被引量：8
7贾真,何大可,尹红风,李天瑞.基于无监督学习的部分-整体关系获取[J].西南交通大学学报,2014,49(4):590-596. 被引量：9
8贺捷.人工神经网络应用于空调系统故障诊断的研究[J].企业家天地（下旬刊）,2010(10):113-114.
9富程,刘旭玉.三层客户/服务器模式分布式应用中的技术难点[J].沈阳航空工业学院学报,2001,18(1):46-49.
10王博.基于网络平台下市场信息平台建设研究[J].现代经济信息,2016,0(3):339-339.

中文信息学报

2010年第1期

浏览历史

内容加载中请稍等...

基于Web弱指导的本体概念实例及属性的同步提取被引量：4

参考文献11

同被引文献56

引证文献4

二级引证文献50

相关作者

相关机构

相关主题

浏览历史

基于Web弱指导的本体概念实例及属性的同步提取 被引量：4

参考文献11

同被引文献56

引证文献4

二级引证文献50

相关作者

相关机构

相关主题

浏览历史

基于Web弱指导的本体概念实例及属性的同步提取被引量：4