基于语义的互联网药品信息抽取算法被引量：7

Web Medicine Information Extraction Algorithm Based on Semantics

下载PDF

导出

摘要针对现有互联网信息抽取技术存在准确率不高、覆盖率低、人工干预多等诸多缺陷,提出了一种新的互联网药品信息抽取算法,通过引入语义技术构建三维语义词典,屏蔽不同药品信息网页在内容和结构上的异构性,同时利用所需抽取的目标药品属性信息具有一定聚集度的特征,基于信息熵的理论设计出对目标信息智能定位和抽取的方法。实验证明该算法既能降低人工干预,又具备较高的准确率和召回率。应用该算法能实时自动全面准确地获取互联网药品信息,为政府药监部门提供丰富的监管依据,对规范医药电子商务市场,保证人们的用药安全具有重要的现实意义。 This article addresses defects of current Web information extraction technology such as low accuracy, low coverage, and manual intervention required, proposes a novel extraction algorithm of web medicine information. The algorithm sets up a three-dimentional semantic dictionary by introduction of the semantics technology, masks the isomerisms of the web page contents and structures, and at the same time, taking advantage of the fact that the attributes of the target medicine tend to have a character of aggregation, designs a way of intellectually locating and extracting the target information based on the theory of information entropy. Through related experiments proves that the algorithm is able to reduce the requirement of manual intervention of the information extraction, and has a high accuracy and recall rate. The application of this algorithm can automatically, comprehensively, and accurately obtain Internet medicine information in real time, offers abundant basis of supervision for the medicine supervision department, and therefore has a significant practical meaning of normalizing medical e-business and ensuring secure medication.

作者沈元一郑骁庆顾轶灵

机构地区复旦大学软件学院

出处《计算机系统应用》 2011年第1期41-47,共7页 Computer Systems & Applications

基金国家科技支撑项目(2006BAH02A05-06) 国家自然科学基金(60903078 60973025)

关键词 WEB信息抽取语义词典 DOM 信息熵 XPATH 医药电子商务 Web information extraction semantic dictionary DOM information entropy XPath medical E-business

分类号 TP393.09 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1吴晓彦.基于结构语义熵的互联网商品信息抽取技术研究[D]复旦大学,复旦大学2009.
2Selkow S.The Tree-to-Tree Editing Problem. Journal of the Information Processing Letters . 1977
3Ion Muslea,Steve Minton,Craig Knoblock.A hierarchical approach to wrapper induction. Proceedings of the Third International Conference on Autonomous Agents . 1999
4Valter Crescenzi,Giansalvatore Mecca,Paolo Merialdo.RoadRunner:Towards Automatic Data Extraction from Large Web Sites. Proceedings of the 26th International Conference on Very Large Database Systems . 2001
5D Freitag.Information extraction from HTML: application of a general machine learning approach. Proceedings of the Fifteenth National Conference on Artificial Intelligence . 1998
6Soderland,Stephen.Learning information extraction rules for semi-structured and free text. Machine Learning . 1999
7Chia-Hui Chang,Shao-Chen Lui.IEPAD: information extraction based on pattern discover. Proceedings of the 10th International Conference on the World Wide Web . 2001
8N Kushmerick,DS Weld,RB Doorenbos.Wrapper Induction for Information Extraction. Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence(IJCAI297) . 1997
9Arocena G,Mendelzon A.WebOQL: Restructuring Documents, Databases and Webs. Proceedings of the 14th IEEE International Conference on Data Engineering (ICDE) . 1998
10SAHUGUET A,AZAVANT F.Building intelligent web applications using lightweight wrappers. Data Mining and Knowledge Discovery . 2001

同被引文献65

1吴平博,陈群秀,马亮.基于时空分析的线索性事件的抽取与集成系统研究[J].中文信息学报,2006,20(1):21-28. 被引量：21
2陈再良,徐德智,陈学工,沈海澜.基于链式结构XML文档的生成方法[J].计算机工程,2006,32(20):59-61. 被引量：5
3中华人民共和国突发事件应对法[J].中华人民共和国国务院公报,2007(30):16-23. 被引量：9
4Guidelines for Robot Writers[EB/OL].http://info.webcrawler. com/mak/projects/robots/robots.html(Accessed Jul. 25,2006).
5丁宝琼.网络文本信息采集分析关键技术研究与实现[D].解放军信息工程大学,2010.
6于毅,毛明.一种新的脆弱性漏洞扫描器[J].信息安全与通信保密,2007,29(12):89-90. 被引量：1
7Zhiwei F., 2002, Evolution and Present Situation of Corpus Research In China, Journal of Chinese Lan- guage and Computing, 12(1) .43-62.
8李素芳.《“知之于困学,好之于交流,乐之于应用”—专访梁茂成教授,李文中教授和许家金博士》,《中国英语教育》2010年第1期.
9Zhan Weidong, Chang Baobao, Duan Huiming, Zhang Huarui. 2006, "Recent Developments in Chinese Corpus Re- search", The 13'h NIJL International Symposium, Language Corpora. Their Compliation and Application. Tokyo, Ja- pan. 3.6-7. http .//ccl. pku. edu. cn/doubtfire/papers/2006_Corpora_NIJL Workshop. pdf, 2014 年7 月 11日.
10刘成飞.《汉语中介语语料库中汉字偏误处理的比较研究》,http.//www.doe88.com/p-0116174114179.html,2015年06月11日.

引证文献7

1张钊,方勇,陈兴刚.快速渗透测试系统的设计与实现[J].信息安全与通信保密,2013,11(5):95-97.
2许建豪.基于语义的搜索算法研究[J].南宁职业技术学院学报,2013,18(5):93-96.
3许建豪.打折商品搜索引擎的设计与实现[J].南宁职业技术学院学报,2014,19(2):90-93.
4郑通涛,曾小燕.大数据时代的汉语中介语语料库建设[J].厦门大学学报（哲学社会科学版）,2016,66(2):53-63. 被引量：15
5余晨,毛喆,高嵩.基于规则的海事自由文本信息抽取方法研究[J].交通信息与安全,2017,35(2):40-47. 被引量：15
6邱奇志,周三三,刘长发,陈晖.基于文体和词表的突发事件信息抽取研究[J].中文信息学报,2018,32(9):56-65. 被引量：13
7陈勇.对地观测用户需求智能融合处理技术[J].无线电工程,2019,49(7):551-556. 被引量：4

二级引证文献45

1周晗,吴定敏,刘轩.韩汉双语新闻语料库建设研究[J].译苑新谭,2020,1(1):135-139.
2范华,翁利国,周艳,姜川,孙涛.基于Bi-LSTM和TFIDF的工单事件提取[J].电脑知识与技术,2020,0(4):291-293.
3郑通涛,曾小燕.大数据时代的汉语国别化教材研发——兼论教材实时修订功能[J].海外华文教育,2016(3):291-302. 被引量：12
4蔡武,郑通涛.我国汉语中介语语料库研究现状与热点透视——基于CiteSpace的可视化分析[J].华文教学与研究,2017(3):79-87. 被引量：9
5郑通涛.复杂动态系统理论与语言交际能力发展[J].海外华文教育,2017(10):1301-1310. 被引量：4
6徐中云.中国学习者韩语中介语语料库建设方案[J].昆明学院学报,2018,40(1):127-132. 被引量：3
7丁晟春,王莉,刘梦露.基于规则的动物卫生事件舆情信息抽取研究[J].计算机应用与软件,2018,35(9):56-62. 被引量：6
8邱奇志,周三三,刘长发,陈晖.基于文体和词表的突发事件信息抽取研究[J].中文信息学报,2018,32(9):56-65. 被引量：13
9呼媛玲,寇媛媛.基于音素的英文发音自动评测系统设计[J].自动化与仪器仪表,2018,0(11):160-163.
10何梦娇,吴戈,梁华,唐倩.基于多源文本挖掘的城市交通舆情分析——以苏州为例[J].交通信息与安全,2018,36(3):105-111. 被引量：12

1孟兆炜,宁洪.基于J2EE的医药电子商务系统的设计与实现[J].计算机工程与科学,2003,25(6):101-103. 被引量：4
2医药电子商务2012年发展仍未明朗[J].流程工业,2012(3):6-6.
3曲朝阳,沈晶,武海燕.基于J2EE构建医药电子商务平台[J].计算机工程与设计,2005,26(8):2236-2239. 被引量：4
4顾轶灵.基于多维语义的互联网药品信息提取方法[J].计算机系统应用,2011,20(11):50-54. 被引量：1
5雷晖.族谱网个人信息网页的设计与实现[J].信息与电脑（理论版）,2010(8):74-74.
6吕文龙.医药电子商务的蓝海[J].互联网周刊,2010(22):74-74. 被引量：2
7骆子祺.医药电子商务:新兴业态前奏曲[J].创新科技,2011(7):46-47. 被引量：6
8王珍.浅谈医药电子商务平台架构与人才培养[J].福建电脑,2004,20(12):31-32. 被引量：6
9李佳佳.基于医药电子商务的医药电子政务系统的建设[J].电子世界,2014(16):355-356.
10何泓伟,曲朝阳.基于企业服务总线的医药电子商务平台[J].计算机应用,2006,26(S2):333-335. 被引量：7

计算机系统应用

2011年第1期

浏览历史

内容加载中请稍等...

基于语义的互联网药品信息抽取算法被引量：7

参考文献10

同被引文献65

引证文献7

二级引证文献45

相关作者

相关机构

相关主题

浏览历史

基于语义的互联网药品信息抽取算法 被引量：7

参考文献10

同被引文献65

引证文献7

二级引证文献45

相关作者

相关机构

相关主题

浏览历史

基于语义的互联网药品信息抽取算法被引量：7