期刊文献+

基于Aho_Corasick算法的中医药数据清洗方法 被引量:1

Method of cleaning TCM data with Aho_Corasick algorithm
下载PDF
导出
摘要 中医药经过数千年的发展,积累了大量的各种类型的数据。很多研究人员利用大数据技术,对方剂或药品数据预处理后,使用相关算法挖掘和探寻疾病诊疗规律,为新药研制、疾病诊治、医学科研提供科学的依据。但随着方剂规模的增大,人工预处理的方式效率低且易出错。因此,文章提出一种基于Aho_Corasick算法的清洗方法,将药物作为模式串,对方剂或药品的药物信息进行识别,规范统一方剂中的药物名称,为后续挖掘工作提供高质量的数据。实验结果显示,准确率达到95%以上,数据清洗效果明显。 Traditional Chinese Medicine(TCM)is accumulated a large number of various types of data after being thousands of years.Many researchers use big data technology to mine and explore the rules of disease diagnosis after preprocessing the data of prescription and/or drug,so as to provide scientific basis for new drug development,disease diagnosis and treatment,and medical research.However,with the increase of prescriptions'scale,it is inefficient and error-prone in the manual pre-process method.Therefore,a cleaning method based on Aho_Corasick algorithm is proposed in this paper.The drugs are used as pattern strings,and then their names are identified from prescriptions or TCM to make them a standard and uniform,so as to provide high-quality TCM data prepared for subsequent mining work.The experimental results show that the accuracy are more than 95%,and the data cleaning effect is obvious.
作者 郭春丽 纪树峰 林源 黄海松 王俐良 Guo Chunli;Ji Shufeng;Lin Yuan;Huang Haisong;Wang Liliang(Institute of Information Science and Technology,Guangdong Finance&Trade Vocational College,Guangzhou,Guangdong 510445,China;Library,Guangdong Finance&Trade Vocational College)
出处 《计算机时代》 2022年第3期77-80,共4页 Computer Era
基金 2020年度广东省普通高校青年创新人才项目(2020KQNCX185)。
关键词 中医药 大数据技术 Aho_Corasick算法 预处理 数据清洗 TCM big data technology Aho_Corasick algorithm pre-process data cleaning
  • 相关文献

参考文献6

二级参考文献28

共引文献127

同被引文献6

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部