摘要
中医药经过数千年的发展,积累了大量的各种类型的数据。很多研究人员利用大数据技术,对方剂或药品数据预处理后,使用相关算法挖掘和探寻疾病诊疗规律,为新药研制、疾病诊治、医学科研提供科学的依据。但随着方剂规模的增大,人工预处理的方式效率低且易出错。因此,文章提出一种基于Aho_Corasick算法的清洗方法,将药物作为模式串,对方剂或药品的药物信息进行识别,规范统一方剂中的药物名称,为后续挖掘工作提供高质量的数据。实验结果显示,准确率达到95%以上,数据清洗效果明显。
Traditional Chinese Medicine(TCM)is accumulated a large number of various types of data after being thousands of years.Many researchers use big data technology to mine and explore the rules of disease diagnosis after preprocessing the data of prescription and/or drug,so as to provide scientific basis for new drug development,disease diagnosis and treatment,and medical research.However,with the increase of prescriptions'scale,it is inefficient and error-prone in the manual pre-process method.Therefore,a cleaning method based on Aho_Corasick algorithm is proposed in this paper.The drugs are used as pattern strings,and then their names are identified from prescriptions or TCM to make them a standard and uniform,so as to provide high-quality TCM data prepared for subsequent mining work.The experimental results show that the accuracy are more than 95%,and the data cleaning effect is obvious.
作者
郭春丽
纪树峰
林源
黄海松
王俐良
Guo Chunli;Ji Shufeng;Lin Yuan;Huang Haisong;Wang Liliang(Institute of Information Science and Technology,Guangdong Finance&Trade Vocational College,Guangzhou,Guangdong 510445,China;Library,Guangdong Finance&Trade Vocational College)
出处
《计算机时代》
2022年第3期77-80,共4页
Computer Era
基金
2020年度广东省普通高校青年创新人才项目(2020KQNCX185)。