多源环境下中药实体统一视图构建策略被引量：2

Construction Strategy for Unified View of TCM Entities in Multi-source Environment

下载PDF

导出

摘要目的针对大数据环境下跨数据源查询面临的中药实体呈现多视图,且中药实体在各数据源中表现出属性不完整、多模态、差异性等问题,提出面向多源数据的中药实体统一视图的构建策略。方法基于实体属性间的相互关系,构建实体多视图融合整体架构,并对实体和属性等关键元素进行抽象化表示;以用户需求为约束提出基于词向量的相关度计算方法,采用Skip-gram模型训练出表征实体属性的词向量;提出基于欧氏距离和Jaccard系数的相关度算法,并以此为依据进行实体融合。结果共训练完成属性词向量6116个,其中有效词向量230个,以400对不同源中药实体作为测试集,分别采用AFCDS、FF和WVCC方法进行实体融合实验,其融合准确率依次为92.20%、88.47%和94.24%。结论基于词向量的实体融合策略有效可行,能充分利用属性间的有效信息,自适应性强,实体融合准确率较高,可为解决多源实体融合问题提供新的研究思路。 Objective To propose a construction strategy of unified view of TCM entities for multi-source data targeting the fact that TCM entities are faced with multi-data query with multiple views in the big data environment, and that TCM entities exhibit incomplete attributes, multi-modality, and differences in each data source. Methods Based on the interrelationship between entity attributes, an entity multi-view fusion overall architecture was constructed, and abstract representations of key elements such as entities and attributes were carried out. A word vector-based correlation calculation method was proposed based on user requirements. The Skip-gram model was used to train word vectors that characterize entity attributes. A correlation algorithm based on Euclidean distance and Jaccard coefficient was proposed, and the entity fusion was based on this. Results The experiment trained a total of 6116 attribute word vectors, including 230 effective word vectors. 400 pairs of heterologous TCM entities were used as test sets, and the entity fusion experiments were carried out by AFCDS, FF and WVCC respectively. The fusion accuracy was 92.20%, 88.47% and 94.24%. Conclusion The entity fusion strategy based on word vector is effective and feasible, and can make full use of the effective information between attributes. It has strong adaptability and high accuracy of entity fusion, and can provide new ideas for solving the problem of multi-source entity fusion.

作者梁杨丁长松蔡雄 LIANG Yang;DING Changsong;CAI Xiong(School of Information Science and Engineering,Hunan University of Chinese Medicine,Changsha 410208,China;TCM Big Data Analysis Laboratory of Hunan,Changsha 410208,China;School of Computer Science and Engineering,Central South University,Changsha 410000,China;Institute of Innovation and Applied Research in Chinese Medicine,Hunan University of Chinese Medicine,Changsha 410208,China)

机构地区湖南中医药大学信息科学与工程学院湖南省中医药大数据分析实验室中南大学计算机学院湖南中医药大学科技创新中心

出处《中国中医药信息杂志》 CAS CSCD 2020年第9期108-114,共7页 Chinese Journal of Information on Traditional Chinese Medicine

基金国家重点研发计划(2017YFC1703306) 湖南省教育厅科学研究项目(19C1391) 湖南省重点研发计划(2017SK2111) 湖南省教育厅重点项目(18A227) 湖南省自然科学基金(2018JJ2301) 湖南省中医药科研计划重点课题(2020002) 湖南中医药大学电子科学与技术学科开放基金(2018DK04)。

关键词大数据多源数据实体融合词向量相关度 big data multi-source data entity fusion word vector correlation

分类号 R28 [医药卫生—中药学] R2-05 [医药卫生—中医学]

引文网络
相关文献

参考文献6

1张露,王华彬,陶亮,周健.基于分类距离分数的自适应多模态生物特征融合[J].计算机研究与发展,2018,55(1):151-162. 被引量：7
2张瑶,李蜀瑜,汤玥.大数据下的多源异构知识融合算法研究[J].计算机技术与发展,2017,27(9):12-16. 被引量：14
3于静,刘燕兵,张宇,刘梦雅,谭建龙,郭莉.大规模图数据匹配技术综述[J].计算机研究与发展,2015,52(2):391-409. 被引量：36
4孟小峰,杜治娟.大数据融合研究:问题与挑战[J].计算机研究与发展,2016,53(2):231-246. 被引量：135
5张群,王红军,王伦文.一种结合上下文语义的短文本聚类算法[J].计算机科学,2016,43(S2):443-446. 被引量：11
6彭京,唐常杰,元昌安,李川,胡建军.一种基于概念相似度的数据分类方法[J].软件学报,2007,18(2):311-322. 被引量：12

二级参考文献107

1韩立岩,周芳.基于D-S证据理论的知识融合及其应用[J].北京航空航天大学学报,2006,32(1):65-68. 被引量：41
2Indyk P,Motwani R.Approximate nearest neighbors:Towards removing the curse of dimensionality.In:Jeffrey V,ed.Proc.of the 30th Annual ACM Symp.on Theory of Computing.New York:ACM Press,1998.604-613.
3Kleinberg J.Two algorithms for nearest-neighbor search in high dimensions.In:Leighton FT,Borodin A,eds.Proc.of the 27th Annual ACM Symp.on Theory of Computing.New York:ACM Press,1997.599-608.
4Kushilevitz E,Ostrovsky R,Rabani Y.Efficient search for approximate nearest neighbor in high dimensional spaces.SIAM Journal on Computing,2000,30(2):451-474.
5Aggarwal C.Hierarchical subspace sampling:A unified framework for high dimensional data reduction,selectivity estimation,and nearest neighbor search.In:Michael J,ed.Proc.of the ACM SIGMOD Conf.New York:ACM Press,2002.452-463.
6Berchtold S,Keim D,Kriegel HP.The X-tree:An index structure for high dimensional data.In:Vijayaraman TM,Buchmann AP,Mohan C,Sarda NL,eds.Proc.of the 22nd Int'l Conf.on Very Large Databases.San Francisco:ACM Press,1996,28-39.
7Beyer K,Goldstein J,Ramakrishnan R,Shaft U.When is nearest neighbors meaningful? In:Beeri C,Buneman P,eds.Proc.of the 7th Int'l Conf.on Database Theory.Jerusalem:Springer-Verlag,1999.217-235.
8Gionis A,Indyk P,Motwani R.Similarity search in high dimensions via hashing.In:Atkinson MP,Orlowska ME,Valduriez P,Zdonik SB,Brodie ML,eds.Proc.of the 25th Int'l Conf.on Very Large Databases.San Francisco:ACM Press,1999.518-529.
9Goldstein J,Ramakrishnan R.Contrast plots and P-sphere trees:Space vs.time in nearest neighbour searches.In:Abbadi AE,Brodie ML,Chakravarthy S,Dayal U,Kamel N,Schlageter G,Whang KY,eds.Proc.of the 26th Int'l Conf.on Very Large Databases.San Francisco:ACM Press,2000.429-440.
10White D,Jain R.Similarity indexing with the SS-tree.In:Su SYW,ed.Proc.of the 12th Int'l Conf.on Data Engineering.New Orleans:IEEE Computer Society,1996.516-523.

共引文献208

1邱均平,余波,杨思洛.大数据背景下一门交叉学科的兴起——论数据计量学的构建[J].中国图书馆学报,2021,47(5):48-58. 被引量：20
2李爰媛,孟相如,张立.基于SVM的故障诊断在网管平台中的应用[J].计算机应用,2007,27(10):2414-2416. 被引量：2
3姜婷,肖刚,高飞,陆佳炜.多维数据的在线自组织方法[J].计算机系统应用,2009,18(3):66-69.
4王娜,金喜子,姜文哲,姚明海.一种基于数据连续性的聚类方法[J].计算机应用与软件,2009,26(11):173-176.
5仲兆满,刘宗田,周文,付剑锋.基于相似度的粗糙集近似算子快速求解[J].小型微型计算机系统,2010,31(1):151-154. 被引量：3
6李秀娟,田川,冯欣.数据挖掘分类技术研究与分析[J].现代电子技术,2010,33(20):86-88. 被引量：11
7蓝萌,徐汀荣,黄斐.使用混合邻域搜索算法求解多目标柔性JSP问题[J].计算机工程与设计,2011,32(1):293-296. 被引量：6
8习胜丰.一种基于类群的服务可信度评价方法[J].计算机工程与应用,2011,47(27):38-40.
9韩楠,贾波,沈涛,雍小嘉.利用属性距离矩阵研究《局方》方剂功效倾向性[J].中国实验方剂学杂志,2011,17(22):198-201.
10龙军,刘昕民,袁鑫攀,张祖平,桂卫华.一种基于信任推理与演化的Web服务组合策略[J].计算机学报,2012,35(2):298-314. 被引量：11

同被引文献7

1颜素容,王耘,郑虎占,乔延江.基于方剂药性特征的中药配伍方法初探[J].北京中医药大学学报,2011,34(9):585-587. 被引量：10
2王瑞祥.基于欧几里得空间的中医语义形式化表达初探[J].江西中医药大学学报,2019,31(1):16-17. 被引量：2
3杨岩,肖佳妹,周晋,贺福元,曾慧杰,杨岩涛.支持向量机法及其在中药研究中的应用[J].中草药,2020,51(8):2258-2266. 被引量：10
4李宝辉,李冬晖,薛党党,赵光.中药益母草的化学指纹图谱及多组分定量分析方法的建立及应用研究[J].天津中医药大学学报,2020,39(3):336-340. 被引量：8
5刘瑞新,郝小佳,张慧杰,张璐,桂新景,林兆洲,罗崇念,田亮玉,王艳丽,冯文豪,姚静,李学林.基于电子眼技术的中药川贝母真伪及规格的快速辨识研究[J].中国中药杂志,2020,45(14):3441-3451. 被引量：44
6邓乐,丁长松,黄辛迪,梁力伟,梁昊.基于多层前馈神经网络的中药药性量化研究[J].中草药,2020,51(16):4277-4283. 被引量：12
7杨文国,乔兆颖,戴沭宁,张沁晶,朱学敏,姚俊宏,陈军.基于支持向量机模型的中药挥发油化学成分类型与皮肤细胞毒性关联性研究[J].中国中医药信息杂志,2020,27(9):121-125. 被引量：2

引证文献2

1邵丰,杨孝兵,秦理,刘丽敏.中药向量空间理论在风湿免疫性疾病方剂中的应用[J].中医药管理杂志,2021,29(11):246-248.
2吴凡,段佩鑫.中药向量空间和矩阵理论在肾系疾病方剂中的应用[J].中医药管理杂志,2021,29(23):328-329.

1单嵩岩,吴振新.面向科研合作预测领域的作者相关度算法分析[J].图书馆理论与实践,2019,0(11):58-62.
2陈俊月,郝文宁,张紫萱,唐新德,康睿智,莫斐.基于改进句子相似度算法的释义识别研究[J].计算机工程,2020,46(9):76-82. 被引量：11
3张建华.“高级”一词的“高级”用法[J].语文学习,2020(9):74-75.
4王拂林,刘丹,昌茜.基于自注意力机制的方面情感分类[J].计算机应用研究,2020,37(11):3227-3231. 被引量：5
5侯丽媛,董艳辉,聂园军,邓舒,肖蓉,张春芬,李亚莉,赵菁,王育川,曹秋芬.苹果属种质TP-M13-SSR亲缘关系及遗传多样性分析[J].山西农业科学,2020,48(9):1371-1378. 被引量：4
6刘立红.城市文化建构视域下实体书店“书店+”现象解析[J].采写编,2020(5):171-172. 被引量：1

中国中医药信息杂志

2020年第9期

浏览历史

内容加载中请稍等...

多源环境下中药实体统一视图构建策略被引量：2

参考文献6

二级参考文献107

共引文献208

同被引文献7

引证文献2

相关作者

相关机构

相关主题

浏览历史

多源环境下中药实体统一视图构建策略 被引量：2

参考文献6

二级参考文献107

共引文献208

同被引文献7

引证文献2

相关作者

相关机构

相关主题

浏览历史

多源环境下中药实体统一视图构建策略被引量：2