期刊文献+

基于时间线优化医疗海量小文件数据集成技术 被引量:1

Optimize Medical Massive Small Files Data Integration Technology Based on the Time Baseline
下载PDF
导出
摘要 随着越来越多的医院开展数字化建设以及区域医疗应用范围的扩大,大量非结构化、半结构化医疗数据爆发式的增长,传统的技术架构在处理海量数据方面显得越来越乏力。深圳市区域卫生信息化数据交换平台,覆盖了全市60家公立医院、600多家社区卫生机构。平台接入近50个异构系统,现有1700多万份健康档案、30亿条以上诊疗数据,平均每天产生500万以上的小文件。针对深圳市卫生区域信息化建设,海量小文件交换处理效率低下的问题,利用Hadoop平台,提出了采用时间基线归档文件技术和序列文件技术解决小文件存储、检索效率问题的解决方案,经验证实该技术可满足实际业务应用中对数据交换的需要。详细描述了该技术的实现细节,包括根据业务数据规模划定时间基线,根据业务需求定制数据类型、数据结构,将小文件合并分块存储,建立小文件到大文件的映射以及相关数据交换处理流程等,并基于真实数据对该技术进行了评测比较,结果表明上述技术与常规技术相比明显提升了批量处理小文件的效率。 As more and more hospitals being digitized and the scope of regional medical applications being expanded, large amounts of unstructured or semi-structured medical data have seen explosive growth, and the traditional technical architecture for handling massive amounts of data has become increasingly weak. At present, the Shenzhen regional health information data exchange platform covers 60 public hospitals and more than 600 community health agencies in Shenzhen. The platform which is accessing nearly 50 heterogeneous systems presently having more than 16 million copies of existing health records and over 3 billion clinic data, generates an average of more than 5 million small files every day. According to the Shenzhen regional health informatization construction and aiming to solve massive small files exchange process inefficiencies, this paper proposed using the archive technologies and techniques based on the time baseline to solve the problems of small files' storage and retrieval based on the Hadoop platform. The technology can meet the needs of practical business application for data exchange. This paper described the implementation details of the technology, including the delineation of the time scale based on business data at baseline, customised data types and data structures according to the business needs, small files' merge and block storage, the establishment of mapping from small files to large files and related data exchange processing, etc. The technical evaluations based on real data were compared, and the results showed that these techniques significantly improved the processing efficiency of massive small files compared with the conventional techniques.
出处 《中国数字医学》 2014年第8期89-92,共4页 China Digital Medicine
基金 基于区域卫生海量医疗数据的实时交互和高效分析处理技术研究(编号:CXZZ20120828161054317)~~
关键词 医疗数据 时间基线 批量小文件 数据集成技术 medical data time baseline massive small files data integration technology
  • 相关文献

参考文献5

二级参考文献15

  • 1陈戏墨,谢铉洋,李志铭,李曦,李扬彬,龚育昌.基于数据挖掘的PACS智能辅助诊断模型研究[J].计算机工程与设计,2005,26(5):1182-1184. 被引量:3
  • 2Han Jiawei,Kamber M.数据挖掘概念与技术[M].2版.范明,孟小峰,译.北京:机械工业出版社,2007.
  • 3De Falco I,Della Cioppa A,Iazzetta A,et al.Optimizing Neural Networks for Time Series Prediction.Proc.of the 3rd International Conference on Soft Computing,1998.
  • 4Simon G,Lendasse A,Cottrell M,et al.Double SOM for long-term time series prediction.Workshop on Self-Organizing Maps(WSOM),2003:35-40.
  • 5Simon G,Lendasse A,Cottrell M,et al.Time series forecasting:Obtaining long term trends with serf-organizing maps.Pattern Recognition Letters,2005:1795-1808.
  • 6Espinoza M,Suykens AK,Moor BD.Short Term Chaotic Time Series Prediction using Symmetric LS-SVM Regression.Proc.of the International Symposium on Nonlinear Theory and Applicatiom(NOLTA),2005:606-609.
  • 7Qian B,Rasheed K.Stock Market Prediction with Multiple Classifiers.Appl Intell,2007:25-33.
  • 8http://www.cs.ucr.edu/-eamonn/time_series_data/.
  • 9Jamie MacLennan,ZhaoHui Tang,Bogdan Crivat.Data Mining with SQL Server 2008[M].董艳,程文俊,译北京:机械工业出版社,2010.
  • 10戈欣,吴晓芬,许建荣.数据挖掘技术在放射科医疗管理中的潜在作用[J].中国数字医学,2009,4(1):60-62. 被引量:3

共引文献16

同被引文献4

  • 1孟小峰,王宇,罗道峰,等.OrientX:一个Native XML数据库系统的实现策略[C].重庆:第20届全国数据库学术会议论文集计算机科学.2003.
  • 2Jagadish H V,Al-Khalifa S,Chapman A,et al.TIMBER:A Native XML Database[C].Vldb Journal—the International Journal on Very Large Data Bases.2002:2002.
  • 3The Apache Software Foundation.Apache Hadoop[DB/OL].http://hadoop.apache.org,2015(5).
  • 4医苑.马云投巨资欲实现医疗梦PE分享大健康产业盛宴[J].现代养生,2015,0(6):8-10. 被引量:1

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部