期刊文献+

基于压缩的海量不完整数据近似查询方法 被引量:7

A Compression-Based Approximate Query Method for Massive Incomplete Data
下载PDF
导出
摘要 随着数据的爆炸式增加,不完整数据普遍存在,传统的数据修复方法对于海量数据处理代价过高,且不能彻底修复,在这些不完整的海量数据上进行满足给定需求的近似查询引起了学术界的关注.因此,提出一种基于压缩的海量不完整数据近似查询方法,该方法对属性值缺失字段进行标记,根据频繁查询条件对标记后的数据进行压缩,并建立对应索引;根据属性划分对索引文件再次压缩以节省存储空间,采用编码字典对索引压缩文件进行选择和投影操作,最终获得不完整数据的近似查询结果.实验表明,该方法能够快速定位不完整数据的压缩位置,提高了查询效率,节省了存储空间,并且保证了查询结果的完整性. With the explosive increase of data,incomplete data are widespread.Traditional methods of data repair will cause high processing cost for mass data,and cannot be fully restored.Thus the approximate querying on these huge amounts of incomplete data for meeting the given requirements attracted greater attention from academics.Therefore,this paper proposes an approximate query method for massive incomplete data based on compression.Tagging the missing attribute value field and finding out the frequent query conditions,this method compresses these data based on the statistical frequent query conditions,and establishes the corresponding indexes.According to the attribute partition rules,index files are compressed again in order to further save storage space.In the stage of query,this method uses encoding dictionary to make selection and projection operations on the index compression files for getting approximate query results of incomplete data in the end.Experimental results show that this method can quickly locate the position of incomplete data compression,improve the query efficiency,save the storage space,and ensure the integrity of the query results.
出处 《计算机研究与发展》 EI CSCD 北大核心 2016年第3期571-581,共11页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61472169 61472072) 国家科技支撑计划基金项目(2012BAF13B08) 国家"九七三"重点基础研究发展计划前期研究专项基金项目(2014CB360509) 辽宁省科学事业公益研究基金项目(2015003003) 辽宁省工业攻关及成果产业化计划项目(2012216007)~~
关键词 不完整数据 近似查询 数据压缩 索引 编码字典 incomplete data approximate query data compression index encoding dictionary
  • 相关文献

参考文献6

二级参考文献189

  • 1杨涛,骆嘉伟,王艳,吴君浩.基于马氏距离的缺失值填充算法[J].计算机应用,2005,25(12):2868-2871. 被引量:24
  • 2彭喜元,俞洋.基于变游程编码的测试数据压缩算法[J].电子学报,2007,35(2):197-201. 被引量:33
  • 3胡瑜,韩银和,董婕,等.无芯2E可测性设计[R].北京:中国科学院计算技术研究所,2005.
  • 4Touha N A. Survey of test vector compression techniques [J]. IEEE Design & Test Of Computers, 2006, 23(4) : 294- 303.
  • 5Koenemann B. LFSR-Coded test patterns for scan designs [C] //Proc of European Test Conf (ETCgl). Munich, Germany: VDE Verlag, 1991: 237-242.
  • 6Baryraktaroglu I, Orailoglu A. Concurrent application of compaction and compression for test time and data volume reduction in scan designs [J]. IEEE Trans on Computers, 2003, 52(11): 1480-1489.
  • 7Hamzaoglu I, Patel J H. Reducing test application time for full scan embedded cores [C] //Proc of Int Syrup on Fault Tolerant Computing. Los Alamitos, CA: IEEE Computer Society, 1999:260-267.
  • 8Chandra A, Chakrabarty K, Test data compression and test resource partitioning for system-on-a-chip using frequency- directed run-length (FDR) codes [J]. IEEE Trans on Computers, 2003, 52(8): 1076-1088.
  • 9EI-Maleh A H. using extended Computer Digit Test data cgmpression for system-on-a-chip frequency-directed run-length code [J]. lET Technology, 2008, 2(3):155-163.
  • 10Tehranipour M, Nourani M, Chakrabarty K. Nine-coded compression technique with application to reduced pin-count testing and flexible on-chip decompression [C] //Proc of Design Automation Test in Europe (DATE'04). Los Alamitos, CA: IEEE Computer Society, 2004:1284-1289.

共引文献355

同被引文献55

引证文献7

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部