期刊文献+

一种劣质数据上统计量的获取方法

A Method for Acquire Statistics on Dirty Database
下载PDF
导出
摘要 随着数据时代的发展,劣质数据越来越普遍存在于数据库中,严重影响了数据的质量,这给数据管理带来了新的挑战.目前,已经有很多管理劣质数据的数据模型面世,实体数据模型就是其中一种,模型以每条元组表示一个现实世界中的实体,允许劣质数据的存在,鉴于该模型的特点,在进行查询操作时,要根据查询语句和数据的相似程度判断数据是否满足用户需求,因此在实体数据模型相似判断的查询操作中,统计数据库中某条记录大约可以和多少记录相似即能为查询优化工作做一定的贡献,本文就如何获取这个统计量展开研究并提出一种有效的聚类算法解决这个问题。 Along with the development of the era of data, dirty data is more and more widely existing in the database, and produces the serious influence in the quality of the data, which brings new challenges to data management. Nowdays, there emerges a lot of data model for dirty data management, one of them is entity - based relational data model in which one tup- le represents an entity in real - world, and allows for the possibility of bad data. In view of the characteristics of the model in the query operation, according to the similarity of the query and data to judge whether the data is what user' s require- ments, so in the query operations of the entity data model similarity judgment, counting how many records similar with one record in the database could do certain contribution for query optimization work. Based on the aboved, this article provides the research on how to obtain the statistic and put forward a kind of effective clustering algorithm to solve this problem.
作者 张岩 唐兴
出处 《智能计算机与应用》 2014年第5期26-28,31,共4页 Intelligent Computer and Applications
基金 国家自然科学基金(61133002) 国家重点基础研究发展计划(973)(2012CB316202)
关键词 劣质数据 聚类 统计量 查询优化 Dirty Data Clustering Statistics Query Optimization
  • 相关文献

参考文献4

二级参考文献68

  • 1Eckerson W.Data Quality and the Bottom Line:Achieving Business Success through a Commitment to High Quality Data,Vol.1.Seattle:The Data Warehousing Institute,2002.1-36.
  • 2Shilakes CC,Tylman J.Enterprise information portals.RC#60232206,United States:Merrill Lynch,1998.1-64.
  • 3Fuxman A,Miller R.First-Order query rewriting for inconsistent databases.In:Eiter T,Libkin L,eds.Proc.of the 10th Int’l Conf.on Database Theory.Edinburgh:Springer-Verlag,2005.337-351.[doi:10.1016/j.jcss.2006.10.013].
  • 4Fuxman A,Fazli E,Miller RJ.ConQuer,efficient management of inconsistent databases.In:-zcan F,ed.Proc.of the ACMSIGMOD Int’l Conf.on Management of Data.Baltimore:ACM Press,2005.155-166.[doi:10.1145/1066157.1066176].
  • 5Andritsos P,Fuxman A,Miller RJ.Clean answers over dirty databases:A probabilistic approach.In:Liu L,Reuter A,Whang KY,Zhang J,eds.Proc.of the 22nd Int’l Conf.on Data Engineering.Atlanta:IEEE Computer Society,2006.30.[doi:10.1109/ICDE.2006.35].
  • 6Khalefa ME,Mokbel MF,Levandoski JJ.Skyline query processing for incomplete data.In:Proc.of the 24th Int’l Conf.on DataEngineering.Cancún:IEEE Computer Society,2008.556-565.[doi:10.1109/ICDE.2008.4497464].
  • 7Koch C.On query algebras for probabilistic databases.SIGMOD Record,2008,37(4):78-85.[doi:10.1145/1519103.1519116].
  • 8Gal A,Martinez MV,Simari GI,Subrahmanian VS.Aggregate query answering under uncertain schema mappings.In:Proc.of the25th Int’l Conf.on Data Engineering.Shanghai:IEEE Computer Society,2009.940-951.[doi:10.1109/ICDE.2009.55].
  • 9Dong XL,Halevy A,Yu C.Data integration with uncertainty.In:Koch C,Gehrke J,Garofalakis MN,Srivastava D,Aberer K,Deshpande A,Florescu D,Chan CC,Ganti V,Kanne C,Klas W,Neuhold EJ,eds.Proc.of the 33rd Int’l Conf.on Very Large DataBases.Vienna:ACM Press,2007.687-698.[doi:10.1007/s00778-008-0119-9].
  • 10Elmagarmid AK,Ipeirotis PG,Verykios VS.Duplicate record detection:A survey.IEEE Trans.on Knowledge and DataEngineering,2007,19(1):1-16.[doi:10.1109/TKDE.2007.250581].

共引文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部