期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Steering data quality with visual analytics:The complexity challenge 被引量:5
1
作者 Shixia Liu Gennady Andrienko +5 位作者 Yingcai Wu Nan Cao Liu Jiang Conglei Shi Yu-Shuen Wang Seokhee Hong 《Visual Informatics》 EI 2018年第4期191-197,共7页
Data quality management,especially data cleansing,has been extensively studied for many years in the areas of data management and visual analytics.In the paper,we first review and explore the relevant work from the re... Data quality management,especially data cleansing,has been extensively studied for many years in the areas of data management and visual analytics.In the paper,we first review and explore the relevant work from the research areas of data management,visual analytics and human-computer interaction.Then for different types of data such as multimedia data,textual data,trajectory data,and graph data,we summarize the common methods for improving data quality by leveraging data cleansing techniques at different analysis stages.Based on a thorough analysis,we propose a general visual analytics framework for interactively cleansing data.Finally,the challenges and opportunities are analyzed and discussed in the context of data and humans. 展开更多
关键词 data quality management Visual analytics data cleansing
原文传递
Efficient Currency Determination Algorithms for Dynamic Data 被引量:2
2
作者 Xiaoou Ding Hongzhi Wang +2 位作者 Yitong Gao Jianzhong Li Hong Gao 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第3期227-242,共16页
Data quality is an important aspect in data application and management, and currency is one of the major dimensions influencing its quality. In real applications, datasets timestamps are often incomplete and unavailab... Data quality is an important aspect in data application and management, and currency is one of the major dimensions influencing its quality. In real applications, datasets timestamps are often incomplete and unavailable, or even absent. With the increasing requirements to update real-time data, existing methods can fail to adequately determine the currency of entities. In consideration of the velocity of big data, we propose a series of efficient algorithms for determining the currency of dynamic datasets, which we divide into two steps. In the preprocessing step, to better determine data currency and accelerate dataset updating, we propose the use of a topological graph of the processing order of the entity attributes. Then, we construct an Entity Query B-Tree (EQB-Tree) structure and an Entity Storage Dynamic Linked List (ES-DLL) to improve the querying and updating processes of both the data currency graph and currency scores. In the currency determination step, we propose definitions of the currency score and currency information for tuples referring to the same entity and use examples to discuss methods and algorithms for their computation. Based on our experimental results with both real and synthetic data, we verify that our methods can efficiently update data in the correct order of currency. 展开更多
关键词 data quality management data currency dynamic determining
原文传递
Content-Related Repairing of Inconsistencies in Distributed Data
3
作者 Yue-Feng Du De-Rong Shen +2 位作者 Tie-Zheng Nie Yue Kou Ge Yu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第4期741-758,共18页
Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-re... Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-related conditional functional dependencies (CCFDs) are a type of special CFDs, which combine content-related CFDs and detect potential inconsistencies by putting content-related data together. In the process of cleaning inconsistencies, detection and repairing are interactive: 1) detection catches inconsistencies, 2) repairing corrects caught inconsistencies while may bring new incon- sistencies. Besides, data are often fragmented and distributed into multiple sites. It consequently costs expensive shipment for inconsistencies cleaning. In this paper, our aim is to repair inconsistencies in distributed content-related data. We propose a framework consisting of an inconsistencies detection method and an inconsistencies repairing method, which work iteratively. The detection method marks the violated CCFDs for computing the inconsistencies which should be repaired preferentially. Based on the repairing-cost model presented in this paper, we prove that the minimum-cost repairing using CCFDs is NP-complete. Therefore, the repairing method heuristically repairs the inconsistencies with minimum cost. To improve the efficiency and accuracy of repairing, we propose distinct values and rules sequences. Distinct values make less data shipments than real data for communication. Rules sequences determine appropriate repairing sequences to avoid some incorrect repairs. Our solution is proved to be more effective than CFDs by empirical evaluation on two real-life datasets. 展开更多
关键词 data quality management distributed consistency content relativity consistency repairing
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部