In this era of big data, data are often collected from multiple sources that have different reliabilities, and there is inevitable conflict with respect to the various information obtained when it relates to the the s...In this era of big data, data are often collected from multiple sources that have different reliabilities, and there is inevitable conflict with respect to the various information obtained when it relates to the the same object.One important task is to identify the most trustworthy value out of all the conflicting claims, and this is known as truth discovery. Existing truth discovery methods simultaneously identify the most trustworthy information and source reliability degrees and are based on the idea that more reliable sources often provide more trustworthy information,and vice versa. However, there are often semantic constrains defined upon relational database, which can be violated by a single data source. To remove violations, an important task is to repair data to satisfy the constrains,and this is known as data cleaning. The two problems above may coexist, but considering them together can provide some benefits, and to the authors knowledge, this has not yet been the focus of any research. In this paper, therefore, a schema-decomposing based method is proposed to simultaneously discover the truth and to clean the data, with the aim of improving accuracy. Experimental results using real world data sets of notebooks and mobile phones, as well as simulated data sets, demonstrate the effectiveness and efficiency of our proposed method.展开更多
Purpose–Data integration is to combine data residing at different sources and to provide the users with a unified interface of these data.An important issue on data integration is the existence of conflicts among the...Purpose–Data integration is to combine data residing at different sources and to provide the users with a unified interface of these data.An important issue on data integration is the existence of conflicts among the different data sources.Data sources may conflict with each other at data level,which is defined as data inconsistency.The purpose of this paper is to aim at this problem and propose a solution for data inconsistency in data integration.Design/methodology/approach–A relational data model extended with data source quality criteria is first defined.Then based on the proposed data model,a data inconsistency solution strategy is provided.To accomplish the strategy,fuzzy multi-attribute decision-making(MADM)approach based on data source quality criteria is applied to obtain the results.Finally,users feedbacks strategies are proposed to optimize the result of fuzzy MADM approach as the final data inconsistent solution.Findings–To evaluate the proposed method,the data obtained from the sensors are extracted.Some experiments are designed and performed to explain the effectiveness of the proposed strategy.The results substantiate that the solution has a better performance than the other methods on correctness,time cost and stability indicators.Practical implications–Since the inconsistent data collected from the sensors are pervasive,the proposed method can solve this problem and correct the wrong choice to some extent.Originality/value–In this paper,for the first time the authors study the effect of users feedbacks on integration results aiming at the inconsistent data.展开更多
New challenges including how to share information on heterogeneous devices appear in data-intensive pervasive computing environments. Data integration is a practical approach to these applications. Dealing with incons...New challenges including how to share information on heterogeneous devices appear in data-intensive pervasive computing environments. Data integration is a practical approach to these applications. Dealing with inconsistencies is one of the important problems in data integration. In this paper we motivate the problem of data inconsistency solution for data integration in pervasive environments. We define data qualit~ criteria and expense quality criteria for data sources to solve data inconsistency. In our solution, firstly, data sources needing high expense to obtain data from them are discarded by using expense quality criteria and utility function. Since it is difficult to obtain the actual quality of data sources in pervasive computing environment, we introduce fuzzy multi-attribute group decision making approach to selecting the appropriate data sources. The experimental results show that our solution has ideal effectiveness.展开更多
基金partially supported by the Key Research and Development Plan of National Ministry of Science and Technology (No. 2016YFB1000703)the Key Program of the National Natural Science Foundation of China (Nos. 61190115, 61472099, 61632010, and U1509216)+2 种基金National Sci-Tech Support Plan (No. 2015BAH10F01)the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province (No. LC2016026)MOE-Microsoft Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology
文摘In this era of big data, data are often collected from multiple sources that have different reliabilities, and there is inevitable conflict with respect to the various information obtained when it relates to the the same object.One important task is to identify the most trustworthy value out of all the conflicting claims, and this is known as truth discovery. Existing truth discovery methods simultaneously identify the most trustworthy information and source reliability degrees and are based on the idea that more reliable sources often provide more trustworthy information,and vice versa. However, there are often semantic constrains defined upon relational database, which can be violated by a single data source. To remove violations, an important task is to repair data to satisfy the constrains,and this is known as data cleaning. The two problems above may coexist, but considering them together can provide some benefits, and to the authors knowledge, this has not yet been the focus of any research. In this paper, therefore, a schema-decomposing based method is proposed to simultaneously discover the truth and to clean the data, with the aim of improving accuracy. Experimental results using real world data sets of notebooks and mobile phones, as well as simulated data sets, demonstrate the effectiveness and efficiency of our proposed method.
文摘Purpose–Data integration is to combine data residing at different sources and to provide the users with a unified interface of these data.An important issue on data integration is the existence of conflicts among the different data sources.Data sources may conflict with each other at data level,which is defined as data inconsistency.The purpose of this paper is to aim at this problem and propose a solution for data inconsistency in data integration.Design/methodology/approach–A relational data model extended with data source quality criteria is first defined.Then based on the proposed data model,a data inconsistency solution strategy is provided.To accomplish the strategy,fuzzy multi-attribute decision-making(MADM)approach based on data source quality criteria is applied to obtain the results.Finally,users feedbacks strategies are proposed to optimize the result of fuzzy MADM approach as the final data inconsistent solution.Findings–To evaluate the proposed method,the data obtained from the sensors are extracted.Some experiments are designed and performed to explain the effectiveness of the proposed strategy.The results substantiate that the solution has a better performance than the other methods on correctness,time cost and stability indicators.Practical implications–Since the inconsistent data collected from the sensors are pervasive,the proposed method can solve this problem and correct the wrong choice to some extent.Originality/value–In this paper,for the first time the authors study the effect of users feedbacks on integration results aiming at the inconsistent data.
基金supported by the National Natural Science Foundation of China under Grant No. 60970010the National Basic Research 973 Program of China under Grant No. 2009CB320705the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant No. 20090073110026
文摘New challenges including how to share information on heterogeneous devices appear in data-intensive pervasive computing environments. Data integration is a practical approach to these applications. Dealing with inconsistencies is one of the important problems in data integration. In this paper we motivate the problem of data inconsistency solution for data integration in pervasive environments. We define data qualit~ criteria and expense quality criteria for data sources to solve data inconsistency. In our solution, firstly, data sources needing high expense to obtain data from them are discarded by using expense quality criteria and utility function. Since it is difficult to obtain the actual quality of data sources in pervasive computing environment, we introduce fuzzy multi-attribute group decision making approach to selecting the appropriate data sources. The experimental results show that our solution has ideal effectiveness.