With the rocketing progress of the Internet, it is easier for people to get information about the objects that they are interested in. However, this information usually has conflicts. In order to resolve conflicts and...With the rocketing progress of the Internet, it is easier for people to get information about the objects that they are interested in. However, this information usually has conflicts. In order to resolve conflicts and get the true information, truth discovery has been proposed and received widespread attention. Many algorithms have been proposed to adapt to different scenarios. This paper aims to investigate these algorithms and summarize them from the perspective of algorithm models and specific concepts. Some classic datasets and evaluation metrics are given in this paper. Some future directions for readers are also provided to better understand the field of truth discovery.展开更多
With the extensive application of software collaborative development technology,the processing of code data generated in programming scenes has become a research hotspot.In the collaborative programming process,differ...With the extensive application of software collaborative development technology,the processing of code data generated in programming scenes has become a research hotspot.In the collaborative programming process,different users can submit code in a distributed way.The consistency of code grammar can be achieved by syntax constraints.However,when different users work on the same code in semantic development programming practices,the development factors of different users will inevitably lead to the problem of data semantic conflict.In this paper,the characteristics of code segment data in a programming scene are considered.The code sequence can be obtained by disassembling the code segment using lexical analysis technology.Combined with a traditional solution of a data conflict problem,the code sequence can be taken as the declared value object in the data conflict resolution problem.Through the similarity analysis of code sequence objects,the concept of the deviation degree between the declared value object and the truth value object is proposed.A multi-truth discovery algorithm,called the multiple truth discovery algorithm based on deviation(MTDD),is proposed.The basic methods,such as Conflict Resolution on Heterogeneous Data,Voting-K,and MTRuths_Greedy,are compared to verify the performance and precision of the proposed MTDD algorithm.展开更多
The Internet now is a large-scale platform with big data. Finding truth from a huge dataset has attracted extensive attention, which can maintain the quality of data collected by users and provide users with accurate ...The Internet now is a large-scale platform with big data. Finding truth from a huge dataset has attracted extensive attention, which can maintain the quality of data collected by users and provide users with accurate and efficient data. However, current truth finder algorithms are unsatisfying, because of their low accuracy and complication. This paper proposes a truth finder algorithm based on entity attributes (TFAEA). Based on the iterative computation of source reliability and fact accuracy, TFAEA considers the interactive degree among facts and the degree of dependence among sources, to simplify the typical truth finder algorithms. In order to improve the accuracy of them, TFAEA combines the one-way text similarity and the factual conflict to calculate the mutual support degree among facts. Furthermore, TFAEA utilizes the symmetric saturation of data sources to calculate the degree of dependence among sources. The experimental results show that TFAEA is not only more stable, but also more accurate than the typical truth finder algorithms.展开更多
基金Fundamental Research Funds for the Central Universities,China (No. 22D111207)。
文摘With the rocketing progress of the Internet, it is easier for people to get information about the objects that they are interested in. However, this information usually has conflicts. In order to resolve conflicts and get the true information, truth discovery has been proposed and received widespread attention. Many algorithms have been proposed to adapt to different scenarios. This paper aims to investigate these algorithms and summarize them from the perspective of algorithm models and specific concepts. Some classic datasets and evaluation metrics are given in this paper. Some future directions for readers are also provided to better understand the field of truth discovery.
基金supported by the National Key R&D Program of China(No.2018YFB1003905)the National Natural Science Foundation of China under Grant(No.61971032)Fundamental Research Funds for the Central Universities(No.FRF-TP-18-008A3).
文摘With the extensive application of software collaborative development technology,the processing of code data generated in programming scenes has become a research hotspot.In the collaborative programming process,different users can submit code in a distributed way.The consistency of code grammar can be achieved by syntax constraints.However,when different users work on the same code in semantic development programming practices,the development factors of different users will inevitably lead to the problem of data semantic conflict.In this paper,the characteristics of code segment data in a programming scene are considered.The code sequence can be obtained by disassembling the code segment using lexical analysis technology.Combined with a traditional solution of a data conflict problem,the code sequence can be taken as the declared value object in the data conflict resolution problem.Through the similarity analysis of code sequence objects,the concept of the deviation degree between the declared value object and the truth value object is proposed.A multi-truth discovery algorithm,called the multiple truth discovery algorithm based on deviation(MTDD),is proposed.The basic methods,such as Conflict Resolution on Heterogeneous Data,Voting-K,and MTRuths_Greedy,are compared to verify the performance and precision of the proposed MTDD algorithm.
基金supported by the National Natural Science Foundation of China(61472192)the Scientific and Technological Support Project(Society)of Jiangsu Province(BE2016776)
文摘The Internet now is a large-scale platform with big data. Finding truth from a huge dataset has attracted extensive attention, which can maintain the quality of data collected by users and provide users with accurate and efficient data. However, current truth finder algorithms are unsatisfying, because of their low accuracy and complication. This paper proposes a truth finder algorithm based on entity attributes (TFAEA). Based on the iterative computation of source reliability and fact accuracy, TFAEA considers the interactive degree among facts and the degree of dependence among sources, to simplify the typical truth finder algorithms. In order to improve the accuracy of them, TFAEA combines the one-way text similarity and the factual conflict to calculate the mutual support degree among facts. Furthermore, TFAEA utilizes the symmetric saturation of data sources to calculate the degree of dependence among sources. The experimental results show that TFAEA is not only more stable, but also more accurate than the typical truth finder algorithms.