With the rocketing progress of the Internet, it is easier for people to get information about the objects that they are interested in. However, this information usually has conflicts. In order to resolve conflicts and...With the rocketing progress of the Internet, it is easier for people to get information about the objects that they are interested in. However, this information usually has conflicts. In order to resolve conflicts and get the true information, truth discovery has been proposed and received widespread attention. Many algorithms have been proposed to adapt to different scenarios. This paper aims to investigate these algorithms and summarize them from the perspective of algorithm models and specific concepts. Some classic datasets and evaluation metrics are given in this paper. Some future directions for readers are also provided to better understand the field of truth discovery.展开更多
With the extensive application of software collaborative development technology,the processing of code data generated in programming scenes has become a research hotspot.In the collaborative programming process,differ...With the extensive application of software collaborative development technology,the processing of code data generated in programming scenes has become a research hotspot.In the collaborative programming process,different users can submit code in a distributed way.The consistency of code grammar can be achieved by syntax constraints.However,when different users work on the same code in semantic development programming practices,the development factors of different users will inevitably lead to the problem of data semantic conflict.In this paper,the characteristics of code segment data in a programming scene are considered.The code sequence can be obtained by disassembling the code segment using lexical analysis technology.Combined with a traditional solution of a data conflict problem,the code sequence can be taken as the declared value object in the data conflict resolution problem.Through the similarity analysis of code sequence objects,the concept of the deviation degree between the declared value object and the truth value object is proposed.A multi-truth discovery algorithm,called the multiple truth discovery algorithm based on deviation(MTDD),is proposed.The basic methods,such as Conflict Resolution on Heterogeneous Data,Voting-K,and MTRuths_Greedy,are compared to verify the performance and precision of the proposed MTDD algorithm.展开更多
基金Fundamental Research Funds for the Central Universities,China (No. 22D111207)。
文摘With the rocketing progress of the Internet, it is easier for people to get information about the objects that they are interested in. However, this information usually has conflicts. In order to resolve conflicts and get the true information, truth discovery has been proposed and received widespread attention. Many algorithms have been proposed to adapt to different scenarios. This paper aims to investigate these algorithms and summarize them from the perspective of algorithm models and specific concepts. Some classic datasets and evaluation metrics are given in this paper. Some future directions for readers are also provided to better understand the field of truth discovery.
基金supported by the National Key R&D Program of China(No.2018YFB1003905)the National Natural Science Foundation of China under Grant(No.61971032)Fundamental Research Funds for the Central Universities(No.FRF-TP-18-008A3).
文摘With the extensive application of software collaborative development technology,the processing of code data generated in programming scenes has become a research hotspot.In the collaborative programming process,different users can submit code in a distributed way.The consistency of code grammar can be achieved by syntax constraints.However,when different users work on the same code in semantic development programming practices,the development factors of different users will inevitably lead to the problem of data semantic conflict.In this paper,the characteristics of code segment data in a programming scene are considered.The code sequence can be obtained by disassembling the code segment using lexical analysis technology.Combined with a traditional solution of a data conflict problem,the code sequence can be taken as the declared value object in the data conflict resolution problem.Through the similarity analysis of code sequence objects,the concept of the deviation degree between the declared value object and the truth value object is proposed.A multi-truth discovery algorithm,called the multiple truth discovery algorithm based on deviation(MTDD),is proposed.The basic methods,such as Conflict Resolution on Heterogeneous Data,Voting-K,and MTRuths_Greedy,are compared to verify the performance and precision of the proposed MTDD algorithm.