String similarity join is an essential operation of many applications that need to find all similar string pairs from two given collections. A quantitative way to determine whether two strings are similar is to comput...String similarity join is an essential operation of many applications that need to find all similar string pairs from two given collections. A quantitative way to determine whether two strings are similar is to compute their similarity based on a certain similarity function. The string pairs with similarity above a certain threshold are regarded as results. The current approach to solving the similarity join problem is to use a unique threshold value. There are, however, several scenarios that require the support of multiple thresholds, for instance, when the dataset includes strings of various lengths. In this scenario, longer string pairs typically tolerate much more typos than shorter ones. Therefore, we proposed a so- lution for string similarity joins that supports different simi- larity thresholds in a single operator. In order to support dif- ferent thresholds, we devised two novel indexing techniques: partition based indexing and similarity aware indexing. To utilize the new indices and improve the join performance, we proposed new filtering methods and index probing tech- niques. To the best of our knowledge, this is the first work that addresses this problem. Experimental results on real-world datasets show that our solution performs efficiently while pro- viding a more flexible threshold specification.展开更多
We conducted and analyzed several internet surveys in order to understand the profile of global research integrity and ethical awareness,encompassing global population distribution.These were(1)the global distribution...We conducted and analyzed several internet surveys in order to understand the profile of global research integrity and ethical awareness,encompassing global population distribution.These were(1)the global distribution of Committee of Publishing Ethics(COPE)membership;(2)the global distribution of“Integrity”or“Ethics”journals;(3)the level of academic integrity awareness in European higher education institutions and(4)awareness of academic integrity in the top universities of Asia and Africa.The results of this survey series highlight seriously imbalanced awareness of research integrity and publishing ethics across the world,especially in developing areas with the highest population density.We therefore propose a new index,the“Academic Integrity Awareness Index”for future discussions across the linked spheres of publishing and research.展开更多
基金This work was supported by China Scholarship Council and the National Natural Science Foundation of China (Grant Nos. 61402329 and 51378350).
文摘String similarity join is an essential operation of many applications that need to find all similar string pairs from two given collections. A quantitative way to determine whether two strings are similar is to compute their similarity based on a certain similarity function. The string pairs with similarity above a certain threshold are regarded as results. The current approach to solving the similarity join problem is to use a unique threshold value. There are, however, several scenarios that require the support of multiple thresholds, for instance, when the dataset includes strings of various lengths. In this scenario, longer string pairs typically tolerate much more typos than shorter ones. Therefore, we proposed a so- lution for string similarity joins that supports different simi- larity thresholds in a single operator. In order to support dif- ferent thresholds, we devised two novel indexing techniques: partition based indexing and similarity aware indexing. To utilize the new indices and improve the join performance, we proposed new filtering methods and index probing tech- niques. To the best of our knowledge, this is the first work that addresses this problem. Experimental results on real-world datasets show that our solution performs efficiently while pro- viding a more flexible threshold specification.
文摘We conducted and analyzed several internet surveys in order to understand the profile of global research integrity and ethical awareness,encompassing global population distribution.These were(1)the global distribution of Committee of Publishing Ethics(COPE)membership;(2)the global distribution of“Integrity”or“Ethics”journals;(3)the level of academic integrity awareness in European higher education institutions and(4)awareness of academic integrity in the top universities of Asia and Africa.The results of this survey series highlight seriously imbalanced awareness of research integrity and publishing ethics across the world,especially in developing areas with the highest population density.We therefore propose a new index,the“Academic Integrity Awareness Index”for future discussions across the linked spheres of publishing and research.