Data cube pre-computation is an important concept for supporting OLAP (Online Analytical Processing) and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage...Data cube pre-computation is an important concept for supporting OLAP (Online Analytical Processing) and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this issue through a partitioning method that groups cube cells into equivalence partitions. Such an approach not only is useful for distributive aggregate functions such as SUM but also can be applied to the maintenance of holistic aggregate functions like MEDIAN which will require the storage of a set of tuples for each equivalence class. Unfortunately, as changes are made to the data sources, maintaining the quotient cube is non-trivial since the partitioning of the cube cells must also be updated. In this paper, the authors design incremental algorithms to update a quotient cube efficiently for both SUM and MEDIAN aggregate functions. For the aggregate function SUM, concepts are borrowed from the principle of Galois Lattice to develop CPU-efficient algorithms to update a quotient cube. For the aggregate function MEDIAN, the concept of a pseudo class is introduced to further reduce the size of the quotient cube, Coupled with a novel sliding window technique, an efficient algorithm is developed for maintaining a MEDIAN quotient cube that takes up reasonably small storage space. Performance study shows that the proposed algorithms are efficient and scalable over large databases.展开更多
Data cube computation is a well-known expensive operation and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube...Data cube computation is a well-known expensive operation and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this fundamental issue through a partitioning method that groups cube cells into equivalent partitions. The effectiveness and efficiency of the quotient cube for cube compression and computation have been proved. However, as changes are made to the data sources, to maintain such a quotient cube is non-trivial since the equivalent classes in it must be split or merged. In this paper, incremental algorithms are designed to update existing quotient cube efficiently based on Galois lattice. Performance study shows that these algorithms are efficient and scalable for large databases.展开更多
Identifying accounts across different online social networks that belong to the same user has attracted extensive attentions.However,existing techniques rely on given user seeds and ignore the dynamic changes of onlin...Identifying accounts across different online social networks that belong to the same user has attracted extensive attentions.However,existing techniques rely on given user seeds and ignore the dynamic changes of online social networks,which fails to generate high quality identification results.In order to solve this problem,we propose an incremental user identification method based on user-guider similarity index(called CURIOUS),which efficiently identifies users and well captures the changes of user features over time.Specifically,we first construct a novel user-guider similarity index(called USI)to speed up the matching between users.Second we propose a two-phase user identification strategy consisting of USI-based bidirectional user matching and seed-based user matching,which is effective even for incomplete networks.Finally,we propose incremental maintenance for both USI and the identification results,which dynamically captures the instant states of social networks.We conduct experimental studies based on three real-world social networks.The experiments demonstrate the effectiveness and the efficiency of our proposed method in comparison with traditional methods.Compared with the traditional methods,our method improves precision,recall and rank score by an average of 0.19,0.16 and 0.09 respectively,and reduces the time cost by an average of 81%.展开更多
基金This paper is supported by the National Natural Science Foundation of China (Grant Nos. 60473069, 60496325, 60273071).
文摘Data cube pre-computation is an important concept for supporting OLAP (Online Analytical Processing) and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this issue through a partitioning method that groups cube cells into equivalence partitions. Such an approach not only is useful for distributive aggregate functions such as SUM but also can be applied to the maintenance of holistic aggregate functions like MEDIAN which will require the storage of a set of tuples for each equivalence class. Unfortunately, as changes are made to the data sources, maintaining the quotient cube is non-trivial since the partitioning of the cube cells must also be updated. In this paper, the authors design incremental algorithms to update a quotient cube efficiently for both SUM and MEDIAN aggregate functions. For the aggregate function SUM, concepts are borrowed from the principle of Galois Lattice to develop CPU-efficient algorithms to update a quotient cube. For the aggregate function MEDIAN, the concept of a pseudo class is introduced to further reduce the size of the quotient cube, Coupled with a novel sliding window technique, an efficient algorithm is developed for maintaining a MEDIAN quotient cube that takes up reasonably small storage space. Performance study shows that the proposed algorithms are efficient and scalable over large databases.
文摘Data cube computation is a well-known expensive operation and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this fundamental issue through a partitioning method that groups cube cells into equivalent partitions. The effectiveness and efficiency of the quotient cube for cube compression and computation have been proved. However, as changes are made to the data sources, to maintain such a quotient cube is non-trivial since the equivalent classes in it must be split or merged. In this paper, incremental algorithms are designed to update existing quotient cube efficiently based on Galois lattice. Performance study shows that these algorithms are efficient and scalable for large databases.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos.62072084,62172082 and 62072086the Science Research Foundation of Liaoning Province of China under Grant No.LJKZ0094+2 种基金the Natural Science Foundation of Liaoning Province of China under Grant No.2022-MS-171the Science and Technology Plan Major Project of Liaoning Province of China under Grant No.2022JH1/10400009the Fundamental Research Funds for the Central Universities of China under Grant No.N2116008。
文摘Identifying accounts across different online social networks that belong to the same user has attracted extensive attentions.However,existing techniques rely on given user seeds and ignore the dynamic changes of online social networks,which fails to generate high quality identification results.In order to solve this problem,we propose an incremental user identification method based on user-guider similarity index(called CURIOUS),which efficiently identifies users and well captures the changes of user features over time.Specifically,we first construct a novel user-guider similarity index(called USI)to speed up the matching between users.Second we propose a two-phase user identification strategy consisting of USI-based bidirectional user matching and seed-based user matching,which is effective even for incomplete networks.Finally,we propose incremental maintenance for both USI and the identification results,which dynamically captures the instant states of social networks.We conduct experimental studies based on three real-world social networks.The experiments demonstrate the effectiveness and the efficiency of our proposed method in comparison with traditional methods.Compared with the traditional methods,our method improves precision,recall and rank score by an average of 0.19,0.16 and 0.09 respectively,and reduces the time cost by an average of 81%.