Least square support vector regression(LSSVR)is a method for function approximation,whose solutions are typically non-sparse,which limits its application especially in some occasions of fast prediction.In this paper,a...Least square support vector regression(LSSVR)is a method for function approximation,whose solutions are typically non-sparse,which limits its application especially in some occasions of fast prediction.In this paper,a sparse algorithm for adaptive pruning LSSVR algorithm based on global representative point ranking(GRPR-AP-LSSVR)is proposed.At first,the global representative point ranking(GRPR)algorithm is given,and relevant data analysis experiment is implemented which depicts the importance ranking of data points.Furthermore,the pruning strategy of removing two samples in the decremental learning procedure is designed to accelerate the training speed and ensure the sparsity.The removed data points are utilized to test the temporary learning model which ensures the regression accuracy.Finally,the proposed algorithm is verified on artificial datasets and UCI regression datasets,and experimental results indicate that,compared with several benchmark algorithms,the GRPR-AP-LSSVR algorithm has excellent sparsity and prediction speed without impairing the generalization performance.展开更多
Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic publications.To tackle the AND problem,existing studie...Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic publications.To tackle the AND problem,existing studies have proposed various approaches based on different types of information,such as raw document features(e.g.,co-authors,titles,and keywords),the fusion feature(e.g.,a hybrid publication embedding based on multiple raw document features),the local structural information(e.g.,a publication's neighborhood information on a graph),and the global structural information(e.g.,interactive information between a node and others on a graph).However,there has been no work taking all the above-mentioned information into account and taking full advantage of the contributions of each raw document feature for the AND problem so far.To fill the gap,we propose a novel framework named EAND(Towards Effective Author Name Disambiguation by Hybrid Attention).Specifically,we design a novel feature extraction model,which consists of three hybrid attention mechanism layers,to extract key information from the global structural information and the local structural information that are generated from six similarity graphs constructed based on different similarity coefficients,raw document features,and the fusion feature.Each hybrid attention mechanism layer contains three key modules:a local structural perception,a global structural perception,and a feature extractor.Additionally,the mean absolute error function in the joint loss function is used to introduce the structural information loss of the vector space.Experimental results on two real-world datasets demonstrate that EAND achieves superior performance,outperforming state-of-the-art methods by at least+2.74%in terms of the micro-F1 score and+3.31%in terms of the macro-F1 score.展开更多
Rtecently a lot of works have been investigating to find the tenuous groups,i.e.,groups with few social interactions and weak relationships among members,for reviewer selection and psycho-educational group formation.H...Rtecently a lot of works have been investigating to find the tenuous groups,i.e.,groups with few social interactions and weak relationships among members,for reviewer selection and psycho-educational group formation.However,the metrics(e.g.,k-triangle,k-line,and k-tenuity)used to measure the tenuity,require a suitable k value to be specified which is difficult for users without background knowledge.Thus,in this paper we formulate the most tenuous group(MTG)query in terms of the group distance and average group distance of a group measuring the tenuity to eliminate the influence of parameter k on the tenuity of the group.To address the MTG problem,we first propose an exact algorithm,namely MTGVDIS,which takes priority to selecting those vertices whose vertex distance is large,to generate the result group,and also utilizes effective filtering and pruning strategies.Since MTGVDIS is not fast enough,we design an efficient exact algorithm,called MTG-VDGE,which exploits the degree metric to sort the vertexes and proposes a new combination order,namely degree and reverse based branch and bound(DRBB).MTG-VDGE gives priority to those vertices with small degree.For a large p,we further develop an approximation algorithm,namely MTG-VDLT,which discards candidate attendees with high degree to reduce the number of vertices to be considered.The experimental results on real datasets manifest that the proposed algorithms outperform existing approaches on both efficiency and group tenuity.展开更多
In smart phones,vehicles and wearable devices,GPS sensors are ubiquitous and collect a lot of valuable spatial data from the real world.Given a set of weighted points and a rectangle r in the space,a maximizing range ...In smart phones,vehicles and wearable devices,GPS sensors are ubiquitous and collect a lot of valuable spatial data from the real world.Given a set of weighted points and a rectangle r in the space,a maximizing range sum(MaxRS)query is to find the position of r,so as to maximize the total weight of the points covered by r(i.e.,the range sum).It has a wide spectrum of applications in spatial crowdsourcing,facility location and traffic monitoring.Most of the existing research focuses on the Euclidean space;however,in real life,the user’s moving route is constrained by the road network,and the existing MaxRS query algorithms in the road network are inefficient.In this paper,we propose a novel GPU-accelerated algorithm,namely,GAM,to tackle MaxRS queries in road networks in two phases efficiently.In phase 1,we partition the entire road network into many small cells by a grid and theoretically prove the correctness of parallel query results by grid shifting,and then we propose an effective multi-grained pruning technique,by which the majority of cells can be pruned without further checking.In phase 2,we design a GPU-friendly storage structure,cell-based road network(CRN),and a two-level parallel framework to compute the final result in the remaining cells.Finally,we conduct extensive experiments on two real-world road networks,and the experimental results demonstrate that GAM is on average one order faster than state-of-the-art competitors,and the maximum speedup can achieve about 55 times.展开更多
基金supported by the Science and Technology on Space Intelligent Control Laboratory for National Defense(KGJZDSYS-2018-08)。
文摘Least square support vector regression(LSSVR)is a method for function approximation,whose solutions are typically non-sparse,which limits its application especially in some occasions of fast prediction.In this paper,a sparse algorithm for adaptive pruning LSSVR algorithm based on global representative point ranking(GRPR-AP-LSSVR)is proposed.At first,the global representative point ranking(GRPR)algorithm is given,and relevant data analysis experiment is implemented which depicts the importance ranking of data points.Furthermore,the pruning strategy of removing two samples in the decremental learning procedure is designed to accelerate the training speed and ensure the sparsity.The removed data points are utilized to test the temporary learning model which ensures the regression accuracy.Finally,the proposed algorithm is verified on artificial datasets and UCI regression datasets,and experimental results indicate that,compared with several benchmark algorithms,the GRPR-AP-LSSVR algorithm has excellent sparsity and prediction speed without impairing the generalization performance.
基金supported by the Major Program of the Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant Nos.19KJA610002 and 19KJB520050the National Natural Science Foundation of China under Grant No.61902270.
文摘Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic publications.To tackle the AND problem,existing studies have proposed various approaches based on different types of information,such as raw document features(e.g.,co-authors,titles,and keywords),the fusion feature(e.g.,a hybrid publication embedding based on multiple raw document features),the local structural information(e.g.,a publication's neighborhood information on a graph),and the global structural information(e.g.,interactive information between a node and others on a graph).However,there has been no work taking all the above-mentioned information into account and taking full advantage of the contributions of each raw document feature for the AND problem so far.To fill the gap,we propose a novel framework named EAND(Towards Effective Author Name Disambiguation by Hybrid Attention).Specifically,we design a novel feature extraction model,which consists of three hybrid attention mechanism layers,to extract key information from the global structural information and the local structural information that are generated from six similarity graphs constructed based on different similarity coefficients,raw document features,and the fusion feature.Each hybrid attention mechanism layer contains three key modules:a local structural perception,a global structural perception,and a feature extractor.Additionally,the mean absolute error function in the joint loss function is used to introduce the structural information loss of the vector space.Experimental results on two real-world datasets demonstrate that EAND achieves superior performance,outperforming state-of-the-art methods by at least+2.74%in terms of the micro-F1 score and+3.31%in terms of the macro-F1 score.
基金supported by the Key-Area Research and Development Program of Guangdong Province(2020B0101100001)Guangdong Basic and Applied Basic Research Foundation(2019B1515130001)+2 种基金the National Natural Science Foundation of China(Grant Nos.61902438 and 61902439)Natural Science Foundation of Guangdong Province(2019A1515011704 and 2019A1515011159)Jianliang Xu's work is supported by HK-RGC(12201018).
文摘Rtecently a lot of works have been investigating to find the tenuous groups,i.e.,groups with few social interactions and weak relationships among members,for reviewer selection and psycho-educational group formation.However,the metrics(e.g.,k-triangle,k-line,and k-tenuity)used to measure the tenuity,require a suitable k value to be specified which is difficult for users without background knowledge.Thus,in this paper we formulate the most tenuous group(MTG)query in terms of the group distance and average group distance of a group measuring the tenuity to eliminate the influence of parameter k on the tenuity of the group.To address the MTG problem,we first propose an exact algorithm,namely MTGVDIS,which takes priority to selecting those vertices whose vertex distance is large,to generate the result group,and also utilizes effective filtering and pruning strategies.Since MTGVDIS is not fast enough,we design an efficient exact algorithm,called MTG-VDGE,which exploits the degree metric to sort the vertexes and proposes a new combination order,namely degree and reverse based branch and bound(DRBB).MTG-VDGE gives priority to those vertices with small degree.For a large p,we further develop an approximation algorithm,namely MTG-VDLT,which discards candidate attendees with high degree to reduce the number of vertices to be considered.The experimental results on real datasets manifest that the proposed algorithms outperform existing approaches on both efficiency and group tenuity.
基金This work was supported in part by the Key Research and Development Plan of National Ministry of Science and Technology under Grant No.2019YFB2101902the National Natural Science Foundation of China under Grant Nos.U19A2059 and 62102119the CCF-Baidu Open Fund CCF-BAIDUunder Grant No.OF2021011。
文摘In smart phones,vehicles and wearable devices,GPS sensors are ubiquitous and collect a lot of valuable spatial data from the real world.Given a set of weighted points and a rectangle r in the space,a maximizing range sum(MaxRS)query is to find the position of r,so as to maximize the total weight of the points covered by r(i.e.,the range sum).It has a wide spectrum of applications in spatial crowdsourcing,facility location and traffic monitoring.Most of the existing research focuses on the Euclidean space;however,in real life,the user’s moving route is constrained by the road network,and the existing MaxRS query algorithms in the road network are inefficient.In this paper,we propose a novel GPU-accelerated algorithm,namely,GAM,to tackle MaxRS queries in road networks in two phases efficiently.In phase 1,we partition the entire road network into many small cells by a grid and theoretically prove the correctness of parallel query results by grid shifting,and then we propose an effective multi-grained pruning technique,by which the majority of cells can be pruned without further checking.In phase 2,we design a GPU-friendly storage structure,cell-based road network(CRN),and a two-level parallel framework to compute the final result in the remaining cells.Finally,we conduct extensive experiments on two real-world road networks,and the experimental results demonstrate that GAM is on average one order faster than state-of-the-art competitors,and the maximum speedup can achieve about 55 times.