A novel multivariate similarity clustering analysis (MSCA) approach was used to estimate a biogeographical division scheme for the global terrestrial fauna and was compared against other widely used clustering algorit...A novel multivariate similarity clustering analysis (MSCA) approach was used to estimate a biogeographical division scheme for the global terrestrial fauna and was compared against other widely used clustering algorithms. The faunal dataset included almost all terrestrial and freshwater fauna, a total of 4631 families, 141,814 genera, and 1,334,834 species. Our findings demonstrated that suitable results were only obtained with the MSCA method, which was associated with distinct hierarchies, reasonable structuring, and furthermore, conformed to biogeographical criteria. A total of seven kingdoms and 20 sub-kingdoms were identified. We discovered that the clustering results for the higher and lower animals did not differ significantly, leading us to consider that the analysis result is convincing as the first zoogeographical division scheme for global all terrestrial animals.展开更多
The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time serie...The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time series,including Euclidean distance,Manhattan distance,and dynamic time warping(DTW).In contrast,DTW has been suggested to allow more robust similarity measure and be able to find the optimal alignment in time series.However,due to its quadratic time and space complexity,DTW is not suitable for large time series datasets.Many improving algorithms have been proposed for DTW search in large databases,such as approximate search or exact indexed search.Unlike the previous modified algorithm,this paper presents a novel parallel scheme for fast similarity search based on DTW,which is called MRDTW(MapRedcuebased DTW).The experimental results show that our approach not only retained the original accuracy as DTW,but also greatly improved the efficiency of similarity measure in large time series.展开更多
In order to better distinguish true hybrids in litchi crossbreeding,F1generations of two litchi hybridized combinations namely that‘Xuehuaizi’בGuiwei’and‘Xuehuaizi’בJiaohesanyuehong’were used as ma...In order to better distinguish true hybrids in litchi crossbreeding,F1generations of two litchi hybridized combinations namely that‘Xuehuaizi’בGuiwei’and‘Xuehuaizi’בJiaohesanyuehong’were used as materials to identify true hybrids,to construct mapping populations and to study the genetic diversities of the two populations with EST-SSR markers.The results showed that F1generations of the two hybrid populations could be identified with an identification rate of100%by the combination of four pair primers,respectively,and the 159 individual plants of the two populations were true hybrids.In addition,variations were exited in leaf morphology of the progenies of the two populations and bands of parents absented were occurred on genotype.The clustering analysis showed that 113 F1plants from the hybridized combination of‘Xuehuaizi’בGuiwei’were clustered into six categories(similarity coefficient was 0.68),and 63.72%(72 plants)of which clustered into one group with male parent.The genetic distances between 32 plants(28.3%)and their parents were far which indicated that larger variation or recombinant appeared in these plants.Forty six hybrid progenies of the combination of‘Xuehuaizi’בJiaohesanyuehong’were divided into two categories when the similarity coefficient was 0.642 and most individual plants(60.87%)showed closer genetic relationship with female parent and partial maternal genetic tendency.It is concluded that EST-SSR markers are suitable to identify true hybrids of litchi.The construction of the two F1mapping populations has established basis for further genetic linkage mapping,meanwhile,has accumulated materials for cultivar improvement of litchi.展开更多
In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented da...In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented data supply chains because the high complexity of the data supply chain makes the computation of similarity extremely complex and inefficient. In this paper, we propose a feature space representation model based on key points,which can extract the key features from the subsequences of the original data supply chain and simplify it into a feature vector form. Then, we formulate the similarity computation of the subsequences based on the multiscale features. Further, we propose an improved hierarchical clustering algorithm for a similarity search over the data supply chains. The main idea is to separate the subsequences into disjoint groups such that each group meets one specific clustering criteria; thus, the cluster containing the query object is the similarity search result. The experimental results show that the proposed approach is both effective and efficient for data supply chain retrieval.展开更多
Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link predic...Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.展开更多
文摘A novel multivariate similarity clustering analysis (MSCA) approach was used to estimate a biogeographical division scheme for the global terrestrial fauna and was compared against other widely used clustering algorithms. The faunal dataset included almost all terrestrial and freshwater fauna, a total of 4631 families, 141,814 genera, and 1,334,834 species. Our findings demonstrated that suitable results were only obtained with the MSCA method, which was associated with distinct hierarchies, reasonable structuring, and furthermore, conformed to biogeographical criteria. A total of seven kingdoms and 20 sub-kingdoms were identified. We discovered that the clustering results for the higher and lower animals did not differ significantly, leading us to consider that the analysis result is convincing as the first zoogeographical division scheme for global all terrestrial animals.
基金supported in part by National High-tech R&D Program of China under Grants No.2012AA012600,2011AA010702,2012AA01A401,2012AA01A402National Natural Science Foundation of China under Grant No.60933005+1 种基金National Science and Technology Ministry of China under Grant No.2012BAH38B04National 242 Information Security of China under Grant No.2011A010
文摘The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time series,including Euclidean distance,Manhattan distance,and dynamic time warping(DTW).In contrast,DTW has been suggested to allow more robust similarity measure and be able to find the optimal alignment in time series.However,due to its quadratic time and space complexity,DTW is not suitable for large time series datasets.Many improving algorithms have been proposed for DTW search in large databases,such as approximate search or exact indexed search.Unlike the previous modified algorithm,this paper presents a novel parallel scheme for fast similarity search based on DTW,which is called MRDTW(MapRedcuebased DTW).The experimental results show that our approach not only retained the original accuracy as DTW,but also greatly improved the efficiency of similarity measure in large time series.
基金Supported by Special Fund of National Natural Science Foundation of China(31272135)Modern Agricultural Industry Technology System of Guangdong Province(LNSG2013-04)
文摘In order to better distinguish true hybrids in litchi crossbreeding,F1generations of two litchi hybridized combinations namely that‘Xuehuaizi’בGuiwei’and‘Xuehuaizi’בJiaohesanyuehong’were used as materials to identify true hybrids,to construct mapping populations and to study the genetic diversities of the two populations with EST-SSR markers.The results showed that F1generations of the two hybrid populations could be identified with an identification rate of100%by the combination of four pair primers,respectively,and the 159 individual plants of the two populations were true hybrids.In addition,variations were exited in leaf morphology of the progenies of the two populations and bands of parents absented were occurred on genotype.The clustering analysis showed that 113 F1plants from the hybridized combination of‘Xuehuaizi’בGuiwei’were clustered into six categories(similarity coefficient was 0.68),and 63.72%(72 plants)of which clustered into one group with male parent.The genetic distances between 32 plants(28.3%)and their parents were far which indicated that larger variation or recombinant appeared in these plants.Forty six hybrid progenies of the combination of‘Xuehuaizi’בJiaohesanyuehong’were divided into two categories when the similarity coefficient was 0.642 and most individual plants(60.87%)showed closer genetic relationship with female parent and partial maternal genetic tendency.It is concluded that EST-SSR markers are suitable to identify true hybrids of litchi.The construction of the two F1mapping populations has established basis for further genetic linkage mapping,meanwhile,has accumulated materials for cultivar improvement of litchi.
基金partly supported by the National Natural Science Foundation of China(Nos.61532012,61370196,and 61672109)
文摘In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented data supply chains because the high complexity of the data supply chain makes the computation of similarity extremely complex and inefficient. In this paper, we propose a feature space representation model based on key points,which can extract the key features from the subsequences of the original data supply chain and simplify it into a feature vector form. Then, we formulate the similarity computation of the subsequences based on the multiscale features. Further, we propose an improved hierarchical clustering algorithm for a similarity search over the data supply chains. The main idea is to separate the subsequences into disjoint groups such that each group meets one specific clustering criteria; thus, the cluster containing the query object is the similarity search result. The experimental results show that the proposed approach is both effective and efficient for data supply chain retrieval.
基金supported in part by the U.S.Army Research Laboratory under Cooperative Agreement No.W911NF-09-2-0053(NS-CTA),NSF ⅡS-0905215,CNS-09-31975MIAS,a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC
文摘Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.