A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phr...A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.展开更多
It is significant to combine multiple tasks into an optimal work package in decision-making of aircraft maintenance to reduce cost,so a cost rate model of combinatorial maintenance is an urgent need.However,the optima...It is significant to combine multiple tasks into an optimal work package in decision-making of aircraft maintenance to reduce cost,so a cost rate model of combinatorial maintenance is an urgent need.However,the optimal combination under various constraints not only involves numerical calculations but also is an NP-hard combinatorial problem.To solve the problem,an adaptive genetic algorithm based on cluster search,which is divided into two phases,is put forward.In the first phase,according to the density,all individuals can be homogeneously scattered over the whole solution space through crossover and mutation and better individuals are collected as candidate cluster centres.In the second phase,the search is confined to the neighbourhood of some selected possible solutions to accurately solve with cluster radius decreasing slowly,meanwhile all clusters continuously move to better regions until all the peaks in the question space is searched.This algorithm can efficiently solve the combination problem.Taking the optimization on decision-making of aircraft maintenance by the algorithm for an example,maintenance which combines multiple parts or tasks can significantly enhance economic benefit when the halt cost is rather high.展开更多
A method of environment mapping using laser-based light detection and ranging (LIDAR) is proposed in this paper. This method not only has a good detection performance in a wide range of detection angles, but also fa...A method of environment mapping using laser-based light detection and ranging (LIDAR) is proposed in this paper. This method not only has a good detection performance in a wide range of detection angles, but also facilitates the detection of dynamic and hollowed-out obstacles. Essentially using this method, an improved clustering algorithm based on fast search and discovery of density peaks (CBFD) is presented to extract various obstacles in the environment map. By comparing with other cluster algorithms, CBFD can obtain a favorable number of clusterings automatically. Furthermore, the experiments show that CBFD is better and more robust in functionality and performance than the K-means and iterative self-organizing data analysis techniques algorithm (ISODATA).展开更多
The tiny searching step length and the satellite distribution density are the major factors to influence the efficiency of the satellite finder,so a scientific and reasonable method to calculate the tiny searching ste...The tiny searching step length and the satellite distribution density are the major factors to influence the efficiency of the satellite finder,so a scientific and reasonable method to calculate the tiny searching step length is proposed to optimize the satellite searching strategy. The pattern clustering and BP neural network are applied to optimize the tiny searching step length. The calculated tiny searching step length is approximately equal to the theoretic value for each satellite. In application,the satellite searching results will be dynamically added to the training samples to re-train the network to improve the generalizability and the precision. Experiments validate that the optimization of the tiny searching step length can avoid the error of locating target satellite and improve the searching efficiency.展开更多
The steered response power-phase transform (SRP-PHAT) sound source localization algorithm is robust in a real environment. However, the large computation complexity limits the practical application of SRP-PHAT. For a ...The steered response power-phase transform (SRP-PHAT) sound source localization algorithm is robust in a real environment. However, the large computation complexity limits the practical application of SRP-PHAT. For a microphone array, each location corresponds to a set of time differences of arrival (TDOAs), and this paper collects them into a TDOA vector. Since the TDOA vectors in the adjacent regions are similar, we present a fast algorithm based on clustering search to reduce the computation complexity of SRP-PHAT. In the training stage, the K-means or Iterative Self-Organizing Data Analysis Technique (ISODATA) clustering algorithm is used to find the centroid in each cluster with similar TDOA vectors. In the procedure of sound localization, the optimal cluster is found by comparing the steered response powers (SRPs) of all centroids. The SRPs of all candidate locations in the optimal cluster are compared to localize the sound source. Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computational load of the proposed method with those of the conventional SRP-PHAT algorithm. The results show that the proposed method is able to reduce the computational load drastically and maintains almost the same localization accuracy and robustness as those of the conventional SRP-PHAT algorithm. The difference in localization performance brought by different clustering algorithms used in the training stage is trivial.展开更多
Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link predic...Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.展开更多
In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented da...In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented data supply chains because the high complexity of the data supply chain makes the computation of similarity extremely complex and inefficient. In this paper, we propose a feature space representation model based on key points,which can extract the key features from the subsequences of the original data supply chain and simplify it into a feature vector form. Then, we formulate the similarity computation of the subsequences based on the multiscale features. Further, we propose an improved hierarchical clustering algorithm for a similarity search over the data supply chains. The main idea is to separate the subsequences into disjoint groups such that each group meets one specific clustering criteria; thus, the cluster containing the query object is the similarity search result. The experimental results show that the proposed approach is both effective and efficient for data supply chain retrieval.展开更多
基金Foundation item: Supported by the National Natural Science Foundation of China (60503020, 60503033, 60703086)Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow Uni-versity (KJS0714)+1 种基金Research Foundation of Nanjing University of Posts and Telecommunications (NY207052, NY207082)National Natural Science Foundation of Jiangsu (BK2006094).
文摘A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.
基金supported by the National Natural Science Foundation of China(6107901361079014+4 种基金61403198)the National Natural Science Funds and Civil Aviaiton Mutual Funds(U1533128U1233114)the Programs of Natural Science Foundation of China and China Civil Aviation Joint Fund(60939003)the Natural Science Foundation of Jiangsu Province in China(BK2011737)
文摘It is significant to combine multiple tasks into an optimal work package in decision-making of aircraft maintenance to reduce cost,so a cost rate model of combinatorial maintenance is an urgent need.However,the optimal combination under various constraints not only involves numerical calculations but also is an NP-hard combinatorial problem.To solve the problem,an adaptive genetic algorithm based on cluster search,which is divided into two phases,is put forward.In the first phase,according to the density,all individuals can be homogeneously scattered over the whole solution space through crossover and mutation and better individuals are collected as candidate cluster centres.In the second phase,the search is confined to the neighbourhood of some selected possible solutions to accurately solve with cluster radius decreasing slowly,meanwhile all clusters continuously move to better regions until all the peaks in the question space is searched.This algorithm can efficiently solve the combination problem.Taking the optimization on decision-making of aircraft maintenance by the algorithm for an example,maintenance which combines multiple parts or tasks can significantly enhance economic benefit when the halt cost is rather high.
基金Supported by the National Natural Science Foundation of China(61103157)
文摘A method of environment mapping using laser-based light detection and ranging (LIDAR) is proposed in this paper. This method not only has a good detection performance in a wide range of detection angles, but also facilitates the detection of dynamic and hollowed-out obstacles. Essentially using this method, an improved clustering algorithm based on fast search and discovery of density peaks (CBFD) is presented to extract various obstacles in the environment map. By comparing with other cluster algorithms, CBFD can obtain a favorable number of clusterings automatically. Furthermore, the experiments show that CBFD is better and more robust in functionality and performance than the K-means and iterative self-organizing data analysis techniques algorithm (ISODATA).
基金Supported by Academic Innovation Project of Beijing(201106149)
文摘The tiny searching step length and the satellite distribution density are the major factors to influence the efficiency of the satellite finder,so a scientific and reasonable method to calculate the tiny searching step length is proposed to optimize the satellite searching strategy. The pattern clustering and BP neural network are applied to optimize the tiny searching step length. The calculated tiny searching step length is approximately equal to the theoretic value for each satellite. In application,the satellite searching results will be dynamically added to the training samples to re-train the network to improve the generalizability and the precision. Experiments validate that the optimization of the tiny searching step length can avoid the error of locating target satellite and improve the searching efficiency.
基金supported by the National Natural Science Foundation of China(Grant Nos. 60971098 and 61201345)the Beijing Key Laboratory of Advanced Information Science and Network Technology(Grant No.XDXX1308)
文摘The steered response power-phase transform (SRP-PHAT) sound source localization algorithm is robust in a real environment. However, the large computation complexity limits the practical application of SRP-PHAT. For a microphone array, each location corresponds to a set of time differences of arrival (TDOAs), and this paper collects them into a TDOA vector. Since the TDOA vectors in the adjacent regions are similar, we present a fast algorithm based on clustering search to reduce the computation complexity of SRP-PHAT. In the training stage, the K-means or Iterative Self-Organizing Data Analysis Technique (ISODATA) clustering algorithm is used to find the centroid in each cluster with similar TDOA vectors. In the procedure of sound localization, the optimal cluster is found by comparing the steered response powers (SRPs) of all centroids. The SRPs of all candidate locations in the optimal cluster are compared to localize the sound source. Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computational load of the proposed method with those of the conventional SRP-PHAT algorithm. The results show that the proposed method is able to reduce the computational load drastically and maintains almost the same localization accuracy and robustness as those of the conventional SRP-PHAT algorithm. The difference in localization performance brought by different clustering algorithms used in the training stage is trivial.
基金supported in part by the U.S.Army Research Laboratory under Cooperative Agreement No.W911NF-09-2-0053(NS-CTA),NSF ⅡS-0905215,CNS-09-31975MIAS,a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC
文摘Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.
基金partly supported by the National Natural Science Foundation of China(Nos.61532012,61370196,and 61672109)
文摘In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented data supply chains because the high complexity of the data supply chain makes the computation of similarity extremely complex and inefficient. In this paper, we propose a feature space representation model based on key points,which can extract the key features from the subsequences of the original data supply chain and simplify it into a feature vector form. Then, we formulate the similarity computation of the subsequences based on the multiscale features. Further, we propose an improved hierarchical clustering algorithm for a similarity search over the data supply chains. The main idea is to separate the subsequences into disjoint groups such that each group meets one specific clustering criteria; thus, the cluster containing the query object is the similarity search result. The experimental results show that the proposed approach is both effective and efficient for data supply chain retrieval.