The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ qu...The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ queries. Most search engines provide short time retrieval to user queries;however, they provide a little guarantee of precision even to the highly detailed users’ queries. In such cases, documents clustering centered on the subject and contents might improve search results. This paper presents a novel method of document clustering, which uses semantic clique. First, we extracted the Features from the documents. Later, the associations between frequently co-occurring terms were defined, which were called as semantic cliques. Each connected component in the semantic clique represented a theme. The documents clustered based on the theme, for which we designed an aggregation algorithm. We evaluated the aggregation algorithm effectiveness using four kinds of datasets. The result showed that the semantic clique based document clustering algorithm performed significantly better than traditional clustering algorithms such as Principal Direction Divisive Partitioning (PDDP), k-means, Auto-Class, and Hierarchical Clustering (HAC). We found that the Semantic Clique Aggregation is a potential model to represent association rules in text and could be immensely useful for automatic document clustering.展开更多
On the basis of investigating the statistical data of bus transport networks of three big cities in China,wepropose that each bus route is a clique(maximal complete subgraph)and a bus transport network(BTN)consists of...On the basis of investigating the statistical data of bus transport networks of three big cities in China,wepropose that each bus route is a clique(maximal complete subgraph)and a bus transport network(BTN)consists of alot of cliques,which intensively connect and overlap with each other.We study the network properties,which includethe degree distribution,multiple edges' overlapping time distribution,distribution of the overlap size between any twooverlapping cliques,distribution of the number of cliques that a node belongs to.Naturally,the cliques also constitute anetwork,with the overlapping nodes being their multiple links.We also research its network properties such as degreedistribution,clustering,average path length,and so on.We propose that a BTN has the properties of random cliqueincrement and random overlapping clique,at the same time,a BTN is a small-world network with highly clique-clusteredand highly clique-overlapped.Finally,we introduce a BTN evolution model,whose simulation results agree well withthe statistical laws that emerge in real BTNs.展开更多
This paper proposes an energy-efficient geocast algorithm for wireless sensor networks with guaranteed de-livery of packets from the sink to all nodes located in several geocast regions. Our approach is different from...This paper proposes an energy-efficient geocast algorithm for wireless sensor networks with guaranteed de-livery of packets from the sink to all nodes located in several geocast regions. Our approach is different from those existing in the literature. We first propose a hybrid clustering scheme: in the first phase we partition the network in cliques using an existing energy-efficient clustering protocol. Next the set of clusterheads of cliques are in their turn partitioned using an energy-efficient hierarchical clustering. Our approach to con-sume less energy falls into the category of energy-efficient clustering algorithm in which the clusterhead is located in the central area of the cluster. Since each cluster is a clique, each sensor is at one hop to the cluster head. This contributes to use less energy for transmission to and from the clusterhead, comparatively to multi hop clustering. Moreover we use the strategy of asleep-awake to minimize energy consumption during extra clique broadcasts.展开更多
高维数据具有稀疏性与易受维度灾难影响的特点,这使高维数据聚类的精度与的效率一直难以得到保证,因此采用子空间聚类的方式减小稀疏性与维度灾难对聚类结果的影响。首先采用随机抽样的方式从高维数据中挑选出适合聚类的维度生成子空间...高维数据具有稀疏性与易受维度灾难影响的特点,这使高维数据聚类的精度与的效率一直难以得到保证,因此采用子空间聚类的方式减小稀疏性与维度灾难对聚类结果的影响。首先采用随机抽样的方式从高维数据中挑选出适合聚类的维度生成子空间,并结合hoeffding界保证抽样结果的有效性。其次利用网格的邻接性,在子空间内生成混合网格,即可以保证数据的完整性也可以提高子空间密度。最后根据子空间的相似度与相异度,对维度剪枝,再次提高子空间密度。算法在加州大学欧文分校数据集(University of California-Irvine,UCI)上能够取得较好的结果,而且算法在的伸缩性以及抗噪声能力上有较好的表现。展开更多
文摘The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ queries. Most search engines provide short time retrieval to user queries;however, they provide a little guarantee of precision even to the highly detailed users’ queries. In such cases, documents clustering centered on the subject and contents might improve search results. This paper presents a novel method of document clustering, which uses semantic clique. First, we extracted the Features from the documents. Later, the associations between frequently co-occurring terms were defined, which were called as semantic cliques. Each connected component in the semantic clique represented a theme. The documents clustered based on the theme, for which we designed an aggregation algorithm. We evaluated the aggregation algorithm effectiveness using four kinds of datasets. The result showed that the semantic clique based document clustering algorithm performed significantly better than traditional clustering algorithms such as Principal Direction Divisive Partitioning (PDDP), k-means, Auto-Class, and Hierarchical Clustering (HAC). We found that the Semantic Clique Aggregation is a potential model to represent association rules in text and could be immensely useful for automatic document clustering.
基金supported by National Natural Science Foundation of China under Grant Nos.60504027 and 60874080the Postdoctor Science Foundation of China under Grant No.20060401037
文摘On the basis of investigating the statistical data of bus transport networks of three big cities in China,wepropose that each bus route is a clique(maximal complete subgraph)and a bus transport network(BTN)consists of alot of cliques,which intensively connect and overlap with each other.We study the network properties,which includethe degree distribution,multiple edges' overlapping time distribution,distribution of the overlap size between any twooverlapping cliques,distribution of the number of cliques that a node belongs to.Naturally,the cliques also constitute anetwork,with the overlapping nodes being their multiple links.We also research its network properties such as degreedistribution,clustering,average path length,and so on.We propose that a BTN has the properties of random cliqueincrement and random overlapping clique,at the same time,a BTN is a small-world network with highly clique-clusteredand highly clique-overlapped.Finally,we introduce a BTN evolution model,whose simulation results agree well withthe statistical laws that emerge in real BTNs.
文摘This paper proposes an energy-efficient geocast algorithm for wireless sensor networks with guaranteed de-livery of packets from the sink to all nodes located in several geocast regions. Our approach is different from those existing in the literature. We first propose a hybrid clustering scheme: in the first phase we partition the network in cliques using an existing energy-efficient clustering protocol. Next the set of clusterheads of cliques are in their turn partitioned using an energy-efficient hierarchical clustering. Our approach to con-sume less energy falls into the category of energy-efficient clustering algorithm in which the clusterhead is located in the central area of the cluster. Since each cluster is a clique, each sensor is at one hop to the cluster head. This contributes to use less energy for transmission to and from the clusterhead, comparatively to multi hop clustering. Moreover we use the strategy of asleep-awake to minimize energy consumption during extra clique broadcasts.
文摘高维数据具有稀疏性与易受维度灾难影响的特点,这使高维数据聚类的精度与的效率一直难以得到保证,因此采用子空间聚类的方式减小稀疏性与维度灾难对聚类结果的影响。首先采用随机抽样的方式从高维数据中挑选出适合聚类的维度生成子空间,并结合hoeffding界保证抽样结果的有效性。其次利用网格的邻接性,在子空间内生成混合网格,即可以保证数据的完整性也可以提高子空间密度。最后根据子空间的相似度与相异度,对维度剪枝,再次提高子空间密度。算法在加州大学欧文分校数据集(University of California-Irvine,UCI)上能够取得较好的结果,而且算法在的伸缩性以及抗噪声能力上有较好的表现。