Effective storage,processing and analyzing of power device condition monitoring data faces enormous challenges.A framework is proposed that can support both MapReduce and Graph for massive monitoring data analysis at ...Effective storage,processing and analyzing of power device condition monitoring data faces enormous challenges.A framework is proposed that can support both MapReduce and Graph for massive monitoring data analysis at the same time based on Aliyun DTplus platform.First,power device condition monitoring data storage based on MaxCompute table and parallel permutation entropy feature extraction based on MaxCompute MapReduce are designed and implemented on DTplus platform.Then,Graph based k-means algorithm is implemented and used for massive condition monitoring data clustering analysis.Finally,performance tests are performed to compare the execution time between serial program and parallel program.Performance is analyzed from CPU cores consumption,memory utilization and parallel granularity.Experimental results show that the designed framework and parallel algorithms can efficiently process massive power device condition monitoring data.展开更多
How to find these communities is an important research work. Recently, community discovery are mainly categorized to HITS algorithm, bipartite cores algorithm and maximum flow/minimum cut framework. In this paper, we ...How to find these communities is an important research work. Recently, community discovery are mainly categorized to HITS algorithm, bipartite cores algorithm and maximum flow/minimum cut framework. In this paper, we proposed a new method to extract communities. The MCL algorithm, which is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm is used to extract communities. By putting mirror deleting procedure behind graph clustering, we decrease comparing cost considerably. After MCL and mirror deletion, we use community member select algorithm to produce the sets of community candidates. The experiment and results show the new method works effectively and properly.展开更多
Based on CNKI database,the paper takes journal literatures with the theme of “sports tourism” as the research objects to draw knowledge map through bibliometrics method and CiteSpace software.Moreover,the paper draw...Based on CNKI database,the paper takes journal literatures with the theme of “sports tourism” as the research objects to draw knowledge map through bibliometrics method and CiteSpace software.Moreover,the paper draws knowledge map and analyses for 1,208 periodical literatures with the theme of “sports tourism” in CNKI from 2002 to 2022.The results show that the sports tourism research from 2002 to 2022 can be divided into three stages:The first is the accelerated development stage,which focuses on the industrialization development of sports tourism;The second is the rapid prosperity stage,which focuses on how to promote economic development,and constantly deepen and enrich the connotation of sports tourism industry;The third is the rising stage of fluctuation,which begins to explore new development opportunities of sports tourism from multiple perspectives.Based on keywords co-occurrence analysis and cluster analysis,the research hotspots and changes of sports tourism in China are obtained.Five key research hotspots and three main directions are extracted to provide reference for the development of sports tourism research hotspots in the future.展开更多
The clustering of objects(individuals or variables)is one of the most used approaches to exploring multivariate data.The two most common unsupervised clustering strategies are hierarchical ascending clustering(HAC)and...The clustering of objects(individuals or variables)is one of the most used approaches to exploring multivariate data.The two most common unsupervised clustering strategies are hierarchical ascending clustering(HAC)and k-means partitioning used to identify groups of similar objects in a dataset to divide it into homogeneous groups.The proposed topological clustering of variables,called TCV,studies an homogeneous set of variables defined on the same set of individuals,based on the notion of neighborhood graphs,some of these variables are more-or-less correlated or linked according to the type quantitative or qualitative of the variables.This topological data analysis approach can then be useful for dimension reduction and variable selection.It’s a topological hierarchical clustering analysis of a set of variables which can be quantitative,qualitative or a mixture of both.It arranges variables into homogeneous groups according to their correlations or associations studied in a topological context of principal component analysis(PCA)or multiple correspondence analysis(MCA).The proposed TCV is adapted to the type of data considered,its principle is presented and illustrated using simple real datasets with quantitative,qualitative and mixed variables.The results of these illustrative examples are compared to those of other variables clustering approaches.展开更多
针对转辙机退化阶段难以划分的问题,提出一种基于多维特征融合的道岔转辙机退化状态识别方法。首先,提取了S700K转辙机退化功率数据的时域、频域、时频域多域特征;其次,通过核主成分分析(Kernel Principal Components Analysis,KPCA)进...针对转辙机退化阶段难以划分的问题,提出一种基于多维特征融合的道岔转辙机退化状态识别方法。首先,提取了S700K转辙机退化功率数据的时域、频域、时频域多域特征;其次,通过核主成分分析(Kernel Principal Components Analysis,KPCA)进行特征融合,获得表征道岔转辙机运行状态的特征向量,构建转辙机退化性能指标;再次,采用K-medoids聚类算法对道岔转辙机性能退化状态进行阶段划分,识别不同的退化状态;最后,选用轮廓系数、分类系数、平均模糊熵对聚类效果进行综合评价,并与模糊C均值聚类(Fuzzy C-Means Clustering,FCM)和古斯塔夫森-凯塞尔(Gustafson Kessel,GK)聚类算法进行比较。研究结果表明,融合特征聚类后的综合评价指标高于单一特征,更能够体现道岔转辙机退化过程中的细节,K-medoids聚类效果明显,模型的准确率达到96.3%,能够对道岔转辙机性能退化阶段进行准确的划分,为铁路现场道岔智能运维提供理论支持。展开更多
目的:了解腹膜透析症状群研究领域的研究趋势、前沿、热点动态,为研究者全面把握该领域的研究态势提供借鉴及参考。方法:以Web of Science数据库核心合集为数据来源,以2013—2023年发表的腹膜透析症状群相关领域的文献为主要分析对象,使...目的:了解腹膜透析症状群研究领域的研究趋势、前沿、热点动态,为研究者全面把握该领域的研究态势提供借鉴及参考。方法:以Web of Science数据库核心合集为数据来源,以2013—2023年发表的腹膜透析症状群相关领域的文献为主要分析对象,使用CiteSpace 6.2.R1软件进行文献可视化分析。结果:共纳入571篇相关文献,发文量基本呈逐年上升趋势;发文量最多的国家是美国;研究热点主要集中在慢性肾病、死亡率、代谢综合征、心血管疾病、风险等相关方面;研究前沿主要围绕代谢综合征、不宁腿综合征等方面逐步发展变化。结论:我国未来在腹膜透析症状群研究领域方面应加强与国外学者的合作与交流,关注研究前沿及动态、深入探讨研究热点,提高我国该领域的研究水平。展开更多
基于2000—2021年中国知识基础设施工程(CNKI)和Web of Science核心数据库(WOS)中以智慧水利为主题的相关研究文献,采用VOSviewer、CiteSpace等软件构建智慧水利研究领域文献量时序分布、发文机构和研究热点演变的各类知识图谱,分析了...基于2000—2021年中国知识基础设施工程(CNKI)和Web of Science核心数据库(WOS)中以智慧水利为主题的相关研究文献,采用VOSviewer、CiteSpace等软件构建智慧水利研究领域文献量时序分布、发文机构和研究热点演变的各类知识图谱,分析了当前智慧水利研究进展。结果表明:智慧水利文献量均逐年递增,但CNKI数据库文献量与WOS数据库相比存在明显差距;智慧水利领域已形成核心研究机构,对其前沿发展做出了重要贡献;CNKI数据库中智慧水利研究侧重以流域为单位构建数字流域与智慧水利框架,WOS数据库则侧重从地理地球视角出发开展研究,两者均以物联网、深度学习等为基础搭建智慧水利平台。展开更多
This paper describes the program module "GRAPHS" which was developed for data processing in geobotany and ecology fields. The "GRAPHS" has a simple interface and is integrated into the Microsoft Excel. This allows...This paper describes the program module "GRAPHS" which was developed for data processing in geobotany and ecology fields. The "GRAPHS" has a simple interface and is integrated into the Microsoft Excel. This allows users to use all features of Microsoft Excel for storage and preparation data for analysis. Calculation of the most common similarity indexes (Jaccarda. Sorenson, Ohai etc.) and their visualization by using different algorithms of the graph theory or hierarchical cluster analysis allows simplifying and accelerating the process of data analysis in ecology or geobotany and makes it clearer. Also, three ordination methods--PCA (principal components analysis), CA (correspondence analysis). NMS (nonmetric multidimensional scaling)-were implemented in the module. The module can be used for vegetation classification, and be used to allocate diagnostic species or to search environmental factors most strongly impact on vegetation. Algorithms of data analysis which were implemented in the module "GRAPHS" have universal nature so they can be applied in many other fields of science.展开更多
The previous studies on detection of communities on complex networks were focused on nondirected graphs, such as Neural Networks, social networks, social interrelations, the contagion of diseases, and bibliographies. ...The previous studies on detection of communities on complex networks were focused on nondirected graphs, such as Neural Networks, social networks, social interrelations, the contagion of diseases, and bibliographies. However, there are also other problems whose modeling entails obtaining a weakly connected directed graph such as the student access to the university, the public transport networks, or trophic chains. Those cases deserve particularized study with an analysis and the resolution adjusted to them. Additionally, this is a challenge, since the existing algorithms in most of the cases were originally designed for non-directed graphs or symmetrical and regular graphs. Our proposal is a Benchmark Generator of Weakly Connected Directed Graphs whose properties can be defined by the end-users according to their necessities. The source code of the generators described in this article is available in GitHub under the GNU license.展开更多
基金This work has been supported by.Central University Research Fund(No.2016MS116,No.2016MS117,No.2018MS074)the National Natural Science Foundation(51677072).
文摘Effective storage,processing and analyzing of power device condition monitoring data faces enormous challenges.A framework is proposed that can support both MapReduce and Graph for massive monitoring data analysis at the same time based on Aliyun DTplus platform.First,power device condition monitoring data storage based on MaxCompute table and parallel permutation entropy feature extraction based on MaxCompute MapReduce are designed and implemented on DTplus platform.Then,Graph based k-means algorithm is implemented and used for massive condition monitoring data clustering analysis.Finally,performance tests are performed to compare the execution time between serial program and parallel program.Performance is analyzed from CPU cores consumption,memory utilization and parallel granularity.Experimental results show that the designed framework and parallel algorithms can efficiently process massive power device condition monitoring data.
基金Supported bythe 211 Project of Ministry of Educa-tion of China
文摘How to find these communities is an important research work. Recently, community discovery are mainly categorized to HITS algorithm, bipartite cores algorithm and maximum flow/minimum cut framework. In this paper, we proposed a new method to extract communities. The MCL algorithm, which is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm is used to extract communities. By putting mirror deleting procedure behind graph clustering, we decrease comparing cost considerably. After MCL and mirror deletion, we use community member select algorithm to produce the sets of community candidates. The experiment and results show the new method works effectively and properly.
文摘Based on CNKI database,the paper takes journal literatures with the theme of “sports tourism” as the research objects to draw knowledge map through bibliometrics method and CiteSpace software.Moreover,the paper draws knowledge map and analyses for 1,208 periodical literatures with the theme of “sports tourism” in CNKI from 2002 to 2022.The results show that the sports tourism research from 2002 to 2022 can be divided into three stages:The first is the accelerated development stage,which focuses on the industrialization development of sports tourism;The second is the rapid prosperity stage,which focuses on how to promote economic development,and constantly deepen and enrich the connotation of sports tourism industry;The third is the rising stage of fluctuation,which begins to explore new development opportunities of sports tourism from multiple perspectives.Based on keywords co-occurrence analysis and cluster analysis,the research hotspots and changes of sports tourism in China are obtained.Five key research hotspots and three main directions are extracted to provide reference for the development of sports tourism research hotspots in the future.
文摘The clustering of objects(individuals or variables)is one of the most used approaches to exploring multivariate data.The two most common unsupervised clustering strategies are hierarchical ascending clustering(HAC)and k-means partitioning used to identify groups of similar objects in a dataset to divide it into homogeneous groups.The proposed topological clustering of variables,called TCV,studies an homogeneous set of variables defined on the same set of individuals,based on the notion of neighborhood graphs,some of these variables are more-or-less correlated or linked according to the type quantitative or qualitative of the variables.This topological data analysis approach can then be useful for dimension reduction and variable selection.It’s a topological hierarchical clustering analysis of a set of variables which can be quantitative,qualitative or a mixture of both.It arranges variables into homogeneous groups according to their correlations or associations studied in a topological context of principal component analysis(PCA)or multiple correspondence analysis(MCA).The proposed TCV is adapted to the type of data considered,its principle is presented and illustrated using simple real datasets with quantitative,qualitative and mixed variables.The results of these illustrative examples are compared to those of other variables clustering approaches.
文摘目的:了解腹膜透析症状群研究领域的研究趋势、前沿、热点动态,为研究者全面把握该领域的研究态势提供借鉴及参考。方法:以Web of Science数据库核心合集为数据来源,以2013—2023年发表的腹膜透析症状群相关领域的文献为主要分析对象,使用CiteSpace 6.2.R1软件进行文献可视化分析。结果:共纳入571篇相关文献,发文量基本呈逐年上升趋势;发文量最多的国家是美国;研究热点主要集中在慢性肾病、死亡率、代谢综合征、心血管疾病、风险等相关方面;研究前沿主要围绕代谢综合征、不宁腿综合征等方面逐步发展变化。结论:我国未来在腹膜透析症状群研究领域方面应加强与国外学者的合作与交流,关注研究前沿及动态、深入探讨研究热点,提高我国该领域的研究水平。
文摘基于2000—2021年中国知识基础设施工程(CNKI)和Web of Science核心数据库(WOS)中以智慧水利为主题的相关研究文献,采用VOSviewer、CiteSpace等软件构建智慧水利研究领域文献量时序分布、发文机构和研究热点演变的各类知识图谱,分析了当前智慧水利研究进展。结果表明:智慧水利文献量均逐年递增,但CNKI数据库文献量与WOS数据库相比存在明显差距;智慧水利领域已形成核心研究机构,对其前沿发展做出了重要贡献;CNKI数据库中智慧水利研究侧重以流域为单位构建数字流域与智慧水利框架,WOS数据库则侧重从地理地球视角出发开展研究,两者均以物联网、深度学习等为基础搭建智慧水利平台。
文摘This paper describes the program module "GRAPHS" which was developed for data processing in geobotany and ecology fields. The "GRAPHS" has a simple interface and is integrated into the Microsoft Excel. This allows users to use all features of Microsoft Excel for storage and preparation data for analysis. Calculation of the most common similarity indexes (Jaccarda. Sorenson, Ohai etc.) and their visualization by using different algorithms of the graph theory or hierarchical cluster analysis allows simplifying and accelerating the process of data analysis in ecology or geobotany and makes it clearer. Also, three ordination methods--PCA (principal components analysis), CA (correspondence analysis). NMS (nonmetric multidimensional scaling)-were implemented in the module. The module can be used for vegetation classification, and be used to allocate diagnostic species or to search environmental factors most strongly impact on vegetation. Algorithms of data analysis which were implemented in the module "GRAPHS" have universal nature so they can be applied in many other fields of science.
基金supported by the Project“Complex Networks”from the Instituto Universitario de Matematica Multidisciplinar(IUMM)of the Universitat Politecnica de Valencia(UPV)[under Grant number(266500194)20170251-Complex-Networks-UPV]
文摘The previous studies on detection of communities on complex networks were focused on nondirected graphs, such as Neural Networks, social networks, social interrelations, the contagion of diseases, and bibliographies. However, there are also other problems whose modeling entails obtaining a weakly connected directed graph such as the student access to the university, the public transport networks, or trophic chains. Those cases deserve particularized study with an analysis and the resolution adjusted to them. Additionally, this is a challenge, since the existing algorithms in most of the cases were originally designed for non-directed graphs or symmetrical and regular graphs. Our proposal is a Benchmark Generator of Weakly Connected Directed Graphs whose properties can be defined by the end-users according to their necessities. The source code of the generators described in this article is available in GitHub under the GNU license.