An information system is a type of knowledge representation,and attribute reduction is crucial in big data,machine learning,data mining,and intelligent systems.There are several ways for solving attribute reduction pr...An information system is a type of knowledge representation,and attribute reduction is crucial in big data,machine learning,data mining,and intelligent systems.There are several ways for solving attribute reduction problems,but they all require a common categorization.The selection of features in most scientific studies is a challenge for the researcher.When working with huge datasets,selecting all available attributes is not an option because it frequently complicates the study and decreases performance.On the other side,neglecting some attributes might jeopardize data accuracy.In this case,rough set theory provides a useful approach for identifying superfluous attributes that may be ignored without sacrificing any significant information;nonetheless,investigating all available combinations of attributes will result in some problems.Furthermore,because attribute reduction is primarily a mathematical issue,technical progress in reduction is dependent on the advancement of mathematical models.Because the focus of this study is on the mathematical side of attribute reduction,we propose some methods to make a reduction for information systems according to classical rough set theory,the strength of rules and similarity matrix,we applied our proposed methods to several examples and calculate the reduction for each case.These methods expand the options of attribute reductions for researchers.展开更多
A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also con...A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also consider the importance of each page which is described as in-weight and out-weight. Compared with the traditional clustering methods, the experiments show that the runtimes of the proposed algorithms are less with the improved accuracies.展开更多
K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper propo...K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.展开更多
As far as the problem of intuitionistic fuzzy cluster analysis is concerned, this paper proposes a new formula of similarity degree with attribute weight of each index. We conduct a fuzzy cluster analysis based on the...As far as the problem of intuitionistic fuzzy cluster analysis is concerned, this paper proposes a new formula of similarity degree with attribute weight of each index. We conduct a fuzzy cluster analysis based on the new intuitionistic fuzzy similarity matrix, which is constructed via this new weighted similarity degree method and can be transformed into a fuzzy similarity matrix. Moreover, an example is given to demonstrate the feasibility and validity of this method.展开更多
A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article. The method first defines page fuzzy categories according to the links on the index page of the...A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article. The method first defines page fuzzy categories according to the links on the index page of the site, then computes fuzzy degree of cross page through aggregating on data of Web log. After that, by using fuzzy comprehensive evaluation method, the method constructs user interest vectors according to page viewing times and frequency of hits, and derives the fuzzy similarity matrix from the interest vectors for the Web users. Finally, it gets the clustering result through the fuzzy clustering method. The experimental results show the effectiveness of the method. Key words Web log mining - fuzzy similarity matrix - fuzzy comprehensive evaluation - fuzzy clustering CLC number TP18 - TP311 - TP391 Foundation item: Supported by the Natural Science Foundation of Heilongjiang Province of China (F0304)Biography: ZHAN Li-qiang (1966-), male, Lecturer, Ph. D. research direction: the theory methods of data mining and theory of database.展开更多
Late fusion multi-view clustering(LFMVC)algorithms aim to integrate the base partition of each single view into a consensus partition.Base partitions can be obtained by performing kernel k-means clustering on all view...Late fusion multi-view clustering(LFMVC)algorithms aim to integrate the base partition of each single view into a consensus partition.Base partitions can be obtained by performing kernel k-means clustering on all views.This type of method is not only computationally efficient,but also more accurate than multiple kernel k-means,and is thus widely used in the multi-view clustering context.LFMVC improves computational efficiency to the extent that the computational complexity of each iteration is reduced from Oen3T to OenT(where n is the number of samples).However,LFMVC also limits the search space of the optimal solution,meaning that the clustering results obtained are not ideal.Accordingly,in order to obtain more information from each base partition and thus improve the clustering performance,we propose a new late fusion multi-view clustering algorithm with a computational complexity of Oen2T.Experiments on several commonly used datasets demonstrate that the proposed algorithm can reach quickly convergence.Moreover,compared with other late fusion algorithms with computational complexity of OenT,the actual time consumption of the proposed algorithm does not significantly increase.At the same time,comparisons with several other state-of-the-art algorithms reveal that the proposed algorithm also obtains the best clustering performance.展开更多
This paper summarizes the research results dealing with washer and nut taxonomy and knowledge base design, making the use of fuzzy methodology. In particular, the theory of fuzzy membership functions, similarity matri...This paper summarizes the research results dealing with washer and nut taxonomy and knowledge base design, making the use of fuzzy methodology. In particular, the theory of fuzzy membership functions, similarity matrices, and the operation of fuzzy inference play important roles.A realistic set of 25 washers and nuts are employed to conduct extensive experiments and simulations.The investigation includes a complete demonstration of engineering design. The results obtained from this feasibility study are very encouraging indeed because they represent the lower bound with respect to performance, namely correctrecognition rate, of what fuzzy methodology can do. This lower bound shows high recognition rate even with noisy input patterns, robustness in terms of noise tolerance, and simplicity in hardware implementation. Possible future works are suggested in the conclusion.展开更多
With the rapid development of WLAN( Wireless Local Area Network) technology,an important target of indoor positioning systems is to improve the positioning accuracy while reducing the online computation.In this paper,...With the rapid development of WLAN( Wireless Local Area Network) technology,an important target of indoor positioning systems is to improve the positioning accuracy while reducing the online computation.In this paper,it proposes a novel fingerprint positioning algorithm known as semi-supervised affinity propagation clustering based on distance function constraints. We show that by employing affinity propagation techniques,it is able to use a fractional labeled data to adjust similarity matrix of signal space to cluster reference points with high accuracy. The semi-supervised APC uses a combination of machine learning,clustering analysis and fingerprinting algorithm. By collecting data and testing our algorithm in a realistic indoor WLAN environment,the experimental results indicate that the proposed algorithm can improve positioning accuracy while reduce the online localization computation,as compared with the widely used K nearest neighbor and maximum likelihood estimation algorithms.展开更多
This paper presents a new method of damage condition assessment that allows accommodating other types of uncertainties due to ambiguity, vagueness, and fuzziness that are statistically nondescribable. In this method, ...This paper presents a new method of damage condition assessment that allows accommodating other types of uncertainties due to ambiguity, vagueness, and fuzziness that are statistically nondescribable. In this method, healthy observations are used to construct a fury set representing sound performance characteristics. Additionally, the bounds on the similarities among the structural damage states are prescribed by using the state similarity matrix. Thus, an optimal group fuzzy sets representing damage states such as little, moderate, and severe damage can be inferred as an inverse problem from healthy observations only. The optimal group of damage fuzzy sets is used to classify a set of observations at any unknown state of damage using the principles of fitzzy pattern recognition based on an approximate principle . This method can be embedded into the system of Structural Health Monitoring (SHM) to give advice about structural maintenance and life predictio comes from Reference [ 9 ] for damage pattern recognition is presented n. Finally, a case and discussed. The study, which compared result illustrates our method is more effective and general, so it is very practical in engineering.展开更多
As an efficient technique for anti-counterfeiting,holographic diffraction labels has been widely applied to various fields.Due to their unique feature,traditional image recognition algorithms are not ideal for the hol...As an efficient technique for anti-counterfeiting,holographic diffraction labels has been widely applied to various fields.Due to their unique feature,traditional image recognition algorithms are not ideal for the holographic diffraction label recognition.Since a tensor preserves the spatiotemporal features of an original sample in the process of feature extraction,in this paper we propose a new holographic diffraction label recognition algorithm that combines two tensor features.The HSV(Hue Saturation Value)tensor and the HOG(Histogram of Oriented Gradient)tensor are used to represent the color information and gradient information of holographic diffraction label,respectively.Meanwhile,the tensor decomposition is performed by high order singular value decomposition,and tensor decomposition matrices are obtained.Taking into consideration of the different recognition capabilities of decomposition matrices,we design a decomposition matrix similarity fusion strategy using a typical correlation analysis algorithm and projection from similarity vectors of different decomposition matrices to the PCA(Principal Component Analysis)sub-space,then,the sub-space performs KNN(K-Nearest Neighbors)classification is performed.The effectiveness of our fusion strategy is verified by experiments.Our double tensor recognition algorithm complements the recognition capability of different tensors to produce better recognition performance for the holographic diffraction label system.展开更多
In this paper,for the regularized Hermitian and skew-Hermitian splitting(RHSS)preconditioner introduced by Bai and Benzi(BIT Numer Math 57:287–311,2017)for the solution of saddle-point linear systems,we analyze the s...In this paper,for the regularized Hermitian and skew-Hermitian splitting(RHSS)preconditioner introduced by Bai and Benzi(BIT Numer Math 57:287–311,2017)for the solution of saddle-point linear systems,we analyze the spectral properties of the preconditioned matrix when the regularization matrix is a special Hermitian positive semidefinite matrix which depends on certain parameters.We accurately describe the numbers of eigenvalues clustered at(0,0)and(2,0),if the iteration parameter is close to 0.An estimate about the condition number of the corresponding eigenvector matrix,which partly determines the convergence rate of the RHSS-preconditioned Krylov subspace method,is also studied in this work.展开更多
Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of in...Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of interest in genetic association studies,genomic analyses,and many other research areas.Based on it,a pseudo-F statistic which partitions the variation in distance matrices is often constructed to achieve the aim.To the best of our knowledge,the statistical properties of the pseudo-F statistic has not yet been well established in the literature.To fill this gap,the authors study the asymptotic null distribution of the pseudo-F statistic and show that it is asymptotically equivalent to a mixture of chi-squared random variables.Given that the pseudo-F test statistic has unsatisfactory power when the correlations of the response variables are large,the authors propose a square-root F-type test statistic which replaces the similarity matrix with its square root.The asymptotic null distribution of the new test statistic and power of both tests are also investigated.Simulation studies are conducted to validate the asymptotic distributions of the tests and demonstrate that the proposed test has more robust power than the pseudo-F test.Both test statistics are exemplified with a gene expression dataset for a prostate cancer pathway.展开更多
Today’s link prediction methods are based on the network structure using a single-channel approach for prediction,and there is a lack of link prediction algorithms constructed from a multichannel approach,which makes...Today’s link prediction methods are based on the network structure using a single-channel approach for prediction,and there is a lack of link prediction algorithms constructed from a multichannel approach,which makes the features monotonous and noncomplementary.To address this problem,this paper proposes a link prediction algorithm based on multichannel structure modelling(MCLP).First,the network is sampled three times to construct its three subgraph structures.Second,the node representation vectors of the network are learned separately for each subgraph on a single channel.Then,the three node representation vectors are combined,and the similarity matrix is calculated for the combined vectors.Finally,the performance of the MCLP algorithm is evaluated by calculating the AUC using the similarity matrix and conducting multiple experiments on three citation network datasets.The experimental results show that the proposed link prediction algorithm has an AUC of 98.92%,which is better than the performance of the 24 link prediction comparison algorithms used in this paper.The experimental results sufficiently prove that the MCLP algorithm can effectively extract the relationships between network nodes,and confirm its effectiveness and feasibility.展开更多
In this article, a clustering method based on genetic algorithm (GA) for telecommunication customer subdivision is presented. First, the features of telecommunication customers (such as the calling behavior and con...In this article, a clustering method based on genetic algorithm (GA) for telecommunication customer subdivision is presented. First, the features of telecommunication customers (such as the calling behavior and consuming behavior) are extracted. Second, the similarities between the multidimensional feature vectors of telecommunication customers are computed and mapped as the distance between samples on a two-dimensional plane. Finally, the distances are adjusted to approximate the similarities gradually by GA. One advantage of this method is the independent distribution of the sample space. The experiments demonstrate the feasibility of the proposed method.展开更多
Secondary earth faults occur frequently in power distribution networks under harsh weather conditions.Owing to its characteristics,a secondary earth fault is typically hidden within the transient of the first fault.Th...Secondary earth faults occur frequently in power distribution networks under harsh weather conditions.Owing to its characteristics,a secondary earth fault is typically hidden within the transient of the first fault.Therefore,most researchers tend to focus on a feeder with single fault while disregarding secondary faults.This paper presents a fault feeder identification method that considers secondary earth faults in a non-effectively grounded distribution network.First,the wavelet singular entropy method is used to detect a secondary fault event.This method can identify the moment at which a secondary fault occurs.The zero-sequence current data can be categorized into two fault stages.The first and second fault stages correspond to the first and secondary faults,respectively.Subsequently,a similarity matrix containing the time-frequency transient information of the zero-sequence current at the two fault stages is defined to identify the fault feeders.Finally,to confirm the effectiveness and reliability of the proposed method,we conduct simulation experiments and an adaptability analysis based on an electromagnetic transient program.展开更多
Approximations based on random Fourier features have recently emerged as an efficient and elegant method for designing large-scale machine learning tasks.Unlike approaches using the Nystr?m method,which randomly sampl...Approximations based on random Fourier features have recently emerged as an efficient and elegant method for designing large-scale machine learning tasks.Unlike approaches using the Nystr?m method,which randomly samples the training examples,we make use of random Fourier features,whose basis functions(i.e.,cosine and sine)are sampled from a distribution independent from the training sample set,to cluster preference data which appears extensively in recommender systems.Firstly,we propose a two-stage preference clustering framework.In this framework,we make use of random Fourier features to map the preference matrix into the feature matrix,soon afterwards,utilize the traditional k-means approach to cluster preference data in the transformed feature space.Compared with traditional preference clustering,our method solves the problem of insufficient memory and greatly improves the efficiency of the operation.Experiments on movie data sets containing 100000 ratings,show that the proposed method is more effective in clustering accuracy than the Nystr?m and k-means,while also achieving better performance than these clustering approaches.展开更多
Consensus clustering is the problem of coordinating clustering information about the same data set coming from different runs of the same algorithm. Consensus clustering is becoming a state-of-the-art approach in an i...Consensus clustering is the problem of coordinating clustering information about the same data set coming from different runs of the same algorithm. Consensus clustering is becoming a state-of-the-art approach in an increasing number of applications. However, determining the optimal cluster number is still an open problem. In this paper, we propose a novel consensus clustering algorithm that is based on the Minkowski distance. Fusing with the Newman greedy algorithm in complex networks, the proposed clustering algorithm can automatically set the number of clusters. It is less sensitive to noise and can integrate solutions from multiple samples of data or attributes for processing data in the processing industry. A numerical simulation is also given to demonstrate the effectiveness of the proposed algorithm. Finally, this consensus clustering algorithm is applied to a froth flotation process.展开更多
Radiology doctors perform text-based image retrieval when they want to retrieve medical images.However,the accuracy and efficiency of such retrieval cannot keep up with the requirements.An innovative algorithm is bein...Radiology doctors perform text-based image retrieval when they want to retrieve medical images.However,the accuracy and efficiency of such retrieval cannot keep up with the requirements.An innovative algorithm is being proposed to retrieve similar medical images.First,we extract the professional terms from the ontology structure and use them to annotate the CT images.Second,the semantic similarity matrix of ontology terms is calculated according to the structure of the ontology.Lastly,the corresponding semantic distance is calculated according to the marked vector,which contains different annotations.We use 120 real liver CT images(divided into six categories)of a top three-hospital to run the algorithm of the program.Result shows that the retrieval index"Precision"is 80.81%,and the classification index"AUC(Area Under Curve)"under the"ROC curve"(Receiver Operating Characteristic)is 0.945.展开更多
文摘An information system is a type of knowledge representation,and attribute reduction is crucial in big data,machine learning,data mining,and intelligent systems.There are several ways for solving attribute reduction problems,but they all require a common categorization.The selection of features in most scientific studies is a challenge for the researcher.When working with huge datasets,selecting all available attributes is not an option because it frequently complicates the study and decreases performance.On the other side,neglecting some attributes might jeopardize data accuracy.In this case,rough set theory provides a useful approach for identifying superfluous attributes that may be ignored without sacrificing any significant information;nonetheless,investigating all available combinations of attributes will result in some problems.Furthermore,because attribute reduction is primarily a mathematical issue,technical progress in reduction is dependent on the advancement of mathematical models.Because the focus of this study is on the mathematical side of attribute reduction,we propose some methods to make a reduction for information systems according to classical rough set theory,the strength of rules and similarity matrix,we applied our proposed methods to several examples and calculate the reduction for each case.These methods expand the options of attribute reductions for researchers.
基金Sponsored bythe Huo Ying-Dong Education Foundation of China(91101)
文摘A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also consider the importance of each page which is described as in-weight and out-weight. Compared with the traditional clustering methods, the experiments show that the runtimes of the proposed algorithms are less with the improved accuracies.
文摘K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.
文摘As far as the problem of intuitionistic fuzzy cluster analysis is concerned, this paper proposes a new formula of similarity degree with attribute weight of each index. We conduct a fuzzy cluster analysis based on the new intuitionistic fuzzy similarity matrix, which is constructed via this new weighted similarity degree method and can be transformed into a fuzzy similarity matrix. Moreover, an example is given to demonstrate the feasibility and validity of this method.
文摘A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article. The method first defines page fuzzy categories according to the links on the index page of the site, then computes fuzzy degree of cross page through aggregating on data of Web log. After that, by using fuzzy comprehensive evaluation method, the method constructs user interest vectors according to page viewing times and frequency of hits, and derives the fuzzy similarity matrix from the interest vectors for the Web users. Finally, it gets the clustering result through the fuzzy clustering method. The experimental results show the effectiveness of the method. Key words Web log mining - fuzzy similarity matrix - fuzzy comprehensive evaluation - fuzzy clustering CLC number TP18 - TP311 - TP391 Foundation item: Supported by the Natural Science Foundation of Heilongjiang Province of China (F0304)Biography: ZHAN Li-qiang (1966-), male, Lecturer, Ph. D. research direction: the theory methods of data mining and theory of database.
基金the Hunan Provincial Science and Technology Plan Project.The specific grant number is 2018XK2102.Y.P.Zhao,W.X.Liang,J.Z.Lu and X.W.Chen all received this grant.
文摘Late fusion multi-view clustering(LFMVC)algorithms aim to integrate the base partition of each single view into a consensus partition.Base partitions can be obtained by performing kernel k-means clustering on all views.This type of method is not only computationally efficient,but also more accurate than multiple kernel k-means,and is thus widely used in the multi-view clustering context.LFMVC improves computational efficiency to the extent that the computational complexity of each iteration is reduced from Oen3T to OenT(where n is the number of samples).However,LFMVC also limits the search space of the optimal solution,meaning that the clustering results obtained are not ideal.Accordingly,in order to obtain more information from each base partition and thus improve the clustering performance,we propose a new late fusion multi-view clustering algorithm with a computational complexity of Oen2T.Experiments on several commonly used datasets demonstrate that the proposed algorithm can reach quickly convergence.Moreover,compared with other late fusion algorithms with computational complexity of OenT,the actual time consumption of the proposed algorithm does not significantly increase.At the same time,comparisons with several other state-of-the-art algorithms reveal that the proposed algorithm also obtains the best clustering performance.
文摘This paper summarizes the research results dealing with washer and nut taxonomy and knowledge base design, making the use of fuzzy methodology. In particular, the theory of fuzzy membership functions, similarity matrices, and the operation of fuzzy inference play important roles.A realistic set of 25 washers and nuts are employed to conduct extensive experiments and simulations.The investigation includes a complete demonstration of engineering design. The results obtained from this feasibility study are very encouraging indeed because they represent the lower bound with respect to performance, namely correctrecognition rate, of what fuzzy methodology can do. This lower bound shows high recognition rate even with noisy input patterns, robustness in terms of noise tolerance, and simplicity in hardware implementation. Possible future works are suggested in the conclusion.
基金Sponsored by the National Natural Science Foundation of China(Grant No.61101122 and 61071105)
文摘With the rapid development of WLAN( Wireless Local Area Network) technology,an important target of indoor positioning systems is to improve the positioning accuracy while reducing the online computation.In this paper,it proposes a novel fingerprint positioning algorithm known as semi-supervised affinity propagation clustering based on distance function constraints. We show that by employing affinity propagation techniques,it is able to use a fractional labeled data to adjust similarity matrix of signal space to cluster reference points with high accuracy. The semi-supervised APC uses a combination of machine learning,clustering analysis and fingerprinting algorithm. By collecting data and testing our algorithm in a realistic indoor WLAN environment,the experimental results indicate that the proposed algorithm can improve positioning accuracy while reduce the online localization computation,as compared with the widely used K nearest neighbor and maximum likelihood estimation algorithms.
基金This paper is supported by the National High Technology Research and Development Program ("863" Program) of China under Grant No.2006AA04Z437
文摘This paper presents a new method of damage condition assessment that allows accommodating other types of uncertainties due to ambiguity, vagueness, and fuzziness that are statistically nondescribable. In this method, healthy observations are used to construct a fury set representing sound performance characteristics. Additionally, the bounds on the similarities among the structural damage states are prescribed by using the state similarity matrix. Thus, an optimal group fuzzy sets representing damage states such as little, moderate, and severe damage can be inferred as an inverse problem from healthy observations only. The optimal group of damage fuzzy sets is used to classify a set of observations at any unknown state of damage using the principles of fitzzy pattern recognition based on an approximate principle . This method can be embedded into the system of Structural Health Monitoring (SHM) to give advice about structural maintenance and life predictio comes from Reference [ 9 ] for damage pattern recognition is presented n. Finally, a case and discussed. The study, which compared result illustrates our method is more effective and general, so it is very practical in engineering.
基金This work was mainly supported by Public Welfare Technology and Industry Project of Zhejiang Provincial Science Technology Department.(No.LGG18F020013,No.LGG19F020016,LGF21F020006).
文摘As an efficient technique for anti-counterfeiting,holographic diffraction labels has been widely applied to various fields.Due to their unique feature,traditional image recognition algorithms are not ideal for the holographic diffraction label recognition.Since a tensor preserves the spatiotemporal features of an original sample in the process of feature extraction,in this paper we propose a new holographic diffraction label recognition algorithm that combines two tensor features.The HSV(Hue Saturation Value)tensor and the HOG(Histogram of Oriented Gradient)tensor are used to represent the color information and gradient information of holographic diffraction label,respectively.Meanwhile,the tensor decomposition is performed by high order singular value decomposition,and tensor decomposition matrices are obtained.Taking into consideration of the different recognition capabilities of decomposition matrices,we design a decomposition matrix similarity fusion strategy using a typical correlation analysis algorithm and projection from similarity vectors of different decomposition matrices to the PCA(Principal Component Analysis)sub-space,then,the sub-space performs KNN(K-Nearest Neighbors)classification is performed.The effectiveness of our fusion strategy is verified by experiments.Our double tensor recognition algorithm complements the recognition capability of different tensors to produce better recognition performance for the holographic diffraction label system.
基金The work is partially supported by the National Natural Science Foundation of China (No. 11801362).
文摘In this paper,for the regularized Hermitian and skew-Hermitian splitting(RHSS)preconditioner introduced by Bai and Benzi(BIT Numer Math 57:287–311,2017)for the solution of saddle-point linear systems,we analyze the spectral properties of the preconditioned matrix when the regularization matrix is a special Hermitian positive semidefinite matrix which depends on certain parameters.We accurately describe the numbers of eigenvalues clustered at(0,0)and(2,0),if the iteration parameter is close to 0.An estimate about the condition number of the corresponding eigenvector matrix,which partly determines the convergence rate of the RHSS-preconditioned Krylov subspace method,is also studied in this work.
基金partially supported by Beijing Natural Science Foundation under Grant No.Z180006.
文摘Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of interest in genetic association studies,genomic analyses,and many other research areas.Based on it,a pseudo-F statistic which partitions the variation in distance matrices is often constructed to achieve the aim.To the best of our knowledge,the statistical properties of the pseudo-F statistic has not yet been well established in the literature.To fill this gap,the authors study the asymptotic null distribution of the pseudo-F statistic and show that it is asymptotically equivalent to a mixture of chi-squared random variables.Given that the pseudo-F test statistic has unsatisfactory power when the correlations of the response variables are large,the authors propose a square-root F-type test statistic which replaces the similarity matrix with its square root.The asymptotic null distribution of the new test statistic and power of both tests are also investigated.Simulation studies are conducted to validate the asymptotic distributions of the tests and demonstrate that the proposed test has more robust power than the pseudo-F test.Both test statistics are exemplified with a gene expression dataset for a prostate cancer pathway.
基金This article was supported by the National Key Research and Development Program of China(No.2020YFC1523300)the Innovation Platform Construction Project of Qinghai Province(2022-ZJ-T02).
文摘Today’s link prediction methods are based on the network structure using a single-channel approach for prediction,and there is a lack of link prediction algorithms constructed from a multichannel approach,which makes the features monotonous and noncomplementary.To address this problem,this paper proposes a link prediction algorithm based on multichannel structure modelling(MCLP).First,the network is sampled three times to construct its three subgraph structures.Second,the node representation vectors of the network are learned separately for each subgraph on a single channel.Then,the three node representation vectors are combined,and the similarity matrix is calculated for the combined vectors.Finally,the performance of the MCLP algorithm is evaluated by calculating the AUC using the similarity matrix and conducting multiple experiments on three citation network datasets.The experimental results show that the proposed link prediction algorithm has an AUC of 98.92%,which is better than the performance of the 24 link prediction comparison algorithms used in this paper.The experimental results sufficiently prove that the MCLP algorithm can effectively extract the relationships between network nodes,and confirm its effectiveness and feasibility.
基金supported by the Hi-Tech Research and Development Program of China(2006AA01Z229)
文摘In this article, a clustering method based on genetic algorithm (GA) for telecommunication customer subdivision is presented. First, the features of telecommunication customers (such as the calling behavior and consuming behavior) are extracted. Second, the similarities between the multidimensional feature vectors of telecommunication customers are computed and mapped as the distance between samples on a two-dimensional plane. Finally, the distances are adjusted to approximate the similarities gradually by GA. One advantage of this method is the independent distribution of the sample space. The experiments demonstrate the feasibility of the proposed method.
基金This work was supported in part by National Science Foundation of China(No.51907097)National Key R&D Program of China(No.2020YFF0305800)+1 种基金the Full-time Postdoc Research and Development Fund of Sichuan University in China(No.2019SCU12003)the Applied Basic Research of Sichuan Province(No.2020YJ0012).
文摘Secondary earth faults occur frequently in power distribution networks under harsh weather conditions.Owing to its characteristics,a secondary earth fault is typically hidden within the transient of the first fault.Therefore,most researchers tend to focus on a feeder with single fault while disregarding secondary faults.This paper presents a fault feeder identification method that considers secondary earth faults in a non-effectively grounded distribution network.First,the wavelet singular entropy method is used to detect a secondary fault event.This method can identify the moment at which a secondary fault occurs.The zero-sequence current data can be categorized into two fault stages.The first and second fault stages correspond to the first and secondary faults,respectively.Subsequently,a similarity matrix containing the time-frequency transient information of the zero-sequence current at the two fault stages is defined to identify the fault feeders.Finally,to confirm the effectiveness and reliability of the proposed method,we conduct simulation experiments and an adaptability analysis based on an electromagnetic transient program.
基金supported by the National Natural Science Foundation of China(Nos.61872260 and 61592419)the Natural Science Foundation of Shanxi Province(No.201703D421013).
文摘Approximations based on random Fourier features have recently emerged as an efficient and elegant method for designing large-scale machine learning tasks.Unlike approaches using the Nystr?m method,which randomly samples the training examples,we make use of random Fourier features,whose basis functions(i.e.,cosine and sine)are sampled from a distribution independent from the training sample set,to cluster preference data which appears extensively in recommender systems.Firstly,we propose a two-stage preference clustering framework.In this framework,we make use of random Fourier features to map the preference matrix into the feature matrix,soon afterwards,utilize the traditional k-means approach to cluster preference data in the transformed feature space.Compared with traditional preference clustering,our method solves the problem of insufficient memory and greatly improves the efficiency of the operation.Experiments on movie data sets containing 100000 ratings,show that the proposed method is more effective in clustering accuracy than the Nystr?m and k-means,while also achieving better performance than these clustering approaches.
基金supported by National High Technology Research and Development Program(863Program)(No.2013AA040301-3)National Natural Science Foundation of China(Nos.61473319 and 61104135)+1 种基金the Key Project of National Natural Science Foundation of China(Nos.61621062 and 61134006)the Innovation Research Funds of Central South University(No.2016CX014)
文摘Consensus clustering is the problem of coordinating clustering information about the same data set coming from different runs of the same algorithm. Consensus clustering is becoming a state-of-the-art approach in an increasing number of applications. However, determining the optimal cluster number is still an open problem. In this paper, we propose a novel consensus clustering algorithm that is based on the Minkowski distance. Fusing with the Newman greedy algorithm in complex networks, the proposed clustering algorithm can automatically set the number of clusters. It is less sensitive to noise and can integrate solutions from multiple samples of data or attributes for processing data in the processing industry. A numerical simulation is also given to demonstrate the effectiveness of the proposed algorithm. Finally, this consensus clustering algorithm is applied to a froth flotation process.
文摘Radiology doctors perform text-based image retrieval when they want to retrieve medical images.However,the accuracy and efficiency of such retrieval cannot keep up with the requirements.An innovative algorithm is being proposed to retrieve similar medical images.First,we extract the professional terms from the ontology structure and use them to annotate the CT images.Second,the semantic similarity matrix of ontology terms is calculated according to the structure of the ontology.Lastly,the corresponding semantic distance is calculated according to the marked vector,which contains different annotations.We use 120 real liver CT images(divided into six categories)of a top three-hospital to run the algorithm of the program.Result shows that the retrieval index"Precision"is 80.81%,and the classification index"AUC(Area Under Curve)"under the"ROC curve"(Receiver Operating Characteristic)is 0.945.