Due to the widespread use of the Internet,customer information is vulnerable to computer systems attack,which brings urgent need for the intrusion detection technology.Recently,network intrusion detection has been one...Due to the widespread use of the Internet,customer information is vulnerable to computer systems attack,which brings urgent need for the intrusion detection technology.Recently,network intrusion detection has been one of the most important technologies in network security detection.The accuracy of network intrusion detection has reached higher accuracy so far.However,these methods have very low efficiency in network intrusion detection,even the most popular SOM neural network method.In this paper,an efficient and fast network intrusion detection method was proposed.Firstly,the fundamental of the two different methods are introduced respectively.Then,the selforganizing feature map neural network based on K-means clustering(KSOM)algorithms was presented to improve the efficiency of network intrusion detection.Finally,the NSLKDD is used as network intrusion data set to demonstrate that the KSOM method can significantly reduce the number of clustering iteration than SOM method without substantially affecting the clustering results and the accuracy is much higher than Kmeans method.The Experimental results show that our method can relatively improve the accuracy of network intrusion and significantly reduce the number of clustering iteration.展开更多
A new clustering algorithm called fuzzy self-organizing feature maps is introduced. It can process not only the exact digital inputs, but also the inexact or fuzzy non-digital inputs, such as natural language inputs. ...A new clustering algorithm called fuzzy self-organizing feature maps is introduced. It can process not only the exact digital inputs, but also the inexact or fuzzy non-digital inputs, such as natural language inputs. Simulation results show that the new algorithm is superior to original Kohonen’s algorithm in clustering performance and learning rate.展开更多
Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.The...Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.Therefore,measuring the distance between sample points is crucial to the effectiveness of clustering.Filtering features by label information and mea-suring the distance between samples by these features is a common supervised learning method to reconstruct distance metric.However,in many application scenarios,it is very expensive to obtain a large number of labeled samples.In this paper,to solve the clustering problem in the few supervised sample and high data dimensionality scenarios,a novel semi-supervised clustering algorithm is proposed by designing an improved prototype network that attempts to reconstruct the distance metric in the sample space with a small amount of pairwise supervised information,such as Must-Link and Cannot-Link,and then cluster the data in the new metric space.The core idea is to make the similar ones closer and the dissimilar ones further away through embedding mapping.Extensive experiments on both real-world and synthetic datasets show the effectiveness of this algorithm.Average clustering metrics on various datasets improved by 8%compared to the comparison algorithm.展开更多
Presented is a new testing system based on using the factor models and self-organizing feature maps as well as the method of filtering undesirable environment influence. Testing process is described by the factor mode...Presented is a new testing system based on using the factor models and self-organizing feature maps as well as the method of filtering undesirable environment influence. Testing process is described by the factor model with simplex structure, which represents the influences of genetics and environmental factors on the observed parameters - the answers to the questions of the test subjects in one case and for the time, which is spent on responding to each test question to another. The Monte Carlo method is applied to get sufficient samples for training self-organizing feature maps, which are used to estimate model goodness-of-fit measures and, consequently, ability level. A prototype of the system is implemented using the Raven's Progressive Matrices (Advanced Progressive Matrices) - an intelligence test of abstract reasoning. Elimination of environment influence results is performed by comparing the observed and predicted answers to the test tasks using the Kalman filter, which is adapted to solve the problem. The testing procedure is optimized by reducing the number of tasks using the distribution of measures to belong to different ability levels after performing each test task provided the required level of conclusion reliability is obtained.展开更多
The ongoing COVID-19 has become a worldwide pandemic with increasing confirmed cases and deaths across the globe.By July 2022,the number of cumulative confirmed cases reported to the World Health Organization(WHO)has ...The ongoing COVID-19 has become a worldwide pandemic with increasing confirmed cases and deaths across the globe.By July 2022,the number of cumulative confirmed cases reported to the World Health Organization(WHO)has risen to 550 million,with more than 6 million deaths in total.The analysis of its epidemic risk remains the focus of attention all over the world for a long time.The Self-organizing feature map(SOM),a vector quantization method,offers a data mapping approach to tracking the response of time series data on a well-trained map.This study aims at a trajectory tracking of COVID-19 epidemic risk in 237 countries measured by the number of new confirmed cases and deaths per day for over one year.A hybrid clustering method uses SOM and K-means to generate a risk map and then displays the trajectory of daily risk on the map.The experimental results demonstrate the promising functionality of SOM for trajectory tracking and give experts insights into the dynamic changes of COVID-19 risk.展开更多
The feature space extracted from vibration signals with various faults is often nonlinear and of high dimension.Currently,nonlinear dimensionality reduction methods are available for extracting low-dimensional embeddi...The feature space extracted from vibration signals with various faults is often nonlinear and of high dimension.Currently,nonlinear dimensionality reduction methods are available for extracting low-dimensional embeddings,such as manifold learning.However,these methods are all based on manual intervention,which have some shortages in stability,and suppressing the disturbance noise.To extract features automatically,a manifold learning method with self-organization mapping is introduced for the first time.Under the non-uniform sample distribution reconstructed by the phase space,the expectation maximization(EM) iteration algorithm is used to divide the local neighborhoods adaptively without manual intervention.After that,the local tangent space alignment(LTSA) algorithm is adopted to compress the high-dimensional phase space into a more truthful low-dimensional representation.Finally,the signal is reconstructed by the kernel regression.Several typical states include the Lorenz system,engine fault with piston pin defect,and bearing fault with outer-race defect are analyzed.Compared with the LTSA and continuous wavelet transform,the results show that the background noise can be fully restrained and the entire periodic repetition of impact components is well separated and identified.A new way to automatically and precisely extract the impulsive components from mechanical signals is proposed.展开更多
Cytogenetic maps of four clusters of disease resistance genes were generated by ISH of the two RFLP markers tightly linked to and flanking each of maize resistance genes and the cloned resistance genes from other plan...Cytogenetic maps of four clusters of disease resistance genes were generated by ISH of the two RFLP markers tightly linked to and flanking each of maize resistance genes and the cloned resistance genes from other plant species onto maize chromosomes, combining with data published before. These genes include Helminthosporium turcium Pass resistance genes Ht1, Htn1 and Ht2, Helminthosporium maydis Nisik resistance genes Rhm1 and Rhm2, maize dwarf mosaic virus resistance gene Mdm1, wheat streak mosaic virus resistance gene Wsm1, Helminthosporium carbonum ULLstrup resistance gene Hml and the cloned Xanthomonas oryzae pv. Oryzae resistance gene Xa21 of rice, Cladosporium fulvum resistance genes Cf-9 and Cf-2.1 of tomato,and Pseudomonas syringae resistance gene RPS2 of Arabidopsis. Most of the tested disease resistance genes located on the four chromosomes, i.e., chromosomes1, 3, 6 and 8, and they closely distributed at the interstitial regions of these chromosomal long arms with percentage distances ranging 31.44(±3.72)-72.40(±3.25) except for genes Rhm1, Rhm2, Mdm1 and Wsm1 which mapped on the satellites of the short arms of chromosome6. It showed that the tested RFLP markers and genes were duplicated or triplicated in maize genome. Homology and conservation of disease resistance genes among species, and relationship between distribution features and functions of the genes were discussed. The results provide important scientific basis for deeply understanding structure and function of disease resistance genes and breeding in maize.展开更多
The traditional Chinese-English translation model tends to translate some source words repeatedly,while mistakenly ignoring some words.Therefore,we propose a novel English-Chinese neural machine translation based on s...The traditional Chinese-English translation model tends to translate some source words repeatedly,while mistakenly ignoring some words.Therefore,we propose a novel English-Chinese neural machine translation based on self-organizing mapping neural network and deep feature matching.In this model,word vector,two-way LSTM,2D neural network and other deep learning models are used to extract the semantic matching features of question-answer pairs.Self-organizing mapping(SOM)is used to classify and identify the sentence feature.The attention mechanism-based neural machine translation model is taken as the baseline system.The experimental results show that this framework significantly improves the adequacy of English-Chinese machine translation and achieves better results than the traditional attention mechanism-based English-Chinese machine translation model.展开更多
Due to the fact that the emergency medicine distribution is vital to the quick response to urgent demand when an epidemic occurs, the optimal vaccine distribution approach is explored according to the epidemic diffusi...Due to the fact that the emergency medicine distribution is vital to the quick response to urgent demand when an epidemic occurs, the optimal vaccine distribution approach is explored according to the epidemic diffusion rule and different urgency degrees of affected areas with the background of the epidemic outbreak in a given region. First, the SIQR (susceptible, infected, quarantined,recovered) epidemic model with pulse vaccination is introduced to describe the epidemic diffusion rule and obtain the demanded vaccine in each pulse. Based on the SIQR model, the affected areas are clustered by using the self-organizing map (SOM) neutral network to qualify the results. Then, a dynamic vaccine distribution model is formulated, incorporating the results of clustering the affected areas with the goals of both reducing the transportation cost and decreasing the unsatisfied demand for the emergency logistics network. Numerical study with twenty affected areas and four distribution centers is carried out. The corresponding numerical results indicate that the proposed approach can make an outstanding contribution to controlling the affected areas with a relatively high degree of urgency, and the comparison results prove that the performance of the clustering method is superior to that of the non-clustering method on controlling epidemic diffusion.展开更多
Clustering is the main method of deinterleaving of radar pulse using multi-parameter.However,the problem in clustering of radar pulses lies in finding the right number of clusters.To solve this problem,a method is pro...Clustering is the main method of deinterleaving of radar pulse using multi-parameter.However,the problem in clustering of radar pulses lies in finding the right number of clusters.To solve this problem,a method is proposed based on Self-Organizing Feature Maps(SOFM) and Composed Density between and within clusters(CDbw).This method firstly extracts the feature of Direction Of Arrival(DOA) data by SOFM using the characteristic of DOA parameter,and then cluster of SOFM.Through computing the cluster validity index CDbw,the right number of clusters is found.The results of simulation show that the method is effective in sorting the data of DOA.展开更多
With respect to the different hydrological responses of catchments, even the adjacent ones, in mountainous regions, there are a great number of motivations for classifying them into homogeneous clusters. These motivat...With respect to the different hydrological responses of catchments, even the adjacent ones, in mountainous regions, there are a great number of motivations for classifying them into homogeneous clusters. These motivations include prediction in ungauged basins(PUB), model parameterization, understanding the potential impact of environmental changes, transferring information from gauged catchments to the ungauged ones. The present study investigated the similarity of catchments through the hydro-climatological pure time-series of a 14-year period from 2001 to 2015. Data sets encompass more than 13,000 month-station streamflow, rainfall, and temperature data obtained from 27 catchments in Utah State as one of the eight mountainous states of the USA. The identification, analysis, and interpretation of homogeneous catchments were investigated by applying the four approaches ofclustering, K-means, Ward, and SOM(Self-Organized Map) and a newly proposed Wavelet-Entropy-based(WE-SOM) clustering method. By using two clustering evaluation criteria, 3, 5, and 6 clusters were determined as the best numbers of clusters, depending on the method employed, where each cluster represents different hydro-climatological behaviors. Despite the absence of geographic characteristics in input data matrix, the results indicated a regionalization in agreement with topographic characteristics. Considering the dependency of the hydrological behavior of catchments on the physiographic field aspects and characteristics, WE-SOM method demonstrated a more acceptable performance, compared to the other three conventional clustering methods, by providing more clusters. WE-SOM appears to be a promising approach in catchment clustering. It preserves the topological structure of data which can, as a result, be proofed in a greater number of clusters by dividing data into higher numbers of distinct clusters withsimilar altitudes of catchments in each cluster. The results showed the aptitude of wavelets to quantify the time-based variability of temperature, rainfall and streamflow, in the way contributing to the regionalization of diverse catchments.展开更多
With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate inform...With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate information,removal of unnecessary data and providing the data in a cohesive and coherent manner is determined to be a most confronting task.In this research,a novel intelligent model for document clustering is designed with graph model and Fuzzy based association rule generation(gFAR).Initially,the graph model is used to map the relationship among the data(multi-source)followed by the establishment of document clustering with the generation of association rule using the fuzzy concept.This method shows benefit in redundancy elimination by mapping the relevant document using graph model and reduces the time consumption and improves the accuracy using the association rule generation with fuzzy.This framework is provided in an interpretable way for document clustering.It iteratively reduces the error rate during relationship mapping among the data(clusters)with the assistance of weighted document content.Also,this model represents the significance of data features with class discrimination.It is also helpful in measuring the significance of the features during the data clustering process.The simulation is done with MATLAB 2016b environment and evaluated with the empirical standards like Relative Risk Patterns(RRP),ROUGE score,and Discrimination Information Measure(DMI)respectively.Here,DailyMail and DUC 2004 dataset is used to extract the empirical results.The proposed gFAR model gives better trade-off while compared with various prevailing approaches.展开更多
In this paper, the authors present three different algorithms for data clustering. These are Self-Organizing Map (SOM), Neural Gas (NG) and Fuzzy C-Means (FCM) algorithms. SOM and NG algorithms are based on comp...In this paper, the authors present three different algorithms for data clustering. These are Self-Organizing Map (SOM), Neural Gas (NG) and Fuzzy C-Means (FCM) algorithms. SOM and NG algorithms are based on competitive leaming. An important property of these algorithms is that they preserve the topological structure of data. This means that data that is close in input distribution is mapped to nearby locations in the network. The FCM algorithm is an algorithm based on soft clustering which means that the different clusters are not necessarily distinct, but may overlap. This clustering method may be very useful in many biological problems, for instance in genetics, where a gene may belong to different clusters. The different algorithms are compared in terms of their visualization of the clustering of proteomic data.展开更多
This paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring technique to improve the efficiency of information retrieval. SOM has been widely used for docu...This paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring technique to improve the efficiency of information retrieval. SOM has been widely used for document clustering and is successful in many applications. However, when applying to non-segmented document, the challenge is to identify any interesting pattern efficiently. There are two main phases in the propose method: preprocessing phase and clustering phase. In the preprocessing phase, the frequent max substring technique is first applied to discover the patterns of interest called Frequent Max substrings that are long and frequent substrings, rather than individual words from the non-segmented texts. These discovered patterns are then used as indexing terms. The indexing terms together with their number of occurrences form a document vector. In the clustering phase, SOM is used to generate the document cluster map by using the feature vector of Frequent Max substrings. To demonstrate the proposed technique, experimental studies and comparison results on clustering the Thai text documents, which consist of non-segmented texts, are presented in this paper. The results show that the proposed technique can be used for Thai texts. The document cluster map generated with the method can be used to find the relevant documents more efficiently.展开更多
Introduction: Data clustering is an important field of machine learningthat has applicability in wide areas, like, business analysis, manufacturing,energy, healthcare, traveling, and logistics. A variety of clustering...Introduction: Data clustering is an important field of machine learningthat has applicability in wide areas, like, business analysis, manufacturing,energy, healthcare, traveling, and logistics. A variety of clusteringapplications have already been developed. Data clustering approachesbased on self-organizing map (SOM) generally use the map dimensions (ofthe grid) ranging from 2 × 2 to 8 × 8 (4–64 neurons [microclusters])without any explicit reason for using the particular dimension, andtherefore optimized results are not obtained. These algorithms use somesecondary approaches to map these microclusters into the lowerdimension (actual number of clusters), like, 2, 3, or 4, as the case maybe, based on the optimum number of clusters in the specific data set. Thesecondary approach, observed in most of the works, is not SOM and is analgorithm, like, cut tree or the other.Methods: In this work, the proposed approach will give an idea of how toselect the most optimal higher dimension of SOM for the given data set,and this dimension is again clustered into the lower actual dimension.Primary and secondary, both utilize the SOM to cluster the data anddiscover that the weight matrix of the SOM is very meaningful. Theoptimized two-dimensional configuration of SOM is not the same forevery data set, and this work also tries to discover this configuration.Results: The adjusted randomized index obtained on the Iris, Wine,Wisconsin diagnostic breast cancer, New Thyroid, Seeds, A1, Imbalance,Dermatology, Ecoli, and Ionosphere is, respectively, 0.7173, 0.9134,0.7543, 0.8041, 0.7781, 0.8907, 0.8755, 0.7543, 0.5013, and 0.1728, whichoutperforms all other results available on the web and when no reductionof attributes is done in this work.Conclusions: It is found that SOM is superior to or on par with otherclustering approaches, like, k-means or the other, and could be usedsuccessfully to cluster all types of data sets. Ten benchmark data sets fromdiverse domains like medical, biological, and chemical are tested in this work,including the synthetic data sets.展开更多
Most methods for classification of remote sensing data are based on the statistical parameter evaluation with the assumption that the samples obey the normal distribution. How-ever, more accurate classification result...Most methods for classification of remote sensing data are based on the statistical parameter evaluation with the assumption that the samples obey the normal distribution. How-ever, more accurate classification results can be obtained with the neural network method through getting knowledge from environments and adjusting the parameter (or weight) step by step by a specific measurement. This paper focuses on the double-layer structured Kohonen self-organizing feature map (SOFM), for which all neurons within the two layers are linked one another and those of the competition layers are linked as well along the sides. Therefore, the self-adapting learning ability is improved due to the effective competition and suppression in this method. The SOFM has become a hot topic in the research area of remote sensing data classi-fication. The Advanced Spaceborne Thermal Emission and Reflectance Radiometer (ASTER) is a new satellite-borne remote sensing instrument with three 15-m resolution bands and three 30-m resolution bands at the near infrared. The ASTER data of Dagang district, Tianjin Munici-pality is used as the test data in this study. At first, the wavelet fusion is carried out to make the spatial resolutions of the ASTER data identical; then, the SOFM method is applied to classifying the land cover types. The classification results are compared with those of the maximum likeli-hood method (MLH). As a consequence, the classification accuracy of SOFM increases about by 7% in general and, in particular, it is almost as twice as that of the MLH method in the town.展开更多
This paper reports the classification of 90 sample pavilions in Shanghai World Expo. An artificial intelligence based nonlinear clustering method known as Self-Organizing Map(SOM) has been used to classify expo pavili...This paper reports the classification of 90 sample pavilions in Shanghai World Expo. An artificial intelligence based nonlinear clustering method known as Self-Organizing Map(SOM) has been used to classify expo pavilions. SOM is an efficient tool for visualization of multidimensional data. To conduct the classification, four characteristics namely Hurst exponent for queue length, Hurst exponent for waiting time, mean queue length and mean waiting time have been applied. The classification results show that Shanghai World Expo pavilions can be optimally classified into four classes. This result will shed light on further studies that how to manage the queue of World Expo pavilions in the future.展开更多
Amino acids are the dominant organic components of processed animal proteins,however there has been limited investigation of differences in their composition between various protein sources.Information on these differ...Amino acids are the dominant organic components of processed animal proteins,however there has been limited investigation of differences in their composition between various protein sources.Information on these differences will not only be helpful for their further utilization but also provide fundamental information for developing species-specific identification methods.In this study,self-organizing feature maps(SOFM) were used to visualize amino acid composition of fish meal,and meat and bone meal(MBM) produced from poultry,ruminants and swine.SOFM display the similarities and differences in amino acid composition between protein sources and effectively improve data transparency.Amino acid composition was shown to be useful for distinguishing fish meal from MBM due to their large concentration differences between glycine,lysine and proline.However,the amino acid composition of the three MBMs was quite similar.The SOFM results were consistent with those obtained by analysis of variance and principal component analysis but more straightforward.SOFM was shown to have a robust sample linkage capacity and to be able to act as a powerful means to link different sample for further data mining.展开更多
Dual clustering performs object clustering in both spatial and non-spatial domains that cannot be dealt with well by traditional clustering methods.However,recent dual clustering research has often omitted spatial out...Dual clustering performs object clustering in both spatial and non-spatial domains that cannot be dealt with well by traditional clustering methods.However,recent dual clustering research has often omitted spatial outliers,subjectively determined the weights of hybrid distance measures,and produced diverse clustering results.In this study,we first redefined the dual clustering problem and related concepts to highlight the clustering criteria.We then presented a self-organizing dual clustering algorithm (SDC) based on the self-organizing feature map and certain spatial analysis operations,including the Voronoi diagram and polygon aggregation and amalgamation.The algorithm employs a hybrid distance measure that combines geometric distance and non-spatial similarity,while the clustering spectrum analysis helps to determine the weight of non-spatial similarity in the measure.A case study was conducted on a spatial database of urban land price samples in Wuhan,China.SDC detected spatial outliers and clustered the points into spatially connective and attributively homogenous sub-groups.In particular,SDC revealed zonal areas that describe the actual distribution of land prices but were not demonstrated by other methods.SDC reduced the subjectivity in dual clustering.展开更多
文摘Due to the widespread use of the Internet,customer information is vulnerable to computer systems attack,which brings urgent need for the intrusion detection technology.Recently,network intrusion detection has been one of the most important technologies in network security detection.The accuracy of network intrusion detection has reached higher accuracy so far.However,these methods have very low efficiency in network intrusion detection,even the most popular SOM neural network method.In this paper,an efficient and fast network intrusion detection method was proposed.Firstly,the fundamental of the two different methods are introduced respectively.Then,the selforganizing feature map neural network based on K-means clustering(KSOM)algorithms was presented to improve the efficiency of network intrusion detection.Finally,the NSLKDD is used as network intrusion data set to demonstrate that the KSOM method can significantly reduce the number of clustering iteration than SOM method without substantially affecting the clustering results and the accuracy is much higher than Kmeans method.The Experimental results show that our method can relatively improve the accuracy of network intrusion and significantly reduce the number of clustering iteration.
文摘A new clustering algorithm called fuzzy self-organizing feature maps is introduced. It can process not only the exact digital inputs, but also the inexact or fuzzy non-digital inputs, such as natural language inputs. Simulation results show that the new algorithm is superior to original Kohonen’s algorithm in clustering performance and learning rate.
文摘Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.Therefore,measuring the distance between sample points is crucial to the effectiveness of clustering.Filtering features by label information and mea-suring the distance between samples by these features is a common supervised learning method to reconstruct distance metric.However,in many application scenarios,it is very expensive to obtain a large number of labeled samples.In this paper,to solve the clustering problem in the few supervised sample and high data dimensionality scenarios,a novel semi-supervised clustering algorithm is proposed by designing an improved prototype network that attempts to reconstruct the distance metric in the sample space with a small amount of pairwise supervised information,such as Must-Link and Cannot-Link,and then cluster the data in the new metric space.The core idea is to make the similar ones closer and the dissimilar ones further away through embedding mapping.Extensive experiments on both real-world and synthetic datasets show the effectiveness of this algorithm.Average clustering metrics on various datasets improved by 8%compared to the comparison algorithm.
文摘Presented is a new testing system based on using the factor models and self-organizing feature maps as well as the method of filtering undesirable environment influence. Testing process is described by the factor model with simplex structure, which represents the influences of genetics and environmental factors on the observed parameters - the answers to the questions of the test subjects in one case and for the time, which is spent on responding to each test question to another. The Monte Carlo method is applied to get sufficient samples for training self-organizing feature maps, which are used to estimate model goodness-of-fit measures and, consequently, ability level. A prototype of the system is implemented using the Raven's Progressive Matrices (Advanced Progressive Matrices) - an intelligence test of abstract reasoning. Elimination of environment influence results is performed by comparing the observed and predicted answers to the test tasks using the Kalman filter, which is adapted to solve the problem. The testing procedure is optimized by reducing the number of tasks using the distribution of measures to belong to different ability levels after performing each test task provided the required level of conclusion reliability is obtained.
基金National Office of Philosophy and Social Sciences(19AZD019)National Ethnic Affairs Commission(2020-GMB-015).
文摘The ongoing COVID-19 has become a worldwide pandemic with increasing confirmed cases and deaths across the globe.By July 2022,the number of cumulative confirmed cases reported to the World Health Organization(WHO)has risen to 550 million,with more than 6 million deaths in total.The analysis of its epidemic risk remains the focus of attention all over the world for a long time.The Self-organizing feature map(SOM),a vector quantization method,offers a data mapping approach to tracking the response of time series data on a well-trained map.This study aims at a trajectory tracking of COVID-19 epidemic risk in 237 countries measured by the number of new confirmed cases and deaths per day for over one year.A hybrid clustering method uses SOM and K-means to generate a risk map and then displays the trajectory of daily risk on the map.The experimental results demonstrate the promising functionality of SOM for trajectory tracking and give experts insights into the dynamic changes of COVID-19 risk.
基金supported by National Natural Science Foundation of China(Grant No.51075323)
文摘The feature space extracted from vibration signals with various faults is often nonlinear and of high dimension.Currently,nonlinear dimensionality reduction methods are available for extracting low-dimensional embeddings,such as manifold learning.However,these methods are all based on manual intervention,which have some shortages in stability,and suppressing the disturbance noise.To extract features automatically,a manifold learning method with self-organization mapping is introduced for the first time.Under the non-uniform sample distribution reconstructed by the phase space,the expectation maximization(EM) iteration algorithm is used to divide the local neighborhoods adaptively without manual intervention.After that,the local tangent space alignment(LTSA) algorithm is adopted to compress the high-dimensional phase space into a more truthful low-dimensional representation.Finally,the signal is reconstructed by the kernel regression.Several typical states include the Lorenz system,engine fault with piston pin defect,and bearing fault with outer-race defect are analyzed.Compared with the LTSA and continuous wavelet transform,the results show that the background noise can be fully restrained and the entire periodic repetition of impact components is well separated and identified.A new way to automatically and precisely extract the impulsive components from mechanical signals is proposed.
文摘Cytogenetic maps of four clusters of disease resistance genes were generated by ISH of the two RFLP markers tightly linked to and flanking each of maize resistance genes and the cloned resistance genes from other plant species onto maize chromosomes, combining with data published before. These genes include Helminthosporium turcium Pass resistance genes Ht1, Htn1 and Ht2, Helminthosporium maydis Nisik resistance genes Rhm1 and Rhm2, maize dwarf mosaic virus resistance gene Mdm1, wheat streak mosaic virus resistance gene Wsm1, Helminthosporium carbonum ULLstrup resistance gene Hml and the cloned Xanthomonas oryzae pv. Oryzae resistance gene Xa21 of rice, Cladosporium fulvum resistance genes Cf-9 and Cf-2.1 of tomato,and Pseudomonas syringae resistance gene RPS2 of Arabidopsis. Most of the tested disease resistance genes located on the four chromosomes, i.e., chromosomes1, 3, 6 and 8, and they closely distributed at the interstitial regions of these chromosomal long arms with percentage distances ranging 31.44(±3.72)-72.40(±3.25) except for genes Rhm1, Rhm2, Mdm1 and Wsm1 which mapped on the satellites of the short arms of chromosome6. It showed that the tested RFLP markers and genes were duplicated or triplicated in maize genome. Homology and conservation of disease resistance genes among species, and relationship between distribution features and functions of the genes were discussed. The results provide important scientific basis for deeply understanding structure and function of disease resistance genes and breeding in maize.
文摘The traditional Chinese-English translation model tends to translate some source words repeatedly,while mistakenly ignoring some words.Therefore,we propose a novel English-Chinese neural machine translation based on self-organizing mapping neural network and deep feature matching.In this model,word vector,two-way LSTM,2D neural network and other deep learning models are used to extract the semantic matching features of question-answer pairs.Self-organizing mapping(SOM)is used to classify and identify the sentence feature.The attention mechanism-based neural machine translation model is taken as the baseline system.The experimental results show that this framework significantly improves the adequacy of English-Chinese machine translation and achieves better results than the traditional attention mechanism-based English-Chinese machine translation model.
基金The National Natural Science Foundation of China (No.70671021)
文摘Due to the fact that the emergency medicine distribution is vital to the quick response to urgent demand when an epidemic occurs, the optimal vaccine distribution approach is explored according to the epidemic diffusion rule and different urgency degrees of affected areas with the background of the epidemic outbreak in a given region. First, the SIQR (susceptible, infected, quarantined,recovered) epidemic model with pulse vaccination is introduced to describe the epidemic diffusion rule and obtain the demanded vaccine in each pulse. Based on the SIQR model, the affected areas are clustered by using the self-organizing map (SOM) neutral network to qualify the results. Then, a dynamic vaccine distribution model is formulated, incorporating the results of clustering the affected areas with the goals of both reducing the transportation cost and decreasing the unsatisfied demand for the emergency logistics network. Numerical study with twenty affected areas and four distribution centers is carried out. The corresponding numerical results indicate that the proposed approach can make an outstanding contribution to controlling the affected areas with a relatively high degree of urgency, and the comparison results prove that the performance of the clustering method is superior to that of the non-clustering method on controlling epidemic diffusion.
文摘Clustering is the main method of deinterleaving of radar pulse using multi-parameter.However,the problem in clustering of radar pulses lies in finding the right number of clusters.To solve this problem,a method is proposed based on Self-Organizing Feature Maps(SOFM) and Composed Density between and within clusters(CDbw).This method firstly extracts the feature of Direction Of Arrival(DOA) data by SOFM using the characteristic of DOA parameter,and then cluster of SOFM.Through computing the cluster validity index CDbw,the right number of clusters is found.The results of simulation show that the method is effective in sorting the data of DOA.
文摘With respect to the different hydrological responses of catchments, even the adjacent ones, in mountainous regions, there are a great number of motivations for classifying them into homogeneous clusters. These motivations include prediction in ungauged basins(PUB), model parameterization, understanding the potential impact of environmental changes, transferring information from gauged catchments to the ungauged ones. The present study investigated the similarity of catchments through the hydro-climatological pure time-series of a 14-year period from 2001 to 2015. Data sets encompass more than 13,000 month-station streamflow, rainfall, and temperature data obtained from 27 catchments in Utah State as one of the eight mountainous states of the USA. The identification, analysis, and interpretation of homogeneous catchments were investigated by applying the four approaches ofclustering, K-means, Ward, and SOM(Self-Organized Map) and a newly proposed Wavelet-Entropy-based(WE-SOM) clustering method. By using two clustering evaluation criteria, 3, 5, and 6 clusters were determined as the best numbers of clusters, depending on the method employed, where each cluster represents different hydro-climatological behaviors. Despite the absence of geographic characteristics in input data matrix, the results indicated a regionalization in agreement with topographic characteristics. Considering the dependency of the hydrological behavior of catchments on the physiographic field aspects and characteristics, WE-SOM method demonstrated a more acceptable performance, compared to the other three conventional clustering methods, by providing more clusters. WE-SOM appears to be a promising approach in catchment clustering. It preserves the topological structure of data which can, as a result, be proofed in a greater number of clusters by dividing data into higher numbers of distinct clusters withsimilar altitudes of catchments in each cluster. The results showed the aptitude of wavelets to quantify the time-based variability of temperature, rainfall and streamflow, in the way contributing to the regionalization of diverse catchments.
文摘With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate information,removal of unnecessary data and providing the data in a cohesive and coherent manner is determined to be a most confronting task.In this research,a novel intelligent model for document clustering is designed with graph model and Fuzzy based association rule generation(gFAR).Initially,the graph model is used to map the relationship among the data(multi-source)followed by the establishment of document clustering with the generation of association rule using the fuzzy concept.This method shows benefit in redundancy elimination by mapping the relevant document using graph model and reduces the time consumption and improves the accuracy using the association rule generation with fuzzy.This framework is provided in an interpretable way for document clustering.It iteratively reduces the error rate during relationship mapping among the data(clusters)with the assistance of weighted document content.Also,this model represents the significance of data features with class discrimination.It is also helpful in measuring the significance of the features during the data clustering process.The simulation is done with MATLAB 2016b environment and evaluated with the empirical standards like Relative Risk Patterns(RRP),ROUGE score,and Discrimination Information Measure(DMI)respectively.Here,DailyMail and DUC 2004 dataset is used to extract the empirical results.The proposed gFAR model gives better trade-off while compared with various prevailing approaches.
文摘In this paper, the authors present three different algorithms for data clustering. These are Self-Organizing Map (SOM), Neural Gas (NG) and Fuzzy C-Means (FCM) algorithms. SOM and NG algorithms are based on competitive leaming. An important property of these algorithms is that they preserve the topological structure of data. This means that data that is close in input distribution is mapped to nearby locations in the network. The FCM algorithm is an algorithm based on soft clustering which means that the different clusters are not necessarily distinct, but may overlap. This clustering method may be very useful in many biological problems, for instance in genetics, where a gene may belong to different clusters. The different algorithms are compared in terms of their visualization of the clustering of proteomic data.
文摘This paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring technique to improve the efficiency of information retrieval. SOM has been widely used for document clustering and is successful in many applications. However, when applying to non-segmented document, the challenge is to identify any interesting pattern efficiently. There are two main phases in the propose method: preprocessing phase and clustering phase. In the preprocessing phase, the frequent max substring technique is first applied to discover the patterns of interest called Frequent Max substrings that are long and frequent substrings, rather than individual words from the non-segmented texts. These discovered patterns are then used as indexing terms. The indexing terms together with their number of occurrences form a document vector. In the clustering phase, SOM is used to generate the document cluster map by using the feature vector of Frequent Max substrings. To demonstrate the proposed technique, experimental studies and comparison results on clustering the Thai text documents, which consist of non-segmented texts, are presented in this paper. The results show that the proposed technique can be used for Thai texts. The document cluster map generated with the method can be used to find the relevant documents more efficiently.
文摘Introduction: Data clustering is an important field of machine learningthat has applicability in wide areas, like, business analysis, manufacturing,energy, healthcare, traveling, and logistics. A variety of clusteringapplications have already been developed. Data clustering approachesbased on self-organizing map (SOM) generally use the map dimensions (ofthe grid) ranging from 2 × 2 to 8 × 8 (4–64 neurons [microclusters])without any explicit reason for using the particular dimension, andtherefore optimized results are not obtained. These algorithms use somesecondary approaches to map these microclusters into the lowerdimension (actual number of clusters), like, 2, 3, or 4, as the case maybe, based on the optimum number of clusters in the specific data set. Thesecondary approach, observed in most of the works, is not SOM and is analgorithm, like, cut tree or the other.Methods: In this work, the proposed approach will give an idea of how toselect the most optimal higher dimension of SOM for the given data set,and this dimension is again clustered into the lower actual dimension.Primary and secondary, both utilize the SOM to cluster the data anddiscover that the weight matrix of the SOM is very meaningful. Theoptimized two-dimensional configuration of SOM is not the same forevery data set, and this work also tries to discover this configuration.Results: The adjusted randomized index obtained on the Iris, Wine,Wisconsin diagnostic breast cancer, New Thyroid, Seeds, A1, Imbalance,Dermatology, Ecoli, and Ionosphere is, respectively, 0.7173, 0.9134,0.7543, 0.8041, 0.7781, 0.8907, 0.8755, 0.7543, 0.5013, and 0.1728, whichoutperforms all other results available on the web and when no reductionof attributes is done in this work.Conclusions: It is found that SOM is superior to or on par with otherclustering approaches, like, k-means or the other, and could be usedsuccessfully to cluster all types of data sets. Ten benchmark data sets fromdiverse domains like medical, biological, and chemical are tested in this work,including the synthetic data sets.
文摘Most methods for classification of remote sensing data are based on the statistical parameter evaluation with the assumption that the samples obey the normal distribution. How-ever, more accurate classification results can be obtained with the neural network method through getting knowledge from environments and adjusting the parameter (or weight) step by step by a specific measurement. This paper focuses on the double-layer structured Kohonen self-organizing feature map (SOFM), for which all neurons within the two layers are linked one another and those of the competition layers are linked as well along the sides. Therefore, the self-adapting learning ability is improved due to the effective competition and suppression in this method. The SOFM has become a hot topic in the research area of remote sensing data classi-fication. The Advanced Spaceborne Thermal Emission and Reflectance Radiometer (ASTER) is a new satellite-borne remote sensing instrument with three 15-m resolution bands and three 30-m resolution bands at the near infrared. The ASTER data of Dagang district, Tianjin Munici-pality is used as the test data in this study. At first, the wavelet fusion is carried out to make the spatial resolutions of the ASTER data identical; then, the SOFM method is applied to classifying the land cover types. The classification results are compared with those of the maximum likeli-hood method (MLH). As a consequence, the classification accuracy of SOFM increases about by 7% in general and, in particular, it is almost as twice as that of the MLH method in the town.
基金supported by 973 Research Program under Grant No.2010CB731500the National Natural Science Foundation of China under Grant Nos.71403262,91024010,91324009+1 种基金Innovative Team Program under Grant No.GH13041Major Program of Institute of Policy and Management,Chinese Academy of Sciences under Grant No.Y201201Z06
文摘This paper reports the classification of 90 sample pavilions in Shanghai World Expo. An artificial intelligence based nonlinear clustering method known as Self-Organizing Map(SOM) has been used to classify expo pavilions. SOM is an efficient tool for visualization of multidimensional data. To conduct the classification, four characteristics namely Hurst exponent for queue length, Hurst exponent for waiting time, mean queue length and mean waiting time have been applied. The classification results show that Shanghai World Expo pavilions can be optimally classified into four classes. This result will shed light on further studies that how to manage the queue of World Expo pavilions in the future.
基金supported by the International Science and Technology Cooperation Project,Ministry of Science and Technology,China(2015DFG32170)
文摘Amino acids are the dominant organic components of processed animal proteins,however there has been limited investigation of differences in their composition between various protein sources.Information on these differences will not only be helpful for their further utilization but also provide fundamental information for developing species-specific identification methods.In this study,self-organizing feature maps(SOFM) were used to visualize amino acid composition of fish meal,and meat and bone meal(MBM) produced from poultry,ruminants and swine.SOFM display the similarities and differences in amino acid composition between protein sources and effectively improve data transparency.Amino acid composition was shown to be useful for distinguishing fish meal from MBM due to their large concentration differences between glycine,lysine and proline.However,the amino acid composition of the three MBMs was quite similar.The SOFM results were consistent with those obtained by analysis of variance and principal component analysis but more straightforward.SOFM was shown to have a robust sample linkage capacity and to be able to act as a powerful means to link different sample for further data mining.
基金supported by the National Natural Science Foundation of China(Grant No.40901188)the Key Laboratory of Geo-informatics of the State Bureau of Surveying and Mapping(Grant No.200906)the Fundamental Research Funds for the Central Universities(Grant No.4082002)
文摘Dual clustering performs object clustering in both spatial and non-spatial domains that cannot be dealt with well by traditional clustering methods.However,recent dual clustering research has often omitted spatial outliers,subjectively determined the weights of hybrid distance measures,and produced diverse clustering results.In this study,we first redefined the dual clustering problem and related concepts to highlight the clustering criteria.We then presented a self-organizing dual clustering algorithm (SDC) based on the self-organizing feature map and certain spatial analysis operations,including the Voronoi diagram and polygon aggregation and amalgamation.The algorithm employs a hybrid distance measure that combines geometric distance and non-spatial similarity,while the clustering spectrum analysis helps to determine the weight of non-spatial similarity in the measure.A case study was conducted on a spatial database of urban land price samples in Wuhan,China.SDC detected spatial outliers and clustered the points into spatially connective and attributively homogenous sub-groups.In particular,SDC revealed zonal areas that describe the actual distribution of land prices but were not demonstrated by other methods.SDC reduced the subjectivity in dual clustering.