In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared...In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.展开更多
Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the S...Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare these methods. We offer the correct syntax to deactivate the similarity algorithm for clustering analysis within the hierarchical clustering module of SPSS. Findings: When one inputs co-occurrence matrices into the data editor of the SPSS hierarchical clustering module without deactivating the embedded similarity algorithm, the program calculates similarity twice, and thus distorts and overestimates the degree of similarity. Practical implications: We offer the correct syntax to block the similarity algorithm for clustering analysis in the SPSS hierarchical clustering module in the case of co-occurrence matrices. This syntax enables researchers to avoid obtaining incorrect results. Originality/value: This paper presents a method of editing syntax to prevent the default use of a similarity algorithm for SPSS's hierarchical clustering module. This will help researchers, especially those from China, to properly implement the co-occurrence matrix when using SPSS for hierarchical cluster analysis, in order to provide more scientific and rational results.展开更多
[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal pa...[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal parenteral nutrition supportive therapy.[Methods]The data about neonatal PN formulations prepared by the Pharmacy Intravenous Admixture Services(PIVAS)of the Affiliated Hospital of Chengde Medical University from July 2015 to June 2021 were collected.The general information of the prescriptions and the frequency of drug use were analyzed with Excel 2019;the boxplot of drug dosing was drawn using GraphPad 8.0 software;and SPSS Modeler 18.0 and SPSS Statistics 26.0 were used to perform association rules and hierarchical cluster analysis.[Results]A total of 11488 PN prescriptions were collected from 1421 newborns,involving 18 kinds of drugs,which were divided into 11 types of nutrients.Association rules analysis yielded 84 nutrient substance combinations.The combination of fat emulsion-water-soluble vitamins-fat-soluble vitamins-glucose-amino acids had the highest confidence(99.95%).The hierarchical cluster analysis divided nutrients into 5 types.[Conclusions]The prescriptions of PN for newborns were composed of five types of nutrients:amino acids,fat emulsion,glucose,water-soluble vitamins,and fat-soluble vitamins.According to the lack of electrolytes and trace elements,appropriate drugs can be chosen to meet nutritional demands.This study provides reference basis for reasonable selection of drugs for neonatal PN prescriptions and further standardization of PN supportive therapy in newborns.展开更多
Social networking sites in the most modernized world are flooded with large data volumes.Extracting the sentiment polarity of important aspects is necessary;as it helps to determine people’s opinions through what the...Social networking sites in the most modernized world are flooded with large data volumes.Extracting the sentiment polarity of important aspects is necessary;as it helps to determine people’s opinions through what they write.The Coronavirus pandemic has invaded the world and been given a mention in the social media on a large scale.In a very short period of time,tweets indicate unpredicted increase of coronavirus.They reflect people’s opinions and thoughts with regard to coronavirus and its impact on society.The research community has been interested in discovering the hidden relationships from short texts such as Twitter and Weiboa;due to their shortness and sparsity.In this paper,a hierarchical twitter sentiment model(HTSM)is proposed to show people’s opinions in short texts.The proposed HTSM has two main features as follows:constructing a hierarchical tree of important aspects from short texts without a predefined hierarchy depth and width,as well as analyzing the extracted opinions to discover the sentiment polarity on those important aspects by applying a valence aware dictionary for sentiment reasoner(VADER)sentiment analysis.The tweets for each extracted important aspect can be categorized as follows:strongly positive,positive,neutral,strongly negative,or negative.The quality of the proposed model is validated by applying it to a popular product and a widespread topic.The results show that the proposed model outperforms the state-of-the-art methods used in analyzing people’s opinions in short text effectively.展开更多
[Objective] This research aimed to study the FTIR spectra of corn germs and endosperms so as to provide a scientific way for identifying corn of different types. [Method] The corn germs and endosperms of three types w...[Objective] This research aimed to study the FTIR spectra of corn germs and endosperms so as to provide a scientific way for identifying corn of different types. [Method] The corn germs and endosperms of three types were studied by using Fourier transform infrared spectroscopy(FTIR) technology, combined with cluster analysis. [Result] The overall characteristics of original FTIR spectra were basically similar within the range of 700-1 800 cm^-1. The FTIR spectra were mainly composed by the absorption peaks of polysaccharides, proteins and lipids. Within the wavelength range of 700-1 800 cm^-1, there were only tiny differences in original FTIR spectra among the corn germs and endosperms of three different types. The spectra were then processed by using first derivative and second derivative. The second derivative spectra were used for hierarchical cluster analysis(HCA). The results showed that with the wavelength range of 700-1 800 cm^-1, the second derivative spectra of the 52 samples could be better clustered according to the tree types and corn germ and corn endosperm. The clustering correct rate reached 96.1%.[Conclusion] FTIR technology, combined with cluster analysis, can be used to identify different types of corn germs and endosperms, and it is characterized by convenience and rapidness.展开更多
The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,...The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,rapid and sensitive HPLC-MS/MS method was developed for the identification and quantitation of the major bioactive components in C.chinensis fruits.Eighteen polyphenols were identified,which are first reported in C.chinensis fruits.Moreover,ten components were simultaneously quantified.The validated quantitative method was proved to be sensitive,reproducible and accurate.Then,it was applied to analyze batches of C.chinensis fruits from different phytomorph and areas.The principal components analysis(PCA)realized visualization and reduction of data set dimension while the hierarchical cluster analysis(HCA)indicated that the content of phenolic acids or all ten components might be used to differentiate C.chinensis fruits of different phytomorph.展开更多
The paper deals with cluster analysis and comparison of clustering methods. Cluster analysis belongs to multivariate statistical methods. Cluster analysis is defined as general logical technique, procedure, which allo...The paper deals with cluster analysis and comparison of clustering methods. Cluster analysis belongs to multivariate statistical methods. Cluster analysis is defined as general logical technique, procedure, which allows clustering variable objects into groups-clusters on the basis of similarity or dissimilarity. Cluster analysis involves computational procedures, of which purpose is to reduce a set of data on several relatively homogenous groups-clusters, while the condition of reduction is maximal and simultaneously minimal similarity of clusters. Similarity of objects is studied by the degree of similarity (correlation coefficient and association coefficient) or the degree of dissimilarity-degree of distance (distance coefficient). Methods of cluster analysis are on the basis of clustering classified as hierarchical or non-hierarchical methods.展开更多
The problem of taking a set of data and separating it into subgroups where the elements of each subgroup are more similar to each other than they are to elements not in the subgroup has been extensively studied throug...The problem of taking a set of data and separating it into subgroups where the elements of each subgroup are more similar to each other than they are to elements not in the subgroup has been extensively studied through the statistical method of cluster analysis. In this paper we want to discuss the application of this method to the field of education: particularly, we want to present the use of cluster analysis to separate students into groups that can be recognized and characterized by common traits in their answers to a questionnaire, without any prior knowledge of what form those groups would take (unsupervised classification). We start from a detailed study of the data processing needed by cluster analysis. Then two methods commonly used in cluster analysis are before described only from a theoretical point a view and after in the Section 4 through an example of application to data coming from an open-ended questionnaire administered to a sample of university students. In particular we describe and criticize the variables and parameters used to show the results of the cluster analysis methods.展开更多
A genetic algorithm-based joint inversion method is presented for evaluating hydrocarbon-bearing geological formations. Conventional inversion procedures routinely used in the oil industry perform the inversion proces...A genetic algorithm-based joint inversion method is presented for evaluating hydrocarbon-bearing geological formations. Conventional inversion procedures routinely used in the oil industry perform the inversion processing of borehole geophysical data locally. As having barely more types of data than unknowns in a depth, a set of marginally over-determined inverse problems has to be solved along a borehole, which is a rather noise sensitive procedure. For the reduction of noise effect, the amount of overdetermination must be increased. To fulfill this requirement, we suggest the use of our interval inversion method, which inverts simultaneously all data from a greater depth interval to estimate petrophysical parameters of reservoirs to the same interval. A series expansion based discretization scheme ensures much more data against unknowns that significantly reduces the estimation error of model parameters. The knowledge of reservoir boundaries is also required for reserve calculation. Well logs contain information about layer-thicknesses, but they cannot be extracted by the local inversion approach. We showed earlier that the depth coordinates of layerboundaries can be determined within the interval inversion procedure. The weakness of method is that the output of inversion is highly influenced by arbitrary assumptions made for layer-thicknesses when creating a starting model (i.e. number of layers, search domain of thicknesses). In this study, we apply an automated procedure for the determination of rock interfaces. We perform multidimensional hierarchical cluster analysis on well-logging data before inversion that separates the measuring points of different layers on a lithological basis. As a result, the vertical distribution of clusters furnishes the coordinates of layer-boundaries, which are then used as initial model parameters for the interval inversion procedure. The improved inversion method gives a fast, automatic and objective estimation to layer-boundaries and petrophysical parameters, which is demonstrated by a hydrocarbon field example.展开更多
A fuzzy clustering analysis model based on the quotient space is proposed. Firstly, the conversion from coarse to fine granularity and the hierarchical structure are used to reduce the multidimensional samples. Second...A fuzzy clustering analysis model based on the quotient space is proposed. Firstly, the conversion from coarse to fine granularity and the hierarchical structure are used to reduce the multidimensional samples. Secondly, the fuzzy compatibility relation matrix of the model is converted into fuzzy equivalence relation matrix. Finally, the diagram of clustering genealogy is generated according to the fuzzy equivalence relation matrix, which enables the dynamic selection of different thresholds to effectively solve the problem of cluster analysis of the samples with multi-dimensional attributes.展开更多
In order to distinguish 8 kinds of rhizome crops, the 40 samples were studied by Fourier transform infrared spectroscopy (FTIR) combined with wavelet transform (WT), principal component analysis (PCA) and hieram...In order to distinguish 8 kinds of rhizome crops, the 40 samples were studied by Fourier transform infrared spectroscopy (FTIR) combined with wavelet transform (WT), principal component analysis (PCA) and hieramhical cluster analysis (HCA). The results showed that the infrared spectra were similar on the whole, but there were differences in peak position, peak shape and peak absorption intensity in the range of 1 800-700 cm-1. The infrared spectra in the range of 1 800-700 cm-1 were selected to perform continuous wavelet transform (CWT) and discrete wavelet transform (DWT). The 15th-Ievel decomposition coefficients of CWT and the 5=-level detail coefficients of DWT were classified by PCA and HCA. The cumulative contri- bution rates of the first three principal components of CWT and DWT were 93.12% and 89.78%, respectively. The accurate recognition rates of PCA and HCA were all 100%. It is proved that FTIR combined with WT can be used to distinguish different kinds of rhizome crops.展开更多
Hierarchical clustering analysis and principal component analysis (PCA) methods were used to assess the similarities and dissimilarities of the entire Excitation-emission matrix spectroscopy (EEMs) data sets of sa...Hierarchical clustering analysis and principal component analysis (PCA) methods were used to assess the similarities and dissimilarities of the entire Excitation-emission matrix spectroscopy (EEMs) data sets of samples collected from Jiaozhou Bay, China. The results demonstrate that multivariate analysis facilitates the complex data treatment and spectral sorting processes, and also enhances the probability to reveal otherwise hidden information concerning the chemical characteristics of the dissolved organic matter (DOM). The distribution of different water samples as revealed by multivariate results has been used to track the movement of DOM material in the study area, and the interpretation is supported by the results obtained from the numerical simulation model of substance tracing technique, which show that the substance discharged by Haibo River can be distributed in Jiaozhou Bay.展开更多
Lipopeptides are currently re-emerging as an interesting subgroup in the peptide research field, having historical applications as antibacterial and antifungal agents and new potential applications as antiviral, antit...Lipopeptides are currently re-emerging as an interesting subgroup in the peptide research field, having historical applications as antibacterial and antifungal agents and new potential applications as antiviral, antitumor, immune-modulating and cell-penetrating compounds. However, due to their specific structure, chromatographic analysis often requires special buffer systems or the use of trifluoroacetic acid, limiting mass spectrometry detection. Therefore, we used a traditional aqueous/acetonitrile based gradient system, containing 0.1% (m/v) formic acid, to separate four pharmaceutically relevant lipopeptides (polymyxin B1, caspofungin, daptomycin and gramicidin A1), which were selected based upon hierarchical cluster analysis (HCA) and principal component analysis (PCA).In total, the performance of four different C18 columns, including one UPLC column, were evaluated using two parallel approaches. First, a Derringer desirability function was used, whereby six single and multiple chromatographic response values were rescaled into one overall D-value per column. Using this approach, the YMC Pack Pro C18 column was ranked as the best column for general MS-compatible lipopeptide separation. Secondly, the kinetic plot approach was used to compare the different columns at different flow rate ranges. As the optimal kinetic column performance is obtained at its maximal pressure, the length elongation factor λ(Pmax/Pexp) was used to transform the obtained experimental data (retention times and peak capacities) and construct kinetic performance limit (KPL) curves, allowing a direct visual and unbiased comparison of the selected columns, whereby the YMC Triart C18 UPLC and ACE C18 columns performed as best. Finally, differences in column performance and the (dis)advantages of both approaches are discussed.展开更多
This study focuses on the geochemical and bacteriological investigation of surface and ground water in the Bamoun plateau (Western-Cameroon). During the period from September 2013 to August 2014, 71 samples were colle...This study focuses on the geochemical and bacteriological investigation of surface and ground water in the Bamoun plateau (Western-Cameroon). During the period from September 2013 to August 2014, 71 samples were collected from two springs, one borehole, four wells and the Nchi stream for analysis of major elements. In order to obtain the characteristics of the various species of bacteria, 7 samples were selected. The analytical method adopted for this study is the conventional hydrochemical technic and multivariate statistical analysis, coupled with the hydrogeochemical modelling. The results revealed that, water from the zone under study are acidic to basic, very weakly to weakly mineralized. Four types of water were identified: 1) CaMg-HCO<sub>3</sub>;2) CaMg-Cl-SO<sub>4</sub>;3) NaCl-SO<sub>4</sub> and 4) NaK-HCO<sub>3</sub>. The major elements were all listed in the World Health Organization guidelines for drinking water quality, except for nitrates which was found at a concentration > 50 mg /l <span style="white-space:nowrap;">NO<sup>-</sup><sub style="margin-left:-7px;">3</sub> </span>in the borehole F401. As for the hydrobiological aspect, the entire sample contained all the bacteriological species except for spring S301 and well P401. According to the hydrogeochemical modelling, the Gibbs model and multivariate statistical tests, the quality of surface and ground water of the Foumban locality is influenced by two important factors: 1) the natural factors characterized by the water-rock interaction, evapotranspiration/crystallization, 2) the anthropogenic factors such as: uncontrolled discharges of liquid and solid effluents of all kinds and without any prior treatment within the ground and the strong urbanization accompanied by lack of sanitation and insufficient care.展开更多
For a city,analyzing its advantages,disadvantages and the level of economic development in a country is important,especially for the cities in China developing at flying speed.The corresponding literatures for the cit...For a city,analyzing its advantages,disadvantages and the level of economic development in a country is important,especially for the cities in China developing at flying speed.The corresponding literatures for the cities in China have not considered the indicators of economy and industry in detail.In this paper,based on multiple indicators of economy and industry,the urban hierarchical structure of 285 cities above the prefecture level in China is investigated.The indicators from the economy,industry,infrastructure,medical care,population,education,culture,and employment levels are selected to establish a new indicator system for analyzing urban hierarchical structure.The factor analysis method is used to investigate the relationship between the variables of selected indicators and obtain the score of each common factor and comprehensive scores and rankings for 285 cities above the prefecture level in China.According to the comprehensive scores,285 cities above the prefecture level are clustered into 15 levels by using K-means clustering algorithm.Then,the hierarchical structure system of the cities above the prefecture level in China is obtained and corresponding policy implications are proposed.The results and implications can not only be applied to the urban planning and development in China but also offer a reference on other developing countries.The methodologies used in this paper can also be applied to study the urban hierarchical structure in other countries.展开更多
In order to quantitatively analyze air traffic operation complexity,multidimensional metrics were selected based on the operational characteristics of traffic flow.The kernel principal component analysis method was ut...In order to quantitatively analyze air traffic operation complexity,multidimensional metrics were selected based on the operational characteristics of traffic flow.The kernel principal component analysis method was utilized to reduce the dimensionality of metrics,therefore to extract crucial information in the metrics.The hierarchical clustering method was used to analyze the complexity of different airspace.Fourteen sectors of Guangzhou Area Control Center were taken as samples.The operation complexity of traffic situation in each sector was calculated based on real flight radar data.Clustering analysis verified the feasibility and rationality of the method,and provided a reference for airspace operation and management.展开更多
The accurate extraction and classification of leather defects is an important guarantee for the automation and quality evaluation of leather industry. Aiming at the problem of data classification of leather defects,a ...The accurate extraction and classification of leather defects is an important guarantee for the automation and quality evaluation of leather industry. Aiming at the problem of data classification of leather defects,a hierarchical classification for defects is proposed.Firstly,samples are collected according to the method of minimum rectangle,and defects are extracted by image processing method.According to the geometric features of representation, they are divided into dot,line and surface for rough classification. From analysing the data which extracting the defects of geometry,gray and texture,the dominating characteristics can be acquired. Each type of defect by choosing different and representative characteristics,reducing the dimension of the data,and through these characteristics of clustering to achieve convergence effectively,realize extracted accurately,and digitized the defect characteristics,eventually establish the database. The results showthat this method can achieve more than 90% accuracy and greatly improve the accuracy of classification.展开更多
The quality of K-12 education has been a very big concern for years. Previous methods studied only one or two factors, such as school choice, or teacher quality, on school performance. Therefore the results they provi...The quality of K-12 education has been a very big concern for years. Previous methods studied only one or two factors, such as school choice, or teacher quality, on school performance. Therefore the results they provide can be limited. We propose a multi-agent approach to integrate multiple actors in a school system. These actors include teachers, students, supporting staffs and administrators. The interactions among these actors compose a hierarchical school social network. We first detect the hierarchical community structure in this school network by using an agglomerative hierarchical algorithm. Existing agglomerative hierarchical algorithms usually calculate similarity or dissimilarity between two clusters by using some measure of distance between pairs of observations. We, however, develop a method that calculates similarity based on social interactions between interactions is essential in multi-agent systems. Our algorithm is applied to 15 school districts in Bexar County, Texas, and it provides satisfying results on generating the hierarchical structure of all school districts. We then use the detected structure of the social network to evaluate the school system’s organization performance. We design and implement a funding evaluation model to decompose the funding policy task into subtasks and then evaluate these subtasks by using funding distribution policies from past years and looking for possible relationships between student performances and funding policies. Experiments in the 15 school districts in Bexar County show no significant correlation between student performance and the amount of the funding a school district received.展开更多
基金This work was supported by Science and Technology Research Program of Chongqing Municipal Education Commission(KJZD-M202300502,KJQN201800539).
文摘In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.
文摘Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare these methods. We offer the correct syntax to deactivate the similarity algorithm for clustering analysis within the hierarchical clustering module of SPSS. Findings: When one inputs co-occurrence matrices into the data editor of the SPSS hierarchical clustering module without deactivating the embedded similarity algorithm, the program calculates similarity twice, and thus distorts and overestimates the degree of similarity. Practical implications: We offer the correct syntax to block the similarity algorithm for clustering analysis in the SPSS hierarchical clustering module in the case of co-occurrence matrices. This syntax enables researchers to avoid obtaining incorrect results. Originality/value: This paper presents a method of editing syntax to prevent the default use of a similarity algorithm for SPSS's hierarchical clustering module. This will help researchers, especially those from China, to properly implement the co-occurrence matrix when using SPSS for hierarchical cluster analysis, in order to provide more scientific and rational results.
基金Supported by Science and Technology Research and Development Project of Chengde City,Hebei Province(201706A043)Young Scholar Program of Hebei Pharmaceutical Association Hospital Pharmaceutical Research Project(2020—Hbsyxhqn0029).
文摘[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal parenteral nutrition supportive therapy.[Methods]The data about neonatal PN formulations prepared by the Pharmacy Intravenous Admixture Services(PIVAS)of the Affiliated Hospital of Chengde Medical University from July 2015 to June 2021 were collected.The general information of the prescriptions and the frequency of drug use were analyzed with Excel 2019;the boxplot of drug dosing was drawn using GraphPad 8.0 software;and SPSS Modeler 18.0 and SPSS Statistics 26.0 were used to perform association rules and hierarchical cluster analysis.[Results]A total of 11488 PN prescriptions were collected from 1421 newborns,involving 18 kinds of drugs,which were divided into 11 types of nutrients.Association rules analysis yielded 84 nutrient substance combinations.The combination of fat emulsion-water-soluble vitamins-fat-soluble vitamins-glucose-amino acids had the highest confidence(99.95%).The hierarchical cluster analysis divided nutrients into 5 types.[Conclusions]The prescriptions of PN for newborns were composed of five types of nutrients:amino acids,fat emulsion,glucose,water-soluble vitamins,and fat-soluble vitamins.According to the lack of electrolytes and trace elements,appropriate drugs can be chosen to meet nutritional demands.This study provides reference basis for reasonable selection of drugs for neonatal PN prescriptions and further standardization of PN supportive therapy in newborns.
基金This research was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0012724,The Competency Development Program for Industry Specialist)and the Soonchunhyang University Research Fund.
文摘Social networking sites in the most modernized world are flooded with large data volumes.Extracting the sentiment polarity of important aspects is necessary;as it helps to determine people’s opinions through what they write.The Coronavirus pandemic has invaded the world and been given a mention in the social media on a large scale.In a very short period of time,tweets indicate unpredicted increase of coronavirus.They reflect people’s opinions and thoughts with regard to coronavirus and its impact on society.The research community has been interested in discovering the hidden relationships from short texts such as Twitter and Weiboa;due to their shortness and sparsity.In this paper,a hierarchical twitter sentiment model(HTSM)is proposed to show people’s opinions in short texts.The proposed HTSM has two main features as follows:constructing a hierarchical tree of important aspects from short texts without a predefined hierarchy depth and width,as well as analyzing the extracted opinions to discover the sentiment polarity on those important aspects by applying a valence aware dictionary for sentiment reasoner(VADER)sentiment analysis.The tweets for each extracted important aspect can be categorized as follows:strongly positive,positive,neutral,strongly negative,or negative.The quality of the proposed model is validated by applying it to a popular product and a widespread topic.The results show that the proposed model outperforms the state-of-the-art methods used in analyzing people’s opinions in short text effectively.
基金Supported by National Natural Science Foundation of China(30960179)Natural Science Foundation of Yunnan Province(2007A048M)~~
文摘[Objective] This research aimed to study the FTIR spectra of corn germs and endosperms so as to provide a scientific way for identifying corn of different types. [Method] The corn germs and endosperms of three types were studied by using Fourier transform infrared spectroscopy(FTIR) technology, combined with cluster analysis. [Result] The overall characteristics of original FTIR spectra were basically similar within the range of 700-1 800 cm^-1. The FTIR spectra were mainly composed by the absorption peaks of polysaccharides, proteins and lipids. Within the wavelength range of 700-1 800 cm^-1, there were only tiny differences in original FTIR spectra among the corn germs and endosperms of three different types. The spectra were then processed by using first derivative and second derivative. The second derivative spectra were used for hierarchical cluster analysis(HCA). The results showed that with the wavelength range of 700-1 800 cm^-1, the second derivative spectra of the 52 samples could be better clustered according to the tree types and corn germ and corn endosperm. The clustering correct rate reached 96.1%.[Conclusion] FTIR technology, combined with cluster analysis, can be used to identify different types of corn germs and endosperms, and it is characterized by convenience and rapidness.
基金supported by the National Natural Science Foundation of China(Grant Nos.82073808,81872828,and 81573384)。
文摘The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,rapid and sensitive HPLC-MS/MS method was developed for the identification and quantitation of the major bioactive components in C.chinensis fruits.Eighteen polyphenols were identified,which are first reported in C.chinensis fruits.Moreover,ten components were simultaneously quantified.The validated quantitative method was proved to be sensitive,reproducible and accurate.Then,it was applied to analyze batches of C.chinensis fruits from different phytomorph and areas.The principal components analysis(PCA)realized visualization and reduction of data set dimension while the hierarchical cluster analysis(HCA)indicated that the content of phenolic acids or all ten components might be used to differentiate C.chinensis fruits of different phytomorph.
文摘The paper deals with cluster analysis and comparison of clustering methods. Cluster analysis belongs to multivariate statistical methods. Cluster analysis is defined as general logical technique, procedure, which allows clustering variable objects into groups-clusters on the basis of similarity or dissimilarity. Cluster analysis involves computational procedures, of which purpose is to reduce a set of data on several relatively homogenous groups-clusters, while the condition of reduction is maximal and simultaneously minimal similarity of clusters. Similarity of objects is studied by the degree of similarity (correlation coefficient and association coefficient) or the degree of dissimilarity-degree of distance (distance coefficient). Methods of cluster analysis are on the basis of clustering classified as hierarchical or non-hierarchical methods.
文摘The problem of taking a set of data and separating it into subgroups where the elements of each subgroup are more similar to each other than they are to elements not in the subgroup has been extensively studied through the statistical method of cluster analysis. In this paper we want to discuss the application of this method to the field of education: particularly, we want to present the use of cluster analysis to separate students into groups that can be recognized and characterized by common traits in their answers to a questionnaire, without any prior knowledge of what form those groups would take (unsupervised classification). We start from a detailed study of the data processing needed by cluster analysis. Then two methods commonly used in cluster analysis are before described only from a theoretical point a view and after in the Section 4 through an example of application to data coming from an open-ended questionnaire administered to a sample of university students. In particular we describe and criticize the variables and parameters used to show the results of the cluster analysis methods.
文摘A genetic algorithm-based joint inversion method is presented for evaluating hydrocarbon-bearing geological formations. Conventional inversion procedures routinely used in the oil industry perform the inversion processing of borehole geophysical data locally. As having barely more types of data than unknowns in a depth, a set of marginally over-determined inverse problems has to be solved along a borehole, which is a rather noise sensitive procedure. For the reduction of noise effect, the amount of overdetermination must be increased. To fulfill this requirement, we suggest the use of our interval inversion method, which inverts simultaneously all data from a greater depth interval to estimate petrophysical parameters of reservoirs to the same interval. A series expansion based discretization scheme ensures much more data against unknowns that significantly reduces the estimation error of model parameters. The knowledge of reservoir boundaries is also required for reserve calculation. Well logs contain information about layer-thicknesses, but they cannot be extracted by the local inversion approach. We showed earlier that the depth coordinates of layerboundaries can be determined within the interval inversion procedure. The weakness of method is that the output of inversion is highly influenced by arbitrary assumptions made for layer-thicknesses when creating a starting model (i.e. number of layers, search domain of thicknesses). In this study, we apply an automated procedure for the determination of rock interfaces. We perform multidimensional hierarchical cluster analysis on well-logging data before inversion that separates the measuring points of different layers on a lithological basis. As a result, the vertical distribution of clusters furnishes the coordinates of layer-boundaries, which are then used as initial model parameters for the interval inversion procedure. The improved inversion method gives a fast, automatic and objective estimation to layer-boundaries and petrophysical parameters, which is demonstrated by a hydrocarbon field example.
文摘A fuzzy clustering analysis model based on the quotient space is proposed. Firstly, the conversion from coarse to fine granularity and the hierarchical structure are used to reduce the multidimensional samples. Secondly, the fuzzy compatibility relation matrix of the model is converted into fuzzy equivalence relation matrix. Finally, the diagram of clustering genealogy is generated according to the fuzzy equivalence relation matrix, which enables the dynamic selection of different thresholds to effectively solve the problem of cluster analysis of the samples with multi-dimensional attributes.
基金Supported by National Natural Science Foundation of China(30960179)Program for Innovative Research Team in Science and Technology in University of Yunnan Province~~
文摘In order to distinguish 8 kinds of rhizome crops, the 40 samples were studied by Fourier transform infrared spectroscopy (FTIR) combined with wavelet transform (WT), principal component analysis (PCA) and hieramhical cluster analysis (HCA). The results showed that the infrared spectra were similar on the whole, but there were differences in peak position, peak shape and peak absorption intensity in the range of 1 800-700 cm-1. The infrared spectra in the range of 1 800-700 cm-1 were selected to perform continuous wavelet transform (CWT) and discrete wavelet transform (DWT). The 15th-Ievel decomposition coefficients of CWT and the 5=-level detail coefficients of DWT were classified by PCA and HCA. The cumulative contri- bution rates of the first three principal components of CWT and DWT were 93.12% and 89.78%, respectively. The accurate recognition rates of PCA and HCA were all 100%. It is proved that FTIR combined with WT can be used to distinguish different kinds of rhizome crops.
基金supported by the National High-tech Research Project ("863" Project) of China under contract Nos 2003AA635180 and 2006AA09Z167the Public Welfare Project of Marine Science Research under contract No 200705011the open project of Key Laboratory of Integrated Marine Monitoring and Applied Technologies for Harmful Algal Blooms,SOA, China under contract No200811
文摘Hierarchical clustering analysis and principal component analysis (PCA) methods were used to assess the similarities and dissimilarities of the entire Excitation-emission matrix spectroscopy (EEMs) data sets of samples collected from Jiaozhou Bay, China. The results demonstrate that multivariate analysis facilitates the complex data treatment and spectral sorting processes, and also enhances the probability to reveal otherwise hidden information concerning the chemical characteristics of the dissolved organic matter (DOM). The distribution of different water samples as revealed by multivariate results has been used to track the movement of DOM material in the study area, and the interpretation is supported by the results obtained from the numerical simulation model of substance tracing technique, which show that the substance discharged by Haibo River can be distributed in Jiaozhou Bay.
基金funded by PhD grants of ‘Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen)’ (Nos. 101529 (MD) and 121512 (BG))The Special Research Fund (BOF) of Ghent University (01J22510 (EW) and 01D38811 (SS))
文摘Lipopeptides are currently re-emerging as an interesting subgroup in the peptide research field, having historical applications as antibacterial and antifungal agents and new potential applications as antiviral, antitumor, immune-modulating and cell-penetrating compounds. However, due to their specific structure, chromatographic analysis often requires special buffer systems or the use of trifluoroacetic acid, limiting mass spectrometry detection. Therefore, we used a traditional aqueous/acetonitrile based gradient system, containing 0.1% (m/v) formic acid, to separate four pharmaceutically relevant lipopeptides (polymyxin B1, caspofungin, daptomycin and gramicidin A1), which were selected based upon hierarchical cluster analysis (HCA) and principal component analysis (PCA).In total, the performance of four different C18 columns, including one UPLC column, were evaluated using two parallel approaches. First, a Derringer desirability function was used, whereby six single and multiple chromatographic response values were rescaled into one overall D-value per column. Using this approach, the YMC Pack Pro C18 column was ranked as the best column for general MS-compatible lipopeptide separation. Secondly, the kinetic plot approach was used to compare the different columns at different flow rate ranges. As the optimal kinetic column performance is obtained at its maximal pressure, the length elongation factor λ(Pmax/Pexp) was used to transform the obtained experimental data (retention times and peak capacities) and construct kinetic performance limit (KPL) curves, allowing a direct visual and unbiased comparison of the selected columns, whereby the YMC Triart C18 UPLC and ACE C18 columns performed as best. Finally, differences in column performance and the (dis)advantages of both approaches are discussed.
文摘This study focuses on the geochemical and bacteriological investigation of surface and ground water in the Bamoun plateau (Western-Cameroon). During the period from September 2013 to August 2014, 71 samples were collected from two springs, one borehole, four wells and the Nchi stream for analysis of major elements. In order to obtain the characteristics of the various species of bacteria, 7 samples were selected. The analytical method adopted for this study is the conventional hydrochemical technic and multivariate statistical analysis, coupled with the hydrogeochemical modelling. The results revealed that, water from the zone under study are acidic to basic, very weakly to weakly mineralized. Four types of water were identified: 1) CaMg-HCO<sub>3</sub>;2) CaMg-Cl-SO<sub>4</sub>;3) NaCl-SO<sub>4</sub> and 4) NaK-HCO<sub>3</sub>. The major elements were all listed in the World Health Organization guidelines for drinking water quality, except for nitrates which was found at a concentration > 50 mg /l <span style="white-space:nowrap;">NO<sup>-</sup><sub style="margin-left:-7px;">3</sub> </span>in the borehole F401. As for the hydrobiological aspect, the entire sample contained all the bacteriological species except for spring S301 and well P401. According to the hydrogeochemical modelling, the Gibbs model and multivariate statistical tests, the quality of surface and ground water of the Foumban locality is influenced by two important factors: 1) the natural factors characterized by the water-rock interaction, evapotranspiration/crystallization, 2) the anthropogenic factors such as: uncontrolled discharges of liquid and solid effluents of all kinds and without any prior treatment within the ground and the strong urbanization accompanied by lack of sanitation and insufficient care.
基金supported by National Key Research and Development Program of China(Grant No.2018YFC0704903).
文摘For a city,analyzing its advantages,disadvantages and the level of economic development in a country is important,especially for the cities in China developing at flying speed.The corresponding literatures for the cities in China have not considered the indicators of economy and industry in detail.In this paper,based on multiple indicators of economy and industry,the urban hierarchical structure of 285 cities above the prefecture level in China is investigated.The indicators from the economy,industry,infrastructure,medical care,population,education,culture,and employment levels are selected to establish a new indicator system for analyzing urban hierarchical structure.The factor analysis method is used to investigate the relationship between the variables of selected indicators and obtain the score of each common factor and comprehensive scores and rankings for 285 cities above the prefecture level in China.According to the comprehensive scores,285 cities above the prefecture level are clustered into 15 levels by using K-means clustering algorithm.Then,the hierarchical structure system of the cities above the prefecture level in China is obtained and corresponding policy implications are proposed.The results and implications can not only be applied to the urban planning and development in China but also offer a reference on other developing countries.The methodologies used in this paper can also be applied to study the urban hierarchical structure in other countries.
基金co-supported by the National Natural Science Foundation of China(No.61304190)the Fundamental Research Funds for the Central Universities of China(No.NJ20150030)the Youth Science and Technology Innovation Fund(No.NS2014067)
文摘In order to quantitatively analyze air traffic operation complexity,multidimensional metrics were selected based on the operational characteristics of traffic flow.The kernel principal component analysis method was utilized to reduce the dimensionality of metrics,therefore to extract crucial information in the metrics.The hierarchical clustering method was used to analyze the complexity of different airspace.Fourteen sectors of Guangzhou Area Control Center were taken as samples.The operation complexity of traffic situation in each sector was calculated based on real flight radar data.Clustering analysis verified the feasibility and rationality of the method,and provided a reference for airspace operation and management.
文摘The accurate extraction and classification of leather defects is an important guarantee for the automation and quality evaluation of leather industry. Aiming at the problem of data classification of leather defects,a hierarchical classification for defects is proposed.Firstly,samples are collected according to the method of minimum rectangle,and defects are extracted by image processing method.According to the geometric features of representation, they are divided into dot,line and surface for rough classification. From analysing the data which extracting the defects of geometry,gray and texture,the dominating characteristics can be acquired. Each type of defect by choosing different and representative characteristics,reducing the dimension of the data,and through these characteristics of clustering to achieve convergence effectively,realize extracted accurately,and digitized the defect characteristics,eventually establish the database. The results showthat this method can achieve more than 90% accuracy and greatly improve the accuracy of classification.
文摘The quality of K-12 education has been a very big concern for years. Previous methods studied only one or two factors, such as school choice, or teacher quality, on school performance. Therefore the results they provide can be limited. We propose a multi-agent approach to integrate multiple actors in a school system. These actors include teachers, students, supporting staffs and administrators. The interactions among these actors compose a hierarchical school social network. We first detect the hierarchical community structure in this school network by using an agglomerative hierarchical algorithm. Existing agglomerative hierarchical algorithms usually calculate similarity or dissimilarity between two clusters by using some measure of distance between pairs of observations. We, however, develop a method that calculates similarity based on social interactions between interactions is essential in multi-agent systems. Our algorithm is applied to 15 school districts in Bexar County, Texas, and it provides satisfying results on generating the hierarchical structure of all school districts. We then use the detected structure of the social network to evaluate the school system’s organization performance. We design and implement a funding evaluation model to decompose the funding policy task into subtasks and then evaluate these subtasks by using funding distribution policies from past years and looking for possible relationships between student performances and funding policies. Experiments in the 15 school districts in Bexar County show no significant correlation between student performance and the amount of the funding a school district received.