RAPD (random amplified polymorphic DNA) markers were generated from filaments of 15 Porphyra lines representing four important groups, P. yezoensis, P. haitanensis, P. katadai var. hemiphylla and P. olig...RAPD (random amplified polymorphic DNA) markers were generated from filaments of 15 Porphyra lines representing four important groups, P. yezoensis, P. haitanensis, P. katadai var. hemiphylla and P. oligospermatangia . Among the total 69 fragments generated by 6 selected primers (among 50 primers), 67 appeared to be polymorphic (97.1%). Cluster analysis based on the RAPD results was performed. The 15 Porphyra lines were divided into 3 groups. This result was consistent with that from taxonomy analysis. A DNA fingerprinting based on 8 bands amplified with OPN_02 and OPJ_18 was constructed and might be used in Porphyra variety identification. Five specific RAPD fragments of 5 Porphyra lines were isolated and cloned into pGEM_T easy vector. These five RAPD fragments may be useful in germplasm identification and property protection of Porphyra .展开更多
This paper suggests that a single class rather than methods should be used as the slice scope to compute class cohesion. First, for a given attribute, the statements in all methods that last define the attribute are c...This paper suggests that a single class rather than methods should be used as the slice scope to compute class cohesion. First, for a given attribute, the statements in all methods that last define the attribute are computed. Then, the forward and backward data slices for this attribute are generated by using the class as the slice scope and are combined to compute the corresponding class data slice. Finally, the class cohesion is computed based on all class data slices for the attributes. Compared to traditional cohesion metrics that use methods as the slice scope, the proposed metrics that use a single class as slice scope take into account the possible interactions between the methods. The experimental results show that class cohesion can be more accurately measured when using the class as the slice scope.展开更多
A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive...A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive to initializations and often generates coincident clusters. AFCM overcomes this shortcoming and it is an ex tension of PCM. Membership and typicality values can be simultaneously produced in AFCM. Experimental re- suits show that noise data can be well processed, coincident clusters are avoided and clustering accuracy is better.展开更多
Inter-simple sequence repeat(ISSR) molecular markers were applied to analyze the genetic diversity and clustering of 48 introduced and bred cultivars of Olea euyopaea L. Totally 106 DNA bands were amplified by 11 sc...Inter-simple sequence repeat(ISSR) molecular markers were applied to analyze the genetic diversity and clustering of 48 introduced and bred cultivars of Olea euyopaea L. Totally 106 DNA bands were amplified by 11 screened primers, including 99 polymorphic bands; the percentage of polymorphic loci was 93.40%, indicating a rich genetic diversity in Olea euyopaea L. germplasm resources. Based on Nei's genetic distances between various cultivars, a dendrogram of 48 cultivars of Olea euyopaea L. was constructed using unweighted pair-group(UPMGA)method,which showed that 48 cultivars were clustered into four main categories; 84.6% of native cultivars were clustered into two categories; most of introduced cultivars were clustered based on their sources and main usages but not on their geographic origins. This study will provide references for the utilization and further genetic improvement of Olea euyopaea L. germplasm resources.展开更多
[Objective] The aim was to study the variation of leaf characters from different provenance sources of Polygonum multiflorum Thunb,as well as to carry out cluster analysis on P.multiflorum from different provenance so...[Objective] The aim was to study the variation of leaf characters from different provenance sources of Polygonum multiflorum Thunb,as well as to carry out cluster analysis on P.multiflorum from different provenance sources to provide basis for the classification,identification,breeding and improved variety selection of P.multiflorum.[Method] Leaf shape characters of 31 copies of germplasm resources in the major distribution region of the whole country were determined,and the genetic variation of P.multiflorum leaves from different producing areas was analyzed.[Result] The leaf characters of single plant of the same experimental provenance source of P.multiflorum were relatively stable,the variation was mainly found on the single leaf area,1/2 leaf width,leaf width and other indicators;the variation of each leaf character among different provenance sources was obvious,and the variation was mainly found on the single leaf weight,leaf area,1/2 leaf width,leaf length and other indicators.The correlation analysis of each leaf character in P.multiflorum suggested that the single leaf area and single leaf weight showed extremely significant positive correlation with leaf length,1/2 leaf width,leaf width,leaf thickness and leaf stem length,while the single leaf area and single leaf weight showed significant negative correlation with WWR(leaf width/1/2 leaf width)and LWR(leaf length/1/2 leaf length),in addition,several macroscopic leaf characters such as leaf length,1/2 leaf width,leaf width,leaf stem length showed extremely positive correlation.The main component analysis result suggested that the contribution rate of accumulation variance of the front three main components was up to 97.4%,which could better reflect the comprehensive performance of leaf characters of different provenance sources of P.multiflorum.The cluster analysis showed that the experimental 31 copies of P.multiflorum provenance sources should be divided into three classes,the first class was distributed in the Middle,Western of Guizhou,northwestern of Guangxi and western areas with higher altitude;the second class was distributed in Hunan,Hubei,Sichuan,Guangdong and the most area of Guangxi;the third class was distributed in Anhui,Jiangsu and Henan and Shandong.[Conclusion] Cluster analysis of leaf characters indicated that the kinds of provenance sources which the geographical position was closer could be got together.The study had provided a certain basis for the classification of P.multiflorum.展开更多
[Objective] The aim was to study the genetic diversities between Xiaogan water chestnut and wild chestnut with randomly amplified polymorphic DNA (RAPD) technology. [Method] Genetic diversities of the local cultivat...[Objective] The aim was to study the genetic diversities between Xiaogan water chestnut and wild chestnut with randomly amplified polymorphic DNA (RAPD) technology. [Method] Genetic diversities of the local cultivated water chestnut,wild chestnut,Lepironia articulata and Scirpus planiculmis Fr. Schmidt were analyzed by RAPD technology. [Result] Among the screened random primers 841,842,807 and 840,the polymorphism of amplification product of 841 was evident,and the obtained bands in electrophoresis were clear and showed good repeatability. Cluster analysis result showed that the affinity of cultivated water chestnut and wild water chestnut was nearer than that between Lepironia articulata and Scirpus planiculmis. [Conclusion] The research provides theoretical basis for cultivating high-quality new varieties of water chestnut.展开更多
An approach to identifying fuzzy models considering both interpretability and precision was proposed. Firstly, interpretability issues about fuzzy models were analyzed. Then, a heuristic strategy was used to select in...An approach to identifying fuzzy models considering both interpretability and precision was proposed. Firstly, interpretability issues about fuzzy models were analyzed. Then, a heuristic strategy was used to select input variables by increasing the number of input variables, and the Gustafson-Kessel fuzzy clustering algorithm, combined with the least square method, was used to identify the fuzzy model. Subsequently, an interpretability measure was described by the product of the number of input variables and the number of rules, while precision was weighted by root mean square error, and the selection objective function concerning interpretability and precision was defined. Given the maximum and minimum number of input variables and rules, a set of fuzzy models was constructed. Finally, the optimal fuzzy model was selected by the objective function, and was optimized by a genetic algorithm to achieve a good tradeoff between interpretability and precision. The performance of the proposed method was illustrated by the well-known Box-Jenkins gas furnace benchmark; the results demonstrate its validity.展开更多
[Objective] This study aimed to screen out hot pepper germplasms highly resistant to Meloidogyne incognita, thereby providing resistant resources for hot pep- per breeding. [Method] Comprehensive analysis combining cl...[Objective] This study aimed to screen out hot pepper germplasms highly resistant to Meloidogyne incognita, thereby providing resistant resources for hot pep- per breeding. [Method] Comprehensive analysis combining cluster analysis and sub- ordinate function was conducted through determining related resistance indexes of 67 hot pepper germplasms 50 days after inoculated with M. incognita. [Result] The effects of M. incognita on related resistance indexes were significantly different am- ong the hot pepper germplasms. Egg index and gall index had abundant genetic variation with variation coefficients of 143.16% and 118.95%, respectively. Based on the gall indexes, cluster analysis of hot pepper germplasms was performed. The 67 hot pepper germplasms were divided into 4 groups (resistant, moderately resistant, susceptible and high susceptible). The resistance intensity of the hot pepper germplasms were ranked according to the sum of subordinate function values of various resistance indexes. The total function values of Rela 2 and L506M were the largest (2.00), indicating that these two germplasms were immune to M. incognita. The total function values of L287-2, L522-1M, L504M, L515-2, 13SM100-1, L512M, L292-1, L319, L316, L317, 13SM87-1 and Rela 5 were larger than 1.95, indicating that these germplasms were highly resistant to M. incognita. [Conclusion] This study could provide certain resistant resources for resistance breeding of hot pepper to M. incognita.展开更多
[Objective] This study aimed to analyze the morphological diversity of red- seed watermelon (Citrullus lanatus ssp. vulgaris var. megalaspermus Lin et Chao) germplasm resources. [Method] Multiple cluster analysis an...[Objective] This study aimed to analyze the morphological diversity of red- seed watermelon (Citrullus lanatus ssp. vulgaris var. megalaspermus Lin et Chao) germplasm resources. [Method] Multiple cluster analysis and principal components analysis on the morphological traits of 51 red-seed watermelon germplasm resources were carried out. [Result] The coefficient of variations (CVs) of 39 morphological traits in 51 red-seed watermelon idioplasm resources ranged from 5.37% to 66.95%, with an average of 22.87%. The average of Shannon diversity information indices was 1.55. Among them, the Shannon diversity information index of seed length was the highest (2.16) and that of seed shell figure pattern was the lowest (0.32). In ad- dition, the morphological diversity information indices of quantity characters were higher than that of quality characters. The principal components analysis revealed that the variance contribution rates of the first, second and third principal compo- nents were 19.49%, 15.32% and 9.55%, respectively. Cluster analysis divided the 51 materials into three broad branches based on the morphological traits. There was only one material in the fist branch and two in the second branch, and all the three materials were wild. The other 48 materials were divided into the third branch and all of them were cultivars. [Conclusion] This study provided a theoretical basis for the protection and utilization of red-seed watermelon resources.展开更多
In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising...In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.展开更多
In order to analyze the heterogeneity in vehicular traffic speed, a new method that integrates cluster analysis and probability distribution function fitting is presented. First, for identifying the optimal number of ...In order to analyze the heterogeneity in vehicular traffic speed, a new method that integrates cluster analysis and probability distribution function fitting is presented. First, for identifying the optimal number of clusters, the two-step cluster method is applied to analyze actual speed data, which suggests that dividing speed data into two clusters can best reflect the intrinsic patterns of traffic flows. Such information is then taken as guidance in probability distribution function fitting. The normal, skew-normal and skew-t distribution functions are used to fit the probability distribution of each cluster respectively, which suggests that the skew-t distribution has the highest fitting accuracy; the second is skew-normal distribution; the worst is normal distribution. Model analysis results demonstrate that the proposed mixture model has a better fitting and generalization capability than the conventional single model. In addition, the new method is more flexible in terms of data fitting and can provide a more accurate model of speed distribution.展开更多
In order to scientifically evaluate the values of Cucurbita moschata cultivars, main botanical characters including the initial flowering date, the first fruiting node, fruit length, fruit stem length, stem diameter, ...In order to scientifically evaluate the values of Cucurbita moschata cultivars, main botanical characters including the initial flowering date, the first fruiting node, fruit length, fruit stem length, stem diameter, internode length, the transverse and longitudinal diameters of the largest leaf, single fruit weight, flesh thickness and soluble solid content of 41 cultivars were measured for conducting diversity, correlation and cluster analysis. The results revealed that the pumpkin cultivars showed large variations in fruit stem length, single fruit weight, fruit length and flesh thickness, but small variations in initial flowering date. Significant, even highly significant correlations were found among the tested traits. Cluster analysis demonstrated that the 41 old Cucurbita moschata cultivars were divided into three groups, of which multiple traits of Group 1 were better than those in the other two groups. High similarities existed in three groups and the cultivars in each group. This research provided basis for selecting excellent traits and parents for the breeding of hybrids.展开更多
In order to reveal the genetic differences and agronomic traits of Fagopy-rum tataricum_ varieties (lines) intuitively, explore good resources and avoid the blindness of parent selection during the breeding process,...In order to reveal the genetic differences and agronomic traits of Fagopy-rum tataricum_ varieties (lines) intuitively, explore good resources and avoid the blindness of parent selection during the breeding process, six primary agronomic traits of 45 F. tataricum_ varieties (lines) that came from the eleven buckwheat breeding departments across the country were analyzed with principal component analysis and cluster analysis. The results of principal component analysis showed that the six agronomic traits could be simplified into three principal components, and the cumulative contribution rate reached 83%. The results of cluster analysis showed that the 45 F. tataricum varieties (lines) were classified into four groups:high stalk, medium yield and smal grain type, medium stalk, high yield and large grain type, medium stalk, low yield and smal grain type and high stalk, medium yield and medium grain type. Among them, performance of comprehensive trait of the second type was better than that of the other types. Thus, the F. tataricum_va-rieties (lines) that were classified into the second type could be considered as good varieties (lines) or breeding materials. The genetic differences among F. tataricum_varieties (lines) had no necessary correlations with origin and geographical distance. ln addition to complementary traits and geographical distance, genetic distances (dif-ferent populations) should be taken into consideration during parent selection in cross breeding.展开更多
A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywor...A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described.展开更多
The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied. Unlike most traditional clustering algorithms that group the...The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied. Unlike most traditional clustering algorithms that group the close values of objects in all the dimensions or a set of dimensions, clustering by pattern similarity shows an interesting pattern, where objects exhibit a coherent pattern of rise and fall in subspaces. A novel approach, named EMaPle to mine the maximal pattern-based subspace clusters, is designed. The EMaPle searches clusters only in the attribute enumeration spaces which are relatively few compared to the large number of row combinations in the typical datasets, and it exploits novel pruning techniques. EMaPle can find the clusters satisfying coherent constraints, size constraints and sign constraints neglected in MaPle. Both synthetic data sets and real data sets are used to evaluate EMaPle and demonstrate that it is more effective and scalable than MaPle.展开更多
文摘RAPD (random amplified polymorphic DNA) markers were generated from filaments of 15 Porphyra lines representing four important groups, P. yezoensis, P. haitanensis, P. katadai var. hemiphylla and P. oligospermatangia . Among the total 69 fragments generated by 6 selected primers (among 50 primers), 67 appeared to be polymorphic (97.1%). Cluster analysis based on the RAPD results was performed. The 15 Porphyra lines were divided into 3 groups. This result was consistent with that from taxonomy analysis. A DNA fingerprinting based on 8 bands amplified with OPN_02 and OPJ_18 was constructed and might be used in Porphyra variety identification. Five specific RAPD fragments of 5 Porphyra lines were isolated and cloned into pGEM_T easy vector. These five RAPD fragments may be useful in germplasm identification and property protection of Porphyra .
基金The National Natural Science Foundation of China(No.60425206,60633010)the High Technology Research and Development Program of Jiangsu Province(No.BG2005032)
文摘This paper suggests that a single class rather than methods should be used as the slice scope to compute class cohesion. First, for a given attribute, the statements in all methods that last define the attribute are computed. Then, the forward and backward data slices for this attribute are generated by using the class as the slice scope and are combined to compute the corresponding class data slice. Finally, the class cohesion is computed based on all class data slices for the attributes. Compared to traditional cohesion metrics that use methods as the slice scope, the proposed metrics that use a single class as slice scope take into account the possible interactions between the methods. The experimental results show that class cohesion can be more accurately measured when using the class as the slice scope.
文摘A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive to initializations and often generates coincident clusters. AFCM overcomes this shortcoming and it is an ex tension of PCM. Membership and typicality values can be simultaneously produced in AFCM. Experimental re- suits show that noise data can be well processed, coincident clusters are avoided and clustering accuracy is better.
基金Supported by Key Project of New Product Development in Yunnan Province(2009BB006)~~
文摘Inter-simple sequence repeat(ISSR) molecular markers were applied to analyze the genetic diversity and clustering of 48 introduced and bred cultivars of Olea euyopaea L. Totally 106 DNA bands were amplified by 11 screened primers, including 99 polymorphic bands; the percentage of polymorphic loci was 93.40%, indicating a rich genetic diversity in Olea euyopaea L. germplasm resources. Based on Nei's genetic distances between various cultivars, a dendrogram of 48 cultivars of Olea euyopaea L. was constructed using unweighted pair-group(UPMGA)method,which showed that 48 cultivars were clustered into four main categories; 84.6% of native cultivars were clustered into two categories; most of introduced cultivars were clustered based on their sources and main usages but not on their geographic origins. This study will provide references for the utilization and further genetic improvement of Olea euyopaea L. germplasm resources.
基金Supported by High-tech Research Project of Jiangsu Province(BG2004314)~~
文摘[Objective] The aim was to study the variation of leaf characters from different provenance sources of Polygonum multiflorum Thunb,as well as to carry out cluster analysis on P.multiflorum from different provenance sources to provide basis for the classification,identification,breeding and improved variety selection of P.multiflorum.[Method] Leaf shape characters of 31 copies of germplasm resources in the major distribution region of the whole country were determined,and the genetic variation of P.multiflorum leaves from different producing areas was analyzed.[Result] The leaf characters of single plant of the same experimental provenance source of P.multiflorum were relatively stable,the variation was mainly found on the single leaf area,1/2 leaf width,leaf width and other indicators;the variation of each leaf character among different provenance sources was obvious,and the variation was mainly found on the single leaf weight,leaf area,1/2 leaf width,leaf length and other indicators.The correlation analysis of each leaf character in P.multiflorum suggested that the single leaf area and single leaf weight showed extremely significant positive correlation with leaf length,1/2 leaf width,leaf width,leaf thickness and leaf stem length,while the single leaf area and single leaf weight showed significant negative correlation with WWR(leaf width/1/2 leaf width)and LWR(leaf length/1/2 leaf length),in addition,several macroscopic leaf characters such as leaf length,1/2 leaf width,leaf width,leaf stem length showed extremely positive correlation.The main component analysis result suggested that the contribution rate of accumulation variance of the front three main components was up to 97.4%,which could better reflect the comprehensive performance of leaf characters of different provenance sources of P.multiflorum.The cluster analysis showed that the experimental 31 copies of P.multiflorum provenance sources should be divided into three classes,the first class was distributed in the Middle,Western of Guizhou,northwestern of Guangxi and western areas with higher altitude;the second class was distributed in Hunan,Hubei,Sichuan,Guangdong and the most area of Guangxi;the third class was distributed in Anhui,Jiangsu and Henan and Shandong.[Conclusion] Cluster analysis of leaf characters indicated that the kinds of provenance sources which the geographical position was closer could be got together.The study had provided a certain basis for the classification of P.multiflorum.
基金Supported by Natural Science Foundation of Hubei Province(2005ABA084)Major Projects of Hubei Provincial Department of Education (04Z002)~~
文摘[Objective] The aim was to study the genetic diversities between Xiaogan water chestnut and wild chestnut with randomly amplified polymorphic DNA (RAPD) technology. [Method] Genetic diversities of the local cultivated water chestnut,wild chestnut,Lepironia articulata and Scirpus planiculmis Fr. Schmidt were analyzed by RAPD technology. [Result] Among the screened random primers 841,842,807 and 840,the polymorphism of amplification product of 841 was evident,and the obtained bands in electrophoresis were clear and showed good repeatability. Cluster analysis result showed that the affinity of cultivated water chestnut and wild water chestnut was nearer than that between Lepironia articulata and Scirpus planiculmis. [Conclusion] The research provides theoretical basis for cultivating high-quality new varieties of water chestnut.
文摘An approach to identifying fuzzy models considering both interpretability and precision was proposed. Firstly, interpretability issues about fuzzy models were analyzed. Then, a heuristic strategy was used to select input variables by increasing the number of input variables, and the Gustafson-Kessel fuzzy clustering algorithm, combined with the least square method, was used to identify the fuzzy model. Subsequently, an interpretability measure was described by the product of the number of input variables and the number of rules, while precision was weighted by root mean square error, and the selection objective function concerning interpretability and precision was defined. Given the maximum and minimum number of input variables and rules, a set of fuzzy models was constructed. Finally, the optimal fuzzy model was selected by the objective function, and was optimized by a genetic algorithm to achieve a good tradeoff between interpretability and precision. The performance of the proposed method was illustrated by the well-known Box-Jenkins gas furnace benchmark; the results demonstrate its validity.
基金Supported by National Nonprofit Institute Research Grant of CATAS-TCGRI(1630032014019,1630032015003)Key Research&Development Project of Hainan Province(ZDYF2016225)Key Technology Research and Demonstration Project of Farmland Improvement of Hainan Province(HNGDpz2015)
文摘[Objective] This study aimed to screen out hot pepper germplasms highly resistant to Meloidogyne incognita, thereby providing resistant resources for hot pep- per breeding. [Method] Comprehensive analysis combining cluster analysis and sub- ordinate function was conducted through determining related resistance indexes of 67 hot pepper germplasms 50 days after inoculated with M. incognita. [Result] The effects of M. incognita on related resistance indexes were significantly different am- ong the hot pepper germplasms. Egg index and gall index had abundant genetic variation with variation coefficients of 143.16% and 118.95%, respectively. Based on the gall indexes, cluster analysis of hot pepper germplasms was performed. The 67 hot pepper germplasms were divided into 4 groups (resistant, moderately resistant, susceptible and high susceptible). The resistance intensity of the hot pepper germplasms were ranked according to the sum of subordinate function values of various resistance indexes. The total function values of Rela 2 and L506M were the largest (2.00), indicating that these two germplasms were immune to M. incognita. The total function values of L287-2, L522-1M, L504M, L515-2, 13SM100-1, L512M, L292-1, L319, L316, L317, 13SM87-1 and Rela 5 were larger than 1.95, indicating that these germplasms were highly resistant to M. incognita. [Conclusion] This study could provide certain resistant resources for resistance breeding of hot pepper to M. incognita.
基金Supported by the National Program for Space Breeding Special Fund of(2006HT100113)China Agriculture Research System(CARS-26)~~
文摘[Objective] This study aimed to analyze the morphological diversity of red- seed watermelon (Citrullus lanatus ssp. vulgaris var. megalaspermus Lin et Chao) germplasm resources. [Method] Multiple cluster analysis and principal components analysis on the morphological traits of 51 red-seed watermelon germplasm resources were carried out. [Result] The coefficient of variations (CVs) of 39 morphological traits in 51 red-seed watermelon idioplasm resources ranged from 5.37% to 66.95%, with an average of 22.87%. The average of Shannon diversity information indices was 1.55. Among them, the Shannon diversity information index of seed length was the highest (2.16) and that of seed shell figure pattern was the lowest (0.32). In ad- dition, the morphological diversity information indices of quantity characters were higher than that of quality characters. The principal components analysis revealed that the variance contribution rates of the first, second and third principal compo- nents were 19.49%, 15.32% and 9.55%, respectively. Cluster analysis divided the 51 materials into three broad branches based on the morphological traits. There was only one material in the fist branch and two in the second branch, and all the three materials were wild. The other 48 materials were divided into the third branch and all of them were cultivars. [Conclusion] This study provided a theoretical basis for the protection and utilization of red-seed watermelon resources.
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Postdoctoral Scientific Program of Jiangsu Province(No.0701045B)
文摘In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.
基金The National Science Foundation by Changjiang Scholarship of Ministry of Education of China(No.BCS-0527508)the Joint Research Fund for Overseas Natural Science of China(No.51250110075)+1 种基金the Natural Science Foundation of Jiangsu Province(No.BK200910046)the Postdoctoral Science Foundation of Jiangsu Province(No.0901005C)
文摘In order to analyze the heterogeneity in vehicular traffic speed, a new method that integrates cluster analysis and probability distribution function fitting is presented. First, for identifying the optimal number of clusters, the two-step cluster method is applied to analyze actual speed data, which suggests that dividing speed data into two clusters can best reflect the intrinsic patterns of traffic flows. Such information is then taken as guidance in probability distribution function fitting. The normal, skew-normal and skew-t distribution functions are used to fit the probability distribution of each cluster respectively, which suggests that the skew-t distribution has the highest fitting accuracy; the second is skew-normal distribution; the worst is normal distribution. Model analysis results demonstrate that the proposed mixture model has a better fitting and generalization capability than the conventional single model. In addition, the new method is more flexible in terms of data fitting and can provide a more accurate model of speed distribution.
基金Supported by Special Fund for Agro-scientific Research in the Public Interest from the Ministry of Agriculture of China(201303112)the 12th National Five-year Plan for Science and Technology Program of Rural Areas(2012BAD02B03-17)~~
文摘In order to scientifically evaluate the values of Cucurbita moschata cultivars, main botanical characters including the initial flowering date, the first fruiting node, fruit length, fruit stem length, stem diameter, internode length, the transverse and longitudinal diameters of the largest leaf, single fruit weight, flesh thickness and soluble solid content of 41 cultivars were measured for conducting diversity, correlation and cluster analysis. The results revealed that the pumpkin cultivars showed large variations in fruit stem length, single fruit weight, fruit length and flesh thickness, but small variations in initial flowering date. Significant, even highly significant correlations were found among the tested traits. Cluster analysis demonstrated that the 41 old Cucurbita moschata cultivars were divided into three groups, of which multiple traits of Group 1 were better than those in the other two groups. High similarities existed in three groups and the cultivars in each group. This research provided basis for selecting excellent traits and parents for the breeding of hybrids.
基金Supported by National Oat and Buckwheat Industrial Technology System(CARS-08-A-1-3)Breeding Project of Shanxi Academy of Agricultural Sciences during the Thirteenth Five-Year Plan Period(16yzgc035)~~
文摘In order to reveal the genetic differences and agronomic traits of Fagopy-rum tataricum_ varieties (lines) intuitively, explore good resources and avoid the blindness of parent selection during the breeding process, six primary agronomic traits of 45 F. tataricum_ varieties (lines) that came from the eleven buckwheat breeding departments across the country were analyzed with principal component analysis and cluster analysis. The results of principal component analysis showed that the six agronomic traits could be simplified into three principal components, and the cumulative contribution rate reached 83%. The results of cluster analysis showed that the 45 F. tataricum varieties (lines) were classified into four groups:high stalk, medium yield and smal grain type, medium stalk, high yield and large grain type, medium stalk, low yield and smal grain type and high stalk, medium yield and medium grain type. Among them, performance of comprehensive trait of the second type was better than that of the other types. Thus, the F. tataricum_va-rieties (lines) that were classified into the second type could be considered as good varieties (lines) or breeding materials. The genetic differences among F. tataricum_varieties (lines) had no necessary correlations with origin and geographical distance. ln addition to complementary traits and geographical distance, genetic distances (dif-ferent populations) should be taken into consideration during parent selection in cross breeding.
基金The Young Teachers Scientific Research Foundation (YTSRF) of Nanjing University of Science and Technology in the Year of2005-2006.
文摘A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described.
基金The National Natural Science Foundation of China(No60273075)
文摘The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied. Unlike most traditional clustering algorithms that group the close values of objects in all the dimensions or a set of dimensions, clustering by pattern similarity shows an interesting pattern, where objects exhibit a coherent pattern of rise and fall in subspaces. A novel approach, named EMaPle to mine the maximal pattern-based subspace clusters, is designed. The EMaPle searches clusters only in the attribute enumeration spaces which are relatively few compared to the large number of row combinations in the typical datasets, and it exploits novel pruning techniques. EMaPle can find the clusters satisfying coherent constraints, size constraints and sign constraints neglected in MaPle. Both synthetic data sets and real data sets are used to evaluate EMaPle and demonstrate that it is more effective and scalable than MaPle.