In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all usef...In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all useful information across quantiles and can detect nonlinear effects including interactions and heterogeneity,effectively.Furthermore,the proposed screening method based on cCCQC is robust to the existence of outliers and enjoys the sure screening property.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors,particularly when the variables are highly correlated.展开更多
[Objective] The aim was to separate chromium-resistant microorganism from soil contaminated by chromium.[Method] Separation and purification technique was used as follows:different concentrations of Cr^6+ were added...[Objective] The aim was to separate chromium-resistant microorganism from soil contaminated by chromium.[Method] Separation and purification technique was used as follows:different concentrations of Cr^6+ were added into medium,and chromium-resistant fungi were screened after separations and domestications.The selected fungi were under preliminary identification according to its morphological and colony characteristics.Then,related biological characteristics were studied,including measurement of growth curve,growing effects by temperature,pH value and osmotic pressure.[Result] The Cr(VI) with concentration of 1 000 mg/L was separated and selected from soils in ten different places contaminated seriously by heavy metal in adjacent region of Yulin City.Considering its morphological and colony characteristics,it was preliminarily identified as saccharomycetes,which can well grow within 15-37 ℃,and whose most suitable temperature was 28℃.Bacterial strain can grow well with pH of 4-10,and the optimum pH was 7.2;besides,it can grow well with NaCl concentration of 0.5%-5.0%.Through the experiment,the bacteria was found with resistance not only to chromium,but also to heavy metals such as Pb+Cu,Cu+Fe,Pb+Fe,and Pb+Cu+Fe.[Conclusion] The fungi selected from the experiment were of good adaptability to natural environment,and it also had resistance to other heavy metals.展开更多
Lung cancer is a leading cause of cancer death worldwide. Some lung cancer patients correlate with a gas of radon besides smoking. To search for common chromosomal aberrations in lung cancer cell lines established fro...Lung cancer is a leading cause of cancer death worldwide. Some lung cancer patients correlate with a gas of radon besides smoking. To search for common chromosomal aberrations in lung cancer cell lines established from patients induced by different factors, a combined approach of chromosome sorting, forward and reverse chromosome painting was used to characterize karyotypes of two lung adenocarcinoma cell lines: A549 and GLC-82 with the latter line derived from a patient who has suffered long-term exposure to environmental radon gas pollution. The chromosome painting results revealed that complex chromosomal rearrangements occurred in these two lung adenocarcinoma cell lines. Thirteen and twenty-four abnormal chromosomes were identified An A549 and GLC-82 cell lines, respectively. Almost half of abnormal chromosomes in these two cell lines were formed by non-reciprocal translocations, the others were derived from deletions and duplication/or amplification in some chromosomal regions. Furthermore, two apparently common breakpoints, HSA8q24 and 12q14 were found in these two lung cancer cell lines.展开更多
The model of optimization problem for Support Vector Machine(SVM) is provided, which based on the definitions of the dual norm and the distance between a point and its projection onto a given plane. The model of impro...The model of optimization problem for Support Vector Machine(SVM) is provided, which based on the definitions of the dual norm and the distance between a point and its projection onto a given plane. The model of improved Support Vector Machine based on 1-norm(1-SVM) is provided from the optimization problem, yet it is a discrete programming. With the smoothing technique and optimality knowledge, the discrete programming is changed into a continuous programming. Experimental results show that the algorithm is easy to implement and this method can select and suppress the problem features more efficiently. Illustrative examples show that the 1-SVM deal with the linear or nonlinear classification well.展开更多
Objective: To develop a new bioinformatic tool based on a data-mining approach for extraction of the most infor- mative proteins that could be used to find the potential biomarkers for the detection of cancer. Methods...Objective: To develop a new bioinformatic tool based on a data-mining approach for extraction of the most infor- mative proteins that could be used to find the potential biomarkers for the detection of cancer. Methods: Two independent datasets from serum samples of 253 ovarian cancer and 167 breast cancer patients were used. The samples were examined by surface- enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). The datasets were used to extract the informative proteins using a data-mining method in the discrete stationary wavelet transform domain. As a dimensionality re- duction procedure, the hard thresholding method was applied to reduce the number of wavelet coefficients. Also, a distance measure was used to select the most discriminative coefficients. To find the potential biomarkers using the selected wavelet coefficients, we applied the inverse discrete stationary wavelet transform combined with a two-sided t-test. Results: From the ovarian cancer dataset, a set of five proteins were detected as potential biomarkers that could be used to identify the cancer patients from the healthy cases with accuracy, sensitivity, and specificity of 100%. Also, from the breast cancer dataset, a set of eight proteins were found as the potential biomarkers that could separate the healthy cases from the cancer patients with accuracy of 98.26%, sensitivity of 100%, and specificity of 95.6%. Conclusion: The results have shown that the new bioinformatic tool can be used in combination with the high-throughput proteomic data such as SELDI-TOF MS to find the potential biomarkers with high discriminative power.展开更多
Flotation is a complex multifaceted process that is widely used for the separation of finely ground minerals. The theory of froth flotation is complex and is not completely understood. This fact has been brought many ...Flotation is a complex multifaceted process that is widely used for the separation of finely ground minerals. The theory of froth flotation is complex and is not completely understood. This fact has been brought many monitoring challenges in a coal processing plant. To solve those challenges, it is important to understand the effect of different parameters on the fine particle separation, and control flotation performance for a particular system. This study is going to indicate the effect of various parameters (particle Characteristics and hydrodynamic conditions) on coal flotation responses (flotation rate constant and recovery) by different modeling techniques. A comprehensive coal flotation database was prepared for the statistical and soft computing methods. Statistical factors were used for variable selections. Results were in a good agreement with recent theoretical flotation investigations. Computational models accurately can estimate flotation rate constant and coal recovery (correlation coefficient 0.85, and 0.99, respectively). According to the results, it can be concluded that the soft computing models can overcome the complexity of process and be used as an expert system to control, and optimize parameters of coal flotation process.展开更多
Depth estimation of subsurface faults is one of the problems in gravity interpretation. We tried using the support vector classifier (SVC) method in the estimation. Using forward and nonlinear inverse techniques, de...Depth estimation of subsurface faults is one of the problems in gravity interpretation. We tried using the support vector classifier (SVC) method in the estimation. Using forward and nonlinear inverse techniques, detecting the depth of subsurface faults with related error is possible but it is necessary to have an initial guess for the depth and this initial guess usually comes from non-gravity data. We introduce SVC in this paper as one of the tools for estimating the depth of subsurface faults using gravity data. We can suppose that each subsurface fault depth is a class and that SVC is a classification algorithm. To better use the SVC algorithm, we select proper depth estimation features using a proper features selection (FS) algorithm. In this research, we produce a training set consisting of synthetic gravity profiles created by subsurface faults at different depths to train the SVC code to estimate the depth of real subsurface faults. Then we test our trained SVC code by a testing set consisting of other synthetic gravity profiles created by subsurface faults at different depths. We also tested our trained SVC code using real data.展开更多
Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calcu...Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calculate the mutual information of feature subset. The feature relevancy is defined by mutual information. Redundancy-synergy coefficient, a novel redundancy and synergy measure for features to describe the class feature, is defined. In terms of information maximization rule, a bidirectional heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient is presented. This study’s experiments show the good performance of the new method.展开更多
To solve the problems of the AMR-WB+(Extended Adaptive Multi-Rate-WideBand) semi-open-loop coding mode selection algorithm,features for ACELP(Algebraic Code Excited Linear Prediction) and TCX(Transform Coded eXcitatio...To solve the problems of the AMR-WB+(Extended Adaptive Multi-Rate-WideBand) semi-open-loop coding mode selection algorithm,features for ACELP(Algebraic Code Excited Linear Prediction) and TCX(Transform Coded eXcitation) classification are investigated.11 classifying features in the AMR-WB+ codec are selected and 2 novel classifying features,i.e.,EFM(Energy Flatness Measurement) and stdEFM(standard deviation of EFM),are proposed.Consequently,a novel semi-open-loop mode selection algorithm based on EFM and selected AMR-WB+ features is proposed.The results of classifying test and listening test show that the performance of the novel algorithm is much better than that of the AMR-WB+ semi-open-loop coding mode selection algorithm.展开更多
Collaborative f iltering, as one of the most popular techniques, plays an important role in recommendation systems. However,when the user-item rating matrix is sparse,its performance will be degenerate. Recently,domai...Collaborative f iltering, as one of the most popular techniques, plays an important role in recommendation systems. However,when the user-item rating matrix is sparse,its performance will be degenerate. Recently,domain-specific recommendation approaches have been developed to address this problem.The basic idea is to partition the users and items into overlapping domains, and then perform recommendation in each domain independently. Here, a domain means a group of users having similar preference to a group of products. However, these domain-specific methods consisting of two sequential steps ignore the mutual benefi t of domain segmentation and recommendation. Hence, a unified framework is presented to simultaneously realize recommendation and make use of the domain information underlying the rating matrix in this paper. Based on matrix factorization,the proposed model learns both user preferences of multiple domains and preference selection vectors to select relevant features for each group of products. Besides, local context information is utilized from the user-item rating matrix to enhance the new framework.Experimental results on two widely used datasets, e.g., Ciao and Epinions, demonstrate the effectiveness of our proposed model.展开更多
Quantifying forest stand parameters is crucial in forestry research and environmental monitoring because it provides important factors for analyzing forest structure and comprehending forest resources.And the estimati...Quantifying forest stand parameters is crucial in forestry research and environmental monitoring because it provides important factors for analyzing forest structure and comprehending forest resources.And the estimation of crown density and volume has always been a prominent topic in forestry remote sensing.Based on GF-2 remote sensing data,sample plot survey data and forest resource survey data,this study used the Chinese fir(Cunninghamia lanceolata(Lamb.)Hook.)and Pinus massoniana Lamb.as research objects to tackle the key challenges in the use of remote sensing technology.The Boruta feature selection technique,together with multiple stepwise and Cubist regression models,was used to estimate crown density and volume in portions of the research area’s stands,introducing novel technological methods for estimating stand parameters.The results show that:(i)the Boruta algorithm is effective at selecting the feature set with the strongest correlation with the dependent variable,which solves the problem of data and the loss of original feature data after dimensionality reduction;(ii)using the Cubist method to build the model yields better results than using multiple stepwise regression.The Cubist regression model’s coefficient of determination(R^(2))is all more than 0.67 in the Chinese fir plots and 0.63 in the P.massoniana plots.As a result,combining the two methods can increase the estimation accuracy of stand parameters,providing a theoretical foundation and technical support for future studies.展开更多
Rough set theory is an effective method to feature selection, which has recently fascinated many researchers. The essence of rough set approach to feature selection is to find a subset of the original features. It is,...Rough set theory is an effective method to feature selection, which has recently fascinated many researchers. The essence of rough set approach to feature selection is to find a subset of the original features. It is, however, an NP-hard problem finding a minimal subset of the features, and it is necessary to investigate effective and efficient heuristic algorithms. This paper presents a novel rough set approach to feature selection based on scatter search metaheuristic. The proposed method, called scatter search rough set attribute reduction (SSAR), is illustrated by 13 well known datasets from UCI machine learning repository. The proposed heuristic strategy is compared with typical attribute reduction methods including genetic algorithm, ant colony, simulated annealing, and Tabu search. Computational results demonstrate that our algorithm can provide efficient solution to find a minimal subset of the features and show promising and competitive performance on the considered datasets.展开更多
Feature selection is an important approach to dimensionality reduction in the field of text classification. Because of the difficulty in handling the problem that the selected features always contain redundant informa...Feature selection is an important approach to dimensionality reduction in the field of text classification. Because of the difficulty in handling the problem that the selected features always contain redundant information, we propose a new simple feature selection method, which can effectively filter the redundant features. First, to calculate the relationship between two words, the definitions of word frequency based relevance and correlative redundancy are introduced. Furthermore, an optimal feature selection(OFS) method is chosen to obtain a feature subset FS1. Finally, to improve the execution speed, the redundant features in FS1 are filtered by combining a predetermined threshold, and the filtered features are memorized in the linked lists. Experiments are carried out on three datasets(Web KB, 20-Newsgroups, and Reuters-21578) where in support vector machines and na?ve Bayes are used. The results show that the classification accuracy of the proposed method is generally higher than that of typical traditional methods(information gain, improved Gini index, and improved comprehensively measured feature selection) and the OFS methods. Moreover, the proposed method runs faster than typical mutual information-based methods(improved and normalized mutual information-based feature selections, and multilabel feature selection based on maximum dependency and minimum redundancy) while simultaneously ensuring classification accuracy. Statistical results validate the effectiveness of the proposed method in handling redundant information in text classification.展开更多
Aims Sexual dimorphism is a common trait in plants with sex separation,which could influence female and male functions differently.In a subdioecious population of Dasiphora glabra on the Qinghai-Tibet Plateau,we inves...Aims Sexual dimorphism is a common trait in plants with sex separation,which could influence female and male functions differently.In a subdioecious population of Dasiphora glabra on the Qinghai-Tibet Plateau,we investigated sexual dimorphism of floral traits and their effects on pollinator visitation,pollen flow and seed production.We also examined differences in genome size of hermaphroditic and dioecious plants.Methods We examined sexual dimorphism in flower number,flower size,and pollen and ovule production in a subdioecious population of D.glabra.We compared pollinator visitation,pollen dispersal and seed production between sexes.We also examined the genome size of three sex morphs using flow cytometry.Important Findings The number of hermaphroditic plants was significantly more than that of male and female plants,and dioecious plants accounted for ca.40%in the study population.Hermaphroditic plants produced significantly more flowers than male and female plants.Flower size of male flowers was significantly larger than that of female and hermaphroditic flowers.Male flowers did not produce more pollen grains than hermaphroditic flowers,but female flowers produced more ovules than hermaphroditic flowers.Flies were the most frequent flower visitors and preferred large flowers,but their movements between flowers did not show any preference to large flowers.Simulated pollen flows suggested that effective pollen transfer was generally low for both hermaphroditic and male flowers,corresponding to the low seed set of naturally pollinated flowers.DNA contents of male and female plants were ca.four times than those of hermaphroditic plants.These results suggest male and female individuals have undergone polyploidy events and thus are not compatible with hermaphroditic individuals.Sexual dimorphism in floral traits in relation to pollination of dioecious plants might show an advantage in female and male functions,but this advantage is masked largely by low effectiveness of pollen transfer.展开更多
基金Outstanding Youth Foundation of Hunan Provincial Department of Education(Grant No.22B0911)。
文摘In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all useful information across quantiles and can detect nonlinear effects including interactions and heterogeneity,effectively.Furthermore,the proposed screening method based on cCCQC is robust to the existence of outliers and enjoys the sure screening property.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors,particularly when the variables are highly correlated.
基金Supported by Chinese National Natural Science Foundation(30960008 )Educational Commission of Guangxi Province(200810LX393)+2 种基金Starting Project of Yulin Normal College,Guangxi ProvinceSpecialized Research Project of Yulin Normal College,Guangxi Province (2011YJZX01)Project Supported by the Science Foundation for Young Scientists of Guangxi Yulin Normal College (2010YJQN24)~~
文摘[Objective] The aim was to separate chromium-resistant microorganism from soil contaminated by chromium.[Method] Separation and purification technique was used as follows:different concentrations of Cr^6+ were added into medium,and chromium-resistant fungi were screened after separations and domestications.The selected fungi were under preliminary identification according to its morphological and colony characteristics.Then,related biological characteristics were studied,including measurement of growth curve,growing effects by temperature,pH value and osmotic pressure.[Result] The Cr(VI) with concentration of 1 000 mg/L was separated and selected from soils in ten different places contaminated seriously by heavy metal in adjacent region of Yulin City.Considering its morphological and colony characteristics,it was preliminarily identified as saccharomycetes,which can well grow within 15-37 ℃,and whose most suitable temperature was 28℃.Bacterial strain can grow well with pH of 4-10,and the optimum pH was 7.2;besides,it can grow well with NaCl concentration of 0.5%-5.0%.Through the experiment,the bacteria was found with resistance not only to chromium,but also to heavy metals such as Pb+Cu,Cu+Fe,Pb+Fe,and Pb+Cu+Fe.[Conclusion] The fungi selected from the experiment were of good adaptability to natural environment,and it also had resistance to other heavy metals.
基金supported partly by grants from the Ministry of Science and Technology of China(2005DKA21502)the Joint Foundation of Science and Technology Bureau of Yunnan Province and Kunming Medical University(2007C0024R)
文摘Lung cancer is a leading cause of cancer death worldwide. Some lung cancer patients correlate with a gas of radon besides smoking. To search for common chromosomal aberrations in lung cancer cell lines established from patients induced by different factors, a combined approach of chromosome sorting, forward and reverse chromosome painting was used to characterize karyotypes of two lung adenocarcinoma cell lines: A549 and GLC-82 with the latter line derived from a patient who has suffered long-term exposure to environmental radon gas pollution. The chromosome painting results revealed that complex chromosomal rearrangements occurred in these two lung adenocarcinoma cell lines. Thirteen and twenty-four abnormal chromosomes were identified An A549 and GLC-82 cell lines, respectively. Almost half of abnormal chromosomes in these two cell lines were formed by non-reciprocal translocations, the others were derived from deletions and duplication/or amplification in some chromosomal regions. Furthermore, two apparently common breakpoints, HSA8q24 and 12q14 were found in these two lung cancer cell lines.
基金Supported Partially by National Science Foundation of China (No.10571109,60573018)Science & Technology Research Project of Shanxi Higher-Education (No.20051277)
文摘The model of optimization problem for Support Vector Machine(SVM) is provided, which based on the definitions of the dual norm and the distance between a point and its projection onto a given plane. The model of improved Support Vector Machine based on 1-norm(1-SVM) is provided from the optimization problem, yet it is a discrete programming. With the smoothing technique and optimality knowledge, the discrete programming is changed into a continuous programming. Experimental results show that the algorithm is easy to implement and this method can select and suppress the problem features more efficiently. Illustrative examples show that the 1-SVM deal with the linear or nonlinear classification well.
文摘Objective: To develop a new bioinformatic tool based on a data-mining approach for extraction of the most infor- mative proteins that could be used to find the potential biomarkers for the detection of cancer. Methods: Two independent datasets from serum samples of 253 ovarian cancer and 167 breast cancer patients were used. The samples were examined by surface- enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). The datasets were used to extract the informative proteins using a data-mining method in the discrete stationary wavelet transform domain. As a dimensionality re- duction procedure, the hard thresholding method was applied to reduce the number of wavelet coefficients. Also, a distance measure was used to select the most discriminative coefficients. To find the potential biomarkers using the selected wavelet coefficients, we applied the inverse discrete stationary wavelet transform combined with a two-sided t-test. Results: From the ovarian cancer dataset, a set of five proteins were detected as potential biomarkers that could be used to identify the cancer patients from the healthy cases with accuracy, sensitivity, and specificity of 100%. Also, from the breast cancer dataset, a set of eight proteins were found as the potential biomarkers that could separate the healthy cases from the cancer patients with accuracy of 98.26%, sensitivity of 100%, and specificity of 95.6%. Conclusion: The results have shown that the new bioinformatic tool can be used in combination with the high-throughput proteomic data such as SELDI-TOF MS to find the potential biomarkers with high discriminative power.
文摘Flotation is a complex multifaceted process that is widely used for the separation of finely ground minerals. The theory of froth flotation is complex and is not completely understood. This fact has been brought many monitoring challenges in a coal processing plant. To solve those challenges, it is important to understand the effect of different parameters on the fine particle separation, and control flotation performance for a particular system. This study is going to indicate the effect of various parameters (particle Characteristics and hydrodynamic conditions) on coal flotation responses (flotation rate constant and recovery) by different modeling techniques. A comprehensive coal flotation database was prepared for the statistical and soft computing methods. Statistical factors were used for variable selections. Results were in a good agreement with recent theoretical flotation investigations. Computational models accurately can estimate flotation rate constant and coal recovery (correlation coefficient 0.85, and 0.99, respectively). According to the results, it can be concluded that the soft computing models can overcome the complexity of process and be used as an expert system to control, and optimize parameters of coal flotation process.
文摘Depth estimation of subsurface faults is one of the problems in gravity interpretation. We tried using the support vector classifier (SVC) method in the estimation. Using forward and nonlinear inverse techniques, detecting the depth of subsurface faults with related error is possible but it is necessary to have an initial guess for the depth and this initial guess usually comes from non-gravity data. We introduce SVC in this paper as one of the tools for estimating the depth of subsurface faults using gravity data. We can suppose that each subsurface fault depth is a class and that SVC is a classification algorithm. To better use the SVC algorithm, we select proper depth estimation features using a proper features selection (FS) algorithm. In this research, we produce a training set consisting of synthetic gravity profiles created by subsurface faults at different depths to train the SVC code to estimate the depth of real subsurface faults. Then we test our trained SVC code by a testing set consisting of other synthetic gravity profiles created by subsurface faults at different depths. We also tested our trained SVC code using real data.
文摘Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calculate the mutual information of feature subset. The feature relevancy is defined by mutual information. Redundancy-synergy coefficient, a novel redundancy and synergy measure for features to describe the class feature, is defined. In terms of information maximization rule, a bidirectional heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient is presented. This study’s experiments show the good performance of the new method.
文摘To solve the problems of the AMR-WB+(Extended Adaptive Multi-Rate-WideBand) semi-open-loop coding mode selection algorithm,features for ACELP(Algebraic Code Excited Linear Prediction) and TCX(Transform Coded eXcitation) classification are investigated.11 classifying features in the AMR-WB+ codec are selected and 2 novel classifying features,i.e.,EFM(Energy Flatness Measurement) and stdEFM(standard deviation of EFM),are proposed.Consequently,a novel semi-open-loop mode selection algorithm based on EFM and selected AMR-WB+ features is proposed.The results of classifying test and listening test show that the performance of the novel algorithm is much better than that of the AMR-WB+ semi-open-loop coding mode selection algorithm.
基金supported in part by the Humanity&Social Science general project of Ministry of Education under Grants No.14YJAZH046National Science Foundation of China under Grants No.61402304the Beijing Educational Committee Science and Technology Development Planned under Grants No.KM201610028015
文摘Collaborative f iltering, as one of the most popular techniques, plays an important role in recommendation systems. However,when the user-item rating matrix is sparse,its performance will be degenerate. Recently,domain-specific recommendation approaches have been developed to address this problem.The basic idea is to partition the users and items into overlapping domains, and then perform recommendation in each domain independently. Here, a domain means a group of users having similar preference to a group of products. However, these domain-specific methods consisting of two sequential steps ignore the mutual benefi t of domain segmentation and recommendation. Hence, a unified framework is presented to simultaneously realize recommendation and make use of the domain information underlying the rating matrix in this paper. Based on matrix factorization,the proposed model learns both user preferences of multiple domains and preference selection vectors to select relevant features for each group of products. Besides, local context information is utilized from the user-item rating matrix to enhance the new framework.Experimental results on two widely used datasets, e.g., Ciao and Epinions, demonstrate the effectiveness of our proposed model.
基金supported by the project of the National Technology Extension Fund of Forestry,‘Forest Vegetation Carbon Storage Monitoring Technology Based on Watershed Algorithm’([2019]06)the National Natural Science Foundation of China,‘Study on Crown Models for Larix olgensis Based on Tree Growth’(31870620).
文摘Quantifying forest stand parameters is crucial in forestry research and environmental monitoring because it provides important factors for analyzing forest structure and comprehending forest resources.And the estimation of crown density and volume has always been a prominent topic in forestry remote sensing.Based on GF-2 remote sensing data,sample plot survey data and forest resource survey data,this study used the Chinese fir(Cunninghamia lanceolata(Lamb.)Hook.)and Pinus massoniana Lamb.as research objects to tackle the key challenges in the use of remote sensing technology.The Boruta feature selection technique,together with multiple stepwise and Cubist regression models,was used to estimate crown density and volume in portions of the research area’s stands,introducing novel technological methods for estimating stand parameters.The results show that:(i)the Boruta algorithm is effective at selecting the feature set with the strongest correlation with the dependent variable,which solves the problem of data and the loss of original feature data after dimensionality reduction;(ii)using the Cubist method to build the model yields better results than using multiple stepwise regression.The Cubist regression model’s coefficient of determination(R^(2))is all more than 0.67 in the Chinese fir plots and 0.63 in the P.massoniana plots.As a result,combining the two methods can increase the estimation accuracy of stand parameters,providing a theoretical foundation and technical support for future studies.
基金supported by the National Natural Science Foundation of China under Grant Nos.71271202 and 70801058
文摘Rough set theory is an effective method to feature selection, which has recently fascinated many researchers. The essence of rough set approach to feature selection is to find a subset of the original features. It is, however, an NP-hard problem finding a minimal subset of the features, and it is necessary to investigate effective and efficient heuristic algorithms. This paper presents a novel rough set approach to feature selection based on scatter search metaheuristic. The proposed method, called scatter search rough set attribute reduction (SSAR), is illustrated by 13 well known datasets from UCI machine learning repository. The proposed heuristic strategy is compared with typical attribute reduction methods including genetic algorithm, ant colony, simulated annealing, and Tabu search. Computational results demonstrate that our algorithm can provide efficient solution to find a minimal subset of the features and show promising and competitive performance on the considered datasets.
基金Project supported by the Joint Funds of the National Natural Science Foundation of China(No.U1509214)the Beijing Natural Science Foundation,China(No.4174105)+1 种基金the Key Projects of National Bureau of Statistics of China(No.2017LZ05)the Discipline Construction Foundation of the Central University of Finance and Economics,China(No.2016XX02)
文摘Feature selection is an important approach to dimensionality reduction in the field of text classification. Because of the difficulty in handling the problem that the selected features always contain redundant information, we propose a new simple feature selection method, which can effectively filter the redundant features. First, to calculate the relationship between two words, the definitions of word frequency based relevance and correlative redundancy are introduced. Furthermore, an optimal feature selection(OFS) method is chosen to obtain a feature subset FS1. Finally, to improve the execution speed, the redundant features in FS1 are filtered by combining a predetermined threshold, and the filtered features are memorized in the linked lists. Experiments are carried out on three datasets(Web KB, 20-Newsgroups, and Reuters-21578) where in support vector machines and na?ve Bayes are used. The results show that the classification accuracy of the proposed method is generally higher than that of typical traditional methods(information gain, improved Gini index, and improved comprehensively measured feature selection) and the OFS methods. Moreover, the proposed method runs faster than typical mutual information-based methods(improved and normalized mutual information-based feature selections, and multilabel feature selection based on maximum dependency and minimum redundancy) while simultaneously ensuring classification accuracy. Statistical results validate the effectiveness of the proposed method in handling redundant information in text classification.
基金by Natural Science Foundation of China(31570385).
文摘Aims Sexual dimorphism is a common trait in plants with sex separation,which could influence female and male functions differently.In a subdioecious population of Dasiphora glabra on the Qinghai-Tibet Plateau,we investigated sexual dimorphism of floral traits and their effects on pollinator visitation,pollen flow and seed production.We also examined differences in genome size of hermaphroditic and dioecious plants.Methods We examined sexual dimorphism in flower number,flower size,and pollen and ovule production in a subdioecious population of D.glabra.We compared pollinator visitation,pollen dispersal and seed production between sexes.We also examined the genome size of three sex morphs using flow cytometry.Important Findings The number of hermaphroditic plants was significantly more than that of male and female plants,and dioecious plants accounted for ca.40%in the study population.Hermaphroditic plants produced significantly more flowers than male and female plants.Flower size of male flowers was significantly larger than that of female and hermaphroditic flowers.Male flowers did not produce more pollen grains than hermaphroditic flowers,but female flowers produced more ovules than hermaphroditic flowers.Flies were the most frequent flower visitors and preferred large flowers,but their movements between flowers did not show any preference to large flowers.Simulated pollen flows suggested that effective pollen transfer was generally low for both hermaphroditic and male flowers,corresponding to the low seed set of naturally pollinated flowers.DNA contents of male and female plants were ca.four times than those of hermaphroditic plants.These results suggest male and female individuals have undergone polyploidy events and thus are not compatible with hermaphroditic individuals.Sexual dimorphism in floral traits in relation to pollination of dioecious plants might show an advantage in female and male functions,but this advantage is masked largely by low effectiveness of pollen transfer.