Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calcu...Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calculate the mutual information of feature subset. The feature relevancy is defined by mutual information. Redundancy-synergy coefficient, a novel redundancy and synergy measure for features to describe the class feature, is defined. In terms of information maximization rule, a bidirectional heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient is presented. This study’s experiments show the good performance of the new method.展开更多
In recent years,there have been important developments in the joint analysis of the travel behavior based on discrete choice models as well as in the formulation of increasingly flexible closed-form models belonging t...In recent years,there have been important developments in the joint analysis of the travel behavior based on discrete choice models as well as in the formulation of increasingly flexible closed-form models belonging to the generalized extreme value class.The objective of this work is to describe the simultaneous choice of shopping destination and travel-to-shop mode in downtown area by making use of the cross-nested logit(CNL) structure that allows for potential spatial correlation.The analysis uses data collected in the downtown areas of Maryland-Washington,D.C.region for shopping trips,considering household,individual,land use,and travel-related characteristics.The estimation results show that the dissimilarity parameter in the CNL model is 0.37 and significant at the 95% level,indicating that the alternatives have high spatial correlation for the short shopping distance.The results of analysis reveal detailed significant influences on travel behavior of joint choice shopping destination and travel mode.Moreover,a Monte Carlo simulation for a group of scenarios arising from transportation policies and parking fees in downtown area,was undertaken to examine the impact of a change in car travel cost on the shopping destination and travel mode switching.These findings have important implications for transportation demand management and urban planning.展开更多
A SVMs (Support Vector Machines) based method to identify Chinese place names is presented. In our approach, place name candidate is located according to a rational forming assumption, then SVMs based identification s...A SVMs (Support Vector Machines) based method to identify Chinese place names is presented. In our approach, place name candidate is located according to a rational forming assumption, then SVMs based identification strategy is used to distinguish whether one candidate is true place name or not. Referring to linguistic knowledge, basic semanteme of a contextual word and frequency information of words inside place name candidate are selected as features in our methodology. So dimension in the feature space is reduced dramatically and processing procedure is performed more efficiently. Result of open testing on unregistered place names achieves F-measure 83.25 in 8.17 million words news based on this project.展开更多
Performance characteristics data of solar photovoltaic (PV) cell/module are conventionally obtained under standard testing conditions. In the present work, the performance of PV modules under extreme temperatures an...Performance characteristics data of solar photovoltaic (PV) cell/module are conventionally obtained under standard testing conditions. In the present work, the performance of PV modules under extreme temperatures and insolations experienced in State of Qatar was utilized to aid presenting a simplified characterization approach for the special case of arid environmental conditions. The chosen model was the well-known single diode model with both series and parallel resistors for greater accuracy. The modeling technique was validated by comparing the numerically calculated electrical characteristics with experimentally obtained data using two approaches: a single indoor fixed monocrystalline PV module inside a solar simulation chamber which physically simulated different weather scenarios by changing irradiation intensities and temperature, and a set of outdoor fixed polycrystalline PV modules. The result of the indoor experiment was presented in form of performance curves, and the outdoor experiment results in a monthly accumulated power production chart format. Both illustration types showed acceptable tolerance.展开更多
This paper studies variable selection problem in structural equation of a two-stage least squares (2SLS) model in presence of endogeneity which is commonly encountered in empirical economic studies. Model uncertaint...This paper studies variable selection problem in structural equation of a two-stage least squares (2SLS) model in presence of endogeneity which is commonly encountered in empirical economic studies. Model uncertainty and variable selection in the structural equation is an important issue as described in Andrews and Lu (2001) and Caner (2009). The authors propose an adaptive Lasso 2SLS estimator for linear structural equation with endogeneity and show that it enjoys the oracle properties, i.e., the consistency in both estimation and model selection. In Monte Carlo simulations, the authors demonstrate that the proposed estimator has smaller bias and MSE compared with the bridge-type GMM estimator (Caner, 2009). In a case study, the authors revisit the classic returns to education problem (Angrist and Krueger, 1991) using the China Population census data. The authors find that the education level not only has strong effects on income but also shows heterogeneity in different age cohorts.展开更多
Quantifying forest stand parameters is crucial in forestry research and environmental monitoring because it provides important factors for analyzing forest structure and comprehending forest resources.And the estimati...Quantifying forest stand parameters is crucial in forestry research and environmental monitoring because it provides important factors for analyzing forest structure and comprehending forest resources.And the estimation of crown density and volume has always been a prominent topic in forestry remote sensing.Based on GF-2 remote sensing data,sample plot survey data and forest resource survey data,this study used the Chinese fir(Cunninghamia lanceolata(Lamb.)Hook.)and Pinus massoniana Lamb.as research objects to tackle the key challenges in the use of remote sensing technology.The Boruta feature selection technique,together with multiple stepwise and Cubist regression models,was used to estimate crown density and volume in portions of the research area’s stands,introducing novel technological methods for estimating stand parameters.The results show that:(i)the Boruta algorithm is effective at selecting the feature set with the strongest correlation with the dependent variable,which solves the problem of data and the loss of original feature data after dimensionality reduction;(ii)using the Cubist method to build the model yields better results than using multiple stepwise regression.The Cubist regression model’s coefficient of determination(R^(2))is all more than 0.67 in the Chinese fir plots and 0.63 in the P.massoniana plots.As a result,combining the two methods can increase the estimation accuracy of stand parameters,providing a theoretical foundation and technical support for future studies.展开更多
文摘Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calculate the mutual information of feature subset. The feature relevancy is defined by mutual information. Redundancy-synergy coefficient, a novel redundancy and synergy measure for features to describe the class feature, is defined. In terms of information maximization rule, a bidirectional heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient is presented. This study’s experiments show the good performance of the new method.
基金Projects(JCYJ20120615145601342,JCYJ20130325151523015)supported by Shenzhen Science and Technology Development Funding-Fundamental Research Plan,ChinaProject(2013U-6)supported by Key Laboratory of Eco Planning & Green Building,Ministry of Education(Tsinghua University),China
文摘In recent years,there have been important developments in the joint analysis of the travel behavior based on discrete choice models as well as in the formulation of increasingly flexible closed-form models belonging to the generalized extreme value class.The objective of this work is to describe the simultaneous choice of shopping destination and travel-to-shop mode in downtown area by making use of the cross-nested logit(CNL) structure that allows for potential spatial correlation.The analysis uses data collected in the downtown areas of Maryland-Washington,D.C.region for shopping trips,considering household,individual,land use,and travel-related characteristics.The estimation results show that the dissimilarity parameter in the CNL model is 0.37 and significant at the 95% level,indicating that the alternatives have high spatial correlation for the short shopping distance.The results of analysis reveal detailed significant influences on travel behavior of joint choice shopping destination and travel mode.Moreover,a Monte Carlo simulation for a group of scenarios arising from transportation policies and parking fees in downtown area,was undertaken to examine the impact of a change in car travel cost on the shopping destination and travel mode switching.These findings have important implications for transportation demand management and urban planning.
基金Foundation of China(Grant No.60175020and60673037) and the National High Technology Research and Development Program of China (Grant No.2002AA117010-09).
文摘A SVMs (Support Vector Machines) based method to identify Chinese place names is presented. In our approach, place name candidate is located according to a rational forming assumption, then SVMs based identification strategy is used to distinguish whether one candidate is true place name or not. Referring to linguistic knowledge, basic semanteme of a contextual word and frequency information of words inside place name candidate are selected as features in our methodology. So dimension in the feature space is reduced dramatically and processing procedure is performed more efficiently. Result of open testing on unregistered place names achieves F-measure 83.25 in 8.17 million words news based on this project.
文摘Performance characteristics data of solar photovoltaic (PV) cell/module are conventionally obtained under standard testing conditions. In the present work, the performance of PV modules under extreme temperatures and insolations experienced in State of Qatar was utilized to aid presenting a simplified characterization approach for the special case of arid environmental conditions. The chosen model was the well-known single diode model with both series and parallel resistors for greater accuracy. The modeling technique was validated by comparing the numerically calculated electrical characteristics with experimentally obtained data using two approaches: a single indoor fixed monocrystalline PV module inside a solar simulation chamber which physically simulated different weather scenarios by changing irradiation intensities and temperature, and a set of outdoor fixed polycrystalline PV modules. The result of the indoor experiment was presented in form of performance curves, and the outdoor experiment results in a monthly accumulated power production chart format. Both illustration types showed acceptable tolerance.
基金Fan’s research was supported by the National Natural Science Foundation of China under Grant No.71671149the Fundamental Research Funds for the Central Universities under Grant No.20720171042+1 种基金the Natural Science Foundation of Fujian Province of China under Grant No.2016J01340Zhong’s research was supported by the National Natural Science Foundation of China under Grant Nos.11671334,11301435,and 11401497
文摘This paper studies variable selection problem in structural equation of a two-stage least squares (2SLS) model in presence of endogeneity which is commonly encountered in empirical economic studies. Model uncertainty and variable selection in the structural equation is an important issue as described in Andrews and Lu (2001) and Caner (2009). The authors propose an adaptive Lasso 2SLS estimator for linear structural equation with endogeneity and show that it enjoys the oracle properties, i.e., the consistency in both estimation and model selection. In Monte Carlo simulations, the authors demonstrate that the proposed estimator has smaller bias and MSE compared with the bridge-type GMM estimator (Caner, 2009). In a case study, the authors revisit the classic returns to education problem (Angrist and Krueger, 1991) using the China Population census data. The authors find that the education level not only has strong effects on income but also shows heterogeneity in different age cohorts.
基金supported by the project of the National Technology Extension Fund of Forestry,‘Forest Vegetation Carbon Storage Monitoring Technology Based on Watershed Algorithm’([2019]06)the National Natural Science Foundation of China,‘Study on Crown Models for Larix olgensis Based on Tree Growth’(31870620).
文摘Quantifying forest stand parameters is crucial in forestry research and environmental monitoring because it provides important factors for analyzing forest structure and comprehending forest resources.And the estimation of crown density and volume has always been a prominent topic in forestry remote sensing.Based on GF-2 remote sensing data,sample plot survey data and forest resource survey data,this study used the Chinese fir(Cunninghamia lanceolata(Lamb.)Hook.)and Pinus massoniana Lamb.as research objects to tackle the key challenges in the use of remote sensing technology.The Boruta feature selection technique,together with multiple stepwise and Cubist regression models,was used to estimate crown density and volume in portions of the research area’s stands,introducing novel technological methods for estimating stand parameters.The results show that:(i)the Boruta algorithm is effective at selecting the feature set with the strongest correlation with the dependent variable,which solves the problem of data and the loss of original feature data after dimensionality reduction;(ii)using the Cubist method to build the model yields better results than using multiple stepwise regression.The Cubist regression model’s coefficient of determination(R^(2))is all more than 0.67 in the Chinese fir plots and 0.63 in the P.massoniana plots.As a result,combining the two methods can increase the estimation accuracy of stand parameters,providing a theoretical foundation and technical support for future studies.