Literature review indicates that sample size, attribute variance and within-sample choice distribution of alternatives are important considerations in the estimation of multinomial logit (MNL) models, but their impa...Literature review indicates that sample size, attribute variance and within-sample choice distribution of alternatives are important considerations in the estimation of multinomial logit (MNL) models, but their impacts on the estimation accuracy have not been systematically studied. Therefore, the objective of this paper is to provide an empirical examination to the above issues through a set of simulated discrete choice preference and rank ordered preference datasets. In this paper, the utility coefficients, alternative specific constants (ASCs), and the mean and standard deviation of the four attributes for a set of seven hypothetical alternatives are specified as a priori. Then, synthetic datasets, with varying sample size, attribute variance and within-sample choice distribution are simulated. Based on these datasets, the utility coefficients and ASCs of the specified MNLs are re-estimated and compared with the original values specified as the priori. It is found that (1) the estimation accuracy of utility parameters increases as the sample size increases; (2) the utility coefficients can be re-estimated with reasonable accuracy, but the estimates of the ASCs are confronted with much larger errors; (3) as the variances of the alternative attributes increase, the estimation accuracy improves significantly; and (4) as the distribution of chosen choices becomes more balanced across alternatives within sample datasets, the hit-ratio decreases. The results indicate that (a) under a similar setting presented in this paper, a large sample consisting of a few thousand observations (3000 - 4000) may be needed in order to provide reasonable estimates for utility coefficients, particularly for ASCs; (b) a larger, but realistic attribute space is preferred in the stated preference survey design; and (c) choice datasets with unbalanced "chosen" choice frequency distribution is preferred, in order to better capture the elasticity between the "perceived utility" associated with alternative's attributes.展开更多
文摘Literature review indicates that sample size, attribute variance and within-sample choice distribution of alternatives are important considerations in the estimation of multinomial logit (MNL) models, but their impacts on the estimation accuracy have not been systematically studied. Therefore, the objective of this paper is to provide an empirical examination to the above issues through a set of simulated discrete choice preference and rank ordered preference datasets. In this paper, the utility coefficients, alternative specific constants (ASCs), and the mean and standard deviation of the four attributes for a set of seven hypothetical alternatives are specified as a priori. Then, synthetic datasets, with varying sample size, attribute variance and within-sample choice distribution are simulated. Based on these datasets, the utility coefficients and ASCs of the specified MNLs are re-estimated and compared with the original values specified as the priori. It is found that (1) the estimation accuracy of utility parameters increases as the sample size increases; (2) the utility coefficients can be re-estimated with reasonable accuracy, but the estimates of the ASCs are confronted with much larger errors; (3) as the variances of the alternative attributes increase, the estimation accuracy improves significantly; and (4) as the distribution of chosen choices becomes more balanced across alternatives within sample datasets, the hit-ratio decreases. The results indicate that (a) under a similar setting presented in this paper, a large sample consisting of a few thousand observations (3000 - 4000) may be needed in order to provide reasonable estimates for utility coefficients, particularly for ASCs; (b) a larger, but realistic attribute space is preferred in the stated preference survey design; and (c) choice datasets with unbalanced "chosen" choice frequency distribution is preferred, in order to better capture the elasticity between the "perceived utility" associated with alternative's attributes.