为了解决不均衡数据集的分类问题和一般的代价敏感学习算法无法扩展到多分类情况的问题,提出了一种基于 K 最近邻( K NN)样本平均距离的代价敏感算法的集成方法。首先,根据最大化最小间隔的思想提出一种降低决策边界样本密度的重采样方...为了解决不均衡数据集的分类问题和一般的代价敏感学习算法无法扩展到多分类情况的问题,提出了一种基于 K 最近邻( K NN)样本平均距离的代价敏感算法的集成方法。首先,根据最大化最小间隔的思想提出一种降低决策边界样本密度的重采样方法;接着,采用每类样本的平均距离作为分类结果的判断依据,并提出一种符合贝叶斯决策理论的学习算法,使得改进后的算法具备代价敏感性;最后,对改进后的代价敏感算法按 K 值进行集成,以代价最小为原则,调整各基学习器的权重,得到一个以总体误分代价最低为目标的代价敏感AdaBoost算法。实验结果表明,与传统的 K NN算法相比,改进后的算法在平均误分代价上下降了31.4个百分点,并且代价敏感性能更好。展开更多
Aimed at the problem of expensive costs in mutation testing which has hampered its wide use,a technique of introducing a test case selection into the process of mutation testing is proposed.For each mutant,a fixed num...Aimed at the problem of expensive costs in mutation testing which has hampered its wide use,a technique of introducing a test case selection into the process of mutation testing is proposed.For each mutant,a fixed number of test cases are selected to constrain the maximum allowable executions so as to reduce useless work.Test case selection largely depends on the degree of mutation.The mutation distance is an index describing the semantic difference between the original program and the mutated program.It represents the percentage of effective test cases in a test set,so it can be used to guide the selection of test cases.The bigger the mutation distance is,the easier it is that the mutant will be killed,so the corresponding number of effective test cases for this mutant is greater.Experimental results suggest that the technique can remarkably reduce execution costs without a significant loss of test effectiveness.展开更多
The primary objective of this work is to explore how drivers react to flashing green at signalized intersections. Through video taping and data procession based on photogrammetry, the operating speeds of vehicles befo...The primary objective of this work is to explore how drivers react to flashing green at signalized intersections. Through video taping and data procession based on photogrammetry, the operating speeds of vehicles before and after the moment when flashing green started was compared using paired-samples T-test. The critical distances between go and stop decisions was defined through cumulative percentage curve. The boundary of dilemma zone was determined by comparing stop distance and travel distance.Amber-running violation was analyzed on the basis of the travel time to the stop line. And finally, a logistic model for stop and go decisions was constructed. The results shows that the stopping ratios of the first vehicles of west-bound and east-bound approaches are 41.3% and 39.8%, respectively; the amber-light running violation ratios of two approaches are 31.6% and 25.4%, respectively;the operating speed growth ratios of first vehicles selecting to cross intersection after the moment when flashing green started are26.7% and 17.7%, respectively; and the critical distances are 48 m and 46 m, respectively, which are close to 44 m, the boundary of dilemma zone. The developed decision models demonstrate that the probability of go decision is higher when the distance from the stop line is shorter or operating speed is higher. This indicates that flashing green is an effective way to enhance intersection safety,but it should work together with a strict enforcement. In addition, traffic signs near critical distance and reasonable speed limitation are also beneficial to the safety of intersections.展开更多
This paper considers the problem of estimating the finite population total in two-phase sampling when some information on auxiliary variable is available. The authors employ an informationtheoretic approach which make...This paper considers the problem of estimating the finite population total in two-phase sampling when some information on auxiliary variable is available. The authors employ an informationtheoretic approach which makes use of effective distance between the estimated probabilities and the empirical frequencies. It is shown that the proposed cross-entropy minimization estimator is more efficient than the usual estimator and has some desirable large sample properties. With some necessary modifications, the method can be applied to two-phase sampling for stratification and non-response. A simulation study is presented to assess the finite sample performance of the proposed estimator.展开更多
文摘为了解决不均衡数据集的分类问题和一般的代价敏感学习算法无法扩展到多分类情况的问题,提出了一种基于 K 最近邻( K NN)样本平均距离的代价敏感算法的集成方法。首先,根据最大化最小间隔的思想提出一种降低决策边界样本密度的重采样方法;接着,采用每类样本的平均距离作为分类结果的判断依据,并提出一种符合贝叶斯决策理论的学习算法,使得改进后的算法具备代价敏感性;最后,对改进后的代价敏感算法按 K 值进行集成,以代价最小为原则,调整各基学习器的权重,得到一个以总体误分代价最低为目标的代价敏感AdaBoost算法。实验结果表明,与传统的 K NN算法相比,改进后的算法在平均误分代价上下降了31.4个百分点,并且代价敏感性能更好。
基金The National High Technology Research and Development Program of China (863 Program) (No. 2008AA01Z113)the National Natural Science Foundation of China (No. 60773105,60973149)
文摘Aimed at the problem of expensive costs in mutation testing which has hampered its wide use,a technique of introducing a test case selection into the process of mutation testing is proposed.For each mutant,a fixed number of test cases are selected to constrain the maximum allowable executions so as to reduce useless work.Test case selection largely depends on the degree of mutation.The mutation distance is an index describing the semantic difference between the original program and the mutated program.It represents the percentage of effective test cases in a test set,so it can be used to guide the selection of test cases.The bigger the mutation distance is,the easier it is that the mutant will be killed,so the corresponding number of effective test cases for this mutant is greater.Experimental results suggest that the technique can remarkably reduce execution costs without a significant loss of test effectiveness.
基金Project(51208451)supported by the National Natural Science Foundation of ChinaProject(10KJB580004)supported by the Natural Science Foundation for Colleges and Universities of Jiangsu Province,ChinaProject supported by the New Century Talents Project of Yangzhou University,China
文摘The primary objective of this work is to explore how drivers react to flashing green at signalized intersections. Through video taping and data procession based on photogrammetry, the operating speeds of vehicles before and after the moment when flashing green started was compared using paired-samples T-test. The critical distances between go and stop decisions was defined through cumulative percentage curve. The boundary of dilemma zone was determined by comparing stop distance and travel distance.Amber-running violation was analyzed on the basis of the travel time to the stop line. And finally, a logistic model for stop and go decisions was constructed. The results shows that the stopping ratios of the first vehicles of west-bound and east-bound approaches are 41.3% and 39.8%, respectively; the amber-light running violation ratios of two approaches are 31.6% and 25.4%, respectively;the operating speed growth ratios of first vehicles selecting to cross intersection after the moment when flashing green started are26.7% and 17.7%, respectively; and the critical distances are 48 m and 46 m, respectively, which are close to 44 m, the boundary of dilemma zone. The developed decision models demonstrate that the probability of go decision is higher when the distance from the stop line is shorter or operating speed is higher. This indicates that flashing green is an effective way to enhance intersection safety,but it should work together with a strict enforcement. In addition, traffic signs near critical distance and reasonable speed limitation are also beneficial to the safety of intersections.
基金supported by the National Natural Science Foundation of China under Grant No.61070236
文摘This paper considers the problem of estimating the finite population total in two-phase sampling when some information on auxiliary variable is available. The authors employ an informationtheoretic approach which makes use of effective distance between the estimated probabilities and the empirical frequencies. It is shown that the proposed cross-entropy minimization estimator is more efficient than the usual estimator and has some desirable large sample properties. With some necessary modifications, the method can be applied to two-phase sampling for stratification and non-response. A simulation study is presented to assess the finite sample performance of the proposed estimator.