摘要
为克服ID3算法应用于字音转换时,运算速度慢、易受数据稀疏问题影响的缺点,提出了一种面向字音转换的新决策树算法"有条件维数扩展算法"(conditional mixincrementing algorithm,CMI)。在ID3的基础上,CMI使用先验发音学知识指导下的互信息量方法选择决策属性,并引入2个参量,最小可信度与最大支持数,控制叶子节点。实验结果表明,CMI简化了运算过程,降低了稀疏数据对所生成决策树预测性能的影响。相同实验条件下,CMI在运算速度上比ID3提高了3.3倍,在决策树的预测正确率上提高了11.6%。
A decision tree algorithm for grapheme-to-phoneme conversion, the conditional mix incrementing algorithm (CMI), was developed to improve the slow computational speed and susceptibility to poor data of the ID3 algorithm when used for grapheme-to-phoneme conversion. The algorithm chooses test attributes using information gain guided by prior pronunciation knowledge with the two concepts of minimum confidence (Minconf) and maximum support (maxsup) applied to the control nodes. Test results show that the algorithm simplifies the computations and reduces the impact of poor data on the prediction capability of the resulting decision tree. Tests show that the algorithm is 3.3 times faster than the ID3 algorithm and gives 11, 6% better in prediction accuracy for the tree.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2008年第10期1629-1631,共3页
Journal of Tsinghua University(Science and Technology)
基金
北京市科技计划项目(Y0105008040111)
关键词
决策树
字音转换
ID3算法
CMI算法
decision tree
grapheme-to phoneme conversion
ID3 algorithms CMI (conditional mix incrementing algorithm)