摘要
为了解决汉语方言模型设计较为单一的问题,提高方言辨识的效率,提出了一种基于联合多样性密度的汉语方言辨识方法。多样性密度算法是多示例学习中的一种经典算法,联合多样性密度算法是对其的改进应用。该方法首先将方言进行预分类为多个小类,然后将各小类方言进行多示例包生成,并通过期望最大多样性密度算法进行多示例学习,得到的多个多样性密度点作为方言的多示例模型,最后提出平均最近距离算法进行模式分类。该方法在训练模型时得到的方言模型更为全面、完整,在模式分类时考虑了未知包中每个示例的影响,提高了辨识系统的效率。
In order to solve the problem that designing Chinese dialect model singly and improve the performance of dialect identification, an approach of Chinese dialect identification based on combination diverse density is presented. Diverse density is a classical algorithm of multi-instance learning. Combination diverse density is a improved application algorithm based on it. The new method firstly pre-classify one kind dialect into several little classes. Secondly generate every little class into multi-instance bags. Then use EM-DD for multi-instance learning and get various diverse density points as a dialect model. Finally put forward average recent distance algorithm for classification. The method can get a complete and full model in training part, and consider the influence of every instance in unseen bags in pattern classification part. Finally the efficiency of the system is improved.
出处
《计算机工程与应用》
CSCD
北大核心
2016年第10期161-166,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.61040053)
江苏省普通高校研究生科研创新计划项目(No.CXZZ12_0977)
关键词
汉语方言辨识
多示例学习
多样性密度
K近邻
平均最近距离
Chinese dialect identification
multi-instance learning
diverse density
k-means
average recent distance