摘要
为提高计算机对古典诗歌自动分类的准确性,提出一种基于特征项聚合的分类方法.首先从文本中提取特征项并用向量表示,向量的每一分量表示该特征项在不同类别中的比重;然后通过聚类算法把相似的特征项聚合为一组,从而形成特征项聚合的诗歌模型;最后利用分类器对诗歌进行分类.《全唐诗》语料库的实验结果表明,利用该模型及算法可明显提高诗歌分类的准确率.
In order to improve automatic classification accuracy of classical poetry, a classification method based on feature terms clustered is proposed. Firstly, the feature terms are extracted from poetries text and expressed by vectors, and each component of the vectors is the proportion of different category. Then, the similar feature terms are clustered by the cluster algorithm and the poetry model based on feature terms clustered is formed. Finally, the poetries are classified by the classifier. Experimental results on Selected Works of Tang Poetry corpus show that classification accuracy has been improved significantly by using the model and algorithm.
出处
《东华大学学报(自然科学版)》
CAS
CSCD
北大核心
2014年第5期599-604,共6页
Journal of Donghua University(Natural Science)
关键词
向量空间模型
特征项选择
聚类算法
文本分类
vector space model
feature terms selection
cluster algorithm
text classification