摘要
针对传统的分类算法不能满足多层次的工程信息分类,提出一种基于词频逆文档频率TFIDF(term frequency inverse document frequency)和分类树的多层工程信息分类法。通过对每条工程信息生成多层分类树,在不同层次构建TFIDF矩阵,减少冗余计算。通过计算树结点中储存的相似度,进行判决得出分类结果。与传统单层分类算法相比,基于树的判决方法可以对类进行多级划分、多类属划分,且计算时间仅为单层分类的59%,并获得了95.1%的召回率和97.4%的准确率,具有很好的灵活性与鲁棒性。实验结果证实了算法的有效性。
For traditional classification algorithms can' t satisfy the requirement of multilevel engineering information classification, we propose a multilayer engineering information classification method which is based on TFIDF ( term frequency inverse document frequency) and classification tree. The algorithm reduces redundant computation by creating multilevel classification tree on each engineering information piece and building TFIDF matrix at different levels. Through calculating the similarity stored in tree nodes, the algorithm gets the classification result by judging. Compared with traditional single-level classification algorithm, the tree-based judgement method can make multi-level classification and muhiple-generic division on classes. The computation time is only 59% of the single-level classification, and a recall rate of 95. 1% and accuracy of 97. 4% are obtained. It has good flexibility and robustness. Experimental results confirm the effectiveness of the algorithm.
出处
《计算机应用与软件》
CSCD
北大核心
2014年第6期174-176,191,共4页
Computer Applications and Software
关键词
信息分类
词频逆文档频率
分类树
Information classification TFIDF Classification tree