摘要
本文构造了一种能准确描述文本之间相似性(亲和力)的新方法,并在此基础上提出了一种基于人工免疫网络的文本聚类算法。仿真结果表明,与传统的文本聚类算法相比,新算法不仅能自动发现新类,而且具有聚类精度更高、数据压缩比更大、与输入初始配置无关、可增量处理的优势。
A new method which can accurately compute the affinity between documents is presented in this paper. By using the method a document clustering algorithm based on artificial immune networks is proposed. Simulation results show that the new algorithm can not only locate new clusters automatically, but has the advantage of being independent of the input initialization and incremental clustering ability as well, where it has better clustering quality and higher data compression rate than some current document clustering algorithms.
出处
《计算机工程与科学》
CSCD
2007年第10期17-19,49,共4页
Computer Engineering & Science
基金
国家自然科学基金资助项目(60575006)
关键词
亲和力计算
人工免疫网络
文本聚类
affinity computing
artificial immune network
document clustering