摘要
提出了一种从非确定结构的论坛页面自动获取信息区域的方法。该方法在对K-中心点聚类算法的研究基础上克服了算法中固定簇数的缺陷,并在算法的簇中心距离计算中引入Smith-Waterman改进算法,提高了算法聚类的精确度。通过对大量论坛网页进行信息识别的实验显示,该方法切实可行并且具有较高的准确性。
There is a method of extracting information automatically from web forums with uncertainly structures. Based on the algorithm of K-medoids Clustering algorithms, the method overcomes fixed clusters shortcomings in the algorithm, and it also adds improved Smith-Waterman algorithm into the calculation of cluster center distance, so clustering algorithms accuracy is improved. Information recognition experiments from many web forums show that the method is much more feasibility and veracity.
出处
《计算机工程与设计》
CSCD
北大核心
2009年第1期210-212,共3页
Computer Engineering and Design