摘要
由于参数设置导致数据挖掘结果异常的例子很多,为了解决这一问题,出现了免参数据挖掘思想。对Kolmogorov复杂度理论进行了研究,将其和免参数据挖掘思想相结合,提出了一种基于压缩的相异度度量SCDM。由于压缩算法是空间和时间高效性算法,使得应用该算法的相异度度量也具有较好的性能。实验表明将这种相异度度量应用到层次聚类算法中,其聚类的准确率也较高。
There are many cases of getting fault results in data mining because of setting parameters. Therefore, parameter-free data mining appeared in order to resolve the problem. A new Compression-Based Dissimilarity Measure which was based on Kolmogorov complexity theory was proposed. Because compression algorithms are typically space and time efficient, the method based on it also has good performance. Applying it to hierarchical clustering can get good result.
出处
《计算机应用》
CSCD
北大核心
2006年第12期2982-2984,共3页
journal of Computer Applications
基金
河南省自然科学基金资助项目(0211050110)