摘要
给出了基于距离的异常数据的数量化定义,提出了基于距离的多指标的异常数据挖掘算法,这种算法适合于一般的海量数据库中的数据分析,以学生考试成绩作为实例进行了分析,可以从中动态地挖掘异常数据。作为特例,把单指标的异常数据挖掘算法应用于校园网Web服务器日志文件,给出了上网用户的频率分析图。
The quantitative definition of outlier data based on the distance was presented. The multi-criterion algorithm for mining outlier data based on the distance was also proposed. The proposed algorithm was very fit for data analysis in large database, and was applied to the student score in order to mining dynamic outliers. As for special example, the single-criterion algorithm for mining outlier data based on the distance was applied to the Web service log in campus networks. The frequency analysis chart including outlier data sign was presented.
出处
《计算机应用与软件》
CSCD
北大核心
2005年第9期105-107,共3页
Computer Applications and Software