期刊文献+

基于离群点检测的分类结果置信度的度量方法 被引量:4

Confidence measure method of classification results based on outlier detection
下载PDF
导出
摘要 为度量在网络日志中网页分类模型的预测结果,将度量为可信的结果加入网址分类集合,提高网络日志中访问链接的分类效率,提出一种基于离群点检测的分类结果置信度的度量方法.采用基于Bagging构建多个弱分类器对待分类数据进行预测,并对每个预测结果构建各类别的概率向量,根据离群点检测来度量模型的预测结果是否为可信.在UCI公共数据集上,使用主流的基于k均值和基于局部密度的度量方法进行了对比实验.实验结果表明,应用基于离群点检测的分类结果置信度,基于k均值的度量方法和基于局部密度的度量方法均显著提高了准确率.另外,在工程项目爬取的网页分类中也取得了同样的效果. In order to measure the prediction result of the webpage classification model,a novel confidence measure method of classification results is proposed based on outlier detection by adding the measurement result as a reliable result to the URL classification set to improve the classification efficiency of the link in the weblog.The Baggingbased weak classifiers first are used to predict the classification data.In addition,the probability vectors of different types are constructed for each prediction result.Then,the credibility of the prediction results is measured by outlier detection.The proposed confidence measure method is used by k-means-based measurement and local density-based measurement to webpage classification on UCI data set.The experimental results show that the accuracy of the classification results based on outlier detection are significantly improved respectively.Furthermore,the same effect is achieved in the classification of web pages crawled from engineering projects.
作者 严云洋 瞿学新 朱全银 李翔 赵阳 Yan Yunyang;Qu Xuexin;Zhu Quanyin;Li Xiang;Zhao Yang(Faeulty of Computer and Software Engineering,Huaiyin Institute of Technology,Huai'an,223003,China;School of Computer Science and Technology,Southwest University of Science and Technology, Mianyang,621010,China)
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2019年第1期102-109,共8页 Journal of Nanjing University(Natural Science)
基金 江苏省"六大人才高峰"项目(2013DZXX-023) 江苏省"青蓝工程" 江苏省重点研发计划(BE2015127)
关键词 离群点 网页分类 K均值 LOF算法 outliers webpage classification k-means LOF
  • 相关文献

参考文献8

二级参考文献91

  • 1侯小静,王黎明.利用HTML标签筛选网页分类样本[J].微机发展,2005,15(3):142-144. 被引量:3
  • 2洪飞,吴志美.基于小波的多尺度网络流量预测模型[J].计算机学报,2006,29(1):166-170. 被引量:46
  • 3赵鹏,耿焕同,王清毅,蔡庆生.基于聚类和分类的个性化文章自动推荐系统的研究[J].南京大学学报(自然科学版),2006,42(5):512-518. 被引量:13
  • 4李婧瑜,李歧强,侯海燕,杨立才.基于遗传算法的小波神经网络交通流预测[J].山东大学学报(工学版),2007,37(2):109-112. 被引量:23
  • 5杨绪兵,陈松灿,杨益民.局部化的广义特征值最接近支持向量机[J].计算机学报,2007,30(8):1227-1234. 被引量:10
  • 6Wu Y H, Chen Y C, Chen A L P. Enabling personalized recommendation on the web based on user interests and behaviors. Karl A, Ling L. The 11th International Workshop in Research Issues in Data Engineering. Los Alamitos: IEEE CS Press, 2001, 17-24.
  • 7Kim H, Chan P K. Learning implicit user inter- est hierarchy for context in personalization. Da- vid L. Proceedings of the 8th International Con- ference on Intelligent User Interfaces. New York: ACM, 2003, 101-108.
  • 8Mobasher B, Cooley R, Srivastave J. Creating adaptive web sites through usage-based cluste- ring of URLs. Proceedings of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop. IEEE Computer Society, Washing- ton DC, USA, 1999, 19-25.
  • 9Pazzani M, Muramatsu J, Billsus D. Syskill &Webert: Identifying interesting web sites. Mi- chael P, Jack M. Proceedings of the National Conference on Artificial Intelligence, lecture Notes in Computer Science. California: AAAI Press, 1996, 54-61.
  • 10Widmer G, Kubat M. Learning in the presence of concept drift and hidden contexts. Machine Learning, 1996, 23(1): 69-101.

共引文献132

同被引文献39

引证文献4

二级引证文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部