期刊文献+

Impacts of Dirty Data on Classification and Clustering Models:An Experimental Evaluation

原文传递
导出
摘要 Data quality issues have attracted widespread attentions due to the negative impacts of dirty data on data mining and machine learning results.The relationship between data quality and the accuracy of results could be applied on the selection of the appropriate model with the consideration of data quality and the determination of the data share to clean.However,rare research has focused on exploring such relationship.Motivated by this,this paper conducts an experimental comparison for the effects of missing,inconsistent,and conflicting data on classification and clustering models.FYom the experimental results,we observe that dirty-data impacts are related to the error type,the error rate,and the data size.Based on the findings,we suggest users leverage our proposed metrics,sensibility and data quality inflection point,for model selection and data cleaning.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第4期806-821,共16页 计算机科学技术学报(英文版)
基金 the National Natural Science Foundation of China under Grant Nos.U1866602 and 71773025,the CCF-Huawei Database System Innovation Research Plan under Grant No.CCF-HuaweiDBIR2020007B the National Key Research and Development Program of China under Grant No.2020YFB1006104.
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部