期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
COSSETS+: Crowdsourced Missing Value Imputation Optimized byKnowledge Base
1
作者 Hong-Zhi Wang zhi-xin qi +2 位作者 Ruo-Xi Shi Jian-Zhong Li Hong Gao 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第5期845-857,共13页
Missing value imputation with crowdsourcing is a novel method in data cleaning to capture missing values that could hardly be filled with automatic approaches. However, the time cost and overhead in crowdsourcing are ... Missing value imputation with crowdsourcing is a novel method in data cleaning to capture missing values that could hardly be filled with automatic approaches. However, the time cost and overhead in crowdsourcing are high. Therefore, we have to reduce cost and guarantee the accuracy of crowdsourced imputation. To achieve the optimization goal, we present COSSET+, a crowdsourced framework optimized by knowledge base. We combine the advantages of both knowledge-based filter and crowdsourcing platform to capture missing values. Since the amount of crowd values will affect the cost of COSSET+, we aim to select partial missing values to be crowdsourced. We prove that the crowd value selection problem is an NP-hard problem and develop an approximation algorithm for this problem. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed approaches. 展开更多
关键词 crowdsourcing missing value IMPUTATION knowledge base OPTIMIZATION
原文传递
Impacts of Dirty Data on Classification and Clustering Models:An Experimental Evaluation
2
作者 zhi-xin qi Hong-Zhi Wang An-Jie Wang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第4期806-821,共16页
Data quality issues have attracted widespread attentions due to the negative impacts of dirty data on data mining and machine learning results.The relationship between data quality and the accuracy of results could be... Data quality issues have attracted widespread attentions due to the negative impacts of dirty data on data mining and machine learning results.The relationship between data quality and the accuracy of results could be applied on the selection of the appropriate model with the consideration of data quality and the determination of the data share to clean.However,rare research has focused on exploring such relationship.Motivated by this,this paper conducts an experimental comparison for the effects of missing,inconsistent,and conflicting data on classification and clustering models.FYom the experimental results,we observe that dirty-data impacts are related to the error type,the error rate,and the data size.Based on the findings,we suggest users leverage our proposed metrics,sensibility and data quality inflection point,for model selection and data cleaning. 展开更多
关键词 data quality CLASSIFICATION CLUSTERING model selection data cleaning
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部