摘要
文章设计了一种简单高效地数据预处理方法,通过数据清理、用户识别、会话识别及路径补全等步骤,并在各步骤中采用一定的规则与算法以提高处理准确性,获得了一个完整的数据预处理过程。经实验证明,该方法有效缩减了数据大小,提高了数据质量,具有良好的可靠性。
This article developed a simple and efficient data preprocessing method.An entire process of data preprocessing has been obtained through data cleaning,user identification,session identification,path completion and other procedures,within which appropriate rules and algorithms have been applied to improve the processing accuracy.The experiment shows that this method can effectively reduce the data size and improve the data quality reliably.
出处
《安徽职业技术学院学报》
2018年第4期5-7,11,共4页
Journal of Anhui Vocational & Technical College
基金
2018年安徽省自然科学研究重点项目"大数据环境下相似重复数据清洗的研究"(KJ2018A0710)阶段性成果