Managing large amounts of data is becoming part of everyday life in most organizations. Handling, analyzing, searching, and making predictions from big data is becoming the norm for many organizations of many interest...Managing large amounts of data is becoming part of everyday life in most organizations. Handling, analyzing, searching, and making predictions from big data is becoming the norm for many organizations of many interests. Big data provides the foundations for more benefits and higher values to be extracted from big data. As big data comes with countless benefits, it also comes with many challenges to fulfilling its expectations. Some of those problems haunting big data banks are being termed dirty data. This paper focuses on dirty data while working on an organization’s natural live information system. The author was responsible for studying and analyzing a faltering information system and planning and carrying out the required solutions and fixes. The importance of the work carried out lies in the high level of dirty data observed in the system. Therefore, this paper is based on the part of dirty data—the paper focuses on how the team suffered from dirty data and how it was dealt with.展开更多
文摘Managing large amounts of data is becoming part of everyday life in most organizations. Handling, analyzing, searching, and making predictions from big data is becoming the norm for many organizations of many interests. Big data provides the foundations for more benefits and higher values to be extracted from big data. As big data comes with countless benefits, it also comes with many challenges to fulfilling its expectations. Some of those problems haunting big data banks are being termed dirty data. This paper focuses on dirty data while working on an organization’s natural live information system. The author was responsible for studying and analyzing a faltering information system and planning and carrying out the required solutions and fixes. The importance of the work carried out lies in the high level of dirty data observed in the system. Therefore, this paper is based on the part of dirty data—the paper focuses on how the team suffered from dirty data and how it was dealt with.