Big data technologies have seen tremendous growth in recent years. They are widely used in both industry and academia. In spite of such exponential growth, these technologies lack adequate measures to protect data fro...Big data technologies have seen tremendous growth in recent years. They are widely used in both industry and academia. In spite of such exponential growth, these technologies lack adequate measures to protect data from misnse/abuse. Corporations that collect data from multiple sources are at risk of liabilities due to the exposure of sensitive information. In the current implementation of Hadoop, only file-level access control is feasible. Providing users with the ability to access data based on the attlibutes in a dataset or the user's role is complicated because of the sheer volume and multiple formats (structured, unstructured and semi-structured) of data. In this paper, we propose an access control framework, which enforces access control policies dynamically based on the sensitivity of the data. This framework enforces access control policies by harnessing the data context, usage patterns and informat/on sensitivity. Information sensitivity changes over time with the addition and removal of datasets, which can lead to modifications in access control decisions. The proposed framework accommodates these changes. The proposed framework is automated to a large extent as the data itself determines the sensitivity with minimal user intervention. Our experimental results show that the proposed framework is capable of enforcing access control policies on non-multimedia datasets with minimal overhead.展开更多
文摘Big data technologies have seen tremendous growth in recent years. They are widely used in both industry and academia. In spite of such exponential growth, these technologies lack adequate measures to protect data from misnse/abuse. Corporations that collect data from multiple sources are at risk of liabilities due to the exposure of sensitive information. In the current implementation of Hadoop, only file-level access control is feasible. Providing users with the ability to access data based on the attlibutes in a dataset or the user's role is complicated because of the sheer volume and multiple formats (structured, unstructured and semi-structured) of data. In this paper, we propose an access control framework, which enforces access control policies dynamically based on the sensitivity of the data. This framework enforces access control policies by harnessing the data context, usage patterns and informat/on sensitivity. Information sensitivity changes over time with the addition and removal of datasets, which can lead to modifications in access control decisions. The proposed framework accommodates these changes. The proposed framework is automated to a large extent as the data itself determines the sensitivity with minimal user intervention. Our experimental results show that the proposed framework is capable of enforcing access control policies on non-multimedia datasets with minimal overhead.