摘要
针对非结构化大数据发布中的隐私保护问题,提出了一种基于改进的可伸缩l-多样性(improved scalable l-diversity,Im SLD)大数据发布隐私保护方法。该算法采用基于两阶段条件随机场的命名实体识别(named entity recognition,NER)方法将非结构化数据表示为结构化形式,设计一种改进的可伸缩l-多样性算法来对表现良好的非结构化数据进行匿名化,实现保护非结构化大数据发布的隐私,通过Apache Pig实现Im SLD算法来使其具有可伸缩性。实验表明与MRA和SKA算法相比,改进的Im SLD算法在不同数据集上提供相同级别的隐私时信息损失均优于对比的另外两种算法。
Aiming at the problem of the privacy protection in unstructured big data publishing,this paper proposed a privacy protection method based on ImSLD big data publishing.The algorithm used the two-stage conditional random field NER method to represent unstructured data as a structured form,and designed an improved scalable l-diversity algorithm to perform well.It anonymized to structured data protect the privacy of unstructured big data distribution.It implemented the ImSLD algorithm by Apache Pig to make it scalable.Experiments show that compared with MRA and SKA algorithm,the improved ImSLD algorithm provides better information loss than the other two algorithms when providing the same level of privacy on different data sets.
作者
邹劲松
李芳
Zou Jinsong;Li Fang(Putian Big Data Industry School,Chongqing College of Water Resources&Electric Engineering,Chongqing 402160,China;School of Computer Science,Chongqing University,Chongqing 400044,China)
出处
《计算机应用研究》
CSCD
北大核心
2021年第2期564-566,571,共4页
Application Research of Computers
基金
重庆市教育科学“十三五”规划2020年度重点无经费课题(2020-GX-169)
重庆市职业教育学会2020—2021年度立项课题(2020ZJXH282086)。