摘要
隐私政策文档声明了应用程序需要获取的隐私信息,但不能保证清晰且完全披露应用获取的隐私信息类型,目前对应用实际敏感行为与隐私政策一致性分析的研究仍存在不足。针对上述问题,提出一种Android应用敏感行为与隐私政策一致性分析方法。在隐私政策分析阶段,基于Bi-GRU-CRF(Bi-directional Gated Recurrent Unit Conditional Random Field)神经网络,通过添加自定义标注库对模型进行增量训练,实现对隐私政策声明中的关键信息的提取;在敏感行为分析阶段,通过对敏感应用程序接口(API)调用进行分类、对输入敏感源列表中已分析过的敏感API调用进行删除,以及对已提取过的敏感路径进行标记的方法来优化IFDS(Interprocedural,Finite,Distributive,Subset)算法,使敏感行为分析结果与隐私政策描述的语言粒度相匹配,并且降低分析结果的冗余,提高分析效率;在一致性分析阶段,将本体之间的语义关系分为等价关系、从属关系和近似关系,并据此定义敏感行为与隐私政策一致性形式化模型,将敏感行为与隐私政策一致的情况分为清晰的表述和模糊的表述,将不一致的情况分为省略的表述、不正确的表述和有歧义的表述,最后根据所提基于语义相似度的一致性分析算法对敏感行为与隐私政策进行一致性分析。实验结果表明,对928个应用程序进行分析,在隐私政策分析正确率为97.34%的情况下,51.4%的Android应用程序存在应用实际敏感行为与隐私政策声明不一致的情况。
The privacy policy document declares the privacy information that an application needs to obtain,but it cannot guarantee that it clearly and fully discloses the types of privacy information that the application obtains.Currently,there are still deficiencies in the analysis of the consistency between actual sensitive behaviors of applications and privacy policies.To address the above issues,a method for analyzing the consistency between sensitive behaviors and privacy policies of Android applications was proposed.In the privacy policy analysis stage,a Bi-GRU-CRF(Bi-directional Gated Recurrent Unit Conditional Random Field)neural network was used and the model was incrementally trained by adding a custom annotation library to extract key information from the privacy policy declaration.In the sensitive behavior analysis stage,IFDS(Interprocedural,Finite,Distributive,Subset)algorithm was optimized by classifying sensitive API(Application Programming Interface)calls,deleting already analyzed sensitive API calls from the input sensitive source list,and marking already extracted sensitive paths.It ensured that the analysis results of sensitive behaviors matched the language granularity of the privacy policy description,reduced the redundancy of the analysis results and improved the efficiency of analysis.In the consistency analysis stage,the semantic relationships between ontologies were classified into equivalence,subordination,and approximation relationships,and a formal model for consistency between sensitive behaviors and privacy policies was defined based on these relationships.The consistency situations between sensitive behaviors and privacy policies were classified into clear expression and ambiguous expression,and inconsistency situations were classified into omitted expression,incorrect expression,and ambiguous expression.Finally,based on the proposed semantic similarity-based consistency analysis algorithm,the consistency between sensitive behaviors and privacy policies was analyzed.Experimental results show that,by analyzing 928 applications,with the privacy policy analysis accuracy of applications are found to have inconsistencies between the actual sensitive behaviors and the privacy policy declaration.
作者
杨保山
杨智
陈性元
韩冰
杜学绘
YANG Baoshan;YANG Zhi;CHEN Xingyuan;HAN Bing;DU Xuehui(Information Engineering University,Zhengzhou Henan 450001,China;State Key Laboratory of Cryptography Science and Technology(State Cryptography Administration),Beijing 100094,China)
出处
《计算机应用》
CSCD
北大核心
2024年第3期788-796,共9页
journal of Computer Applications
基金
国家自然科学基金资助项目(62176265)。
关键词
ANDROID
IFDS
敏感行为
隐私政策
自然语言处理
Android
IFDS(Interprocedural,Finite,Distributive,Subset)
sensitive behavior
privacy policy
Natural Language Processing(NLP)