摘要
为了提升隐私政策可读性并评价其质量,提出一种基于机器学习的中文隐私政策条款自动分类方法。首先,确立条款分类指标体系,从不同类别条款中提取特征;其次,建立和训练基于机器学习算法的层次多标签分类模型,在测试集上通过实验对比各算法性能;最后,基于分类结果检测隐私政策的虚假性和完整性,同时设计了隐私政策评价方法对其进行评分。实验结果表明,支持向量机模型在分类效果上优于其他模型,准确率达到86%,验证了该方法在自动分类隐私政策条款上的可行性。此外,对华为应用市场中1500篇隐私政策检测发现,其中38.5%不是隐私政策,余下隐私政策中92.5%的内容不完整,大部分得分偏低。
In order to improve the readability of privacy policy and evaluate its quality,an automatic classification method of Chinese privacy policy terms based on machine learning is proposed.Firstly,a clause classification index system is established to extract features from different categories of clauses.Then,a hierarchical multi-label classification model based on machine learning algorithms is established and trained,and the performance of each algorithm is compared through experiments on the test set.Finally,based on the classification results,the falseness and completeness of the privacy policy are detected,and at the same time,a privacy policy evaluation method is designed to score it.The experimental results indicate that the support vector machine model is superior to other models in the classification effect,with an accuracy rate of 86%,which verifies the feasibility of this method in the automatic classification of privacy policy terms.In addition,a test of 1,500 privacy policies in Huawei App Market shows that 38.5%of them are not privacy policies,and 92.5%of the remaining privacy policies are incomplete,and most scores are low.
作者
朱璋颖
陆亦恬
唐祝寿
张燕
ZHU Zhang-ying;LU Yi-tian;TANG Zhu-shou;ZHANG Yan(Pwnzen Information Technology Co.Ltd.,Shanghai 201100,China)
出处
《通信技术》
2020年第11期2749-2757,共9页
Communications Technology