摘要
数据挖掘是当前热门的研究方向,序列数据分类作为一种非典型的数据挖掘任务在很多领域有广泛的应用前景,例如金融、生物等领域.由于序列数据难以定义特征,其分类问题,传统的基于特征的分类方法难以适用.一种常见的解决方案是使用概率模型进行序列分类,例如马尔可夫模型.在大数据时代,经常需要多个独立的机构或个人共享数据进行数据挖掘任务,很多数据由于伦理、法律等方面的问题,不适合直接共享.本文在保证每个参与方数据隐私(包括数据本身以及数据的统计特征)的前提下针对如何训练马尔可夫模型,借助密码学技术提出了一个解决方案.方案针对1阶和2阶马尔可夫模型没有误差并且有较小的时间开销,尤其是训练轮数增加时性能较高.
Nowadays data mining is a popular research area,as a special classification task,sequence classification has been extensively used in some area,such as finance,biology etc.It is hard to define the features for sequence data,traditional feature based classification algorithms are not good choices.A common way used on sequence classification are probability models,such as Markov model.In big data area,it is common for some individual institutions or people to undertake data mining tasks under collaboration.Faced with some problems such as ethic and law,it is not suitable to share data directly.In this paper,we use a cryptography scheme to solve the problem of how to train Markov model while preserving privacy(include data and statistical features of data).The results of firstorder and secondorder Markov model are very small in terms of time cost,especially with the increase of the train period.
出处
《小型微型计算机系统》
CSCD
北大核心
2018年第2期197-201,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61572456)资助
江苏省自然科学基金项目(BK20151241)资助
关键词
数据挖掘
隐私保护
序列分类
马尔可夫模型
data mining privacy preserving sequence classification Markov model