摘要
为解决大规模数据集的概率密度函数估计问题,提出一种基于无放回抽样的帕尔森窗口集成(sampling without replacement-based Parzen window ensemble,SR-PWE)方法,该方法在不需要利用全部数据的前提下,能够以较低的计算复杂度获得令人满意的概率密度函数估计表现.基于无放回抽样得到的若干原数据集的数据子集,利用帕尔森窗口法在数据子集上进行基概率密度函数估计,并将抽样上估计的基概率密度函数集成得到原始数据集的概率密度函数.通过在柯西分布和正态分布上对比帕尔森窗口法和SRPWE方法的概率密度函数估计表现,证实SR-PWE方法可行且有效.
Although the Parzen window method is a classical probability density function(PDF)estimation method,which is widely applied in the fields of machine learning and pattern recognition,it is unsuitable for the PDF estimation of large-scale data because of its high computational complexity and bandwidth sensibility.In this paper,to handle the PDF estimation for large-scale data,we propose a sampling without replacement-based Parzen window ensemble(SR-PWE)method which conducts the PDF estimation based on the partial data and is able to obtain the satisfactory PDF estimation performance with the low computation complexity.Firstly,we generate a number of sub-datasets from the original data set by sampling without replacement.Secondly,we estimate the base PDFs by using the Parzen window method on these sub-datasets.Then,we determine the PDF of original data set based on the fusion of base PDFs.Finally,the experimental results on Cauchy and normal distributions demonstrate the feasibility and effectiveness of sampling without replacement-based Parzen window ensemble method.
作者
何武超
王晓兰
何玉林
熊睿杰
HE Wuchao;WANG Xiaolan;HE Yulin;XIONG Ruijie(Department of Information Engineering,Cangzhou Technical College,Cangzhou 061001,Hebei Province,P.R.China;College of Computer Science and Software Engineering,Shenzhen University,Shenzhen 518060, Guangdong Province,P.R.China;National Engineering Laboratory for Big Data System Computing Technology,Shenzhen University, Shenzhen 518060,Guangdong Province,P.R.China)
出处
《深圳大学学报(理工版)》
EI
CAS
CSCD
北大核心
2018年第6期617-621,共5页
Journal of Shenzhen University(Science and Engineering)
基金
国家自然科学基金资助项目(61503252)
中国博士后科学基金资助项目(2016T90799)
深圳大学新引进教师科研启动资助项目(2018060)
国家重点研发计划资助项目(2017YFC0822604-2)~~
关键词
概率分布
概率密度函数估计
帕尔森窗口
核密度估计方法
窗口宽度
无放回抽样
集成方法
大规模数据集
probability distribution
probability density function estimation
Parzen window
kernel density estimation method
bandwidth
sampling without replacement
ensemble method
large-scale dataset