摘要
为了提高互联网中高敏数据动态抓取能力,提出一种基于Python语言的高敏数据动态抓取方法。构建互联网高敏数据分布模型,通过三阶自相关信息匹配的方法,进行互联网敏感数据的分布式检测,采用K近邻的欠采样的方法,进行互联网敏感数据的深度学习和特征分解,获得互联网敏感数据特征表达规律,分析互联网高敏数据的语义相似度,得到互联网高敏数据的线性包络融合向量,根据信息融合和大数据聚类结果,采用模糊度检测和动态识别技术,实现对互联网高敏数据的动态抓取识别与优化输出控制。仿真结果表明,采用该方法进行互联网高敏数据抓取,准确性较高,动态抓取能力较强,提高了互联网高敏数据的检测和识别能力。
In order to improve the dynamic capture ability of high sensitivity data in the Internet,a dynamic capture method of high sensitivity data based on Python language was proposed.Constructing a distribution model of Internet sensitive data,using the third-order autocorrelation information matching method to perform distributed detection of Internet sensitive data,using the K-nearest neighbor under-sampling method to perform deep learning and feature decomposition of Internet sensitive data to obtain Internet sensitive data feature expression law,it analyzed the semantic similarity of Internet high sensitivity data,and obtained the linear envelope fusion vector of Internet high sensitivity data.According to the results of information fusion and big data clustering,through ambiguity detection and dynamic recognition technology,it realized the detection of Internet high sensitivity data dynamic capture recognition and optimized output control.The simulation results show that the method can be used to capture high sensitivity Internet data with high accuracy and strong dynamic capture ability,which improves the detection and recognition capabilities of high sensitivity Internet data.
作者
彭文良
吴红虹
PENG Wen-liang;WU Hong-hong(Department of Electronic Information and Media,Chizhou Vocational and Technical College,Chizhou,247000,Anhui;Department of Economics and Management,Chizhou Vocational and Technical College,Chizhou,247000,Anhui)
出处
《蚌埠学院学报》
2021年第5期61-65,共5页
Journal of Bengbu University
基金
安徽省质量工程项目高校继续教育教学改革项目(2019jxjj66)
安徽省省级质量工程线上教学重大项目(2020zdxsjg242)
高校优秀青年骨干人才国内访学研修项目(gxgnfx2018120)
池州职业技术学院院级自然重点项目(2020yjzrzd03)。