摘要
在线社交媒体中存在大量的噪音和冗余信息,为对其进行过滤和筛选,获取高质量的信息,提出基于核主分析和小波变换的高质量微博提取框架,并设计一种基于多特征融合的高质量信息的提取算法,将信息特征转换到小波域以更好地捕获信号间的细节差异。利用最大期望算法度量各个特征的权值,进一步融合得到特征综合值。为降低噪声特征对信息质量提取的影响并提高算法运算速度,引入核主成分分析对特征进行变换。实验结果表明,该框架能够提取出更高质量的微博,并且大幅减少运算时间。
Massive social event relevant messages are generated in online social media,which makes the filtering and screening of them be a challenge.In order to obtain massages with high quality,a high quality information extraction framework based on Kernel Principal Component Analysis and Wavelet Transformation(KPCA-WT) is proposed.Based on multiple features fusion,the paper designs an algorithm to extract the microblogs of high quality,which transforms the features into wavelet domain to capture the details differences between the feature signals.The features weights are evaluated by employing Expectation Maximization(EM) algorithm and fused further to get a comprehensive value of each message,in order to reduce the effect of noise features,and to speed up the operation,the features are transformed through KPCA.Experimental results show that the proposed framework can extract information with higher quality and greatly reduce the time consumption.
出处
《计算机工程》
CAS
CSCD
北大核心
2016年第1期180-186,共7页
Computer Engineering
基金
国家自然科学基金资助项目(61472291
61303115)
2013年深圳知识创新计划基础研究基金资助项目
关键词
信息提取
特征融合
小波变换
期望最大算法
核主成分分析
information extraction
feature fusion
wavelet transformation
Expectation Maximization(EM) algorithm
Kernel Principal Component Analysis(KPCA)