摘要
[目的 /意义]为了解决主流特征提取方法的提取效率有限的问题,在Fisher判别分析的基础上,借鉴流形学习思想,提出融合全局和局部特征的文本特征提取方法。[方法 /过程]首先定义基于流形的类间离散度(MBCS)以及基于流形的类内离散度(MWCS),然后在Fisher准则基础上通过最大化MBCS与MWCS之比实现特征提取,从而保证类间样本尽可能远离,而类内样本尽可能紧密。[结果 /结论]比较实验结果表明该方法有效。
[Purpose/significance] To solve the problem of limited extraction efficiency of current text feature extraction methods, the paper takes manifold learning for reference on the basis of Fisher Discriminant Analysis (FDA), and puts forwards a text feature ex- traction method with integration of whole and part features. [Method/process] The paper defines Manifold-based Between-Class Scatter (MBCS) and Manifold-based Within-Class Scatter (MWCS), carries on feature extraction by maximizing the ratio of MBCS to MWCS, so as to ensure that the samples in different classes are far away from each other, while the samples in the same class are as close as possible. [Result/conclusion] The comparative experiment results indicate that the method is valid.
出处
《情报探索》
2016年第1期1-3,共3页
Information Research
基金
江苏省数据工程与知识服务重点实验室开放项目"云环境下基于兴趣图谱的个性化学习资源推荐方法研究"(项目编号:DEKS2014KT005)
中国科学技术信息研究所情报工程实验室开放基金"云环境下基于兴趣图谱的个性化科技信息推荐方法研究"(项目编号:ISTIC-IEL201501)
山西省哲学社会科学"十二五"规划2014年度项目"山西省非物质文化遗产数字化保护方法研究"成果
关键词
文本特征提取
全局特征
局部特征
FISHER判别分析
流形学习
text feature extraction
whole feature
part feature
Fisher Discriminant Analysis (FDA)
manifold learning