摘要
股票研报是由金融行业分析师对股票相关新闻作出的分析和评价,它从专业角度分析此类新闻是否会对某股票的未来走势产生影响,并提出专业投资建议,往往比论坛分析更具权威性。然而,各类别研报数量之间的严重不均衡性致使常规的SVM分类效果较差。为提高分类效果,提出一种新的不均衡数据分类方法。在文本特征项选择方面采用组合特征思想以选择更具语义信息的特征短语,并改进CHI统计以提高对少数类样本特征项的选择,然后设计一个基于SVM聚类的边界自适应层次欠采样算法对多数类样本进行层次欠采样。实验结果表明,该方法能够在不影响多数类分类的基础上对少数类的分类效果有较为明显的提升。
Stock research report is an authoritative summarization about the stock information which is analyzed and evaluated by professional financial analysts. In such report,they analyze whether the financial news will have a positive impact on the trend of the stock or not,and provide professional investment advice as well. However,the serious quantitative imbalance of each class results in a bad effect on SVM classification. In order to improve the classification results,this paper first proposed composition feature to express the text feature. Then it designed an improved CHI statistics to make a more accurate feature selection on the minority samples. Finally,it developed a clustering-SVM boundary hierarchical under-sampling algorithm to appropriately remove the majority samples that were interfered with the minority ones. The experimental results demonstrate that this algorithm can dramatically improve the minority class without making too much side effects on the majority class.
出处
《计算机应用研究》
CSCD
北大核心
2017年第3期769-772,780,共5页
Application Research of Computers
基金
国家自然科学基金青年项目(164659)