摘要
针对正则化极限学习机处理高维文本数据时文本特征表示能力不足的问题,提出了一种基于多隐层极限学习机的文本分类方法.首先,使用极限学习机自编码器的压缩表示对高维文本数据进行降维处理.然后,通过多隐层极限学习机的多隐层结构提取出高层文本特征并通过最小二乘的方法对文本数据进行分类.与多个算法的实验对比表明,该算法在20newsgroup、Reuters和复旦大学中文语料库这3个数据集上都具有良好的分类性能.
When the dimension of text data is high, the regularized extreme learning machine (ELM) of single hidden layer structure has not enough ability to express feature in the text classification. To solve the problem, this paper presented a text classification method based on multi-layer extreme learning machine (ML-ELM). First, the method used the compressed representation of extreme learning machine- based auto-encoder (ELM-AE) to reduce the dimension of the text data. Then, the structure of the multi-hidden was used to represent high-level features in the text data, and the method of least squares was used to classify the text data. The experimental results on Reuters, 20newsgroup and Fudan University Chinese Corpus datasets show that this algorithm has a good classification performance compared with other algorithms.
作者
冀俊忠
庞皓明
杨翠翠
刘金铎
JI Junzhong;PANG Haoming;YANG Cuicui;LIU Jinduo(Multimedia and Intelligent Software Technology Beijing Key Laboratory, Beijing University of Technology,Beijing 100124, China)
出处
《北京工业大学学报》
CAS
CSCD
北大核心
2019年第6期534-545,共12页
Journal of Beijing University of Technology
基金
国家自然科学基金资助项目(61672065)
关键词
文本分类
高维文本
多隐层极限学习机
极限学习机自编码器
特征映射
神经网络
text classification
high dimensional text
multi-layer extreme learning machine ( ML-ELM)
extreme learning machine-based auto-encoder (ELM-AE)
feature mapping
neural network