摘要
网站已成为高校发布学术活动通知的主要平台,能否从中准确抽取相关信息并进行分类,直接影响着办公自动化的效率。为了实现文本分类,采用向量空间模型(VSM)描述文本,利用一种改进的TF-IDF算法设置文本向量特征权重,并通过优化的互信息算法来实现文本特征提取与降维,改善了分类模型的向量输入。最后在构造BP神经网络文本分类器时,通过激活函数的选择、参数的初始值设置以及动量因子的引入对其进行改进。实验结果表明,改进后的方法在分类精度上有明显的提高,利用该方法可以较好地进行学术活动的文本分类。
Websites have become the main platform for universities to publish notifications of academic activities. Whether it can accurately extract relevant information from academic activities and classify directly affects the efficiency of office automation. In order to realize the text categorization, texts are described by using vector space model (VSM), and an improved TF-IDF algorithm is used to represent the text vector feature weights. Text feature extraction and dimensionality reduction are achieved through an optimized mutual information algorithm, which improves the vector input of categorization model. Finally, the BP neural network text classifier is improved by selecting the activation function, setting the initial value of the parameters and introducing the momentum factor. The experimental results show that the improved method has significantly improved the classification accuracy, which can be used for text categorization of academic activities.
作者
田欢
李红莲
吕学强
周建设
夏红科
TIAN Huan;LI Honglian;LU Xueqiang;ZHOU Jianshe;XIA Hongke(School of Information and Communication Engineering,Beijing Information Science & Technology University,Beijing 100101,China;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science & Technology University,Beijing 100101,China;Beijing Advanced Innovation Center for Imaging Technology,Capital Normal University,Beijing 100048,China;Computer School,Beijing Information Science & Technology University,Beijing 100101,China)
出处
《北京信息科技大学学报(自然科学版)》
2018年第5期38-44,共7页
Journal of Beijing Information Science and Technology University
基金
国家自然科学基金项目(61671070)
北京成像技术高精尖创新中心项目(BAICIT-2016003)
国家社会科学基金重大项目(14@ZH0363)
国家语委重点项目(ZDI135-53)