摘要
针对LDA(Latent Dirichlet Allocatiom)主题挖掘算法不太适用于短文本且对于得到的每个主题直接进行凝练和表达很难做到统一化的问题,提出了一种综合特征维度分析的LDA主题挖掘改进算法。首先采用Python和Selenium抓取MOOC(Massive open online course)平台"教育教学"类25门在线课程的评论以及对应回复数据作为源数据,使用中文自然语言处理技术对其进行预处理。然后,基于LDA改进算法实现主题挖掘以及可视化呈现热点聚焦话题的演化趋势。最后,利用共现网络图综合特征维度分析主题分布特征。该研究通过引入自然语言处理技术和改进的LDA主题挖掘算法,构建出一种面向在线课程评论回复文本的主题共现分析方法框架,为在线学习非结构化数据研究提供了理论支持和创新思路。
Aiming at the problems that LDA(Latent Dirchlet Allocatiom)topic mining algorithm is not suitable for short texts and it is difficult to unify each topic directly,an improved LDA topic mining algorithm based on feature dimension analysis was proposed.Firstly,the comments and corresponding reply data of 25 online courses of"education and teaching"on the MOOC(Massive open online course)platform were captured by Python and Selenium as the source data,and Chinese natural language processing technology was used to preprocess them.Then,based on the improved LDA algorithm,the topic mining is realized and the evolution trend of hot topics is visualized.Finally,the co-occurrence network graph is used to analyze the topic distribution characteristics.By introducing the natural language processing technology and the improved LDA topic mining algorithm,this study constructs a topic co-occurrence analysis method framework for online course comment response text,which provides theoretical support and innovative ideas for online learning unstructured data research.
作者
陈秀明
张晨晨
王峰
王先传
CHEN Xiuming;ZHANG Chenchen;WANG Feng;WANG Xianchuan(School of Computer and Information Engineering,Fuyang Normal University,Fuyang Anhui 236037,China)
出处
《阜阳师范大学学报(自然科学版)》
2021年第4期73-81,共9页
Journal of Fuyang Normal University:Natural Science
基金
中国高校产学研创新基金-新一代信息技术创新项目(2019ITA01037)
安徽省高校自然科学研究重点项目(KJ2019A0541,KJ2019A0533)
安徽省高等学校省级质量工程支持疫情防控期间高校线上教学工作特需项目-重大线上教学改革研究项目(2020zdxsjg256)。