摘要
近年来随着大型网络开放平台MOOC的大量出现,学习者需要花费大量的时间在不同的平台搜索自己满意的MOOC课程。为了提高MOOC教育资源的利用率,本文设计并实现面向MOOC领域的垂直搜索引擎系统,提出一种多线程并行紧耦合爬取和索引优化方案;根据课程列表的3种加载方法,实现课程相关信息的下载;分析被提取课程网页的特征定制相关信息抽取规则;提出一种检索排序相似度评分的优化方法。实验结果表明:该垂直搜索引擎在平均爬取及索引时间、排序效果和平均正确率均值等方面都有一定的提高,实现了MOOC教育资源的整合、存储和检索功能,满足了教育信息化发展的要求。
Learners need spend a lot of time in searching the satisfying courses of themselves in different platforms, as the large network open platform MOOC appears in recent years. A vertical search engine for MOOC is designed and implemented in order to improve the utilization efficiency of education resource in MOOC. This paper proposes a kind of optimization scheme of tightly coupled crawling and index of muhithreading parallel, h can download the relative information of courses according to three kinds of methods of loading course list, and customize the extraction rules of relative information according to the feature of course Web page being analyzed. This paper also proposes a kind of prioritization method of similarity score in search ranking. The analysis on experiment result indicates that the evaluation values of average time of crawling and index, sorting effect and average mean value of correct rate etc increase to some extend by the vertical search engine for MOOC. Therefore, it achieves the integration, storage, and retrieval functions of education resource in MOOC, and satisfies the requirements of development of educational information.
出处
《计算机与现代化》
2017年第4期32-37,共6页
Computer and Modernization
基金
湖北省教育科学"十二五"规划项目(2011B130)
湖北省高等学校优秀中青年科技创新团队计划项目(T201515)
湖北省教学研究项目(2015382)