摘要
在线课程组织和管理系统就是为了使学习更加便利而提供的一个教育资源的集成平台。作为系统中重要环节的元数据抽取模块,需要对半结构化网页能够达到较好的抽取精确性,并具有处理结构松散文档的能力。本文设计并实现了一种按照指定规则自动抽取的元数据方法。该方法能够按照多优先级规则匹配网页元数据,并按照两步抽取的方法进行精细化处理。针对不同的问题域使用不同规则抽取,不需对程序进行特定修改。实验证明,这种方法能够很好地处理半结构化网页,F测度达到85%以上,具有较好的实用价值。
Integrating all kinds of learning material is becoming more and more significant for the teachers and students to take advantage of the online E-learning courses. As the key part of the whole Online Course Organization System,Metadata Extraction function needs to heaccurate enough when dealing with semi-structured documents, even those incompact ones. We design and !mplement a Metadata Extractor to .compare. between several rules ordered by priority,and there is another step of information refinement to help improving the final accuracy. When domain changes, users just need to input.specific rules, without considering the program. The experiment, shows that our new method can perform very well withthose semi-structured documents, with F measure higher than 85%, which indicates that this method is quite feasible in reality.
出处
《计算机科学》
CSCD
北大核心
2008年第3期94-96,共3页
Computer Science
基金
国家自然科学基金“网络计算环境综合试验平台”(编号90412010)
惠普大学合作基金“在线课程的组织与管理”项目
国家自然科学基金(编号60573166)
广东省网络重点实验室基金的支持
关键词
元数据抽取
正则表达式
信息精化
Metadata extraction, Regular expression, Information refinement