摘要
为了充分利用领域特征进行Web文本的结构化分析,文章提出了一种面向领域的Web文本结构化分析方法。该方法以领域特征为基础,依据半结构化文本的结构特征和Html文本的层次特性构造Html树;利用本体论的相关思想和方法构建领域本体,从Html树中提取有价值的信息;并结合通用词库和领域词库进行结构化分析。实验结果表明,该方法能够很好地实现Web文本的结构化分析。
In order to take full use of the domain feature during the structured analysis of Web texts,a domain-oriented structured analysis method of Web texts is proposed.Based on the domain feature,this method first accords to the structural characteristic of the semi-structured text and the level characteristic of Html text to construct the Html tree.And then this method uses the related methods and thoughts of ontology to build the domain ontology,and extracts valuable information from the Html tree.Finally it combines with the general dictionary and the domain dictionary to accomplish the structured analysis.The experimental results show that this method is able to achieve the structured analysis of Web texts.
出处
《合肥工业大学学报(自然科学版)》
CAS
CSCD
北大核心
2013年第3期309-314,共6页
Journal of Hefei University of Technology:Natural Science
基金
国家自然科学基金资助项目(60975033
60575035
60275022)