期刊文献+

面向领域的Web文本结构化分析 被引量:2

Domain-oriented structured analysis of Web texts
下载PDF
导出
摘要 为了充分利用领域特征进行Web文本的结构化分析,文章提出了一种面向领域的Web文本结构化分析方法。该方法以领域特征为基础,依据半结构化文本的结构特征和Html文本的层次特性构造Html树;利用本体论的相关思想和方法构建领域本体,从Html树中提取有价值的信息;并结合通用词库和领域词库进行结构化分析。实验结果表明,该方法能够很好地实现Web文本的结构化分析。 In order to take full use of the domain feature during the structured analysis of Web texts,a domain-oriented structured analysis method of Web texts is proposed.Based on the domain feature,this method first accords to the structural characteristic of the semi-structured text and the level characteristic of Html text to construct the Html tree.And then this method uses the related methods and thoughts of ontology to build the domain ontology,and extracts valuable information from the Html tree.Finally it combines with the general dictionary and the domain dictionary to accomplish the structured analysis.The experimental results show that this method is able to achieve the structured analysis of Web texts.
出处 《合肥工业大学学报(自然科学版)》 CAS CSCD 北大核心 2013年第3期309-314,共6页 Journal of Hefei University of Technology:Natural Science
基金 国家自然科学基金资助项目(60975033 60575035 60275022)
关键词 领域特征 WEB文本 结构化分析 半结构化文本 领域本体 domain feature Web text structured analysis semi-structured text domain ontology
  • 相关文献

参考文献12

二级参考文献71

  • 1史忠植.智能主体及其应用[M].北京:科学出版社,2001.7-11.
  • 2Han J, Kim T, Choi I. Web document clustering by using automatic keyphrase extraction[C]//IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Workshops, 2007 : 56-- 59.
  • 3Farhat A, Isabelle J F, Douglas O' Shaughnessy. Clustering words for statistical language models based on contextual word similarity[J]. Proceedings of the Acoustics, Speeah, and Signal Processing, 1996 IEEE Internaional Conference, Vol 1. Atlanta, GA,USA, 1996:180 -183.
  • 4Hammouda K M, Kamel M S. Efficient phrase-based document indexing for Web document clustering [J]. IEEE Transactions on Knowledge and Data Ebgineering, 2004,16 (10) : 1279--1296.
  • 5Zhang D. Semantic, hierarchical, online clustering of Web search results[C]//Proceeding of the 6th Asia Pacific Web Conference. Hangzhou, China, 2004 : 69-- 78.
  • 6Chen Zheng, Ma Weiying, Ma Jinwen. Learning to cluster web search results[C]//Proceedings of the 27th Annual In- ternational ACM SIGIR Conference. Sheffield, South Yorkshire,UK, 2004:210 -217.
  • 7Zamir O, Etzioni O. Web document cluserting: a feasibility demonstration[C]//Proceeding of Austrilia ACM SIGIR on Research and Development in Information Retrieval. New York: ACM Press, 1998 : 46- 54.
  • 8Pandya A, Bhattacharyya P. Text similarity measurment using concept representation of texts [C]//Proeeedings of First International Conference on Pattern Recognition and Machine Intelligence. Berlin: Springer, 2005 :678-689.
  • 9Song Jiangchun, Shen Junyi. A Web document clustering algorithm based on concept of neighbor[C]//Proceedings of the Second International Conference on Machine Learning and Cybernetics, 2003 : 46--47.
  • 10Liu Qun, Li Sujian. Word similarity computing based on How net[J]. Computational Linguistics and Chinese Language Processing,2002,17(2) :59-76.

共引文献227

同被引文献20

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部