基于本体概念图的web文档主题爬取探讨
摘要
提出了基于本体概念图的web文档的主题爬取,采用本体概念图构造主题层次图,赋予待爬取的URL对象以层次语义信息,按照语义相关性与重要性选择爬取URL对象,搜索属于特定语义相关主题的重要web文档的WWW子集。
出处
《科技创新导报》
2010年第8期24-25,共2页
Science and Technology Innovation Herald
参考文献7
-
1M Diligenti,F M Coetzee,S Lawrence et al.Focused crawling using context graphs.In:Proceedings of the 26th International Conference on Very Large Data Bases.Cairo:Morgan Kaufmann Publishers,2000:527-534.
-
2Studer R,Benjamins V R,Fensel D.Knowledge Engineering,Principles and Methods.Data and Knowledge Engineering,1998,25(122):161-197.
-
3Perez A G,Benjamins V R.Overview of Knowledge Sharing and Reuse Components:Ontologies and Problem Solving Methods.In:Proceedings of the IJCAI-99 workshop on Ontologies and Problem Solving Methods(KRR5),1999:1-15.
-
4J Cho,H Garcia-Molina,L Page,Efficient crawling through URL ordering.In:Proceedings of the 7th ACM-WWW International Conference.Brisbane:ACM Press,1998:161-172.
-
5Aggarwal C,Al-Garawi F,Yu P.Intelligent crawling on the World Wide Web with arbitrary predicates.In:Proceedings of the Tenth International World Wide Web Conference.Hong Kong:ACM Press,2001:96-105.
-
6Weiss R.A Hierarchical Network Search Engine that Exploits Content Link Hypertext Clustering.In:Proceedings of the Seventh ACM Conference on Hypertext.Washington,DC:ACM Press,1996:180-193.
-
7曾义聪,杨贯中,周志光,曾强聪.基于层次语义的URL排序方法研究[J].计算机工程与设计,2008,29(13):3365-3367. 被引量:1
二级参考文献8
-
1曾义聪,杨贯中,刘柯.基于概念树的主题爬取技术研究[J].科学技术与工程,2005,5(12):785-790. 被引量:3
-
2Menczer F, Pant G, Srinivasan P, et al. Evaluating topic-driven web crawlers[C]. Proceedings of the 24th annual international ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2001 :241-249.
-
3Diligenti M, Coetzee F M, Lawrence S, et al. Focused crawling using context graphs [C]. Proceedings of the 26th International Conference on Very Large Data Bases. Cairo: Morgan Kaufmann Publishers, 2000:527-534.
-
4Bergmark D, Lagoze C, Sbityakov A. Focused crawls, tunneling, and digital libraries[C]. Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries. London: Springer-Verlag, 2002:91-106.
-
5Ehrig M, Maedche A. Ontology-focused crawling of web documents[C]. Proceedings of the ACM symposium on Applied computing. New York: ACM Press, 2003:1174-1178.
-
6Ganesh S, Jayaraj M, SrinivasaMurthy V K, et al. Ontologybased web crawler[C]. Proceedings of Information Technology: Coding and Computing. Washington, DC: IEEE Computer Society, 2004:337-341.
-
7Edgington T, Choi B, Katherine Henson, et al. Adopting ontology to facilitate knowledge sharing[C]. New York: ACM Press, 2004:85-90.
-
8Cheng Jing, Li Qing, Wang Liping, et al. Automatically generating an e-textbook on the web [C]. Lecture Notes in Computer Science 3143. Berlin: Springer-Verlag Heidelberg, 2004:35-42.
-
1曾义聪.基于本体概念图的电子课本系统构造技术研究[J].计算机系统应用,2008,17(1):31-34.
-
2白秋产,金春霞,章慧,周海岩.词共现文本主题聚类算法[J].计算机工程与科学,2013,35(7):164-168. 被引量:13
-
3王诗碕,李伊潇,沈立炜,赵文耘.本体概念图的展示过程及技术实现[J].计算机科学,2015,42(12):87-91.
-
4朱青,吕晓旭.基于机器学习的HTML标题抽取[J].微计算机信息,2010,26(9):15-16. 被引量:4
-
5陈金梁,李青.基于本体的领域文档主题抽取方法研究[J].电脑开发与应用,2014,27(9):44-47.
-
6袁晓峰.基于词语相关度的文档主题抽取算法[J].成都大学学报(自然科学版),2012,31(4):367-369.
-
7沙丽华.基于RDF语义标注的领域文档主题描述方法研究[J].电子技术与软件工程,2015(13):196-197.
-
8金春霞,周海岩.位置加权文本聚类算法[J].计算机工程与科学,2011,33(6):154-158. 被引量:6
-
9张膂.基于LPAL模型的超文本分析[J].微型电脑应用,2016,32(3):77-80. 被引量:1
-
10刘俊,邹东升,邢欣来,李英豪.基于主题特征的关键词抽取[J].计算机应用研究,2012,29(11):4224-4227. 被引量:30