期刊文献+

基于LDA模型的网页采集算法设计研究

Design of web collection algorithm based on LDA model
下载PDF
导出
摘要 根据网页动态内容提出了一种具体采集方法,利用基于关联的LDA(Latent Dirichlet Allocation)算法,设计了一个基于内容的网页动态内容采集算法。本算法能为网页中的动态内容提供自动注释,并且利用了动态内容和文本内容之间的语义关系。基于关联的LDA提供了概念级匹配,来建立文本和网页动态内容之间的对应关系,以达到更高检索精确度。实验结果表明,与基于SVM方法相比,本算法具有较高的精确度和召回率。 The popular web-based dynamic content collection algorithms are based on user's sparselabels. In this paper, a very specific collection method is proposed based on the dynamic content ofweb. A content - based dynamic web content collection algorithm is designed by using the LatentDirichlet Allocation ( LDA) algorithm. The proposed algorithm provides automatic annotation of thedynamic content in web pages and makes use of the semantic relationship between dynamic content andtextual content. Based on the associated LDA provides a concept-level matching to establish the corre-spondence between the text and the dynamic content of the web page, in order to achieve higherretrieval accuracy. Experimental results show that compared with the SVM-based method, the proposedalgorithm has higher accuracy and recall.
作者 胡六四 HU Liu-Si(College of Software,Anhui Vocational College of Electronics & InformationTechnology,Bengbu 233000,China)
出处 《大庆师范学院学报》 2018年第6期55-58,共4页 Journal of Daqing Normal University
关键词 LDA 网页采集 动态内容 LDA Web collection Dynamic content
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部