基于LDA模型的网页采集算法设计研究

Design of web collection algorithm based on LDA model

下载PDF

导出

摘要根据网页动态内容提出了一种具体采集方法,利用基于关联的LDA(Latent Dirichlet Allocation)算法,设计了一个基于内容的网页动态内容采集算法。本算法能为网页中的动态内容提供自动注释,并且利用了动态内容和文本内容之间的语义关系。基于关联的LDA提供了概念级匹配,来建立文本和网页动态内容之间的对应关系,以达到更高检索精确度。实验结果表明,与基于SVM方法相比,本算法具有较高的精确度和召回率。 The popular web-based dynamic content collection algorithms are based on user＇s sparselabels. In this paper, a very specific collection method is proposed based on the dynamic content ofweb. A content - based dynamic web content collection algorithm is designed by using the LatentDirichlet Allocation （ LDA） algorithm. The proposed algorithm provides automatic annotation of thedynamic content in web pages and makes use of the semantic relationship between dynamic content andtextual content. Based on the associated LDA provides a concept-level matching to establish the corre-spondence between the text and the dynamic content of the web page, in order to achieve higherretrieval accuracy. Experimental results show that compared with the SVM-based method, the proposedalgorithm has higher accuracy and recall.

作者胡六四 HU Liu-Si(College of Software,Anhui Vocational College of Electronics ＆ InformationTechnology,Bengbu 233000,China)

机构地区安徽电子信息职业技术学院软件学院

出处《大庆师范学院学报》 2018年第6期55-58,共4页 Journal of Daqing Normal University

关键词 LDA 网页采集动态内容 LDA Web collection Dynamic content

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1赵以昕,李铮,汪强兵.基于改进LDA模型的图书推荐方法研究[J].情报工程,2018,4(5):83-95.
2刘妍君,范玉.基于关联交易分析的上市公司大股东掏空行为研究--以紫鑫药业为例[J].行政事业资产与财务,2018(22):15-16. 被引量：1
3李健,马延周.支持DOM模板可视化配置的网页抽取方法[J].现代计算机,2018,24(7):56-60. 被引量：4
4何跃,丰月,赵书朋,马玉凤.基于知乎问答社区的内容推荐研究——以物流话题为例[J].数据分析与知识发现,2018,2(9):42-49. 被引量：7
5陈飞跃,涂亚东,殷华锋,杜娇,徐小兵.新型智能搬运小车系统设计[J].组合机床与自动化加工技术,2018(11):71-73. 被引量：6
6徐艳华,苗雨洁,苗琳,吕学强.基于LDA模型的HSK作文生成[J].数据分析与知识发现,2018,2(9):80-87. 被引量：1
7周雯,战丽彬.基于关联规则的怒所致消渴病的证-症-药规律探析[J].中华中医药杂志,2018,33(11):4883-4886. 被引量：5
8王雪莲,葛宏伟,孙亮.基于典型相关子空间与K邻近的图像自动注释算法[J].鲁东大学学报（自然科学版）,2018,34(2):97-104.
9潘智勇,刘扬,刘国军,郭茂祖.基于空间主题模型和结构特征的对象识别方法研究[J].智能计算机与应用,2018,8(6):186-190. 被引量：2
10覃运初,罗富贵.遥感激光图像的移动特征定位跟踪方法[J].激光杂志,2018,39(10):124-128.

大庆师范学院学报

2018年第6期

浏览历史

内容加载中请稍等...

基于LDA模型的网页采集算法设计研究

相关作者

相关机构

相关主题

浏览历史