摘要
提出一种基于LDA模型(latent Dirichlet allocation model)与主题知识库相结合的网络日志内容属性标注方法。IP知识库的建立首先需要对采集的网络日志进行数据预处理;然后基于统计学原理标注网络日志的时间类属性,利用IP地址库映射方法提取网络日志的地域类属性;最后采用一种基于LDA模型与主题知识库相结合的标注方法对网络日志的内容类属性进行挖掘。结果表明,该方法原理正确,对网络日志属性的挖掘具有较好的效果。
This paper proposed a method for labelling the content attribute of Web log based on latent Dirichlet allocation model and the theme knowledge base of the combination. Set up the IP address knowledge base depended on the Web log data preprocessing、labelling the time attribute of Web log based on principle of statistics、labelling the region attribute of Web log according to IP address base mapping method,further more,labelling the content attribute of Web log rely on the latent dirichlet allocation model and the theme knowledge base of the combination to data mining. The experimental results indicate that the principle of the method is correct and it has a better effect in Web log attributes mining,eventually set up the IP knowledge base.
出处
《计算机应用研究》
CSCD
北大核心
2017年第5期1410-1414,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61370139)
北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519)
北京市教委专项基金资助项目(PXM2016_014224_000067)