摘要
通过对BBS话题模型、话题相似度、话题检测评价标准以及话题趋势的分析和研究,提出了基于内容分析的中文BBS话题检测算法:通过爬虫获取BBS信息;采用基于URL和Xpath的网页模板处理BBS信息;应用ICTLAS实现BBS信息的分词;采用Carrot2对BBS话题进行聚类,基于功率谱的热点话题分析以及基于时间序列的话题预测。最后,通过采用J2EE开发包及Eclipse集成开发环境,结合Hibernate、GWT等技术实现了中文BBS话题检测系统,并在多个BBS论坛上进行了测试,取得了良好的效果。
Through analyzing and studying BBS topic model,topic similarity,topic assessment standard and topic development trend,the paper puts forward a content analysis based Chinese BBS topic detection algorithm,including obtaining BBS information by web crawlers,processing BBS information with URL and Xpath based webpage templates,realizing BBS information participles by ICTLAS,clustering BBS topics by Carrot2,analyzing hot topics based on the power spectrum and predicting topics based on time sequences.Finally a Chinese BBS topic detection system is realized by applying J2EE SDK and Eclipse IDE as well as combining such technologies as Hibernate and GWT etc.A number of tests have been performed on multiple BBS;all have achieved fine results.
出处
《计算机应用与软件》
CSCD
2011年第6期242-246,共5页
Computer Applications and Software
基金
深圳市科技计划项目资助课题(07KJce140)