摘要
描述了一个微博热点检测系统。管理者通过它可以快速了解正在发生的或是已发生的微博热点事件。系统采用调用微博API接口与改进爬虫程序相结合的方式获取网页数据,由于网络数据量巨大,为了提高效率,还采用了网页清理技术。重点介绍了话题活性模型的方法,系统可以根据时间坐标快速寻找热点话题,提高了热点话题发现的效率,大大降低了热点话题发现的时间复杂度。
A micro-blog hot topics detecting system is described. System managers can quickly find the micro-blog hot events that are taking place or have occurred. The system uses a combination of calling micro-blog API interface and improving crawler program to get Web data. Due to the huge a- mount of network data, in order to improve efficiency, the system uses Web cleaning technology. Focusing on the method of the topic activity model, the system can quickly find the hot topic by the time coordinate, and improve efficiency of the hot topics detection and greatly reduce the time complexity of the hot topics detection.
出处
《电视技术》
北大核心
2013年第3期205-208,共4页
Video Engineering
基金
国家"863"计划项目(2012BAH38B05)