摘要
This article presents the formal definition and description of popular topics on the Internet,analyzes the relationship between popular words and topics,and finally introduces a method that uses statistics and correlation of the popular words in traffic content and network flow characteristics as input for extracting popular topics on the Internet.Based on this,this article adapts a clustering algorithm to extract popular topics and gives formalized results.The test results show that this method has an accuracy of 16.7%in extracting popular topics on the Internet.Compared with web mining and topic detection and tracking(TDT),it can provide a more suitable data source for effective recovery of Internet public opinions.
基金
was supported by the National Natural Science Foundation of China (Grant No.60574087)
the Hi-Tech Research and Development Program of China (2007AA01Z475,2007AA01Z480,2007A-A01Z464)
the 111 International Collaboration Program of China.