
基于语义分析的微博热点话题发现技术研究 被引量:3

Research of micro-blog's hot topic detection technology based on semantic analysis
摘要 近年来,微博热点话题发现已经成为当前网络舆情分析研究的热点.本文针对微博信息的碎片化、口语化等短文本特点,为解决向量空间模型(VSM)文本表示方法存在高维度、稀疏,及同义多义等问题,采用潜在语义分析法对微博信息进行建模,再通过贝叶斯分类算法实现话题发现.并采用J2EE开发包及Eclipse集成开发环境,结合Hibernate,Lucene等技术实现了微博热点话题发现系统,实验表明这种方法是有效的. The hot topics of micro-blog detecting has become the current research focuses of Internet public opinion information. In order to solve the existing problems of high-dimension, sparse, synonymy and polysemy from the Vector Space Model (VSM) text presentation, the micro-blog information model was developed using LSA for the short texts of the fragment, colloquial micro blog informa- tion,then the topic detection was achieved through the Bayesian classification algorithm. Furthermore, the micro blog topic detecting system was constructed by adopting software developer's kit J2EE, the integrated development environment Eclipse and techniques such as Hibernate and Lucene, and the operation of the system was proved to be effective.
作者 柏建普 田芳
出处 《内蒙古科技大学学报》 CAS 2013年第3期283-286,共4页 Journal of Inner Mongolia University of Science and Technology
关键词 语义分析微博 热点 话题发现 semantic analysis micro blogs hot topics topic detection
  • 相关文献


  • 1Allan J.Carbonell J,Doddington G,et al. Topicdetection andtracking pilot study : Finalreport [ A ]. In : proceedings of theDARPA broadcast news transcription and understandingworkshop[C]. Virginia : Lansdowiie, Febmary 1998 : 194-218.
  • 2杨少华,林海略,韩燕波.针对模板生成网页的一种数据自动抽取方法(英文)[J].软件学报,2008,19(2):209-223. 被引量:45
  • 3朱望斌.自动文本分类算法研究[D].长沙:湖南大学,2006.


  • 1Chang CH, Kayed M, Girgis MR, Shaalan K. A survey of Web information extraction systems. IEEE Trans. on Knowledge and Data Engineering, 2006,18(10): 1411-1428.
  • 2Gold ME. Language identification in the limit. Information and Control, 1967,10(5):447-474.
  • 3Laender AHF, Ribeiro-Neto BA, da Silva AD, Teixeira JS. A brief survey of Web data extraction tools. SIGMOD Record, 2002,31 (2):84-93.
  • 4Arasu A, Hector GM. Extracting structured data from Web pages. In: Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. San Diego: ACM Press, 2003. 337-348.
  • 5EXALG datasets, http://infolab.stanford.edu/-arvind/extract/
  • 6TBDW v1.02, http://daisen.cc.kyushu-u.ac.jp/TBDW/testbed/
  • 7Zhao HK, Meng WY, Wu ZH, Raghavan V, Yu C. Fully automatic wrapper generation for search engines. In: Proc. of the 14th Int'l Conf. on World Wide Web (WWW 2005). Chiba: ACM Press, 2005.66-75.
  • 8Simon K, Lausen G. VIPER: Augmenting automatic information extraction with visual perceptions. In: Proc. of the ACM CIKM Int'l Conf. on Information and Knowledge Management. Bremen: ACM Press, 2005. 381-388.
  • 9Crescenzi V, Mecca G, Meraldo P. RoadRunner: Towards automatic data extraction from large Web sites. In: Proc. of the 27th Int'l Conf. on Very Large Data Bases (VLDB 2001). Roma: Morgan Kaufmann Publishers, 2001. 109-118.
  • 10Wang JY, Lochovsky FH. Data extraction and label assignment for Web databases. In: Proc. of the 12th Int'l World Wide Web Conf. (WWW 2003). Budapest: ACM Press, 2003. 187-196.












使用帮助 返回顶部