期刊文献+

基于全切分获取网络流行语方法研究 被引量:2

Study on popular words and phrases extraction of network based on omni-segmentation
下载PDF
导出
摘要 利用统计和规则相结合的算法从互联网的动态信息流中提取网络流行语。在利用全切分算法获取候选词集的基础上,依次对候选词集进行三次过滤:首先基于向量空间模型的权重过滤,运用语言模型进行过滤;然后利用垃圾串过滤规则获取网络流行词语候选词集;最后利用提出的流行词语评分模型进行筛选得到网络流行词语。实验表明,在不影响流行词语准确率的前提下,利用该方法自动获取网络流行词语的速度明显提高。 This paper aimed to extract popular words and phrases of network by specific algorithm. It filtrated the candidate words set three times based on the algorithm of omni-segmentation. The first was the weight filtration based on the vector space model , then used the model of language regulation , and the last through the filtration of rubbish cluster. Finally, it mined the popular words and phrases from the candidate set by the popular words determinant formula. The experimentation indicates that without reducing the correct rate of catchwords, the speed of extacting, the popular words and phrases of network impoves distinctly.
出处 《计算机应用研究》 CSCD 北大核心 2009年第4期1260-1262,1285,共4页 Application Research of Computers
基金 国家自然科学基金资助项目(60673040) 国家社会科学基金资助项目(06BYY029) 国家教育部科学技术研究重点项目(105117) 湖北省自然科学基金资助项目(2006ABC011) 国家"973"计划重点基础研究发展项目(2007CB310804)
关键词 网络流行词语 中文信息处理 全切分 popular words and phrases of network Chinese information processing omni-segmentation
  • 相关文献

参考文献12

二级参考文献22

共引文献146

同被引文献22

  • 1费洪晓,康松林,朱小娟,谢文彪.基于词频统计的中文分词的研究[J].计算机工程与应用,2005,41(7):67-68. 被引量:68
  • 2黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量:249
  • 3Bian Guo-Wei, Chen Hsin-His. A new hybrid approach for Chinese-English query translation[ C ]//Proceedings of the First Asia Digital Library Workshop. 1998:156-167.
  • 4Wu Z, Tseng G. ACTS: An automatic Chinese text seg- mentation system for full text retrieval [ J ]. Journal of the American Society for Information Sciences and Technology, 1995,46(2) :83-96.
  • 5Wong Kam-Fai, Li Wenjie. Intelligent Chinese information retrieval-Why is it so difficult? [ C]// Proceedings of the First Asia Digital Library Workshop. 1998:47-56.
  • 6Su Keh-Yih, Chiang Tung-Hui, Chang Jing-Shin. An over- view of corpus-based statistics oriented(CBSO) techniques for natural language processing[J]. Computational Linguis- tics and Chinese Language Processing, 1996,1 ( 1 ) : 101- 157.
  • 7Chien Lee-Feng. PAT-tree-based adaptive keyphrase extrac- tion for intelligent Chinese information retrieval [ J ]. Infor- mation Processing and Management, Elsevier Press, 1999, 35 (4) :501-521.
  • 8Chien Lee-Feng. PAT-tree-based keyword extraction for Chi- nese information retrieval [ C ]/! Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1997:50-58.
  • 9Knuth D E. The Art of Computer Programming: Sorting and searching, Vol 3 [ M ]. Addison-Wesley, Mass, 1973.
  • 10Morrison D R. PATRICIA-Pratical algorithm to retrieve in- formation coded in alphanumeric [ J ]. Journal of the Asso-ciation for Computing Machinery, 1968,15 (4) :514-534.

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部