期刊文献+

基于互联网多语种分布情况研究与分析 被引量:2

THE RESEARCH AND ANALYSES OF INTERNET-BASED MULTILINGUAL DISTRIBUTION
下载PDF
导出
摘要 提出了一种互联网上语种识别和多语种分布统计的方法。方法针对各语种文字被使用频率不同的特点给出高频字定义,以高频字作为关键字进行网页查询和网页语种识别,然后引用概率论中加法公式的推广公式统计互联网上各语种网页分布情况,并结合高频字被使用频率对各语种文字分布情况做出进一步统计。实验方法和数据结果对计算机工作者全面了解互联网特征提供参考。 A way for Internet-based multilingual identifying is introduced. The high-frequency words are defined for different languages, and these high-frequency words are used to query and identify the Web page languages. The distributions of multilingual Web pages and multilingual texts bytes are computed. The techniques and the results of the experiments help the computer operators to know the characteristics of the Internet well.
作者 张芳 李芳
出处 《计算机应用与软件》 CSCD 北大核心 2007年第9期137-140,共4页 Computer Applications and Software
关键词 高频字 搜索引擎 网页数量 网页份额 High-frequency words Search engine Web page amounts Web page quotient
  • 相关文献

参考文献6

  • 1Xu J L.Multilingual Search on the World Wide Web.In Proceedings of the Hawaii International Conference on System Sciences HICSS-33,Maui,Hawaii,January 2000.
  • 2Gregory Grefenstette,Julien Nioche.Estimation of English and non-English Language Use on the WWW,Xerox Research Centre Europe,2000.
  • 3Language Distribution in AlltheWeb,Available from all-the-Web.com for June 2002,prepared by Takagi and Fredric Gey,June 2002.
  • 4Language Distribution in AlltheWeb,Available from all-the-Web.com for July 2003,prepared by Takagi and Fredric Gey,July 2003.
  • 5Adam Kilgarriff,Gregory Grefenstette.Introduction to the Special Issue on the Web as Corpus,Lexicography MasterClass Ltd.and ITRI University of Brighton & Clairvoyance Corporation,2003.
  • 6Fredric Gey.MULTI-LINGUAL INFORMATION ACCESS.Tutorial at ECDL-2004,September 12,2004.

同被引文献7

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部