期刊文献+

基于电子政务平台查询关键词共现多维可视化聚类分析研究 被引量:6

Clustering and Multidimensional Visualization of Co-occurrence Query Keywords in e-Government Platform
下载PDF
导出
摘要 针对某政府网站某一时间段的服务器日志中抽取出的搜索引擎查询信息,提出了一系列规则来遴选出有代表性的核心查询词,并分别针对每个核心查询词进行共现与可视化聚类分析,创建基于共现频率的相似矩阵,采用非计量MDS算法导出三维可视化聚类图,并且采用基于瓦兹算法(Wards method)的层次聚类法验证了MDS算法三维可视化聚类结果的正确性、有效性与优越性.同时,我们针对日志的特点开发了适合本研究的一系列分析工具,从而能够帮助我们对同类网站、不同结构的日志信息进行挖掘、提取、选择和加工,并利用统计分析工具对加工结果进行可视化聚类分析和比较研究.实验结果表明,本分析方法充分发挥了MDS分析方法与各种向量空间聚类计算优点,能更好地观察对象间的聚类样式、形状以及距离,能够为构建基于主题图的政府电子政务平台优化研究提供理论方法和实证依据. Aiming at the extraction and selection of a particular section of log file from a particular e-Government website server, we get the search engine query keywords, we have presented a series of methods to generate the core- searching words form the log by analyzing the co-occurrences matrix of these queries. We a/so developed a series of application tools for extraction, selection and processing the queries in order to make it effective. We use these multidimensional visualization results to compare with the hierarchical clustering results of Ward' s method, the result shows our results are correct and effective. It proved that the result can give scope to its advantages for clustering calculation in vector space with customizing form, shape and distance. Also, it provides theoretical and experimental support for our research on e-Government website optimization with Topic maps.
出处 《情报学报》 CSSCI 北大核心 2012年第4期352-361,共10页 Journal of the China Society for Scientific and Technical Information
基金 本文系国家自然科学基金项目"基于主题图的电子政务门户知识组织与整合方法研究"(项目编号:70873050)和教育部"新世纪优秀人才支持计划(NCET)资助"(项目编号:NCET-08-0788)的研究成果之一.
关键词 电子政务门户 搜索引擎 查询词 聚类 MDS 相似度矩阵 e-Government, search engine, query key words, clustering, MDS, similarity matrix
  • 相关文献

参考文献19

  • 1夏立新.加强政府信息资源的组织与整合研究,推进电子政务信息门户建设[J].图书情报工作,2010,54(8):10-10. 被引量:3
  • 2余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量:117
  • 3陈磊,刘奕群,茹立云,马少平.基于用户日志挖掘的搜索引擎广告效果分析[J].中文信息学报,2008,22(6):92-97. 被引量:16
  • 4Li Z C, Zhang M, Ma S P. Mining Synonymous Entities using Search Engine Query Logs [ J ]. Journal of Computational Information Systems, 2009, 5 ( 3 ): 1217-1224.
  • 5Zhou B, Ma S P, Ru L Spelling Correction for C Journal of Computational (3) : 1225-1234. Y. Log-Mining Based Query hinese Search Engines [ J ]. Information Systems, 2009, 5.
  • 6王继民,彭波.搜索引擎用户点击行为分析[J].情报学报,2006,25(2):154-162. 被引量:45
  • 7Ross N, Wolfram D. End-user searching on the Internet- An analysis of term pair topics submitted to the Excite search [ J ]. Journal of the American Society for Information Science ,2000,51 ( 10 ) :949-958.
  • 8Beitzel S, Jensen E, Chowdhury A, et al. Temporal anal- ysis of a very large topically categorized Web query log [ J ]. Journal of the American Society for Information Science and Technology, 2007,58 ( 2 ) : 166-178.
  • 9Shi X D, Yang C. Mining related queries from Web search engine query logs using an improved association rule mining model [ J ]. Journal of the American Society for Information Science and Technology,2007,58 ( 12 ) : 1871 - 1883.
  • 10Huang C K, Feng L, Oyang C Y J. Relevant term suggestion in interactive Web Search based on contextual information in query session logs [ J ]. Journal of the American Society for Information Science and Technology ,2003,54 ( 7 ) :638-649.

二级参考文献35

  • 1王建勇,单松巍,雷鸣,谢正茂,李晓明.Web search engine:characteristics of user behaviors and their implication[J].Science in China(Series F),2001,44(5):351-365. 被引量:4
  • 2余慧佳,刘奕群,张敏,等.基于大规模日志分析的网络搜索引擎用户行为研究[C]//第三届学生计算语言学研讨会.沈阳:[出版者不详],2006.
  • 3赛迪网.2007中国搜索引擎市场研究专题报告[OL].[2007-11].http://www.sowang.com/news/200711161.htm.
  • 4中国互联网络信息中心.第21次中国互联网络发展状况统计报告[OL].[200801].http://www.cnnic.net.cn/uploadfiles/doc/2008/1/17/104126.doc.
  • 5Animesh A, Vandana R, Siva V. An Empirical Investigation of the Performance of Online Sponsored Search Markets[C]//ICEC'07, 2007: 153-160.
  • 6Anindya G, Sha Y. An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising [R]. NET Institute Working Paper, 2007: 7-35.
  • 7Bilenko, M. and White, R. W. Mining the search trails of surfing crowds., identifying relevant websites from user activity[C]//Proceeding of the 17th interna tional Conference on World Wide Web (Beijing, Chi na, April 21-25, 2008). WWW '08. ACM, New York, NY: 51-60.
  • 8Bernard J. The Comparative Effectiveness of Sponsored and Nonsponsored Links for Web E commerce Queries[J]. ACM Transactions on the Web, 2007, Vol. 1, Article 3.
  • 9中国互联网络信息中心 (China Internet Network Information Center,CNNIC),http://www.cnnic.net.cn/
  • 10Baldi P,Frasconi P,Smyth P.Modeling the Internet and the Web,probabilistic methods and algorithms.England:John Wiley,2003

共引文献167

同被引文献171

引证文献6

二级引证文献79

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部