As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results a...As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.展开更多
Many countries are paying more and more attention to the protection of water resources at present,and how to protect water resources has received extensive attention from society.Water quality monitoring is the key wo...Many countries are paying more and more attention to the protection of water resources at present,and how to protect water resources has received extensive attention from society.Water quality monitoring is the key work to water resources protection.How to efficiently collect and analyze water quality monitoring data is an important aspect of water resources protection.In this paper,python programming tools and regular expressions were used to design a web crawler for the acquisition of water quality monitoring data from Global Freshwater Quality Database(GEMStat)sites,and the multi-thread parallelism was added to improve the efficiency in the process of downloading and parsing.In order to analyze and process the crawled water quality data,Pandas and Pyecharts are used to visualize the water quality data to show the intrinsic correlation and spatiotemporal relationship of the data.展开更多
数据采集与网络爬虫具有应用范围广、实用性强等特点,学生普遍学习兴趣较高。但传统教学模式侧重知识传授,难以满足社会对大数据人才的需求。成果导向教育(Outcome based education,OBE)理念强调学习成果,这与人才能力导向需求相吻合。...数据采集与网络爬虫具有应用范围广、实用性强等特点,学生普遍学习兴趣较高。但传统教学模式侧重知识传授,难以满足社会对大数据人才的需求。成果导向教育(Outcome based education,OBE)理念强调学习成果,这与人才能力导向需求相吻合。计划–执行–检查–行动(Plan-Do-Check-Act,PDCA)循环是全面质量管理遵循的科学程序,它构成了持续改进的基本方法和框架。本文将OBE理念与PDCA循环相融合,以学生为核心,以成果为导向,以问题为抓手,对数据采集与网络爬虫课程中的教学设计、教学实施、教学评价和教学反思四个过程进行了创新设计,期望实现课程质量的持续改进,培养学生的编程能力、自主学习能力及分析解决问题的能力。展开更多
文摘As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.
基金This research was funded by the National Natural Science Foundation of China(No.51775185)Scientific Research Fund of Hunan Province Education Department(18C0003)+2 种基金Research project on teaching reform in colleges and universities of Hunan Province Education Department(20190147)Innovation and Entrepreneurship Training Program for College Students in Hunan Province(2021-1980)Hunan Normal University University-Industry Cooperation.This work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province,Open project,Grant Number 20181901CRP04.
文摘Many countries are paying more and more attention to the protection of water resources at present,and how to protect water resources has received extensive attention from society.Water quality monitoring is the key work to water resources protection.How to efficiently collect and analyze water quality monitoring data is an important aspect of water resources protection.In this paper,python programming tools and regular expressions were used to design a web crawler for the acquisition of water quality monitoring data from Global Freshwater Quality Database(GEMStat)sites,and the multi-thread parallelism was added to improve the efficiency in the process of downloading and parsing.In order to analyze and process the crawled water quality data,Pandas and Pyecharts are used to visualize the water quality data to show the intrinsic correlation and spatiotemporal relationship of the data.
文摘数据采集与网络爬虫具有应用范围广、实用性强等特点,学生普遍学习兴趣较高。但传统教学模式侧重知识传授,难以满足社会对大数据人才的需求。成果导向教育(Outcome based education,OBE)理念强调学习成果,这与人才能力导向需求相吻合。计划–执行–检查–行动(Plan-Do-Check-Act,PDCA)循环是全面质量管理遵循的科学程序,它构成了持续改进的基本方法和框架。本文将OBE理念与PDCA循环相融合,以学生为核心,以成果为导向,以问题为抓手,对数据采集与网络爬虫课程中的教学设计、教学实施、教学评价和教学反思四个过程进行了创新设计,期望实现课程质量的持续改进,培养学生的编程能力、自主学习能力及分析解决问题的能力。