As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results a...As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.展开更多
In the light of the defect of web vulnerability detection system, combined with the characteristics of high efficient and sharing in the cloud environment, a design proposal is presented based on cloud environment, wh...In the light of the defect of web vulnerability detection system, combined with the characteristics of high efficient and sharing in the cloud environment, a design proposal is presented based on cloud environment, which analyses the key technology of gaining the URL, task allocation and scheduling and the design of attack detection. Experiment shows its feasibility and effectiveness in this paper.展开更多
Many countries are paying more and more attention to the protection of water resources at present,and how to protect water resources has received extensive attention from society.Water quality monitoring is the key wo...Many countries are paying more and more attention to the protection of water resources at present,and how to protect water resources has received extensive attention from society.Water quality monitoring is the key work to water resources protection.How to efficiently collect and analyze water quality monitoring data is an important aspect of water resources protection.In this paper,python programming tools and regular expressions were used to design a web crawler for the acquisition of water quality monitoring data from Global Freshwater Quality Database(GEMStat)sites,and the multi-thread parallelism was added to improve the efficiency in the process of downloading and parsing.In order to analyze and process the crawled water quality data,Pandas and Pyecharts are used to visualize the water quality data to show the intrinsic correlation and spatiotemporal relationship of the data.展开更多
Most real estate agents develop new objects by visiting unfamiliar clients, distributing leaflets, or browsing other real estate trading website platforms,whereas consumers often rely on websites to search and compar...Most real estate agents develop new objects by visiting unfamiliar clients, distributing leaflets, or browsing other real estate trading website platforms,whereas consumers often rely on websites to search and compare prices when purchasing real property. In addition to being time consuming, this search processrenders it difficult for agents and consumers to understand the status changes ofobjects. In this study, Python is used to write web crawler and image recognitionprograms to capture object information from the web pages of real estate agents;perform data screening, arranging, and cleaning;compare the text of real estateobject information;as well as integrate and use the convolutional neural networkof a deep learning algorithm to implement image recognition. In this study, dataare acquired from two business-to-consumer real estate agency networks, i.e., theSinyi real estate agent and the Yungching real estate agent, and one consumer-toconsumer real estate agency platform, i.e., the, FiveNineOne real estate agent. Theresults indicate that text mining can reveal the similarities and differences betweenthe objects, list the number of days that the object has been available for sale onthe website, and provide the price fluctuations and fluctuation times during thesales period. In addition, 213,325 object amplification images are used as a database for training using deep learning algorithms, and the maximum image recognition accuracy achieved is 95%. The dynamic recommendation system for realestate objects constructed by combining text mining and image recognition systems enables developers in the real estate industry to understand the differencesbetween their commodities and other businesses in approximately 2 min, as wellas rapidly determine developable objects via comparison results provided by thesystem. Meanwhile, consumers require less time in searching and comparingprices after they have understood the commodity dynamic information, therebyallowing them to use the most efficient approach to purchase real estate objectsof their interest.展开更多
The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created.However,challenges remain with connecting ...The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created.However,challenges remain with connecting individuals searching for geospatial data with servers and websites where such data exist.The objective of this paper is to present a publically available Geospatial Search Engine(GSE)that utilizes a web crawler built on top of the Google search engine in order to search the web for geospatial data.The crawler seeding mechanism combines search terms entered by users with predefined keywords that identify geospatial data services.A procedure runs daily to update map server layers and metadata,and to eliminate servers that go offline.The GSE supports Web Map Services,ArcGIS services,and websites that have geospatial data for download.We applied the GSE to search for all available geospatial services under these formats and provide search results including the spatial distribution of all obtained services.While enhancements to our GSE and to web crawler technology in general lie ahead,our work represents an important step toward realizing the potential of a publically accessible tool for discovering the global availability of geospatial data.展开更多
Desertification research plays a key role in the survival and development of all mankind.The Normalized Comprehensive Hotspots Index(NCH)is a comprehensive index that reveals the spatial distribution of research hotsp...Desertification research plays a key role in the survival and development of all mankind.The Normalized Comprehensive Hotspots Index(NCH)is a comprehensive index that reveals the spatial distribution of research hotspots in a given research field based on the number of relevant scientific papers.This study uses Web Crawler technology to retrieve the full text of all Chinese journal articles spanning the 1980s-2018 in the Chinese Academic Journal full-text database(CAJ)from CNKI.Based on the 253,055 articles on desertification that were retrieved,we have constructed a research hotspot extraction model for desertification in China by means of the NCH Index.This model can reveal the spatial distribution and dynamic changes of research hotspots for desertification in China.This analysis shows the following:1)The spatial distribution of research hotspots on desertification in China can be effectively described by the NCH Index,although its application in other fields still needs to be verified and optimized.2)According to the NCH Index,the research hotspots for desertification are mainly distributed in the Agro-Pastoral Ecotone and grassland in Inner Mongolia,the desertification areas of Qaidam Basin in the Western Alpine Zone and the Oasis-Desert Ecotone in Xinjiang(including the extension of the central Tarim Basin to the foothills of the Kunlun Mountains,the sporadic areas around the Tianshan Mountains and the former hilly belt of the southern foothills of the Altai Mountains).Among these three,the Agro-Pastoral Ecotone in the middle and eastern part of Inner Mongolia includes the most prominent hotspots in the study of desertification.3)Since the 1980s,the research hotspots for desertification in China have shown a general downward trend,with a significant decline in 219 counties(10.37%of the study area).This trend is dominated by the projects carried out since 2002.The governance of desertification in the eastern part of the Inner Mongolia-Greater Khingan Range still needs to be strengthened.The distribution of desertification climate types reflects the distribution of desertification in a given region to some extent.The Normalized Comprehensive Hotspots Index provides a new approach for researchers in different fields to analyze research progress.展开更多
文摘As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.
文摘In the light of the defect of web vulnerability detection system, combined with the characteristics of high efficient and sharing in the cloud environment, a design proposal is presented based on cloud environment, which analyses the key technology of gaining the URL, task allocation and scheduling and the design of attack detection. Experiment shows its feasibility and effectiveness in this paper.
基金This research was funded by the National Natural Science Foundation of China(No.51775185)Scientific Research Fund of Hunan Province Education Department(18C0003)+2 种基金Research project on teaching reform in colleges and universities of Hunan Province Education Department(20190147)Innovation and Entrepreneurship Training Program for College Students in Hunan Province(2021-1980)Hunan Normal University University-Industry Cooperation.This work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province,Open project,Grant Number 20181901CRP04.
文摘Many countries are paying more and more attention to the protection of water resources at present,and how to protect water resources has received extensive attention from society.Water quality monitoring is the key work to water resources protection.How to efficiently collect and analyze water quality monitoring data is an important aspect of water resources protection.In this paper,python programming tools and regular expressions were used to design a web crawler for the acquisition of water quality monitoring data from Global Freshwater Quality Database(GEMStat)sites,and the multi-thread parallelism was added to improve the efficiency in the process of downloading and parsing.In order to analyze and process the crawled water quality data,Pandas and Pyecharts are used to visualize the water quality data to show the intrinsic correlation and spatiotemporal relationship of the data.
基金supported by the National Natural Science Foundation of China(No.51268054 and No.51468061)Natural Science Foundation of Tianjin,China(No.13JCQNJC07300)
文摘Most real estate agents develop new objects by visiting unfamiliar clients, distributing leaflets, or browsing other real estate trading website platforms,whereas consumers often rely on websites to search and compare prices when purchasing real property. In addition to being time consuming, this search processrenders it difficult for agents and consumers to understand the status changes ofobjects. In this study, Python is used to write web crawler and image recognitionprograms to capture object information from the web pages of real estate agents;perform data screening, arranging, and cleaning;compare the text of real estateobject information;as well as integrate and use the convolutional neural networkof a deep learning algorithm to implement image recognition. In this study, dataare acquired from two business-to-consumer real estate agency networks, i.e., theSinyi real estate agent and the Yungching real estate agent, and one consumer-toconsumer real estate agency platform, i.e., the, FiveNineOne real estate agent. Theresults indicate that text mining can reveal the similarities and differences betweenthe objects, list the number of days that the object has been available for sale onthe website, and provide the price fluctuations and fluctuation times during thesales period. In addition, 213,325 object amplification images are used as a database for training using deep learning algorithms, and the maximum image recognition accuracy achieved is 95%. The dynamic recommendation system for realestate objects constructed by combining text mining and image recognition systems enables developers in the real estate industry to understand the differencesbetween their commodities and other businesses in approximately 2 min, as wellas rapidly determine developable objects via comparison results provided by thesystem. Meanwhile, consumers require less time in searching and comparingprices after they have understood the commodity dynamic information, therebyallowing them to use the most efficient approach to purchase real estate objectsof their interest.
文摘The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created.However,challenges remain with connecting individuals searching for geospatial data with servers and websites where such data exist.The objective of this paper is to present a publically available Geospatial Search Engine(GSE)that utilizes a web crawler built on top of the Google search engine in order to search the web for geospatial data.The crawler seeding mechanism combines search terms entered by users with predefined keywords that identify geospatial data services.A procedure runs daily to update map server layers and metadata,and to eliminate servers that go offline.The GSE supports Web Map Services,ArcGIS services,and websites that have geospatial data for download.We applied the GSE to search for all available geospatial services under these formats and provide search results including the spatial distribution of all obtained services.While enhancements to our GSE and to web crawler technology in general lie ahead,our work represents an important step toward realizing the potential of a publically accessible tool for discovering the global availability of geospatial data.
基金The National Key Research and Development Program of China(2016YFC0503701,2016YFB0501502)The Strategic Priority Research Program of Chinese Academy of Sciences(XDA19040301,XDA20010202,XDA23100201)The Key Project of the High Resolution Earth Observation System in China(00-Y30B14-9001-14/16)
文摘Desertification research plays a key role in the survival and development of all mankind.The Normalized Comprehensive Hotspots Index(NCH)is a comprehensive index that reveals the spatial distribution of research hotspots in a given research field based on the number of relevant scientific papers.This study uses Web Crawler technology to retrieve the full text of all Chinese journal articles spanning the 1980s-2018 in the Chinese Academic Journal full-text database(CAJ)from CNKI.Based on the 253,055 articles on desertification that were retrieved,we have constructed a research hotspot extraction model for desertification in China by means of the NCH Index.This model can reveal the spatial distribution and dynamic changes of research hotspots for desertification in China.This analysis shows the following:1)The spatial distribution of research hotspots on desertification in China can be effectively described by the NCH Index,although its application in other fields still needs to be verified and optimized.2)According to the NCH Index,the research hotspots for desertification are mainly distributed in the Agro-Pastoral Ecotone and grassland in Inner Mongolia,the desertification areas of Qaidam Basin in the Western Alpine Zone and the Oasis-Desert Ecotone in Xinjiang(including the extension of the central Tarim Basin to the foothills of the Kunlun Mountains,the sporadic areas around the Tianshan Mountains and the former hilly belt of the southern foothills of the Altai Mountains).Among these three,the Agro-Pastoral Ecotone in the middle and eastern part of Inner Mongolia includes the most prominent hotspots in the study of desertification.3)Since the 1980s,the research hotspots for desertification in China have shown a general downward trend,with a significant decline in 219 counties(10.37%of the study area).This trend is dominated by the projects carried out since 2002.The governance of desertification in the eastern part of the Inner Mongolia-Greater Khingan Range still needs to be strengthened.The distribution of desertification climate types reflects the distribution of desertification in a given region to some extent.The Normalized Comprehensive Hotspots Index provides a new approach for researchers in different fields to analyze research progress.