As the tsunami of data has emerged,search engines have become the most powerful tool for obtaining scattered information on the internet.The traditional search engines return the organized results by using ranking alg...As the tsunami of data has emerged,search engines have become the most powerful tool for obtaining scattered information on the internet.The traditional search engines return the organized results by using ranking algorithm such as term frequency,link analysis(PageRank algorithm and HITS algorithm)etc.However,these algorithms must combine the keyword frequency to determine the relevance between user’s query and the data in the computer system or internet.Moreover,we expect the search engines could understand users’searching by content meanings rather than literal strings.Semantic Web is an intelligent network and it could understand human’s language more semantically and make the communication easier between human and computers.But,the current technology for the semantic search is hard to apply.Because some meta data should be annotated to each web pages,then the search engine will have the ability to understand the users intend.However,annotate every web page is very time-consuming and leads to inefficiency.So,this study designed an ontology-based approach to improve the current traditional keyword-based search and emulate the effects of semantic search.And let the search engine can understand users more semantically when it gets the knowledge.展开更多
As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results a...As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.展开更多
When conducting a literature review,medical authors typically search for relevant keywords in bibliographic databases or on search engines like Google.After selecting the most pertinent article based on the title’s r...When conducting a literature review,medical authors typically search for relevant keywords in bibliographic databases or on search engines like Google.After selecting the most pertinent article based on the title’s relevance and the abstract’s content,they download or purchase the article and cite it in their manuscript.Three major elements influence whether an article will be cited in future manuscripts:the keywords,the title,and the abstract.This indicates that these elements are the“key dissemination tools”for research papers.If these three elements are not determined judiciously by authors,it may adversely affect the manuscript’s retrievability,readability,and citation index,which can negatively impact both the author and the journal.In this article,we share our informed perspective on writing strategies to enhance the searchability and citation of medical articles.These strategies are adopted from the principles of search engine optimization,but they do not aim to cheat or manipulate the search engine.Instead,they adopt a reader-centric content writing methodology that targets well-researched keywords to the readers who are searching for them.Reputable journals,such as Nature and the British Medical Journal,emphasize“online searchability”in their author guidelines.We hope that this article will encourage medical authors to approach manuscript drafting from the perspective of“looking inside-out.”In other words,they should not only draft manuscripts around what they want to convey to fellow researchers but also integrate what the readers want to discover.It is a call-to-action to better understand and engage search engine algorithms,so they yield information in a desired and self-learning manner because the“Cloud”is the new stakeholder.展开更多
Surveillance is an essential work on infectious diseases prevention and control.When the pandemic occurred,the inadequacy of traditional surveillance was exposed,but it also provided a valuable opportunity to explore ...Surveillance is an essential work on infectious diseases prevention and control.When the pandemic occurred,the inadequacy of traditional surveillance was exposed,but it also provided a valuable opportunity to explore new surveillance methods.This study aimed to estimate the transmission dynamics and epidemic curve of severe acute respiratory syndrome coronavirus 2(SARS-Co V-2)Omicron BF.7 in Beijing under the emergent situation using Baidu index and influenza-like illness(ILI)surveillance.A novel hybrid model(multiattention bidirectional gated recurrent unit(MABG)-susceptible-exposed-infected-removed(SEIR))was developed,which leveraged a deep learning algorithm(MABG)to scrutinize the past records of ILI occurrences and the Baidu index of diverse symptoms such as fever,pyrexia,cough,sore throat,anti-fever medicine,and runny nose.By considering the current Baidu index and the correlation between ILI cases and coronavirus disease 2019(COVID-19)cases,a transmission dynamics model(SEIR)was formulated to estimate the transmission dynamics and epidemic curve of SARS-Co V-2.During the COVID-19 pandemic,when conventional surveillance measures have been suspended temporarily,cases of ILI can serve as a useful indicator for estimating the epidemiological trends of COVID-19.In the specific case of Beijing,it has been ascertained that cumulative infection attack rate surpass 80.25%(95%confidence interval(95%CI):77.51%-82.99%)since December 17,2022,with the apex of the outbreak projected to transpire on December 12.The culmination of existing patients is expected to occur three days subsequent to this peak.Effective reproduction number(Rt)represents the average number of secondary infections generated from a single infected individual at a specific point in time during an epidemic,remained below 1 since December 17,2022.The traditional disease surveillance systems should be complemented with information from modern surveillance data such as online data sources with advanced technical support.Modern surveillance channels should be used primarily in emerging infectious and disease outbreaks.Syndrome surveillance on COVID-19 should be established to following on the epidemic,clinical severity,and medical resource demand.展开更多
Web usage mining,content mining,and structure mining comprise the web mining process.Web-Page Recommendation(WPR)development by incor-porating Data Mining Techniques(DMT)did not include end-users with improved perform...Web usage mining,content mining,and structure mining comprise the web mining process.Web-Page Recommendation(WPR)development by incor-porating Data Mining Techniques(DMT)did not include end-users with improved performance in the obtainedfiltering results.The cluster user profile-based clustering process is delayed when it has a low precision rate.Markov Chain Monte Carlo-Dynamic Clustering(MC2-DC)is based on the User Behavior Profile(UBP)model group’s similar user behavior on a dynamic update of UBP.The Reversible-Jump Concept(RJC)reviews the history with updated UBP and moves to appropriate clusters.Hamilton’s Filtering Framework(HFF)is designed tofilter user data based on personalised information on automatically updated UBP through the Search Engine(SE).The Hamilton Filtered Regime Switching User Query Probability(HFRSUQP)works forward the updated UBP for easy and accuratefiltering of users’interests and improves WPR.A Probabilistic User Result Feature Ranking based on Gaussian Distribution(PURFR-GD)has been developed to user rank results in a web mining process.PURFR-GD decreases the delay time in the end-to-end workflow for SE personalization in various meth-ods by using the Gaussian Distribution Function(GDF).The theoretical analysis and experiment results of the proposed MC2-DC method automatically increase the updated UBP accuracy by 18.78%.HFRSUQP enabled extensive Maximize Log-Likelihood(ML-L)increases to 15.28%of User Personalized Information Search Retrieval Rate(UPISRT).For feature ranking,the PURFR-GD model defines higher Classification Accuracy(CA)and Precision Ratio(PR)while uti-lising minimum Execution Time(ET).Furthermore,UPISRT's ranking perfor-mance has improved by 20%.展开更多
This study compares websites that take live data into account using search engine optimization(SEO).A series of steps called search engine optimization can help a website rank highly in search engine results.Static we...This study compares websites that take live data into account using search engine optimization(SEO).A series of steps called search engine optimization can help a website rank highly in search engine results.Static websites and dynamic websites are two different types of websites.Static websites must have the necessary expertise in programming compatible with SEO.Whereas in dynamic websites,one can utilize readily available plugins/modules.The fundamental issue of all website holders is the lower level of page rank,congestion,utilization,and exposure of the website on the search engine.Here,the authors have studied the live data of four websites as the real-time data would indicate how the SEO strategy may be applied to website page rank,page difficulty removal,and brand query,etc.It is also necessary to choose relevant keywords on any website.The right keyword might assist to increase the brand query while also lowering the page difficulty both on and off the page.In order to calculate Off-page SEO,On-page SEO,and SEO Difficulty,the authors examined live data in this study and chose four well-known Indian university and institute websites for this study:www.caluniv.ac.in,www.jnu.ac.in,www.iima.ac.in,and www.iitb.ac.in.Using live data and SEO,the authors estimated the Off-page SEO,On-page SEO,and SEO Difficulty.It has been shown that the Off-page SEO of www.caluniv.ac.in is lower than that of www.jnu.ac.in,www.iima.ac.in,and www.iitb.ac.in by 9%,7%,and 7%,respectively.On-page SEO is,in comparison,4%,1%,and 1%more.Every university has continued to keep up its own brand query.Additionally,www.caluniv.ac.in has slightly less SEO Difficulty compared to other websites.The final computed results have been displayed and compared.展开更多
To integrate reasoning and text retrieval, the architecture of a semantic search engine which includes several kinds of queries is proposed, and the semantic search engine Smartch is designed and implemented. Based on...To integrate reasoning and text retrieval, the architecture of a semantic search engine which includes several kinds of queries is proposed, and the semantic search engine Smartch is designed and implemented. Based on a logical reasoning process and a graphic user-defined process, Smartch provides four kinds of search services. They are basic search, concept search, graphic user-defined query and association relationship search. The experimental results show that compared with the traditional search engine, the recall and precision of Smartch are improved. Graphic user-defined queries can accurately locate the information of user needs. Association relationship search can find complicated relationships between concepts. Smartch can perform some intelligent functions based on ontology inference.展开更多
A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and...A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and document feature encoding. In the Rough-CC4, the documents are described by the equivalent classes of the approximate words. By this method, the dimensions representing the documents can be reduced, which can solve the precision problems caused by the different document sizes and also blur the differences caused by the approximate words. In the Rough-CC4, a binary encoding method is introduced, through which the importance of documents relative to each equivalent class is encoded. By this encoding method, the precision of the Rough-CC4 is improved greatly and the space complexity of the Rough-CC4 is reduced. The Rough-CC4 can be used in automatic classification of documents.展开更多
A new mapping approach for automated ontology mapping using web search engines (such as Google) is presented. Based on lexico-syntactic patterns, the hyponymy relationships between ontology concepts can be obtained ...A new mapping approach for automated ontology mapping using web search engines (such as Google) is presented. Based on lexico-syntactic patterns, the hyponymy relationships between ontology concepts can be obtained from the web by search engines and an initial candidate mapping set consisting of ontology concept pairs is generated. According to the concept hierarchies of ontologies, a set of production rules is proposed to delete the concept pairs inconsistent with the ontology semantics from the initial candidate mapping set and add the concept pairs consistent with the ontology semantics to it. Finally, ontology mappings are chosen from the candidate mapping set automatically with a mapping select rule which is based on mutual information. Experimental results show that the F-measure can reach 75% to 100% and it can effectively accomplish the mapping between ontologies.展开更多
Internet based technologies, such as mobile payments, social networks, search engines and cloud computation, will lead to a paradigm shift in financial sector. Beside indirect financing via commercial banks and direct...Internet based technologies, such as mobile payments, social networks, search engines and cloud computation, will lead to a paradigm shift in financial sector. Beside indirect financing via commercial banks and direct financing through security markets, a third way to conduct financial activities will emerge, which we call "internet finance'" This paper presents a detailed analysis of payment, information processing and resource allocation under internet finance.展开更多
The meta search engines provide service to the users by dispensing the users' requests to the existing search engines. The existing search engines selected by meta search engine determine the searching quality. Be...The meta search engines provide service to the users by dispensing the users' requests to the existing search engines. The existing search engines selected by meta search engine determine the searching quality. Because the performance of the existing search engines and the users' requests are changed dynamically, it is not favorable for the fixed search engines to optimize the holistic performance of the meta search engine. This paper applies the genetic algorithm (GA) to realize the scheduling strategy of agent manager in our meta search engine, GSE(general search engine), which can simulate the evolution process of living things more lively and more efficiently. By using GA, the combination of search engines can be optimized and hence the holistic performance of GSE can be improved dramatically.展开更多
The problem of associating the agricultural market names on web sites with their locations is essential for geographical analysis of the agricultural products. In this paper, an algorithm which employs the administrat...The problem of associating the agricultural market names on web sites with their locations is essential for geographical analysis of the agricultural products. In this paper, an algorithm which employs the administrative ontology and the statistics from the search results were proposed. The experiments with 100 market names collected from web sites were conducted. The experimental results demonstrate that the algorithm proposed obtains satisfactory performance in resolving the problem above, thus the effectiveness of the method is verified.展开更多
At present, how to enable Search Engine to construct user personal interest model initially, master user's personalized information timely and provide personalized services accurately have become the hotspot in the r...At present, how to enable Search Engine to construct user personal interest model initially, master user's personalized information timely and provide personalized services accurately have become the hotspot in the research of Search Engine area. Aiming at the problems of user model's construction and combining techniques of manual customization modeling and automatic analytical modeling, a User Interest Model (UIM) is proposed in the paper. On the basis of it, the corresponding establishment and update algorithms of User lnterest Profile (UIP) are presented subsequently. Simulation tests proved that the UIM proposed and corresponding algorithms could enhance the retrieval precision effectively and have superior adaptability.展开更多
Focused carawling is a new research approach of search engine. It restricts information retrieval and provides search service in specific topic area. Focused crawling search algorithm is a key technique of focused cra...Focused carawling is a new research approach of search engine. It restricts information retrieval and provides search service in specific topic area. Focused crawling search algorithm is a key technique of focused crawler which directly affects the search quality. This paper first introduces several traditional topic-specific crawling algorithms, then an inverse link based topic-specific crawling algorithm is put forward. Comparison experiment proves that this algorithm has a good performance in recall, obviously better than traditional Breadth-First and Shark-Search algorithms. The experiment also proves that this algorithm has a good precision.展开更多
This paper starts with a description of the present status of the Digital Library of India Initiative. As part of this initiative large corpus of scanned text is available in many Indian languages and has stimulated a...This paper starts with a description of the present status of the Digital Library of India Initiative. As part of this initiative large corpus of scanned text is available in many Indian languages and has stimulated a vast amount of research in Indian language technology briefly described in this paper. Other than the Digital Library of India Initiative which is part of the Million Books to the Web Project initiated by Prof Raj Reddy of Carnegie Mellon University, there are a few more initiatives in India towards taking the heritage of the country to the Web. This paper presents the future directions for the Digital Library of India Initiative both in terms of growing collection and the technical challenges in managing such large collection poses.展开更多
Because the web is huge and web pages are updated frequently, the index maintained by a search engine has to refresh web pages periodically. This is extremely resource consuming because the search engine needs to craw...Because the web is huge and web pages are updated frequently, the index maintained by a search engine has to refresh web pages periodically. This is extremely resource consuming because the search engine needs to crawl the web and download web pages to refresh its index, Based on present technologies of web refreshing, we present a cooperative schema between web server and search engine for maintaining freshness of web repository. The web server provides metadata defined through XML standard to describe web sites. Before updating the web page the crawler visits the meta-data files. If the meta-data indicates that the page is not modified, then the crawler will not update it. So this schema can save bandwidth resource. A primitive model based on the schema is implemented. The cost and efficiency of the schema are analyzed.展开更多
Search engines have greatly helped us to find the desired information from the Internet. Most search engines use keywords matching technique. This paper discusses a Dynamic Knowledge Base based Search Engine (DKBSE)...Search engines have greatly helped us to find the desired information from the Internet. Most search engines use keywords matching technique. This paper discusses a Dynamic Knowledge Base based Search Engine (DKBSE), which can expand the user's query using the keywords' concept or meaning. To do this, the DKBSE needs to construct and maintain the knowledge base dynamically via the system's searching results and the user's feedback information. The DKBSE expands the user's initial query using the knowledge base, and returns the searched information after the expanded query.展开更多
Chemical structure searching based on databases and machine learning has at-tracted great attention recently for fast screening materials with target func-tionalities.To this end,we estab-lished a high-performance che...Chemical structure searching based on databases and machine learning has at-tracted great attention recently for fast screening materials with target func-tionalities.To this end,we estab-lished a high-performance chemical struc-ture database based on MYSQL engines,named MYDB.More than 160000 metal-organic frameworks(MOFs)have been collected and stored by using new retrieval algorithms for effcient searching and recom-mendation.The evaluations results show that MYDB could realize fast and effcient key-word searching against millions of records and provide real-time recommendations for similar structures.Combining machine learning method and materials database,we developed an adsorption model to determine the adsorption capacitor of metal-organic frameworks to-ward argon and hydrogen under certain conditions.We expect that MYDB together with the developed machine learning techniques could support large-scale,low-cost,and highly convenient structural research towards accelerating discovery of materials with target func-tionalities in the eld of computational materials research.展开更多
Web search engines are very useful information service tools in the Internet. The current web search engines produce search results relating to the search terms and the actual information collected by them. Since the ...Web search engines are very useful information service tools in the Internet. The current web search engines produce search results relating to the search terms and the actual information collected by them. Since the selections of the search results cannot affect the future ones, they may not cover most people’s interests. In this paper, feedback information produced by the user’s accessing lists will be represented by the rough set and can reconstruct the query string and influence the search results. And thus the search engines can provide self-adaptability. Key words WWW - search engine - query reconstruction - feedback CLC number TP 311. 135.4 Foundation item: Supported by the National Natural Science Fundation of China (60373066). National Grand Fundamental Research 973 Program of China (2002CB31200). Opening Foundation of State Key Laboratory of Software Engineering in Wuhan University, Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow UniversityBiography: ZHANG Wei-feng (1975-), male, Ph.D. research direction: artificial intelligence, search engine, data mining, network language.展开更多
Borda sorting algorithm is a kind of improvement algorithm based on weighted position sorting algorithm,it is mainly suitable for the high duplication of search results,for the independent search results,the effect is...Borda sorting algorithm is a kind of improvement algorithm based on weighted position sorting algorithm,it is mainly suitable for the high duplication of search results,for the independent search results,the effect is not very good and the computing method of relative score in Borda sorting algorithm is according to the rule of the linear regressive,but position relationship cannot fully represent the correlation changes.aimed at this drawback,the new sorting algorithm is proposed in this paper,named PMS-Sorting algorithm,firstly the position score of the returned results is standardized processing,and the similarity retrieval word string with the query results is combined into the algorithm,the similarity calculation method is also improved,through the experiment,the improved algorithm is superior to traditional sorting algorithm.展开更多
文摘As the tsunami of data has emerged,search engines have become the most powerful tool for obtaining scattered information on the internet.The traditional search engines return the organized results by using ranking algorithm such as term frequency,link analysis(PageRank algorithm and HITS algorithm)etc.However,these algorithms must combine the keyword frequency to determine the relevance between user’s query and the data in the computer system or internet.Moreover,we expect the search engines could understand users’searching by content meanings rather than literal strings.Semantic Web is an intelligent network and it could understand human’s language more semantically and make the communication easier between human and computers.But,the current technology for the semantic search is hard to apply.Because some meta data should be annotated to each web pages,then the search engine will have the ability to understand the users intend.However,annotate every web page is very time-consuming and leads to inefficiency.So,this study designed an ontology-based approach to improve the current traditional keyword-based search and emulate the effects of semantic search.And let the search engine can understand users more semantically when it gets the knowledge.
文摘As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.
文摘When conducting a literature review,medical authors typically search for relevant keywords in bibliographic databases or on search engines like Google.After selecting the most pertinent article based on the title’s relevance and the abstract’s content,they download or purchase the article and cite it in their manuscript.Three major elements influence whether an article will be cited in future manuscripts:the keywords,the title,and the abstract.This indicates that these elements are the“key dissemination tools”for research papers.If these three elements are not determined judiciously by authors,it may adversely affect the manuscript’s retrievability,readability,and citation index,which can negatively impact both the author and the journal.In this article,we share our informed perspective on writing strategies to enhance the searchability and citation of medical articles.These strategies are adopted from the principles of search engine optimization,but they do not aim to cheat or manipulate the search engine.Instead,they adopt a reader-centric content writing methodology that targets well-researched keywords to the readers who are searching for them.Reputable journals,such as Nature and the British Medical Journal,emphasize“online searchability”in their author guidelines.We hope that this article will encourage medical authors to approach manuscript drafting from the perspective of“looking inside-out.”In other words,they should not only draft manuscripts around what they want to convey to fellow researchers but also integrate what the readers want to discover.It is a call-to-action to better understand and engage search engine algorithms,so they yield information in a desired and self-learning manner because the“Cloud”is the new stakeholder.
基金supported by grants from the Chinese Academy of Medical Sciences(CAMS)Innovation Fund for Medical Sciences(2021I2M-1-044)。
文摘Surveillance is an essential work on infectious diseases prevention and control.When the pandemic occurred,the inadequacy of traditional surveillance was exposed,but it also provided a valuable opportunity to explore new surveillance methods.This study aimed to estimate the transmission dynamics and epidemic curve of severe acute respiratory syndrome coronavirus 2(SARS-Co V-2)Omicron BF.7 in Beijing under the emergent situation using Baidu index and influenza-like illness(ILI)surveillance.A novel hybrid model(multiattention bidirectional gated recurrent unit(MABG)-susceptible-exposed-infected-removed(SEIR))was developed,which leveraged a deep learning algorithm(MABG)to scrutinize the past records of ILI occurrences and the Baidu index of diverse symptoms such as fever,pyrexia,cough,sore throat,anti-fever medicine,and runny nose.By considering the current Baidu index and the correlation between ILI cases and coronavirus disease 2019(COVID-19)cases,a transmission dynamics model(SEIR)was formulated to estimate the transmission dynamics and epidemic curve of SARS-Co V-2.During the COVID-19 pandemic,when conventional surveillance measures have been suspended temporarily,cases of ILI can serve as a useful indicator for estimating the epidemiological trends of COVID-19.In the specific case of Beijing,it has been ascertained that cumulative infection attack rate surpass 80.25%(95%confidence interval(95%CI):77.51%-82.99%)since December 17,2022,with the apex of the outbreak projected to transpire on December 12.The culmination of existing patients is expected to occur three days subsequent to this peak.Effective reproduction number(Rt)represents the average number of secondary infections generated from a single infected individual at a specific point in time during an epidemic,remained below 1 since December 17,2022.The traditional disease surveillance systems should be complemented with information from modern surveillance data such as online data sources with advanced technical support.Modern surveillance channels should be used primarily in emerging infectious and disease outbreaks.Syndrome surveillance on COVID-19 should be established to following on the epidemic,clinical severity,and medical resource demand.
基金Supporting this study through Taif University Researchers Supporting Project number(TURSP-2020/115),Taif University,Taif,Saudi Arabia.
文摘Web usage mining,content mining,and structure mining comprise the web mining process.Web-Page Recommendation(WPR)development by incor-porating Data Mining Techniques(DMT)did not include end-users with improved performance in the obtainedfiltering results.The cluster user profile-based clustering process is delayed when it has a low precision rate.Markov Chain Monte Carlo-Dynamic Clustering(MC2-DC)is based on the User Behavior Profile(UBP)model group’s similar user behavior on a dynamic update of UBP.The Reversible-Jump Concept(RJC)reviews the history with updated UBP and moves to appropriate clusters.Hamilton’s Filtering Framework(HFF)is designed tofilter user data based on personalised information on automatically updated UBP through the Search Engine(SE).The Hamilton Filtered Regime Switching User Query Probability(HFRSUQP)works forward the updated UBP for easy and accuratefiltering of users’interests and improves WPR.A Probabilistic User Result Feature Ranking based on Gaussian Distribution(PURFR-GD)has been developed to user rank results in a web mining process.PURFR-GD decreases the delay time in the end-to-end workflow for SE personalization in various meth-ods by using the Gaussian Distribution Function(GDF).The theoretical analysis and experiment results of the proposed MC2-DC method automatically increase the updated UBP accuracy by 18.78%.HFRSUQP enabled extensive Maximize Log-Likelihood(ML-L)increases to 15.28%of User Personalized Information Search Retrieval Rate(UPISRT).For feature ranking,the PURFR-GD model defines higher Classification Accuracy(CA)and Precision Ratio(PR)while uti-lising minimum Execution Time(ET).Furthermore,UPISRT's ranking perfor-mance has improved by 20%.
文摘This study compares websites that take live data into account using search engine optimization(SEO).A series of steps called search engine optimization can help a website rank highly in search engine results.Static websites and dynamic websites are two different types of websites.Static websites must have the necessary expertise in programming compatible with SEO.Whereas in dynamic websites,one can utilize readily available plugins/modules.The fundamental issue of all website holders is the lower level of page rank,congestion,utilization,and exposure of the website on the search engine.Here,the authors have studied the live data of four websites as the real-time data would indicate how the SEO strategy may be applied to website page rank,page difficulty removal,and brand query,etc.It is also necessary to choose relevant keywords on any website.The right keyword might assist to increase the brand query while also lowering the page difficulty both on and off the page.In order to calculate Off-page SEO,On-page SEO,and SEO Difficulty,the authors examined live data in this study and chose four well-known Indian university and institute websites for this study:www.caluniv.ac.in,www.jnu.ac.in,www.iima.ac.in,and www.iitb.ac.in.Using live data and SEO,the authors estimated the Off-page SEO,On-page SEO,and SEO Difficulty.It has been shown that the Off-page SEO of www.caluniv.ac.in is lower than that of www.jnu.ac.in,www.iima.ac.in,and www.iitb.ac.in by 9%,7%,and 7%,respectively.On-page SEO is,in comparison,4%,1%,and 1%more.Every university has continued to keep up its own brand query.Additionally,www.caluniv.ac.in has slightly less SEO Difficulty compared to other websites.The final computed results have been displayed and compared.
基金The National Natural Science Foundation of China(No60403027)
文摘To integrate reasoning and text retrieval, the architecture of a semantic search engine which includes several kinds of queries is proposed, and the semantic search engine Smartch is designed and implemented. Based on a logical reasoning process and a graphic user-defined process, Smartch provides four kinds of search services. They are basic search, concept search, graphic user-defined query and association relationship search. The experimental results show that compared with the traditional search engine, the recall and precision of Smartch are improved. Graphic user-defined queries can accurately locate the information of user needs. Association relationship search can find complicated relationships between concepts. Smartch can perform some intelligent functions based on ontology inference.
基金The National Natural Science Foundation of China(No.60503020,60373066,60403016,60425206),the Natural Science Foundation of Jiangsu Higher Education Institutions ( No.04KJB520096),the Doctoral Foundation of Nanjing University of Posts and Telecommunication (No.0302).
文摘A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and document feature encoding. In the Rough-CC4, the documents are described by the equivalent classes of the approximate words. By this method, the dimensions representing the documents can be reduced, which can solve the precision problems caused by the different document sizes and also blur the differences caused by the approximate words. In the Rough-CC4, a binary encoding method is introduced, through which the importance of documents relative to each equivalent class is encoded. By this encoding method, the precision of the Rough-CC4 is improved greatly and the space complexity of the Rough-CC4 is reduced. The Rough-CC4 can be used in automatic classification of documents.
基金The National Natural Science Foundation of China(No60425206,90412003)the Foundation of Excellent Doctoral Dis-sertation of Southeast University (NoYBJJ0502)
文摘A new mapping approach for automated ontology mapping using web search engines (such as Google) is presented. Based on lexico-syntactic patterns, the hyponymy relationships between ontology concepts can be obtained from the web by search engines and an initial candidate mapping set consisting of ontology concept pairs is generated. According to the concept hierarchies of ontologies, a set of production rules is proposed to delete the concept pairs inconsistent with the ontology semantics from the initial candidate mapping set and add the concept pairs consistent with the ontology semantics to it. Finally, ontology mappings are chosen from the candidate mapping set automatically with a mapping select rule which is based on mutual information. Experimental results show that the F-measure can reach 75% to 100% and it can effectively accomplish the mapping between ontologies.
文摘Internet based technologies, such as mobile payments, social networks, search engines and cloud computation, will lead to a paradigm shift in financial sector. Beside indirect financing via commercial banks and direct financing through security markets, a third way to conduct financial activities will emerge, which we call "internet finance'" This paper presents a detailed analysis of payment, information processing and resource allocation under internet finance.
基金Supported in part by the National Natural Science F oundation of China(NSFC) (6 0 0 730 12 )
文摘The meta search engines provide service to the users by dispensing the users' requests to the existing search engines. The existing search engines selected by meta search engine determine the searching quality. Because the performance of the existing search engines and the users' requests are changed dynamically, it is not favorable for the fixed search engines to optimize the holistic performance of the meta search engine. This paper applies the genetic algorithm (GA) to realize the scheduling strategy of agent manager in our meta search engine, GSE(general search engine), which can simulate the evolution process of living things more lively and more efficiently. By using GA, the combination of search engines can be optimized and hence the holistic performance of GSE can be improved dramatically.
基金supported by the Knowledge Innovation Program of the Chinese Academy of Sciences
文摘The problem of associating the agricultural market names on web sites with their locations is essential for geographical analysis of the agricultural products. In this paper, an algorithm which employs the administrative ontology and the statistics from the search results were proposed. The experiments with 100 market names collected from web sites were conducted. The experimental results demonstrate that the algorithm proposed obtains satisfactory performance in resolving the problem above, thus the effectiveness of the method is verified.
基金Supported by the National Natural Science Foundation of China (50674086)the Doctoral Foundation of Ministry of Education of China (20060290508)the Youth Scientific Research Foundation of CUMT (0D060125)
文摘At present, how to enable Search Engine to construct user personal interest model initially, master user's personalized information timely and provide personalized services accurately have become the hotspot in the research of Search Engine area. Aiming at the problems of user model's construction and combining techniques of manual customization modeling and automatic analytical modeling, a User Interest Model (UIM) is proposed in the paper. On the basis of it, the corresponding establishment and update algorithms of User lnterest Profile (UIP) are presented subsequently. Simulation tests proved that the UIM proposed and corresponding algorithms could enhance the retrieval precision effectively and have superior adaptability.
文摘Focused carawling is a new research approach of search engine. It restricts information retrieval and provides search service in specific topic area. Focused crawling search algorithm is a key technique of focused crawler which directly affects the search quality. This paper first introduces several traditional topic-specific crawling algorithms, then an inverse link based topic-specific crawling algorithm is put forward. Comparison experiment proves that this algorithm has a good performance in recall, obviously better than traditional Breadth-First and Shark-Search algorithms. The experiment also proves that this algorithm has a good precision.
文摘This paper starts with a description of the present status of the Digital Library of India Initiative. As part of this initiative large corpus of scanned text is available in many Indian languages and has stimulated a vast amount of research in Indian language technology briefly described in this paper. Other than the Digital Library of India Initiative which is part of the Million Books to the Web Project initiated by Prof Raj Reddy of Carnegie Mellon University, there are a few more initiatives in India towards taking the heritage of the country to the Web. This paper presents the future directions for the Digital Library of India Initiative both in terms of growing collection and the technical challenges in managing such large collection poses.
基金Supported by the National Natural Science Funda-tion of China (60403027)
文摘Because the web is huge and web pages are updated frequently, the index maintained by a search engine has to refresh web pages periodically. This is extremely resource consuming because the search engine needs to crawl the web and download web pages to refresh its index, Based on present technologies of web refreshing, we present a cooperative schema between web server and search engine for maintaining freshness of web repository. The web server provides metadata defined through XML standard to describe web sites. Before updating the web page the crawler visits the meta-data files. If the meta-data indicates that the page is not modified, then the crawler will not update it. So this schema can save bandwidth resource. A primitive model based on the schema is implemented. The cost and efficiency of the schema are analyzed.
文摘Search engines have greatly helped us to find the desired information from the Internet. Most search engines use keywords matching technique. This paper discusses a Dynamic Knowledge Base based Search Engine (DKBSE), which can expand the user's query using the keywords' concept or meaning. To do this, the DKBSE needs to construct and maintain the knowledge base dynamically via the system's searching results and the user's feedback information. The DKBSE expands the user's initial query using the knowledge base, and returns the searched information after the expanded query.
基金This work was supported by the National Natu-ral Science Foundation of China(No.21573204 and No.21421063),Fundamental Research Funds for the Central Universities,National Program for Support of Top-notch Young Professional,CAS Interdisciplinary Innovation Team,and Super Computer Center of USTCSCC and SCCAS.
文摘Chemical structure searching based on databases and machine learning has at-tracted great attention recently for fast screening materials with target func-tionalities.To this end,we estab-lished a high-performance chemical struc-ture database based on MYSQL engines,named MYDB.More than 160000 metal-organic frameworks(MOFs)have been collected and stored by using new retrieval algorithms for effcient searching and recom-mendation.The evaluations results show that MYDB could realize fast and effcient key-word searching against millions of records and provide real-time recommendations for similar structures.Combining machine learning method and materials database,we developed an adsorption model to determine the adsorption capacitor of metal-organic frameworks to-ward argon and hydrogen under certain conditions.We expect that MYDB together with the developed machine learning techniques could support large-scale,low-cost,and highly convenient structural research towards accelerating discovery of materials with target func-tionalities in the eld of computational materials research.
文摘Web search engines are very useful information service tools in the Internet. The current web search engines produce search results relating to the search terms and the actual information collected by them. Since the selections of the search results cannot affect the future ones, they may not cover most people’s interests. In this paper, feedback information produced by the user’s accessing lists will be represented by the rough set and can reconstruct the query string and influence the search results. And thus the search engines can provide self-adaptability. Key words WWW - search engine - query reconstruction - feedback CLC number TP 311. 135.4 Foundation item: Supported by the National Natural Science Fundation of China (60373066). National Grand Fundamental Research 973 Program of China (2002CB31200). Opening Foundation of State Key Laboratory of Software Engineering in Wuhan University, Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow UniversityBiography: ZHANG Wei-feng (1975-), male, Ph.D. research direction: artificial intelligence, search engine, data mining, network language.
基金This work was funded by the National Natural Science Foundation of China under Grant(No.61772152 and No.61502037)the Basic Research Project(Nos.JCKY2016206B001,JCKY2014206C002 and JCKY2017604C010)the Technical Foundation Project(No.JSQB2017206C002).
文摘Borda sorting algorithm is a kind of improvement algorithm based on weighted position sorting algorithm,it is mainly suitable for the high duplication of search results,for the independent search results,the effect is not very good and the computing method of relative score in Borda sorting algorithm is according to the rule of the linear regressive,but position relationship cannot fully represent the correlation changes.aimed at this drawback,the new sorting algorithm is proposed in this paper,named PMS-Sorting algorithm,firstly the position score of the returned results is standardized processing,and the similarity retrieval word string with the query results is combined into the algorithm,the similarity calculation method is also improved,through the experiment,the improved algorithm is superior to traditional sorting algorithm.