As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results a...As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.展开更多
A new mapping approach for automated ontology mapping using web search engines (such as Google) is presented. Based on lexico-syntactic patterns, the hyponymy relationships between ontology concepts can be obtained ...A new mapping approach for automated ontology mapping using web search engines (such as Google) is presented. Based on lexico-syntactic patterns, the hyponymy relationships between ontology concepts can be obtained from the web by search engines and an initial candidate mapping set consisting of ontology concept pairs is generated. According to the concept hierarchies of ontologies, a set of production rules is proposed to delete the concept pairs inconsistent with the ontology semantics from the initial candidate mapping set and add the concept pairs consistent with the ontology semantics to it. Finally, ontology mappings are chosen from the candidate mapping set automatically with a mapping select rule which is based on mutual information. Experimental results show that the F-measure can reach 75% to 100% and it can effectively accomplish the mapping between ontologies.展开更多
Web search engines are important tools for lexicography.This paper takes translation of business terms("e-commerce"and"e-business")as an example to illustrate the application of web search engines ...Web search engines are important tools for lexicography.This paper takes translation of business terms("e-commerce"and"e-business")as an example to illustrate the application of web search engines in English-Chinese dictionary translation,including the methods of(1)finding the potential Chinese equivalents of the English business terms,and(2)selecting typical and proper Chinese equivalents in accordance with the frequencies and the meanings of the English business terms respectively.展开更多
The concept of Webpage visibility is usually linked to search engine optimization (SEO), and it is based on global in-link metric [1]. SEO is the process of designing Webpages to optimize its potential to rank high on...The concept of Webpage visibility is usually linked to search engine optimization (SEO), and it is based on global in-link metric [1]. SEO is the process of designing Webpages to optimize its potential to rank high on search engines, preferably on the first page of the results page. The purpose of this research study is to analyze the influence of local geographical area, in terms of cultural values, and the effect of local society keywords in increasing Website visibility. Websites were analyzed by accessing the source code of their homepages through Google Chrome browser. Statistical analysis methods were selected to assess and analyze the results of the SEO and search engine visibility (SEV). The results obtained suggest that the development of Web indicators to be included should consider a local idea of visibility, and consider a certain geographical context. The geographical region that the researchers are considering in this research is the Hashemite kingdom of Jordan (HKJ). The results obtained also suggest that the use of social culture keywords leads to increase the Website visibility in search engines as well as localizes the search area such as google.jo, which localizes the search for HKJ.展开更多
As a new knowledge mining means,Web mining provides a new solution for the utilization of Web information resources This article describes the application of Web mining technologies in search engines,and discusses how...As a new knowledge mining means,Web mining provides a new solution for the utilization of Web information resources This article describes the application of Web mining technologies in search engines,and discusses how to mine the latest technologies for search engines so as to improve their retrieving展开更多
The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created.However,challenges remain with connecting ...The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created.However,challenges remain with connecting individuals searching for geospatial data with servers and websites where such data exist.The objective of this paper is to present a publically available Geospatial Search Engine(GSE)that utilizes a web crawler built on top of the Google search engine in order to search the web for geospatial data.The crawler seeding mechanism combines search terms entered by users with predefined keywords that identify geospatial data services.A procedure runs daily to update map server layers and metadata,and to eliminate servers that go offline.The GSE supports Web Map Services,ArcGIS services,and websites that have geospatial data for download.We applied the GSE to search for all available geospatial services under these formats and provide search results including the spatial distribution of all obtained services.While enhancements to our GSE and to web crawler technology in general lie ahead,our work represents an important step toward realizing the potential of a publically accessible tool for discovering the global availability of geospatial data.展开更多
In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the char...In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user' s queries and clicked URLs present dramatic locality, which implies that query cache and 'hot click' cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution character-istics of web information are also analyzed, which demonstrates that the link popularity and replica pop-ularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.展开更多
To integrate reasoning and text retrieval, the architecture of a semantic search engine which includes several kinds of queries is proposed, and the semantic search engine Smartch is designed and implemented. Based on...To integrate reasoning and text retrieval, the architecture of a semantic search engine which includes several kinds of queries is proposed, and the semantic search engine Smartch is designed and implemented. Based on a logical reasoning process and a graphic user-defined process, Smartch provides four kinds of search services. They are basic search, concept search, graphic user-defined query and association relationship search. The experimental results show that compared with the traditional search engine, the recall and precision of Smartch are improved. Graphic user-defined queries can accurately locate the information of user needs. Association relationship search can find complicated relationships between concepts. Smartch can perform some intelligent functions based on ontology inference.展开更多
现今Web中存在大量缺失、不一致及不精确的数据,而传统的搜索引擎只能根据关键词返回文档片段,无法直接获取目标实体。提出一种新的基于图匹配的实体抽取算法GMEE(Graph Matching Based Entity Extraction),首先将片段按词语分割,进行...现今Web中存在大量缺失、不一致及不精确的数据,而传统的搜索引擎只能根据关键词返回文档片段,无法直接获取目标实体。提出一种新的基于图匹配的实体抽取算法GMEE(Graph Matching Based Entity Extraction),首先将片段按词语分割,进行实体的初步筛选;然后根据各实体之间的结构和语义关系建立“加权语义实体关联图”;最后利用“最大公共子图匹配”策略抽取目标实体。实验结果表明,提出的算法在不需要大量参数训练及传递的情况下,能够对抽取的实体集进行有效的精简,既保证了召回率、准确率,又提高了抽取过程的可解释性。展开更多
从电子商务环境下服装供应链管理的需求出发,分析了目前服装搜索引擎存在的问题,提出了基于语义Web服务的分布式服装商品搜索引擎系统模型,并讨论了它的体系结构。介绍了基于Ontology Web Language(OWL)的服装本体设计模型及其语义描述...从电子商务环境下服装供应链管理的需求出发,分析了目前服装搜索引擎存在的问题,提出了基于语义Web服务的分布式服装商品搜索引擎系统模型,并讨论了它的体系结构。介绍了基于Ontology Web Language(OWL)的服装本体设计模型及其语义描述方法。分析了服装搜索引擎的基本功能及分布式环境下的Web Services(WS)合成。理论分析和实例原型说明了基于服装语义树的搜索引擎多关键词搜索效率明显高于全文搜索引擎。展开更多
文摘As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.
基金The National Natural Science Foundation of China(No60425206,90412003)the Foundation of Excellent Doctoral Dis-sertation of Southeast University (NoYBJJ0502)
文摘A new mapping approach for automated ontology mapping using web search engines (such as Google) is presented. Based on lexico-syntactic patterns, the hyponymy relationships between ontology concepts can be obtained from the web by search engines and an initial candidate mapping set consisting of ontology concept pairs is generated. According to the concept hierarchies of ontologies, a set of production rules is proposed to delete the concept pairs inconsistent with the ontology semantics from the initial candidate mapping set and add the concept pairs consistent with the ontology semantics to it. Finally, ontology mappings are chosen from the candidate mapping set automatically with a mapping select rule which is based on mutual information. Experimental results show that the F-measure can reach 75% to 100% and it can effectively accomplish the mapping between ontologies.
文摘Web search engines are important tools for lexicography.This paper takes translation of business terms("e-commerce"and"e-business")as an example to illustrate the application of web search engines in English-Chinese dictionary translation,including the methods of(1)finding the potential Chinese equivalents of the English business terms,and(2)selecting typical and proper Chinese equivalents in accordance with the frequencies and the meanings of the English business terms respectively.
文摘The concept of Webpage visibility is usually linked to search engine optimization (SEO), and it is based on global in-link metric [1]. SEO is the process of designing Webpages to optimize its potential to rank high on search engines, preferably on the first page of the results page. The purpose of this research study is to analyze the influence of local geographical area, in terms of cultural values, and the effect of local society keywords in increasing Website visibility. Websites were analyzed by accessing the source code of their homepages through Google Chrome browser. Statistical analysis methods were selected to assess and analyze the results of the SEO and search engine visibility (SEV). The results obtained suggest that the development of Web indicators to be included should consider a local idea of visibility, and consider a certain geographical context. The geographical region that the researchers are considering in this research is the Hashemite kingdom of Jordan (HKJ). The results obtained also suggest that the use of social culture keywords leads to increase the Website visibility in search engines as well as localizes the search area such as google.jo, which localizes the search for HKJ.
文摘As a new knowledge mining means,Web mining provides a new solution for the utilization of Web information resources This article describes the application of Web mining technologies in search engines,and discusses how to mine the latest technologies for search engines so as to improve their retrieving
文摘The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created.However,challenges remain with connecting individuals searching for geospatial data with servers and websites where such data exist.The objective of this paper is to present a publically available Geospatial Search Engine(GSE)that utilizes a web crawler built on top of the Google search engine in order to search the web for geospatial data.The crawler seeding mechanism combines search terms entered by users with predefined keywords that identify geospatial data services.A procedure runs daily to update map server layers and metadata,and to eliminate servers that go offline.The GSE supports Web Map Services,ArcGIS services,and websites that have geospatial data for download.We applied the GSE to search for all available geospatial services under these formats and provide search results including the spatial distribution of all obtained services.While enhancements to our GSE and to web crawler technology in general lie ahead,our work represents an important step toward realizing the potential of a publically accessible tool for discovering the global availability of geospatial data.
基金This work was supported by the National Grand Fundamental Research of China ( Grant No. G1999032706).
文摘In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user' s queries and clicked URLs present dramatic locality, which implies that query cache and 'hot click' cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution character-istics of web information are also analyzed, which demonstrates that the link popularity and replica pop-ularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.
基金The National Natural Science Foundation of China(No60403027)
文摘To integrate reasoning and text retrieval, the architecture of a semantic search engine which includes several kinds of queries is proposed, and the semantic search engine Smartch is designed and implemented. Based on a logical reasoning process and a graphic user-defined process, Smartch provides four kinds of search services. They are basic search, concept search, graphic user-defined query and association relationship search. The experimental results show that compared with the traditional search engine, the recall and precision of Smartch are improved. Graphic user-defined queries can accurately locate the information of user needs. Association relationship search can find complicated relationships between concepts. Smartch can perform some intelligent functions based on ontology inference.
文摘现今Web中存在大量缺失、不一致及不精确的数据,而传统的搜索引擎只能根据关键词返回文档片段,无法直接获取目标实体。提出一种新的基于图匹配的实体抽取算法GMEE(Graph Matching Based Entity Extraction),首先将片段按词语分割,进行实体的初步筛选;然后根据各实体之间的结构和语义关系建立“加权语义实体关联图”;最后利用“最大公共子图匹配”策略抽取目标实体。实验结果表明,提出的算法在不需要大量参数训练及传递的情况下,能够对抽取的实体集进行有效的精简,既保证了召回率、准确率,又提高了抽取过程的可解释性。
文摘从电子商务环境下服装供应链管理的需求出发,分析了目前服装搜索引擎存在的问题,提出了基于语义Web服务的分布式服装商品搜索引擎系统模型,并讨论了它的体系结构。介绍了基于Ontology Web Language(OWL)的服装本体设计模型及其语义描述方法。分析了服装搜索引擎的基本功能及分布式环境下的Web Services(WS)合成。理论分析和实例原型说明了基于服装语义树的搜索引擎多关键词搜索效率明显高于全文搜索引擎。