To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent...To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent semantic analysis of the attributes in the query interfaces and the unique entrances of the deep web sites,the hidden semantic structure information can be retrieved and dimension reduction can be achieved to a certain extent.Using this semantic structure information,the contents in the site can be inferred and the similarity measures among sites in deep web can be revised.Experimental results show that latent semantic analysis revises and improves the semantic understanding of the query form in the deep web,which overcomes the shortcomings of the keyword-based methods.This approach can be used to effectively search the most similar site for any given site and to obtain a site list which conforms to the restrictions one specifies.展开更多
Data aggregation from various web sources is very significant for web data analysis domain. In ad- dition, the recognition of coherence micro cluster is one of the most interesting issues in the field of data aggregat...Data aggregation from various web sources is very significant for web data analysis domain. In ad- dition, the recognition of coherence micro cluster is one of the most interesting issues in the field of data aggregation. Until now, many algorithms have been proposed to work on this issue. However, the deficiency of these solutions is that they cannot recognize the micro-cluster data stream accurately. A semantic-based coherent micro-cluster recognition algorithm for hybrid web data stream is nronosed.Firstly, an objective function is proposed to recognize the coherence micro-cluster and then the coher- ence micro-cluster recognition algorithm for hybrid web data stream based on semantic is raised. Fi-展开更多
文摘To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent semantic analysis of the attributes in the query interfaces and the unique entrances of the deep web sites,the hidden semantic structure information can be retrieved and dimension reduction can be achieved to a certain extent.Using this semantic structure information,the contents in the site can be inferred and the similarity measures among sites in deep web can be revised.Experimental results show that latent semantic analysis revises and improves the semantic understanding of the query form in the deep web,which overcomes the shortcomings of the keyword-based methods.This approach can be used to effectively search the most similar site for any given site and to obtain a site list which conforms to the restrictions one specifies.
基金Supported by the National High Technology Research and Development Programme of China(No.2011AA120300,2011AA120302)the National Key Technology Support Program of China(No.2013BAH66F02)
文摘Data aggregation from various web sources is very significant for web data analysis domain. In ad- dition, the recognition of coherence micro cluster is one of the most interesting issues in the field of data aggregation. Until now, many algorithms have been proposed to work on this issue. However, the deficiency of these solutions is that they cannot recognize the micro-cluster data stream accurately. A semantic-based coherent micro-cluster recognition algorithm for hybrid web data stream is nronosed.Firstly, an objective function is proposed to recognize the coherence micro-cluster and then the coher- ence micro-cluster recognition algorithm for hybrid web data stream based on semantic is raised. Fi-