With the development of Internet technology,the explosive growth of Internet information presentation has led to difficulty in filtering effective information.Finding a model with high accuracy for text classification...With the development of Internet technology,the explosive growth of Internet information presentation has led to difficulty in filtering effective information.Finding a model with high accuracy for text classification has become a critical problem to be solved by text filtering,especially for Chinese texts.This paper selected the manually calibrated Douban movie website comment data for research.First,a text filtering model based on the BP neural network has been built;Second,based on the Term Frequency-Inverse Document Frequency(TF-IDF)vector space model and the doc2vec method,the text word frequency vector and the text semantic vector were obtained respectively,and the text word frequency vector was linearly reduced by the Principal Component Analysis(PCA)method.Third,the text word frequency vector after dimensionality reduction and the text semantic vector were combined,add the text value degree,and the text synthesis vector was constructed.Experiments show that the model combined with text word frequency vector degree after dimensionality reduction,text semantic vector,and text value has reached the highest accuracy of 84.67%.展开更多
To avoid the scalability of the existing systems that employed centralized indexing,index flooding or query flooding,we proposed an efficient peer-to-peer information retrieval system SPIRS (Semantic P2P-based Informa...To avoid the scalability of the existing systems that employed centralized indexing,index flooding or query flooding,we proposed an efficient peer-to-peer information retrieval system SPIRS (Semantic P2P-based Information Retrieval System) that supported state-of-the-art content and semantic searches. SPIRS distributes document indices through P2P network hierarchically by Latent Semantic Indexing (LSI) and organizes nodes into a hierarchical overlay through CAN and TRIE. Comparing with other P2P search techniques,those based on simple keyword matching,SPIRS has better accuracy for considering the advanced relevance among documents. Given a query,only a small number of nodes are needed for SPIRS to identify the matching documents. Furthermore,both theoretical analysis and experimental results show that SPIRS possesses higher accuracy and less logic hops.展开更多
基金Supported by the Sichuan Science and Technology Program (2021YFQ0003).
文摘With the development of Internet technology,the explosive growth of Internet information presentation has led to difficulty in filtering effective information.Finding a model with high accuracy for text classification has become a critical problem to be solved by text filtering,especially for Chinese texts.This paper selected the manually calibrated Douban movie website comment data for research.First,a text filtering model based on the BP neural network has been built;Second,based on the Term Frequency-Inverse Document Frequency(TF-IDF)vector space model and the doc2vec method,the text word frequency vector and the text semantic vector were obtained respectively,and the text word frequency vector was linearly reduced by the Principal Component Analysis(PCA)method.Third,the text word frequency vector after dimensionality reduction and the text semantic vector were combined,add the text value degree,and the text synthesis vector was constructed.Experiments show that the model combined with text word frequency vector degree after dimensionality reduction,text semantic vector,and text value has reached the highest accuracy of 84.67%.
基金the Nartional Basic Research Programof China(Grant No.2002CB312002)the Science and Technology Commission of Shanghai Munic-ipality Project(Grant No.03dz15027 and 03dz15028).
文摘To avoid the scalability of the existing systems that employed centralized indexing,index flooding or query flooding,we proposed an efficient peer-to-peer information retrieval system SPIRS (Semantic P2P-based Information Retrieval System) that supported state-of-the-art content and semantic searches. SPIRS distributes document indices through P2P network hierarchically by Latent Semantic Indexing (LSI) and organizes nodes into a hierarchical overlay through CAN and TRIE. Comparing with other P2P search techniques,those based on simple keyword matching,SPIRS has better accuracy for considering the advanced relevance among documents. Given a query,only a small number of nodes are needed for SPIRS to identify the matching documents. Furthermore,both theoretical analysis and experimental results show that SPIRS possesses higher accuracy and less logic hops.