Most academic information has its creator, that is, a subject who has created the information. The subject can be an individual, a group, or an institution, and can be a nation depending on the nature of the relevant ...Most academic information has its creator, that is, a subject who has created the information. The subject can be an individual, a group, or an institution, and can be a nation depending on the nature of the relevant information. Most web data are composed of a title, an author, and contents. A paper which is under the academic information category has metadata including a title, an author, keyword, abstract, data about publication, place of publication, ISSN, and the like. A patent has metadata including the title, an applicant, an inventor, an attorney, IPC, number of application, and claims of the invention. Most web-based academic information services enable users to search the information by processing the meta-information. An important element is to search information by using the author field which corresponds to a personal name. This study suggests a method of efficient indexing and using the adjacent operation result ranking algorithm to which phrase search-based boosting elements are applied, and thus improving the accuracy of the search results of author name. This method can be effectively applied to providing accurate search results in the academic information services.展开更多
This paper proposed a novel text representation and matching scheme for Chinese text retrieval. At present, the indexing methods of Chinese retrieval systems are either character-based or word-based. The character-bas...This paper proposed a novel text representation and matching scheme for Chinese text retrieval. At present, the indexing methods of Chinese retrieval systems are either character-based or word-based. The character-based indexing methods, such as bi-gram or tri-gram indexing, have high false drops due to the mismatches between queries and documents. On the other hand, it's difficult to efficiently identify all the proper nouns, terminology of different domains, and phrases in the word-based indexing systems. The new indexing method uses both proximity and mutual information of the word pairs to represent the text content so as to overcome the high false drop, new word and phrase problems that exist in the character-based and word-based systems. The evaluation results indicate that the average query precision of proximity-based indexing is 5.2% higher than the best results of TREC-5.展开更多
To avoid the scalability of the existing systems that employed centralized indexing,index flooding or query flooding,we proposed an efficient peer-to-peer information retrieval system SPIRS (Semantic P2P-based Informa...To avoid the scalability of the existing systems that employed centralized indexing,index flooding or query flooding,we proposed an efficient peer-to-peer information retrieval system SPIRS (Semantic P2P-based Information Retrieval System) that supported state-of-the-art content and semantic searches. SPIRS distributes document indices through P2P network hierarchically by Latent Semantic Indexing (LSI) and organizes nodes into a hierarchical overlay through CAN and TRIE. Comparing with other P2P search techniques,those based on simple keyword matching,SPIRS has better accuracy for considering the advanced relevance among documents. Given a query,only a small number of nodes are needed for SPIRS to identify the matching documents. Furthermore,both theoretical analysis and experimental results show that SPIRS possesses higher accuracy and less logic hops.展开更多
Developments in multimedia technologies have paved way for the storage of huge collections of video doc- uments on computer systems. It is essential to design tools for content-based access to the documents, so as to ...Developments in multimedia technologies have paved way for the storage of huge collections of video doc- uments on computer systems. It is essential to design tools for content-based access to the documents, so as to allow an efficient exploitation of these collections. Content based anal- ysis provides a flexible and powerful way to access video data when compared with the other traditional video analysis tech- niques. The area of content based video indexing and retrieval (CBVIR), focusing on automating the indexing, retrieval and management of video, has attracted extensive research in the last decade. CBVIR is a lively area of research with endur- ing acknowledgments from several domains. Herein a vital assessment of contemporary researches associated with the content-based indexing and retrieval of visual information. In this paper, we present an extensive review of significant researches on CBV1R. Concise description of content based video analysis along with the techniques associated with the content based video indexing and retrieval is presented.展开更多
A novel latent semantic indexing (LSI) approach for content-based image retrieval is presented in this paper. Firstly, an extension of non-negative matrix factorization (NMF) to supervised initialization is discus...A novel latent semantic indexing (LSI) approach for content-based image retrieval is presented in this paper. Firstly, an extension of non-negative matrix factorization (NMF) to supervised initialization is discussed. Then, supervised NMF is used in LSI to find the relationships between low-level features and high-level semantics. The retrieved results are compared with other approaches and a good performance is obtained.展开更多
The volume of information being created, generated and stored is huge. Without adequate knowledge of Information Retrieval (IR) methods, the retrieval process for information would be cumbersome and frustrating. Studi...The volume of information being created, generated and stored is huge. Without adequate knowledge of Information Retrieval (IR) methods, the retrieval process for information would be cumbersome and frustrating. Studies have further revealed that IR methods are essential in information centres (for example, Digital Library environment) for storage and retrieval of information. Therefore, with more than one billion people accessing the Internet, and millions of queries being issued on a daily basis, modern Web search engines are facing a problem of daunting scale. The main problem associated with the existing search engines is how to avoid irrelevant information retrieval and to retrieve the relevant ones. In this study, the existing system of library retrieval was studied. Problems associated with them were analyzed in order to address this problem. The concept of existing information retrieval models was studied, and the knowledge gained was used to design a digital library information retrieval system. It was successfully implemented using a real life data. The need for a continuous evaluation of the IR methods for effective and efficient full text retrieval system was recommended.展开更多
Traditional information retrieval systems respond to user queries with ranked lists of relevant documents. Since, XML (Extensible Markup Language) documents separate content and structure; XML-IR (information retri...Traditional information retrieval systems respond to user queries with ranked lists of relevant documents. Since, XML (Extensible Markup Language) documents separate content and structure; XML-IR (information retrieval) systems are able to retrieve only the relevant portions of documents. Therefore, users who utilize an XML-IR system could potentially receive highly relevant and precise material. We have developed the XML information retrieval system by using MySQL and Sphinx, which we call MEXIR. In our system, XML documents are stored into one table that has fixed relational schema. The schema is independent of the logical structure of XML documents. Each node in XML documents is represented by labels that express the positions in XML tree, namely ADXPI scheme. Our system has performance experiments on INEX collections and shown an average up to four seconds better than GPX. In addition, it has been reduced the size of the data down by 82.29 % compare to GPX system.展开更多
The traditional information hiding methods embed the secret information by modifying the carrier,which will inevitably leave traces of modification on the carrier.In this way,it is hard to resist the detection of steg...The traditional information hiding methods embed the secret information by modifying the carrier,which will inevitably leave traces of modification on the carrier.In this way,it is hard to resist the detection of steganalysis algorithm.To address this problem,the concept of coverless information hiding was proposed.Coverless information hiding can effectively resist steganalysis algorithm,since it uses unmodified natural stego-carriers to represent and convey confidential information.However,the state-of-the-arts method has a low hidden capacity,which makes it less appealing.Because the pixel values of different regions of the molecular structure images of material(MSIM)are usually different,this paper proposes a novel coverless information hiding method based on MSIM,which utilizes the average value of sub-image’s pixels to represent the secret information,according to the mapping between pixel value intervals and secret information.In addition,we employ a pseudo-random label sequence that is used to determine the position of sub-images to improve the security of the method.And the histogram of the Bag of words model(BOW)is used to determine the number of subimages in the image that convey secret information.Moreover,to improve the retrieval efficiency,we built a multi-level inverted index structure.Furthermore,the proposed method can also be used for other natural images.Compared with the state-of-the-arts,experimental results and analysis manifest that our method has better performance in anti-steganalysis,security and capacity.展开更多
文摘Most academic information has its creator, that is, a subject who has created the information. The subject can be an individual, a group, or an institution, and can be a nation depending on the nature of the relevant information. Most web data are composed of a title, an author, and contents. A paper which is under the academic information category has metadata including a title, an author, keyword, abstract, data about publication, place of publication, ISSN, and the like. A patent has metadata including the title, an applicant, an inventor, an attorney, IPC, number of application, and claims of the invention. Most web-based academic information services enable users to search the information by processing the meta-information. An important element is to search information by using the author field which corresponds to a personal name. This study suggests a method of efficient indexing and using the adjacent operation result ranking algorithm to which phrase search-based boosting elements are applied, and thus improving the accuracy of the search results of author name. This method can be effectively applied to providing accurate search results in the academic information services.
文摘This paper proposed a novel text representation and matching scheme for Chinese text retrieval. At present, the indexing methods of Chinese retrieval systems are either character-based or word-based. The character-based indexing methods, such as bi-gram or tri-gram indexing, have high false drops due to the mismatches between queries and documents. On the other hand, it's difficult to efficiently identify all the proper nouns, terminology of different domains, and phrases in the word-based indexing systems. The new indexing method uses both proximity and mutual information of the word pairs to represent the text content so as to overcome the high false drop, new word and phrase problems that exist in the character-based and word-based systems. The evaluation results indicate that the average query precision of proximity-based indexing is 5.2% higher than the best results of TREC-5.
基金the Nartional Basic Research Programof China(Grant No.2002CB312002)the Science and Technology Commission of Shanghai Munic-ipality Project(Grant No.03dz15027 and 03dz15028).
文摘To avoid the scalability of the existing systems that employed centralized indexing,index flooding or query flooding,we proposed an efficient peer-to-peer information retrieval system SPIRS (Semantic P2P-based Information Retrieval System) that supported state-of-the-art content and semantic searches. SPIRS distributes document indices through P2P network hierarchically by Latent Semantic Indexing (LSI) and organizes nodes into a hierarchical overlay through CAN and TRIE. Comparing with other P2P search techniques,those based on simple keyword matching,SPIRS has better accuracy for considering the advanced relevance among documents. Given a query,only a small number of nodes are needed for SPIRS to identify the matching documents. Furthermore,both theoretical analysis and experimental results show that SPIRS possesses higher accuracy and less logic hops.
文摘Developments in multimedia technologies have paved way for the storage of huge collections of video doc- uments on computer systems. It is essential to design tools for content-based access to the documents, so as to allow an efficient exploitation of these collections. Content based anal- ysis provides a flexible and powerful way to access video data when compared with the other traditional video analysis tech- niques. The area of content based video indexing and retrieval (CBVIR), focusing on automating the indexing, retrieval and management of video, has attracted extensive research in the last decade. CBVIR is a lively area of research with endur- ing acknowledgments from several domains. Herein a vital assessment of contemporary researches associated with the content-based indexing and retrieval of visual information. In this paper, we present an extensive review of significant researches on CBV1R. Concise description of content based video analysis along with the techniques associated with the content based video indexing and retrieval is presented.
基金This work was supported by the Key Technologies R&D Program of Shanghai under Grant No. 03DZ19320.
文摘A novel latent semantic indexing (LSI) approach for content-based image retrieval is presented in this paper. Firstly, an extension of non-negative matrix factorization (NMF) to supervised initialization is discussed. Then, supervised NMF is used in LSI to find the relationships between low-level features and high-level semantics. The retrieved results are compared with other approaches and a good performance is obtained.
文摘The volume of information being created, generated and stored is huge. Without adequate knowledge of Information Retrieval (IR) methods, the retrieval process for information would be cumbersome and frustrating. Studies have further revealed that IR methods are essential in information centres (for example, Digital Library environment) for storage and retrieval of information. Therefore, with more than one billion people accessing the Internet, and millions of queries being issued on a daily basis, modern Web search engines are facing a problem of daunting scale. The main problem associated with the existing search engines is how to avoid irrelevant information retrieval and to retrieve the relevant ones. In this study, the existing system of library retrieval was studied. Problems associated with them were analyzed in order to address this problem. The concept of existing information retrieval models was studied, and the knowledge gained was used to design a digital library information retrieval system. It was successfully implemented using a real life data. The need for a continuous evaluation of the IR methods for effective and efficient full text retrieval system was recommended.
文摘Traditional information retrieval systems respond to user queries with ranked lists of relevant documents. Since, XML (Extensible Markup Language) documents separate content and structure; XML-IR (information retrieval) systems are able to retrieve only the relevant portions of documents. Therefore, users who utilize an XML-IR system could potentially receive highly relevant and precise material. We have developed the XML information retrieval system by using MySQL and Sphinx, which we call MEXIR. In our system, XML documents are stored into one table that has fixed relational schema. The schema is independent of the logical structure of XML documents. Each node in XML documents is represented by labels that express the positions in XML tree, namely ADXPI scheme. Our system has performance experiments on INEX collections and shown an average up to four seconds better than GPX. In addition, it has been reduced the size of the data down by 82.29 % compare to GPX system.
基金This work is supported,in part,by the National Natural Science Foundation of China under grant numbers U1536206,U1405254,61772283,61602253,61672294,61502242in part,by the Jiangsu Basic Research Programs-Natural Science Foundation under grant numbers BK20150925 and BK20151530+1 种基金in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fundin part,by the Collaborative Innovation Center of Atmospheric Environment and Equipment Technology(CICAEET)fund,China.
文摘The traditional information hiding methods embed the secret information by modifying the carrier,which will inevitably leave traces of modification on the carrier.In this way,it is hard to resist the detection of steganalysis algorithm.To address this problem,the concept of coverless information hiding was proposed.Coverless information hiding can effectively resist steganalysis algorithm,since it uses unmodified natural stego-carriers to represent and convey confidential information.However,the state-of-the-arts method has a low hidden capacity,which makes it less appealing.Because the pixel values of different regions of the molecular structure images of material(MSIM)are usually different,this paper proposes a novel coverless information hiding method based on MSIM,which utilizes the average value of sub-image’s pixels to represent the secret information,according to the mapping between pixel value intervals and secret information.In addition,we employ a pseudo-random label sequence that is used to determine the position of sub-images to improve the security of the method.And the histogram of the Bag of words model(BOW)is used to determine the number of subimages in the image that convey secret information.Moreover,to improve the retrieval efficiency,we built a multi-level inverted index structure.Furthermore,the proposed method can also be used for other natural images.Compared with the state-of-the-arts,experimental results and analysis manifest that our method has better performance in anti-steganalysis,security and capacity.