The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer s...The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.展开更多
Sequence tag index in the field of computational proteomics can be used to facilitate faster open-search-based identification of modified peptides and in-depth analysis of mass spectrometry data. In protein-identifica...Sequence tag index in the field of computational proteomics can be used to facilitate faster open-search-based identification of modified peptides and in-depth analysis of mass spectrometry data. In protein-identification search engines, sequence tag index are playing a prominent role in recent ten years due to fast searching speed. However, in pursuit of less index space consumption, some protein search engines design excessively concise index schemes which lead to higher computational burden. We proposed a new tag index scheme named TIIP with a better balance between space and time complexity. TIIP has a unique two-level hierarchical index structure which allows rapid retrieval of all peptide sequences and their corresponding masses. Theoretically, the index space consumption of TIIP is not much higher compared to the typical tag index schemes, but the time complexity of sequence retrieval can be reduced to O(1), and practically, TIIP has about one million fold improvement in searching speed compared with brute force approach.展开更多
The traditional information hiding methods embed the secret information by modifying the carrier,which will inevitably leave traces of modification on the carrier.In this way,it is hard to resist the detection of steg...The traditional information hiding methods embed the secret information by modifying the carrier,which will inevitably leave traces of modification on the carrier.In this way,it is hard to resist the detection of steganalysis algorithm.To address this problem,the concept of coverless information hiding was proposed.Coverless information hiding can effectively resist steganalysis algorithm,since it uses unmodified natural stego-carriers to represent and convey confidential information.However,the state-of-the-arts method has a low hidden capacity,which makes it less appealing.Because the pixel values of different regions of the molecular structure images of material(MSIM)are usually different,this paper proposes a novel coverless information hiding method based on MSIM,which utilizes the average value of sub-image’s pixels to represent the secret information,according to the mapping between pixel value intervals and secret information.In addition,we employ a pseudo-random label sequence that is used to determine the position of sub-images to improve the security of the method.And the histogram of the Bag of words model(BOW)is used to determine the number of subimages in the image that convey secret information.Moreover,to improve the retrieval efficiency,we built a multi-level inverted index structure.Furthermore,the proposed method can also be used for other natural images.Compared with the state-of-the-arts,experimental results and analysis manifest that our method has better performance in anti-steganalysis,security and capacity.展开更多
The boom of Internet and multimedia technology leads to the explosion of multimedia information, especially image, which has created an urgent need of quickly retrieving similar and interested images from huge image c...The boom of Internet and multimedia technology leads to the explosion of multimedia information, especially image, which has created an urgent need of quickly retrieving similar and interested images from huge image collections. The content-based high-dimensional indexing mechanism holds the key to achieving this goal by efficiently organizing the content of images and storing them in computer memory. In the past decades, many important developments in high-dimensional image indexing technologies have occurred to cope with the 'curse of dimensionality'. The high-dimensional indexing mechanisms can mainly be divided into three categories: tree-based index, hashing-based index, and visual words based inverted index. In this paper we review the technologies with respect to these three categories of mechanisms, and make several recommendations for future research issues.展开更多
Time intervals are often associated with tuples to represent their valid time in temporal relations, where overlap join is crucial for various kinds of queries. Many existing overlap join algorithms use indices based ...Time intervals are often associated with tuples to represent their valid time in temporal relations, where overlap join is crucial for various kinds of queries. Many existing overlap join algorithms use indices based on tree structures such as quad-tree, B+-tree and interval tree. These algorithms usually have high CPU cost since deep path traversals are unavoidable, which makes them not so competitive as data-partition or plane-sweep based algorithms. This paper proposes an efficient overlap join algorithm based on a new two-layer flat index named as Overlap Interval Inverted Index (i.e., O2i Index). It uses an array to record the end points of intervals and approximates the nesting structures of intervals via two functions in the first layer, and the second layer uses inverted lists to trace all intervals satisfying the approximated nesting structures. With the help of the new index, the join algorithm only visits the must-be-scanned lists and skips all others. Analyses and experiments on both real and synthetic datasets show that the proposed algorithm is as competitive as the state-of-the-art algorithms.展开更多
Color descriptors of an image are the most widely used visual features in content-based image retrieval sys- tems. In this study, we present a novel color-based image retrieval framework by integrating color space qua...Color descriptors of an image are the most widely used visual features in content-based image retrieval sys- tems. In this study, we present a novel color-based image retrieval framework by integrating color space quantization and feature coding. Although color features have advantages such as robustness and simple extraction, direct processing of the abundant amount of color information in an RGB image is a challenging task. To overcome this problem, a color space clustering quantization algorithm is proposed to obtain the clustering color space (CCS) by clustering the CIE1976L*a*b* space into 256 distinct colors, which ade- quately accommodate human visual perception. In addition, a new feature coding method called feature-to-character coding (FCC) is proposed to encode the block-based main color fea- tures into character codes. In this method, images are repre- sented by character codes that contribute to efficiently build- ing an inverted index by using color features and by utilizing text-based search engines. Benefiting from its high-efficiency computation, the proposed framework can also be applied to large-scale web image retrieval. The experimental results demonstrate that the proposed system can produce a signifi- cant augmentation in performance when compared to block- based main color image retrieval systems that utilize the tra- ditional HSV(Hue, Saturation, Value) quantization method.展开更多
文摘The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.
文摘Sequence tag index in the field of computational proteomics can be used to facilitate faster open-search-based identification of modified peptides and in-depth analysis of mass spectrometry data. In protein-identification search engines, sequence tag index are playing a prominent role in recent ten years due to fast searching speed. However, in pursuit of less index space consumption, some protein search engines design excessively concise index schemes which lead to higher computational burden. We proposed a new tag index scheme named TIIP with a better balance between space and time complexity. TIIP has a unique two-level hierarchical index structure which allows rapid retrieval of all peptide sequences and their corresponding masses. Theoretically, the index space consumption of TIIP is not much higher compared to the typical tag index schemes, but the time complexity of sequence retrieval can be reduced to O(1), and practically, TIIP has about one million fold improvement in searching speed compared with brute force approach.
基金This work is supported,in part,by the National Natural Science Foundation of China under grant numbers U1536206,U1405254,61772283,61602253,61672294,61502242in part,by the Jiangsu Basic Research Programs-Natural Science Foundation under grant numbers BK20150925 and BK20151530+1 种基金in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fundin part,by the Collaborative Innovation Center of Atmospheric Environment and Equipment Technology(CICAEET)fund,China.
文摘The traditional information hiding methods embed the secret information by modifying the carrier,which will inevitably leave traces of modification on the carrier.In this way,it is hard to resist the detection of steganalysis algorithm.To address this problem,the concept of coverless information hiding was proposed.Coverless information hiding can effectively resist steganalysis algorithm,since it uses unmodified natural stego-carriers to represent and convey confidential information.However,the state-of-the-arts method has a low hidden capacity,which makes it less appealing.Because the pixel values of different regions of the molecular structure images of material(MSIM)are usually different,this paper proposes a novel coverless information hiding method based on MSIM,which utilizes the average value of sub-image’s pixels to represent the secret information,according to the mapping between pixel value intervals and secret information.In addition,we employ a pseudo-random label sequence that is used to determine the position of sub-images to improve the security of the method.And the histogram of the Bag of words model(BOW)is used to determine the number of subimages in the image that convey secret information.Moreover,to improve the retrieval efficiency,we built a multi-level inverted index structure.Furthermore,the proposed method can also be used for other natural images.Compared with the state-of-the-arts,experimental results and analysis manifest that our method has better performance in anti-steganalysis,security and capacity.
基金supported by the National Natural Science Foundation of China (Nos. 61173114, 61202300, and 61272202)the Guangdong Provincial Research Project (No. 2011B090400251)
文摘The boom of Internet and multimedia technology leads to the explosion of multimedia information, especially image, which has created an urgent need of quickly retrieving similar and interested images from huge image collections. The content-based high-dimensional indexing mechanism holds the key to achieving this goal by efficiently organizing the content of images and storing them in computer memory. In the past decades, many important developments in high-dimensional image indexing technologies have occurred to cope with the 'curse of dimensionality'. The high-dimensional indexing mechanisms can mainly be divided into three categories: tree-based index, hashing-based index, and visual words based inverted index. In this paper we review the technologies with respect to these three categories of mechanisms, and make several recommendations for future research issues.
文摘Time intervals are often associated with tuples to represent their valid time in temporal relations, where overlap join is crucial for various kinds of queries. Many existing overlap join algorithms use indices based on tree structures such as quad-tree, B+-tree and interval tree. These algorithms usually have high CPU cost since deep path traversals are unavoidable, which makes them not so competitive as data-partition or plane-sweep based algorithms. This paper proposes an efficient overlap join algorithm based on a new two-layer flat index named as Overlap Interval Inverted Index (i.e., O2i Index). It uses an array to record the end points of intervals and approximates the nesting structures of intervals via two functions in the first layer, and the second layer uses inverted lists to trace all intervals satisfying the approximated nesting structures. With the help of the new index, the join algorithm only visits the must-be-scanned lists and skips all others. Analyses and experiments on both real and synthetic datasets show that the proposed algorithm is as competitive as the state-of-the-art algorithms.
基金This work was supported in part by the National Natu- ral Science Foundation of China (Grant No. 61370149), in part by the Funda- mental Research Funds for the Central Universities (ZYGX2013J083), and in part by the Scientific Research Foundation for the Returned Overseas Chi- nese Scholars, State Education Ministry.
文摘Color descriptors of an image are the most widely used visual features in content-based image retrieval sys- tems. In this study, we present a novel color-based image retrieval framework by integrating color space quantization and feature coding. Although color features have advantages such as robustness and simple extraction, direct processing of the abundant amount of color information in an RGB image is a challenging task. To overcome this problem, a color space clustering quantization algorithm is proposed to obtain the clustering color space (CCS) by clustering the CIE1976L*a*b* space into 256 distinct colors, which ade- quately accommodate human visual perception. In addition, a new feature coding method called feature-to-character coding (FCC) is proposed to encode the block-based main color fea- tures into character codes. In this method, images are repre- sented by character codes that contribute to efficiently build- ing an inverted index by using color features and by utilizing text-based search engines. Benefiting from its high-efficiency computation, the proposed framework can also be applied to large-scale web image retrieval. The experimental results demonstrate that the proposed system can produce a signifi- cant augmentation in performance when compared to block- based main color image retrieval systems that utilize the tra- ditional HSV(Hue, Saturation, Value) quantization method.