Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and re...Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.展开更多
Cell-phone short messages in English possess special styles. Some are in a literary style, and others in a non-literary one. In the present paper, the stylistic features of English short messages are illustrated by ap...Cell-phone short messages in English possess special styles. Some are in a literary style, and others in a non-literary one. In the present paper, the stylistic features of English short messages are illustrated by applying the stylistic theory and employing the "Model of Analyzing Textual Function". The features of the two styles of messages are described and analyzed from the perspectives of their pronunciation, graphology, vocabulary, grammar, discourse, rhetoric, etc., and the stylistic features serve as the basis for discussing the translation principles and methods of English short messages.展开更多
On the basis of information theory and statistical methods, we use mutual information, n- tuple entropy and conditional entropy, combined with biological characteristics, to analyze the long range correlation and shor...On the basis of information theory and statistical methods, we use mutual information, n- tuple entropy and conditional entropy, combined with biological characteristics, to analyze the long range correlation and short range correlation in human Y chromosome palindromes. The magnitude distribution of the long range correlation which can be reflected by the mutual information is PS〉PSa〉PSb (P5a and P5b are the sequences that replace solely Alu repeats and all interspersed repeats with random uneorrelated sequences in human Y chromosome palindrome 5, respectively); and the magnitude distribution of the short range correlation which can be reflected by the n-tuple entropy and the conditional entropy is PS〉P5a〉PSb〉random uncorrelated sequence. In other words, when the Alu repeats and all interspersed repeats replace with random uneorrelated sequence, the long range and short range correlation decrease gradually. However, the random nncorrelated sequence has no correlation. This research indicates that more repeat sequences result in stronger correlation between bases in human Y chromosome. The analyses may be helpful to understand the special structures of human Y chromosome palindromes profoundly.展开更多
A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywor...A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described.展开更多
A new joint decoding strategy that combines the character-based and word-based conditional random field model is proposed.In this segmentation framework,fragments are used to generate candidate Out-of-Vocabularies(OOV...A new joint decoding strategy that combines the character-based and word-based conditional random field model is proposed.In this segmentation framework,fragments are used to generate candidate Out-of-Vocabularies(OOVs).After the initial segmentation,the segmentation fragments are divided into two classes as "combination"(combining several fragments as an unknown word) and "segregation"(segregating to some words).So,more OOVs can be recalled.Moreover,for the characteristics of the cross-domain segmentation,context information is reasonably used to guide Chinese Word Segmentation(CWS).This method is proved to be effective through several experiments on the test data from Sighan Bakeoffs 2007 and Bakeoffs 2010.The rates of OOV recall obtain better performance and the overall segmentation performances achieve a good effect.展开更多
Human-object interaction(HOIs)detection is a new branch of visual relationship detection,which plays an important role in the field of image understanding.Because of the complexity and diversity of image content,the d...Human-object interaction(HOIs)detection is a new branch of visual relationship detection,which plays an important role in the field of image understanding.Because of the complexity and diversity of image content,the detection of HOIs is still an onerous challenge.Unlike most of the current works for HOIs detection which only rely on the pairwise information of a human and an object,we propose a graph-based HOIs detection method that models context and global structure information.Firstly,to better utilize the relations between humans and objects,the detected humans and objects are regarded as nodes to construct a fully connected undirected graph,and the graph is pruned to obtain an HOI graph that only preserving the edges connecting human and object nodes.Then,in order to obtain more robust features of human and object nodes,two different attention-based feature extraction networks are proposed,which model global and local contexts respectively.Finally,the graph attention network is introduced to pass messages between different nodes in the HOI graph iteratively,and detect the potential HOIs.Experiments on V-COCO and HICO-DET datasets verify the effectiveness of the proposed method,and show that it is superior to many existing methods.展开更多
In order to improve the accuracy of biophysical parameters retrieved from remotely sensing data, a new algorithm was presented by using spatial contextual to estimate canopy variables from high-resolution remote sensi...In order to improve the accuracy of biophysical parameters retrieved from remotely sensing data, a new algorithm was presented by using spatial contextual to estimate canopy variables from high-resolution remote sensing images. The developed algorithm was used for inversion of leaf area index (LAI) from Enhanced Thematic Mapper Plus (ETM+) data by combining with optimization method to minimize cost functions. The results show that the distribution of LAI is spatially consistent with the false composition imagery from ETM+ and the accuracy of LAI is significantly improved over the results retrieved by the conventional pixelwise retrieval methods, demonstrating that this method can be reliably used to integrate spatial contextual information for inverting LAI from high-resolution remote sensing images.展开更多
文摘Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.
文摘Cell-phone short messages in English possess special styles. Some are in a literary style, and others in a non-literary one. In the present paper, the stylistic features of English short messages are illustrated by applying the stylistic theory and employing the "Model of Analyzing Textual Function". The features of the two styles of messages are described and analyzed from the perspectives of their pronunciation, graphology, vocabulary, grammar, discourse, rhetoric, etc., and the stylistic features serve as the basis for discussing the translation principles and methods of English short messages.
基金This work was supported by the National Natu- ral Science Foundation of China (No.20173023 and No.90203012) and the Specialized Research Fund for the Doctoral Program of Higher Education of China (No.20020730006).
文摘On the basis of information theory and statistical methods, we use mutual information, n- tuple entropy and conditional entropy, combined with biological characteristics, to analyze the long range correlation and short range correlation in human Y chromosome palindromes. The magnitude distribution of the long range correlation which can be reflected by the mutual information is PS〉PSa〉PSb (P5a and P5b are the sequences that replace solely Alu repeats and all interspersed repeats with random uneorrelated sequences in human Y chromosome palindrome 5, respectively); and the magnitude distribution of the short range correlation which can be reflected by the n-tuple entropy and the conditional entropy is PS〉P5a〉PSb〉random uncorrelated sequence. In other words, when the Alu repeats and all interspersed repeats replace with random uneorrelated sequence, the long range and short range correlation decrease gradually. However, the random nncorrelated sequence has no correlation. This research indicates that more repeat sequences result in stronger correlation between bases in human Y chromosome. The analyses may be helpful to understand the special structures of human Y chromosome palindromes profoundly.
基金The Young Teachers Scientific Research Foundation (YTSRF) of Nanjing University of Science and Technology in the Year of2005-2006.
文摘A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described.
基金supported by the National Natural Science Foundation of China under Grants No.61173100,No.61173101the Fundamental Research Funds for the Central Universities under Grant No.DUT10RW202
文摘A new joint decoding strategy that combines the character-based and word-based conditional random field model is proposed.In this segmentation framework,fragments are used to generate candidate Out-of-Vocabularies(OOVs).After the initial segmentation,the segmentation fragments are divided into two classes as "combination"(combining several fragments as an unknown word) and "segregation"(segregating to some words).So,more OOVs can be recalled.Moreover,for the characteristics of the cross-domain segmentation,context information is reasonably used to guide Chinese Word Segmentation(CWS).This method is proved to be effective through several experiments on the test data from Sighan Bakeoffs 2007 and Bakeoffs 2010.The rates of OOV recall obtain better performance and the overall segmentation performances achieve a good effect.
基金Project(51678075)supported by the National Natural Science Foundation of ChinaProject(2017GK2271)supported by the Hunan Provincial Science and Technology Department,China。
文摘Human-object interaction(HOIs)detection is a new branch of visual relationship detection,which plays an important role in the field of image understanding.Because of the complexity and diversity of image content,the detection of HOIs is still an onerous challenge.Unlike most of the current works for HOIs detection which only rely on the pairwise information of a human and an object,we propose a graph-based HOIs detection method that models context and global structure information.Firstly,to better utilize the relations between humans and objects,the detected humans and objects are regarded as nodes to construct a fully connected undirected graph,and the graph is pruned to obtain an HOI graph that only preserving the edges connecting human and object nodes.Then,in order to obtain more robust features of human and object nodes,two different attention-based feature extraction networks are proposed,which model global and local contexts respectively.Finally,the graph attention network is introduced to pass messages between different nodes in the HOI graph iteratively,and detect the potential HOIs.Experiments on V-COCO and HICO-DET datasets verify the effectiveness of the proposed method,and show that it is superior to many existing methods.
基金Project(2007CB714407) supported by the Major State Basic Research and Development Program of ChinaProject(2004DFA06300) supported by Key International Collaboration Project in Science and TechnologyProjects(40571107, 40701102) supported by the National Natural Science Foundation of China
文摘In order to improve the accuracy of biophysical parameters retrieved from remotely sensing data, a new algorithm was presented by using spatial contextual to estimate canopy variables from high-resolution remote sensing images. The developed algorithm was used for inversion of leaf area index (LAI) from Enhanced Thematic Mapper Plus (ETM+) data by combining with optimization method to minimize cost functions. The results show that the distribution of LAI is spatially consistent with the false composition imagery from ETM+ and the accuracy of LAI is significantly improved over the results retrieved by the conventional pixelwise retrieval methods, demonstrating that this method can be reliably used to integrate spatial contextual information for inverting LAI from high-resolution remote sensing images.