Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and re...Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.展开更多
Mazu is the most famous goddess of canal transport in China,and one of the three folk beliefs in China.Japan is our neighbor across the sea.As early as 1000 years ago,Japan was influenced by the Mazu ceremonial cultur...Mazu is the most famous goddess of canal transport in China,and one of the three folk beliefs in China.Japan is our neighbor across the sea.As early as 1000 years ago,Japan was influenced by the Mazu ceremonial culture.Through big data analysis,this study conducted database counting,screening,and analysis on the Mazu culture in Diaolong,the full-text database of Chinese and Japanese ancient books.Besides,it explored the hot topics of concern and emotional attitudes,and then analyzed the important role of Mazu culture in the cultural exchange and mutual learning between China and Japan in the new era,with a view to completing the contemporary task of“people-to-people bond”and achieving common development.展开更多
As historical Chinese calligraphy works are being digitized, the problem of retrieval becomes a new challenge. But, currently no OCR technique can convert calligraphy character images into text, nor can the existing H...As historical Chinese calligraphy works are being digitized, the problem of retrieval becomes a new challenge. But, currently no OCR technique can convert calligraphy character images into text, nor can the existing Handwriting Character Recognition approach does not work for it. This paper proposes a novel approach to efficiently retrieving Chinese calligraphy characters on the basis of similarity: calligraphy character image is represented by a collection of discriminative features, and high retrieval speed with reasonable effectiveness is achieved. First, calligraphy characters that have no possibility similar to the query are filtered out step by step by comparing the character complexity, stroke density and stroke protrusion. Then, similar calligraphy characters axe retrieved and ranked according to their matching cost produced by approximate shape match. In order to speed up the retrieval, we employed high dimensional data structure - PK-tree. Finally, the efficiency of the algorithm is demonstrated by a preliminary experiment with 3012 calligraphy character images.展开更多
文摘Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.
文摘Mazu is the most famous goddess of canal transport in China,and one of the three folk beliefs in China.Japan is our neighbor across the sea.As early as 1000 years ago,Japan was influenced by the Mazu ceremonial culture.Through big data analysis,this study conducted database counting,screening,and analysis on the Mazu culture in Diaolong,the full-text database of Chinese and Japanese ancient books.Besides,it explored the hot topics of concern and emotional attitudes,and then analyzed the important role of Mazu culture in the cultural exchange and mutual learning between China and Japan in the new era,with a view to completing the contemporary task of“people-to-people bond”and achieving common development.
基金Supported by the National Natural Science Foundation of China(Grant Nos.60533090,60525108)the National Grand Fundamental Research 973 Program of China(Grant No.2002CB312101)+1 种基金the Science and Technology Project of Zhejiang Province(2005C13032,2005C11001-05)the China-US Million Book Digital Library Project(www.cadal.zju.edu.cn).
文摘As historical Chinese calligraphy works are being digitized, the problem of retrieval becomes a new challenge. But, currently no OCR technique can convert calligraphy character images into text, nor can the existing Handwriting Character Recognition approach does not work for it. This paper proposes a novel approach to efficiently retrieving Chinese calligraphy characters on the basis of similarity: calligraphy character image is represented by a collection of discriminative features, and high retrieval speed with reasonable effectiveness is achieved. First, calligraphy characters that have no possibility similar to the query are filtered out step by step by comparing the character complexity, stroke density and stroke protrusion. Then, similar calligraphy characters axe retrieved and ranked according to their matching cost produced by approximate shape match. In order to speed up the retrieval, we employed high dimensional data structure - PK-tree. Finally, the efficiency of the algorithm is demonstrated by a preliminary experiment with 3012 calligraphy character images.