In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction p...In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining.展开更多
Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which...Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which decreases the number of memory operations in construction and keeps the result tree sequential, is proposed. The experiment result shows that both the construction and the matching procedure are more efficient than Ukkonen algorithm.展开更多
Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large an...Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large and inexpensive memories are vulnerable to bit flips.Thus,the correctness of output of a classical algorithm can be threatened by a few memory faults.Fault tolerant data structures and resilient algorithms are developed to tolerate a limited number of faults and provide a correct output based on the uncorrupted part of the data.Suffix tree is one of the important data structures that has widespread applications including substring search,super string problem and data compression.The fault tolerant version of the suffix tree presented in the literature uses complex techniques of encodable and decodable error-correcting codes,blocked data structures and fault-resistant tries.In this work,we use the natural approach of data replication to develop a fault tolerant suffix tree based on the faulty memory random access machine model.The proposed data structure stores copies of the indices to sustain memory faults injected by an adversary.We develop a resilient version of the Ukkonen’s algorithm for constructing the fault tolerant suffix tree and derive an upper bound on the number of corrupt suffixes.展开更多
Hu Shuhe gets a sufficient condition on the law of the iterated logarithm for the sums of φ-mixing sequences with duple suffixes. This paper greatly improves his condition.
In the research on the Chinese temporal system,Chen(1988)proposed the ternary structure of Chinese temporal system.Based on the ternary structure of the Chinese temporal system,many researches have focused on discussi...In the research on the Chinese temporal system,Chen(1988)proposed the ternary structure of Chinese temporal system.Based on the ternary structure of the Chinese temporal system,many researches have focused on discussing Chinese aspect system.Compared with the research on aspect,there are fewer studies on Chinese verbal situations,such as Ma(1981),Deng(1985),Dai(1997),etc.,which are all based on Vendler’s(1967)four categories of verbal situations.And compared with verbal situation,there are fewer studies on phase.Most researchers believe that phase and verbal situations are the same concept.However,this article believes that in the study of Chinese temporal system,we should first distinguish between phase and verbal situations,and then compare with aspect.Based on the distinction between phase and verbal situations,this article combines the situation and verbal aspect suffix“LE”,which is also an aspect marker,and tries to sum up the relationship between the situation and“LE”.展开更多
In the higher vocational college,most students encounter great difficulties in dealing with English suffixes.This paper analyses the differences between English and Chinese morphology,especially suffixes that denote t...In the higher vocational college,most students encounter great difficulties in dealing with English suffixes.This paper analyses the differences between English and Chinese morphology,especially suffixes that denote tenses,numbers and comparisons and offer recommendations for English as a second language(ESL)classroom.展开更多
提出一种基于广义后缀树的概念生成算法(generalized suffix tree based concept generation algorithm,GSTCG),将背景中所有对象的属性序列及其后缀建立为一棵广义后缀树,并根据广义后缀树产生候选概念;其次,合并具有相同对象集合的候...提出一种基于广义后缀树的概念生成算法(generalized suffix tree based concept generation algorithm,GSTCG),将背景中所有对象的属性序列及其后缀建立为一棵广义后缀树,并根据广义后缀树产生候选概念;其次,合并具有相同对象集合的候选概念,再根据规则对候选概念进行扩展;最后,删除冗余的候选概念后得到全部形式概念。在两类不同参数人工数据集上的实验结果表明,GSTCG算法与NextClosure算法在所有背景上得到的概念数量一致,且前者具有更优的时间性能。展开更多
文摘In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining.
基金supported by the National Natural Science Foundation of China(6050203260672068).
文摘Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which decreases the number of memory operations in construction and keeps the result tree sequential, is proposed. The experiment result shows that both the construction and the matching procedure are more efficient than Ukkonen algorithm.
文摘Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large and inexpensive memories are vulnerable to bit flips.Thus,the correctness of output of a classical algorithm can be threatened by a few memory faults.Fault tolerant data structures and resilient algorithms are developed to tolerate a limited number of faults and provide a correct output based on the uncorrupted part of the data.Suffix tree is one of the important data structures that has widespread applications including substring search,super string problem and data compression.The fault tolerant version of the suffix tree presented in the literature uses complex techniques of encodable and decodable error-correcting codes,blocked data structures and fault-resistant tries.In this work,we use the natural approach of data replication to develop a fault tolerant suffix tree based on the faulty memory random access machine model.The proposed data structure stores copies of the indices to sustain memory faults injected by an adversary.We develop a resilient version of the Ukkonen’s algorithm for constructing the fault tolerant suffix tree and derive an upper bound on the number of corrupt suffixes.
文摘Hu Shuhe gets a sufficient condition on the law of the iterated logarithm for the sums of φ-mixing sequences with duple suffixes. This paper greatly improves his condition.
文摘In the research on the Chinese temporal system,Chen(1988)proposed the ternary structure of Chinese temporal system.Based on the ternary structure of the Chinese temporal system,many researches have focused on discussing Chinese aspect system.Compared with the research on aspect,there are fewer studies on Chinese verbal situations,such as Ma(1981),Deng(1985),Dai(1997),etc.,which are all based on Vendler’s(1967)four categories of verbal situations.And compared with verbal situation,there are fewer studies on phase.Most researchers believe that phase and verbal situations are the same concept.However,this article believes that in the study of Chinese temporal system,we should first distinguish between phase and verbal situations,and then compare with aspect.Based on the distinction between phase and verbal situations,this article combines the situation and verbal aspect suffix“LE”,which is also an aspect marker,and tries to sum up the relationship between the situation and“LE”.
文摘In the higher vocational college,most students encounter great difficulties in dealing with English suffixes.This paper analyses the differences between English and Chinese morphology,especially suffixes that denote tenses,numbers and comparisons and offer recommendations for English as a second language(ESL)classroom.
文摘提出一种基于广义后缀树的概念生成算法(generalized suffix tree based concept generation algorithm,GSTCG),将背景中所有对象的属性序列及其后缀建立为一棵广义后缀树,并根据广义后缀树产生候选概念;其次,合并具有相同对象集合的候选概念,再根据规则对候选概念进行扩展;最后,删除冗余的候选概念后得到全部形式概念。在两类不同参数人工数据集上的实验结果表明,GSTCG算法与NextClosure算法在所有背景上得到的概念数量一致,且前者具有更优的时间性能。