In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction p...In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining.展开更多
Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which...Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which decreases the number of memory operations in construction and keeps the result tree sequential, is proposed. The experiment result shows that both the construction and the matching procedure are more efficient than Ukkonen algorithm.展开更多
Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large an...Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large and inexpensive memories are vulnerable to bit flips.Thus,the correctness of output of a classical algorithm can be threatened by a few memory faults.Fault tolerant data structures and resilient algorithms are developed to tolerate a limited number of faults and provide a correct output based on the uncorrupted part of the data.Suffix tree is one of the important data structures that has widespread applications including substring search,super string problem and data compression.The fault tolerant version of the suffix tree presented in the literature uses complex techniques of encodable and decodable error-correcting codes,blocked data structures and fault-resistant tries.In this work,we use the natural approach of data replication to develop a fault tolerant suffix tree based on the faulty memory random access machine model.The proposed data structure stores copies of the indices to sustain memory faults injected by an adversary.We develop a resilient version of the Ukkonen’s algorithm for constructing the fault tolerant suffix tree and derive an upper bound on the number of corrupt suffixes.展开更多
Hu Shuhe gets a sufficient condition on the law of the iterated logarithm for the sums of φ-mixing sequences with duple suffixes. This paper greatly improves his condition.
In the research on the Chinese temporal system,Chen(1988)proposed the ternary structure of Chinese temporal system.Based on the ternary structure of the Chinese temporal system,many researches have focused on discussi...In the research on the Chinese temporal system,Chen(1988)proposed the ternary structure of Chinese temporal system.Based on the ternary structure of the Chinese temporal system,many researches have focused on discussing Chinese aspect system.Compared with the research on aspect,there are fewer studies on Chinese verbal situations,such as Ma(1981),Deng(1985),Dai(1997),etc.,which are all based on Vendler’s(1967)four categories of verbal situations.And compared with verbal situation,there are fewer studies on phase.Most researchers believe that phase and verbal situations are the same concept.However,this article believes that in the study of Chinese temporal system,we should first distinguish between phase and verbal situations,and then compare with aspect.Based on the distinction between phase and verbal situations,this article combines the situation and verbal aspect suffix“LE”,which is also an aspect marker,and tries to sum up the relationship between the situation and“LE”.展开更多
In the higher vocational college,most students encounter great difficulties in dealing with English suffixes.This paper analyses the differences between English and Chinese morphology,especially suffixes that denote t...In the higher vocational college,most students encounter great difficulties in dealing with English suffixes.This paper analyses the differences between English and Chinese morphology,especially suffixes that denote tenses,numbers and comparisons and offer recommendations for English as a second language(ESL)classroom.展开更多
Sponsored by Shanghai Textile Holding (Group) Corporation, China Council for the Promotion of International Trade Shanghai Sub-council, China Chamber of International Commerce Shanghai Chamber of Commerce, and
文摘In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining.
基金supported by the National Natural Science Foundation of China(6050203260672068).
文摘Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which decreases the number of memory operations in construction and keeps the result tree sequential, is proposed. The experiment result shows that both the construction and the matching procedure are more efficient than Ukkonen algorithm.
文摘Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large and inexpensive memories are vulnerable to bit flips.Thus,the correctness of output of a classical algorithm can be threatened by a few memory faults.Fault tolerant data structures and resilient algorithms are developed to tolerate a limited number of faults and provide a correct output based on the uncorrupted part of the data.Suffix tree is one of the important data structures that has widespread applications including substring search,super string problem and data compression.The fault tolerant version of the suffix tree presented in the literature uses complex techniques of encodable and decodable error-correcting codes,blocked data structures and fault-resistant tries.In this work,we use the natural approach of data replication to develop a fault tolerant suffix tree based on the faulty memory random access machine model.The proposed data structure stores copies of the indices to sustain memory faults injected by an adversary.We develop a resilient version of the Ukkonen’s algorithm for constructing the fault tolerant suffix tree and derive an upper bound on the number of corrupt suffixes.
文摘Hu Shuhe gets a sufficient condition on the law of the iterated logarithm for the sums of φ-mixing sequences with duple suffixes. This paper greatly improves his condition.
文摘In the research on the Chinese temporal system,Chen(1988)proposed the ternary structure of Chinese temporal system.Based on the ternary structure of the Chinese temporal system,many researches have focused on discussing Chinese aspect system.Compared with the research on aspect,there are fewer studies on Chinese verbal situations,such as Ma(1981),Deng(1985),Dai(1997),etc.,which are all based on Vendler’s(1967)four categories of verbal situations.And compared with verbal situation,there are fewer studies on phase.Most researchers believe that phase and verbal situations are the same concept.However,this article believes that in the study of Chinese temporal system,we should first distinguish between phase and verbal situations,and then compare with aspect.Based on the distinction between phase and verbal situations,this article combines the situation and verbal aspect suffix“LE”,which is also an aspect marker,and tries to sum up the relationship between the situation and“LE”.
文摘In the higher vocational college,most students encounter great difficulties in dealing with English suffixes.This paper analyses the differences between English and Chinese morphology,especially suffixes that denote tenses,numbers and comparisons and offer recommendations for English as a second language(ESL)classroom.
文摘Sponsored by Shanghai Textile Holding (Group) Corporation, China Council for the Promotion of International Trade Shanghai Sub-council, China Chamber of International Commerce Shanghai Chamber of Commerce, and