In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction p...In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining.展开更多
A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phr...A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.展开更多
There exists an inherent difficulty in the original algorithm for the construction of Dwarf, which prevents it from constructing true Dwarfs. We explained when and why it introduces suffix redundancies into the Dwarf ...There exists an inherent difficulty in the original algorithm for the construction of Dwarf, which prevents it from constructing true Dwarfs. We explained when and why it introduces suffix redundancies into the Dwarf structure. To solve this problem, we proposed a completely new algorithm called PID. It bottom-up computes partitions of a fact table, and inserts them into the Dwarf structure. If a partition is an MSV partition, coalesce its sub-Dwarf; otherwise create necessary nodes and cells. Our performance study showed that PID is efficient. For further condensing of Dwarf, we proposed Condensed Dwarf, a more com- pressed structure, combining the strength of Dwarf and Condensed Cube. By eliminating unnecessary stores of “ALL” cells from the Dwarf structure, Condensed Dwarf could effectively reduce the size of Dwarf, especially for Dwarfs of the real world, which was illustrated by our experiments. Its query processing is still simple and, only two minor modifications to PID are required for the construction of Condensed Dwarf.展开更多
[Objective] Taking the knowledge of tea-science field as research object,an extraction method for the taxonomic relation of ontology conception was proposed in the paper.[Method] Through improving the rule based on la...[Objective] Taking the knowledge of tea-science field as research object,an extraction method for the taxonomic relation of ontology conception was proposed in the paper.[Method] Through improving the rule based on language mode,generalized suffix tree was constructed for the concept set of tea-science field,forming hierarchical structure and taxonomic relation among conceptions.[Result and Conclusion] Moreover,corresponding prototype system was developed based on above method,and test result indicating that the method was effective.展开更多
Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which...Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which decreases the number of memory operations in construction and keeps the result tree sequential, is proposed. The experiment result shows that both the construction and the matching procedure are more efficient than Ukkonen algorithm.展开更多
Entity relation is an essential component of some famous knowledge bases,such as Freebase,Yago and Knowledge Graph,while the hyponymy plays an important role in entity relations that show the relationship between the ...Entity relation is an essential component of some famous knowledge bases,such as Freebase,Yago and Knowledge Graph,while the hyponymy plays an important role in entity relations that show the relationship between the more general terms(hypernyms)and the more specific instances of the terms(hyponyms).In this paper,we present a comprehensive scheme of open-domain Chinese entity hypernym hierarchical construction.Some of the most important unsupervised and heuristic approaches for building hierarchical structure are covered in sufficient detail along with reasonable analyses.We experimentally evaluate the proposed methods and compare them with other baselines.The result shows high precision of our method and the proposed scheme will be further improved with larger scale corpora.展开更多
Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large an...Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large and inexpensive memories are vulnerable to bit flips.Thus,the correctness of output of a classical algorithm can be threatened by a few memory faults.Fault tolerant data structures and resilient algorithms are developed to tolerate a limited number of faults and provide a correct output based on the uncorrupted part of the data.Suffix tree is one of the important data structures that has widespread applications including substring search,super string problem and data compression.The fault tolerant version of the suffix tree presented in the literature uses complex techniques of encodable and decodable error-correcting codes,blocked data structures and fault-resistant tries.In this work,we use the natural approach of data replication to develop a fault tolerant suffix tree based on the faulty memory random access machine model.The proposed data structure stores copies of the indices to sustain memory faults injected by an adversary.We develop a resilient version of the Ukkonen’s algorithm for constructing the fault tolerant suffix tree and derive an upper bound on the number of corrupt suffixes.展开更多
Hu Shuhe gets a sufficient condition on the law of the iterated logarithm for the sums of φ-mixing sequences with duple suffixes. This paper greatly improves his condition.
There are settings where encryption must be performed by a sender under a time constraint. This paper de-scribes an encryption/decryption algorithm based on modular arithmetic of complex integers called Gaus-sians. It...There are settings where encryption must be performed by a sender under a time constraint. This paper de-scribes an encryption/decryption algorithm based on modular arithmetic of complex integers called Gaus-sians. It is shown how cubic extractors operate and how to find all cubic roots of the Gaussian. All validations (proofs) are provided in the Appendix. Detailed numeric illustrations explain how to use the method of digital isotopes to avoid ambiguity in recovery of the original plaintext by the receiver.展开更多
Mesqan is a South Ethio-Semitic tonguewhich is mainly worn in day-to-day message by a people of on 179,737 communities in the Gurage Zone,Ethiopia,whose linguistic skin were not well expressed.The inner aspire of this...Mesqan is a South Ethio-Semitic tonguewhich is mainly worn in day-to-day message by a people of on 179,737 communities in the Gurage Zone,Ethiopia,whose linguistic skin were not well expressed.The inner aspire of this paper is to offer a complete account of noun phrase structures of the Mesqan tongue.The paper is expressive in character,as the lessons is mostly worried with telling what is really being in the tongue,and mostly relies on main linguistic facts.The linguistic facts,i.e.the elicited grammatical facts regarding noun phrases,was composed from local speakers of the tongue during 12 months of fieldwork mannered among 2011 and 2012 in four Mesqan villages and in Butajira,the managerial hub of the Mesqan Woreda.The head of a NP can be a pronoun,a noun or an adjective.The head alone can constitute a full noun phrase.Adjectives,nouns in the genitive,or relative clauses function as modifiers of head nouns.Quantifiers are numerals,unspecific quantifiers,determiners include the definite marker,demonstrative pronouns,and possessive suffixes occur in two positions to the head noun.Only the demonstrative pronouns and the number‘one’when used as indefinite marker occur in phrase-initial position,while all other determiners follow the head.展开更多
In the research on the Chinese temporal system,Chen(1988)proposed the ternary structure of Chinese temporal system.Based on the ternary structure of the Chinese temporal system,many researches have focused on discussi...In the research on the Chinese temporal system,Chen(1988)proposed the ternary structure of Chinese temporal system.Based on the ternary structure of the Chinese temporal system,many researches have focused on discussing Chinese aspect system.Compared with the research on aspect,there are fewer studies on Chinese verbal situations,such as Ma(1981),Deng(1985),Dai(1997),etc.,which are all based on Vendler’s(1967)four categories of verbal situations.And compared with verbal situation,there are fewer studies on phase.Most researchers believe that phase and verbal situations are the same concept.However,this article believes that in the study of Chinese temporal system,we should first distinguish between phase and verbal situations,and then compare with aspect.Based on the distinction between phase and verbal situations,this article combines the situation and verbal aspect suffix“LE”,which is also an aspect marker,and tries to sum up the relationship between the situation and“LE”.展开更多
. Applied Mathematics A Journal of Chinese Universities, abbreviated to Appl. Math.-- JCU, is a nationwide journal sponsored by Zhejiang University' The Journal aims at issuing academic works in Applied Mathematic.... Applied Mathematics A Journal of Chinese Universities, abbreviated to Appl. Math.-- JCU, is a nationwide journal sponsored by Zhejiang University' The Journal aims at issuing academic works in Applied Mathematics: original theoretical and (or) methodological research results, and innovative applications in practical fields. From 1994 on, one volume will be publishedper annum,consisting of four issues in chinese, appeared quarterly as Ser. A,and four issues inEnglish, appeared quarterly as Ser. B. Contents in the two series will'not overlap. The Journal isdistributed domestically and abroad.2. Instructions to author(s)Appl. Math. -- JCU Ser. B publishes full length papers. In view of the high cost of printing, authors should keep their papers as short as are consistent with clarity' Unnecessary introductory material should be avoided. Graphical presentation of information should be confined toas few separate diagrams as are practicable. The rules of grammar should be observed.The submission of an article will be taken to indicate that it has not been and will not be submitted for publication elsewhere.Script Requirements for All ArticlesManuscript: The manuscript must be typed in English double--spaced on one side of A4 goodquality white paper. The maximum length of an article is 15 pages, including diagrams and tables. Two copies of an article are reguired for submission (not to be returned).Abstract A short Abstract not exceeding 200 words should appear at the beginning of the paper after the title, name (s) of author (s), affiliation(s) and address (es). It should contain no reference and mathematical symbols should be kept to a minumum.展开更多
English Vocabulary study is of great importance because it is related to comprehesive skills on English learning.That is,it is a key factor to learn English well.Therefore we must study and know how to develope vocabu...English Vocabulary study is of great importance because it is related to comprehesive skills on English learning.That is,it is a key factor to learn English well.Therefore we must study and know how to develope vocabulary.This article mainly explores how to build up our English vocabulary.展开更多
Words are the foundations of language. They play an important role in translation and communication. To better understand English and Chinese and to use them exactly, the distinctions of them in word formation are pre...Words are the foundations of language. They play an important role in translation and communication. To better understand English and Chinese and to use them exactly, the distinctions of them in word formation are presented, especially derivation including prefix and suffix.展开更多
In order to enlarge English vocabulary,we need to have some methods.I'd like to share my experience with beginners how I enlarge English vocabulary when when I am learning English.It is a long process and needs ha...In order to enlarge English vocabulary,we need to have some methods.I'd like to share my experience with beginners how I enlarge English vocabulary when when I am learning English.It is a long process and needs hard work and patience.展开更多
文摘In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining.
基金Foundation item: Supported by the National Natural Science Foundation of China (60503020, 60503033, 60703086)Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow Uni-versity (KJS0714)+1 种基金Research Foundation of Nanjing University of Posts and Telecommunications (NY207052, NY207082)National Natural Science Foundation of Jiangsu (BK2006094).
文摘A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.
基金Project (No. 20030487032) supported by the Specialized Research Fund for the Doctoral Program of Higher Education, China
文摘There exists an inherent difficulty in the original algorithm for the construction of Dwarf, which prevents it from constructing true Dwarfs. We explained when and why it introduces suffix redundancies into the Dwarf structure. To solve this problem, we proposed a completely new algorithm called PID. It bottom-up computes partitions of a fact table, and inserts them into the Dwarf structure. If a partition is an MSV partition, coalesce its sub-Dwarf; otherwise create necessary nodes and cells. Our performance study showed that PID is efficient. For further condensing of Dwarf, we proposed Condensed Dwarf, a more com- pressed structure, combining the strength of Dwarf and Condensed Cube. By eliminating unnecessary stores of “ALL” cells from the Dwarf structure, Condensed Dwarf could effectively reduce the size of Dwarf, especially for Dwarfs of the real world, which was illustrated by our experiments. Its query processing is still simple and, only two minor modifications to PID are required for the construction of Condensed Dwarf.
文摘[Objective] Taking the knowledge of tea-science field as research object,an extraction method for the taxonomic relation of ontology conception was proposed in the paper.[Method] Through improving the rule based on language mode,generalized suffix tree was constructed for the concept set of tea-science field,forming hierarchical structure and taxonomic relation among conceptions.[Result and Conclusion] Moreover,corresponding prototype system was developed based on above method,and test result indicating that the method was effective.
基金supported by the National Natural Science Foundation of China(6050203260672068).
文摘Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which decreases the number of memory operations in construction and keeps the result tree sequential, is proposed. The experiment result shows that both the construction and the matching procedure are more efficient than Ukkonen algorithm.
基金supported by ZTE Industry-Academia-Research Cooperation Funds
文摘Entity relation is an essential component of some famous knowledge bases,such as Freebase,Yago and Knowledge Graph,while the hyponymy plays an important role in entity relations that show the relationship between the more general terms(hypernyms)and the more specific instances of the terms(hyponyms).In this paper,we present a comprehensive scheme of open-domain Chinese entity hypernym hierarchical construction.Some of the most important unsupervised and heuristic approaches for building hierarchical structure are covered in sufficient detail along with reasonable analyses.We experimentally evaluate the proposed methods and compare them with other baselines.The result shows high precision of our method and the proposed scheme will be further improved with larger scale corpora.
文摘Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large and inexpensive memories are vulnerable to bit flips.Thus,the correctness of output of a classical algorithm can be threatened by a few memory faults.Fault tolerant data structures and resilient algorithms are developed to tolerate a limited number of faults and provide a correct output based on the uncorrupted part of the data.Suffix tree is one of the important data structures that has widespread applications including substring search,super string problem and data compression.The fault tolerant version of the suffix tree presented in the literature uses complex techniques of encodable and decodable error-correcting codes,blocked data structures and fault-resistant tries.In this work,we use the natural approach of data replication to develop a fault tolerant suffix tree based on the faulty memory random access machine model.The proposed data structure stores copies of the indices to sustain memory faults injected by an adversary.We develop a resilient version of the Ukkonen’s algorithm for constructing the fault tolerant suffix tree and derive an upper bound on the number of corrupt suffixes.
文摘Hu Shuhe gets a sufficient condition on the law of the iterated logarithm for the sums of φ-mixing sequences with duple suffixes. This paper greatly improves his condition.
文摘There are settings where encryption must be performed by a sender under a time constraint. This paper de-scribes an encryption/decryption algorithm based on modular arithmetic of complex integers called Gaus-sians. It is shown how cubic extractors operate and how to find all cubic roots of the Gaussian. All validations (proofs) are provided in the Appendix. Detailed numeric illustrations explain how to use the method of digital isotopes to avoid ambiguity in recovery of the original plaintext by the receiver.
文摘Mesqan is a South Ethio-Semitic tonguewhich is mainly worn in day-to-day message by a people of on 179,737 communities in the Gurage Zone,Ethiopia,whose linguistic skin were not well expressed.The inner aspire of this paper is to offer a complete account of noun phrase structures of the Mesqan tongue.The paper is expressive in character,as the lessons is mostly worried with telling what is really being in the tongue,and mostly relies on main linguistic facts.The linguistic facts,i.e.the elicited grammatical facts regarding noun phrases,was composed from local speakers of the tongue during 12 months of fieldwork mannered among 2011 and 2012 in four Mesqan villages and in Butajira,the managerial hub of the Mesqan Woreda.The head of a NP can be a pronoun,a noun or an adjective.The head alone can constitute a full noun phrase.Adjectives,nouns in the genitive,or relative clauses function as modifiers of head nouns.Quantifiers are numerals,unspecific quantifiers,determiners include the definite marker,demonstrative pronouns,and possessive suffixes occur in two positions to the head noun.Only the demonstrative pronouns and the number‘one’when used as indefinite marker occur in phrase-initial position,while all other determiners follow the head.
文摘In the research on the Chinese temporal system,Chen(1988)proposed the ternary structure of Chinese temporal system.Based on the ternary structure of the Chinese temporal system,many researches have focused on discussing Chinese aspect system.Compared with the research on aspect,there are fewer studies on Chinese verbal situations,such as Ma(1981),Deng(1985),Dai(1997),etc.,which are all based on Vendler’s(1967)four categories of verbal situations.And compared with verbal situation,there are fewer studies on phase.Most researchers believe that phase and verbal situations are the same concept.However,this article believes that in the study of Chinese temporal system,we should first distinguish between phase and verbal situations,and then compare with aspect.Based on the distinction between phase and verbal situations,this article combines the situation and verbal aspect suffix“LE”,which is also an aspect marker,and tries to sum up the relationship between the situation and“LE”.
文摘. Applied Mathematics A Journal of Chinese Universities, abbreviated to Appl. Math.-- JCU, is a nationwide journal sponsored by Zhejiang University' The Journal aims at issuing academic works in Applied Mathematics: original theoretical and (or) methodological research results, and innovative applications in practical fields. From 1994 on, one volume will be publishedper annum,consisting of four issues in chinese, appeared quarterly as Ser. A,and four issues inEnglish, appeared quarterly as Ser. B. Contents in the two series will'not overlap. The Journal isdistributed domestically and abroad.2. Instructions to author(s)Appl. Math. -- JCU Ser. B publishes full length papers. In view of the high cost of printing, authors should keep their papers as short as are consistent with clarity' Unnecessary introductory material should be avoided. Graphical presentation of information should be confined toas few separate diagrams as are practicable. The rules of grammar should be observed.The submission of an article will be taken to indicate that it has not been and will not be submitted for publication elsewhere.Script Requirements for All ArticlesManuscript: The manuscript must be typed in English double--spaced on one side of A4 goodquality white paper. The maximum length of an article is 15 pages, including diagrams and tables. Two copies of an article are reguired for submission (not to be returned).Abstract A short Abstract not exceeding 200 words should appear at the beginning of the paper after the title, name (s) of author (s), affiliation(s) and address (es). It should contain no reference and mathematical symbols should be kept to a minumum.
文摘English Vocabulary study is of great importance because it is related to comprehesive skills on English learning.That is,it is a key factor to learn English well.Therefore we must study and know how to develope vocabulary.This article mainly explores how to build up our English vocabulary.
文摘Words are the foundations of language. They play an important role in translation and communication. To better understand English and Chinese and to use them exactly, the distinctions of them in word formation are presented, especially derivation including prefix and suffix.
文摘In order to enlarge English vocabulary,we need to have some methods.I'd like to share my experience with beginners how I enlarge English vocabulary when when I am learning English.It is a long process and needs hard work and patience.