This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synony...This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence, recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves the precision and recall of text classification compared with the conventional TF-IDF approach.展开更多
The concept of word classes (parts of speech) has always generated controversy among linguists. The earlier Prescriptive and Descriptive Schools might have set the pace for this controversy but the present dilemma i...The concept of word classes (parts of speech) has always generated controversy among linguists. The earlier Prescriptive and Descriptive Schools might have set the pace for this controversy but the present dilemma is much deeper. Learners and even teachers are sometimes at quandary as to how to proof that a particular word belongs to a particular class. This is because a word may sometimes belong to several classes, in context as in the word "watch" which can belong to different classes. This paper therefore tries to provide answers to the problem of word class classification by using a morphological and syntactical evidence to prove that English words follow a particular range of inflections and belong to strictly ordered particular categories and do not change their class arbitrarily. This is in line with the natural perfect order of homogeneity in creation which precludes a specie from merging effectively with another specie without having to undergo some fundamental changes. Other variables were also looked into and it was concluded that teachers and learners as well, can rely on this sub-categorization approach as a reliable paradigm for their assumptions concerning word classes.展开更多
基金Project (No. 60082003) supported by the National Natural Science Foundation of China
文摘This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence, recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves the precision and recall of text classification compared with the conventional TF-IDF approach.
文摘The concept of word classes (parts of speech) has always generated controversy among linguists. The earlier Prescriptive and Descriptive Schools might have set the pace for this controversy but the present dilemma is much deeper. Learners and even teachers are sometimes at quandary as to how to proof that a particular word belongs to a particular class. This is because a word may sometimes belong to several classes, in context as in the word "watch" which can belong to different classes. This paper therefore tries to provide answers to the problem of word class classification by using a morphological and syntactical evidence to prove that English words follow a particular range of inflections and belong to strictly ordered particular categories and do not change their class arbitrarily. This is in line with the natural perfect order of homogeneity in creation which precludes a specie from merging effectively with another specie without having to undergo some fundamental changes. Other variables were also looked into and it was concluded that teachers and learners as well, can rely on this sub-categorization approach as a reliable paradigm for their assumptions concerning word classes.