This paper aims at designing and implementing an automatic classification and retrieval system for English documents,focusing on improving the result of the classification algorithm.The documents in an automatic text classification sys tem are represented by feature vectors,and the overall performance is dependent on the algorithm and its accuracy of feature selection.Conventional word-fo rm based automatic classification systems ignore all semantic information of th e words,so the diversity and indeterminacy of word-forms will harm the result .This paper proposes a new feature extraction algorithm,which is based on the Vector Space Model,and uses concepts as features,giving further consideration to the concepts' inter-phrase relativity,especially the hypernymy.The algori thm enables the extraction of more abstract concepts of a text,and thus improve s the classification result.
Computer Engineering and Applications