Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest i...Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest in this field of research since the early 1990s. Named Entity Recognition has a vital role in different fields of natural language processing such as Machine Translation, Information Extraction, Question Answering System and various other fields. In this paper, Named Entity Recognition for Nepali text, based on the Support Vector Machine (SVM) is presented which is one of machine learning approaches for the classification task. A set of features are extracted from training data set. Accuracy and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are tested with ten datasets for Nepali text. The strength of this work is the efficient feature extraction and the comprehensive recognition techniques. The Support Vector Machine based Named Entity Recognition is limited to use a certain set of features and it uses a small dictionary which affects its performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training data and increase the rate of learning on the increment of training size.展开更多
Spam is a universal problem with which everyone is familiar. A number of approaches are used for Spam filtering. The most common filtering technique is content-based filtering which uses the actual text of message to ...Spam is a universal problem with which everyone is familiar. A number of approaches are used for Spam filtering. The most common filtering technique is content-based filtering which uses the actual text of message to determine whether it is Spam or not. The content is very dynamic and it is very challenging to represent all information in a mathematical model of classification. For instance, in content-based Spam filtering, the characteristics used by the filter to identify Spam message are constantly changing over time. Na?ve Bayes method represents the changing nature of message using probability theory and support vector machine (SVM) represents those using different features. These two methods of classification are efficient in different domains and the case of Nepali SMS or Text classification has not yet been in consideration;these two methods do not consider the issue and it is interesting to find out the performance of both the methods in the problem of Nepali Text classification. In this paper, the Na?ve Bayes and SVM-based classification techniques are implemented to classify the Nepali SMS as Spam and non-Spam. An empirical analysis for various text cases has been done to evaluate accuracy measure of the classification methodologies used in this study. And, it is found to be 87.15% accurate in SVM and 92.74% accurate in the case of Na?ve Bayes.展开更多
文摘Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest in this field of research since the early 1990s. Named Entity Recognition has a vital role in different fields of natural language processing such as Machine Translation, Information Extraction, Question Answering System and various other fields. In this paper, Named Entity Recognition for Nepali text, based on the Support Vector Machine (SVM) is presented which is one of machine learning approaches for the classification task. A set of features are extracted from training data set. Accuracy and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are tested with ten datasets for Nepali text. The strength of this work is the efficient feature extraction and the comprehensive recognition techniques. The Support Vector Machine based Named Entity Recognition is limited to use a certain set of features and it uses a small dictionary which affects its performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training data and increase the rate of learning on the increment of training size.
文摘Spam is a universal problem with which everyone is familiar. A number of approaches are used for Spam filtering. The most common filtering technique is content-based filtering which uses the actual text of message to determine whether it is Spam or not. The content is very dynamic and it is very challenging to represent all information in a mathematical model of classification. For instance, in content-based Spam filtering, the characteristics used by the filter to identify Spam message are constantly changing over time. Na?ve Bayes method represents the changing nature of message using probability theory and support vector machine (SVM) represents those using different features. These two methods of classification are efficient in different domains and the case of Nepali SMS or Text classification has not yet been in consideration;these two methods do not consider the issue and it is interesting to find out the performance of both the methods in the problem of Nepali Text classification. In this paper, the Na?ve Bayes and SVM-based classification techniques are implemented to classify the Nepali SMS as Spam and non-Spam. An empirical analysis for various text cases has been done to evaluate accuracy measure of the classification methodologies used in this study. And, it is found to be 87.15% accurate in SVM and 92.74% accurate in the case of Na?ve Bayes.