Classification is one of the data mining processes used to predict predetermined target classes with data learning accurately.This study discusses data classification using a fuzzy soft set method to predict target cl...Classification is one of the data mining processes used to predict predetermined target classes with data learning accurately.This study discusses data classification using a fuzzy soft set method to predict target classes accurately.This study aims to form a data classification algorithm using the fuzzy soft set method.In this study,the fuzzy soft set was calculated based on the normalized Hamming distance.Each parameter in this method is mapped to a power set from a subset of the fuzzy set using a fuzzy approximation function.In the classification step,a generalized normalized Euclidean distance is used to determine the similarity between two sets of fuzzy soft sets.The experiments used the University of California(UCI)Machine Learning dataset to assess the accuracy of the proposed data classification method.The dataset samples were divided into training(75%of samples)and test(25%of samples)sets.Experiments were performed in MATLAB R2010a software.The experiments showed that:(1)The fastest sequence is matching function,distance measure,similarity,normalized Euclidean distance,(2)the proposed approach can improve accuracy and recall by up to 10.3436%and 6.9723%,respectively,compared with baseline techniques.Hence,the fuzzy soft set method is appropriate for classifying data.展开更多
In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and...In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and mathematically defined. Not only such a definition is possible, it is actually provably better than any other feasible definitions. Not only such an ultimate definition is possible, but also it can be conveniently and fruitfully applied to construct a QA system. We have built such a system -- QUANTA. Extensive experiments are conducted to justify the new theory.展开更多
In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We d...In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We develop a two-step method to acquire a list of candidate CUIs and their associated preferred names using UMLS API and to choose the closest CUI by calculating the similarity between the input disorder mention and each candidate. The similarity calculation step is formulated as a classification problem and multiple features(string features,ranking features,similarity features,and contextual features) are used to normalize the disorder mentions. The results show that the multiple feature approach improves the accuracy of the normalization task from 32.99% to 67.08% compared with the Meta Map baseline.展开更多
文摘Classification is one of the data mining processes used to predict predetermined target classes with data learning accurately.This study discusses data classification using a fuzzy soft set method to predict target classes accurately.This study aims to form a data classification algorithm using the fuzzy soft set method.In this study,the fuzzy soft set was calculated based on the normalized Hamming distance.Each parameter in this method is mapped to a power set from a subset of the fuzzy set using a fuzzy approximation function.In the classification step,a generalized normalized Euclidean distance is used to determine the similarity between two sets of fuzzy soft sets.The experiments used the University of California(UCI)Machine Learning dataset to assess the accuracy of the proposed data classification method.The dataset samples were divided into training(75%of samples)and test(25%of samples)sets.Experiments were performed in MATLAB R2010a software.The experiments showed that:(1)The fastest sequence is matching function,distance measure,similarity,normalized Euclidean distance,(2)the proposed approach can improve accuracy and recall by up to 10.3436%and 6.9723%,respectively,compared with baseline techniques.Hence,the fuzzy soft set method is appropriate for classifying data.
基金the National Natural Science Foundation of China under Grant Nos.60572084 and 60621062.
文摘In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and mathematically defined. Not only such a definition is possible, it is actually provably better than any other feasible definitions. Not only such an ultimate definition is possible, but also it can be conveniently and fruitfully applied to construct a QA system. We have built such a system -- QUANTA. Extensive experiments are conducted to justify the new theory.
基金Supported by the National Natural Science Foundation of China(61133012,61202193,61373108)the Major Projects of the National Social Science Foundation of China(11&ZD189)+1 种基金the Chinese Postdoctoral Science Foundation(2013M540593,2014T70722)the Open Foundation of Shandong Key Laboratory of Language Resource Development and Application
文摘In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We develop a two-step method to acquire a list of candidate CUIs and their associated preferred names using UMLS API and to choose the closest CUI by calculating the similarity between the input disorder mention and each candidate. The similarity calculation step is formulated as a classification problem and multiple features(string features,ranking features,similarity features,and contextual features) are used to normalize the disorder mentions. The results show that the multiple feature approach improves the accuracy of the normalization task from 32.99% to 67.08% compared with the Meta Map baseline.