Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Prev...Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance.展开更多
基金supported in part by the Beijing Natural Science Foundation under grants M21032 and 19L2029in part by the National Natural Science Foundation of China under grants U1836106 and 81961138010in part by the Scientific and Technological Innovation Foundation of Foshan under grants BK21BF001 and BK20BF010.
文摘Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance.