Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Prev...Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance.展开更多
Offensive messages on social media,have recently been frequently used to harass and criticize people.In recent studies,many promising algorithms have been developed to identify offensive texts.Most algorithms analyze ...Offensive messages on social media,have recently been frequently used to harass and criticize people.In recent studies,many promising algorithms have been developed to identify offensive texts.Most algorithms analyze text in a unidirectional manner,where a bidirectional method can maximize performance results and capture semantic and contextual information in sentences.In addition,there are many separate models for identifying offensive texts based on monolin-gual and multilingual,but there are a few models that can detect both monolingual and multilingual-based offensive texts.In this study,a detection system has been developed for both monolingual and multilingual offensive texts by combining deep convolutional neural network and bidirectional encoder representations from transformers(Deep-BERT)to identify offensive posts on social media that are used to harass others.This paper explores a variety of ways to deal with multilin-gualism,including collaborative multilingual and translation-based approaches.Then,the Deep-BERT is tested on the Bengali and English datasets,including the different bidirectional encoder representations from transformers(BERT)pre-trained word-embedding techniques,and found that the proposed Deep-BERT’s efficacy outperformed all existing offensive text classification algorithms reaching an accuracy of 91.83%.The proposed model is a state-of-the-art model that can classify both monolingual-based and multilingual-based offensive texts.展开更多
Huffman[Huffman(1952)]encoding is one of the most known compression algorithms.In its basic use,only one encoding is given for the same letter in text to compress.In this paper,a text compression algorithm that is bas...Huffman[Huffman(1952)]encoding is one of the most known compression algorithms.In its basic use,only one encoding is given for the same letter in text to compress.In this paper,a text compression algorithm that is based on Huffman encoding is proposed.Huffman encoding is used to give different encodings for the same letter depending on the prefix preceding it in the word.A deterministic finite automaton(DFA)that recognizes the words of the text is constructed.This DFA records the frequencies for letters that label the transitions.Every state will correspond to one of the prefixes of the words of the text.For every state,a different Huffman encoding is defined for the letters that label the transitions leaving that state.These Huffman encodings are then used to encode the letters of the words in the text.This algorithm was implemented and experimental study showed significant reduction in compression ratio over the basic Huffman encoding.However,more time is needed to construct these codes.展开更多
Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their...Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their limited ability to collect and acquire contextual information hinders their effectiveness.We propose a Text Augmentation-based computational model for recognizing emotions using transformers(TA-MERT)to address this.The proposed model uses the Multimodal Emotion Lines Dataset(MELD),which ensures a balanced representation for recognizing human emotions.Themodel used text augmentation techniques to producemore training data,improving the proposed model’s accuracy.Transformer encoders train the deep neural network(DNN)model,especially Bidirectional Encoder(BE)representations that capture both forward and backward contextual information.This integration improves the accuracy and robustness of the proposed model.Furthermore,we present a method for balancing the training dataset by creating enhanced samples from the original dataset.By balancing the dataset across all emotion categories,we can lessen the adverse effects of data imbalance on the accuracy of the proposed model.Experimental results on the MELD dataset show that TA-MERT outperforms earlier methods,achieving a weighted F1 score of 62.60%and an accuracy of 64.36%.Overall,the proposed TA-MERT model solves the GBN models’weaknesses in obtaining contextual data for ERC.TA-MERT model recognizes human emotions more accurately by employing text augmentation and transformer-based encoding.The balanced dataset and the additional training samples also enhance its resilience.These findings highlight the significance of transformer-based approaches for special emotion recognition in conversations.展开更多
While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning me...While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic.展开更多
基金supported in part by the Beijing Natural Science Foundation under grants M21032 and 19L2029in part by the National Natural Science Foundation of China under grants U1836106 and 81961138010in part by the Scientific and Technological Innovation Foundation of Foshan under grants BK21BF001 and BK20BF010.
文摘Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance.
文摘Offensive messages on social media,have recently been frequently used to harass and criticize people.In recent studies,many promising algorithms have been developed to identify offensive texts.Most algorithms analyze text in a unidirectional manner,where a bidirectional method can maximize performance results and capture semantic and contextual information in sentences.In addition,there are many separate models for identifying offensive texts based on monolin-gual and multilingual,but there are a few models that can detect both monolingual and multilingual-based offensive texts.In this study,a detection system has been developed for both monolingual and multilingual offensive texts by combining deep convolutional neural network and bidirectional encoder representations from transformers(Deep-BERT)to identify offensive posts on social media that are used to harass others.This paper explores a variety of ways to deal with multilin-gualism,including collaborative multilingual and translation-based approaches.Then,the Deep-BERT is tested on the Bengali and English datasets,including the different bidirectional encoder representations from transformers(BERT)pre-trained word-embedding techniques,and found that the proposed Deep-BERT’s efficacy outperformed all existing offensive text classification algorithms reaching an accuracy of 91.83%.The proposed model is a state-of-the-art model that can classify both monolingual-based and multilingual-based offensive texts.
文摘Huffman[Huffman(1952)]encoding is one of the most known compression algorithms.In its basic use,only one encoding is given for the same letter in text to compress.In this paper,a text compression algorithm that is based on Huffman encoding is proposed.Huffman encoding is used to give different encodings for the same letter depending on the prefix preceding it in the word.A deterministic finite automaton(DFA)that recognizes the words of the text is constructed.This DFA records the frequencies for letters that label the transitions.Every state will correspond to one of the prefixes of the words of the text.For every state,a different Huffman encoding is defined for the letters that label the transitions leaving that state.These Huffman encodings are then used to encode the letters of the words in the text.This algorithm was implemented and experimental study showed significant reduction in compression ratio over the basic Huffman encoding.However,more time is needed to construct these codes.
文摘Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their limited ability to collect and acquire contextual information hinders their effectiveness.We propose a Text Augmentation-based computational model for recognizing emotions using transformers(TA-MERT)to address this.The proposed model uses the Multimodal Emotion Lines Dataset(MELD),which ensures a balanced representation for recognizing human emotions.Themodel used text augmentation techniques to producemore training data,improving the proposed model’s accuracy.Transformer encoders train the deep neural network(DNN)model,especially Bidirectional Encoder(BE)representations that capture both forward and backward contextual information.This integration improves the accuracy and robustness of the proposed model.Furthermore,we present a method for balancing the training dataset by creating enhanced samples from the original dataset.By balancing the dataset across all emotion categories,we can lessen the adverse effects of data imbalance on the accuracy of the proposed model.Experimental results on the MELD dataset show that TA-MERT outperforms earlier methods,achieving a weighted F1 score of 62.60%and an accuracy of 64.36%.Overall,the proposed TA-MERT model solves the GBN models’weaknesses in obtaining contextual data for ERC.TA-MERT model recognizes human emotions more accurately by employing text augmentation and transformer-based encoding.The balanced dataset and the additional training samples also enhance its resilience.These findings highlight the significance of transformer-based approaches for special emotion recognition in conversations.
基金This research was funded by National Natural Science Foundation of China under Grant No.61806171Sichuan University of Science&Engineering Talent Project under Grant No.2021RC15+2 种基金Open Fund Project of Key Laboratory for Non-Destructive Testing and Engineering Computer of Sichuan Province Universities on Bridge Inspection and Engineering under Grant No.2022QYJ06Sichuan University of Science&Engineering Graduate Student Innovation Fund under Grant No.Y2023115The Scientific Research and Innovation Team Program of Sichuan University of Science and Technology under Grant No.SUSE652A006.
文摘While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic.