期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
A Comparative Study on Two Techniques of Reducing the Dimension of Text Feature Space
1
作者 Yin Zhonghang, Wang Yongcheng, Cai Wei & Diao Qian School of Electronic & Information Technology, Shanghai Jiaotong University, Shanghai 200030, P.R.China 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2002年第1期87-92,共6页
With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension... With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension has become a practical problem in the field. Here we present two clustering methods, i.e. concept association and concept abstract, to achieve the goal. The first refers to the keyword clustering based on the co occurrence of 展开更多
关键词 in the same text and the second refers to that in the same category. Then we compare the difference between them. Our experiment results show that they are efficient to reduce the dimension of text feature space. Keywords: text data mining
下载PDF
Text Extraction with Optimal Bi-LSTM
2
作者 Bahera H.Nayef Siti Norul Huda Sheikh Abdullah +1 位作者 Rossilawati Sulaiman Ashwaq Mukred Saeed 《Computers, Materials & Continua》 SCIE EI 2023年第9期3549-3567,共19页
Text extraction from images using the traditional techniques of image collecting,and pattern recognition using machine learning consume time due to the amount of extracted features from the images.Deep Neural Networks... Text extraction from images using the traditional techniques of image collecting,and pattern recognition using machine learning consume time due to the amount of extracted features from the images.Deep Neural Networks introduce effective solutions to extract text features from images using a few techniques and the ability to train large datasets of images with significant results.This study proposes using Dual Maxpooling and concatenating convolution Neural Networks(CNN)layers with the activation functions Relu and the Optimized Leaky Relu(OLRelu).The proposed method works by dividing the word image into slices that contain characters.Then pass them to deep learning layers to extract feature maps and reform the predicted words.Bidirectional Short Memory(BiLSTM)layers extractmore compelling features and link the time sequence fromforward and backward directions during the training phase.The Connectionist Temporal Classification(CTC)function calcifies the training and validation loss rates.In addition to decoding the extracted feature to reform characters again and linking them according to their time sequence.The proposed model performance is evaluated using training and validation loss errors on the Mjsynth and Integrated Argument Mining Tasks(IAM)datasets.The result of IAM was 2.09%for the average loss errors with the proposed dualMaxpooling and OLRelu.In the Mjsynth dataset,the best validation loss rate shrunk to 2.2%by applying concatenating CNN layers,and Relu. 展开更多
关键词 Deep neural network text features dual max-pooling concatenating convolution neural networks bidirectional long short memory text connector characteristics
下载PDF
Detecting Malicious Uniform Resource Locators Using an Applied Intelligence Framework
3
作者 Simona-Vasilica Oprea Adela Bara 《Computers, Materials & Continua》 SCIE EI 2024年第6期3827-3853,共27页
The potential of text analytics is revealed by Machine Learning(ML)and Natural Language Processing(NLP)techniques.In this paper,we propose an NLP framework that is applied to multiple datasets to detect malicious Unif... The potential of text analytics is revealed by Machine Learning(ML)and Natural Language Processing(NLP)techniques.In this paper,we propose an NLP framework that is applied to multiple datasets to detect malicious Uniform Resource Locators(URLs).Three categories of features,both ML and Deep Learning(DL)algorithms and a ranking schema are included in the proposed framework.We apply frequency and prediction-based embeddings,such as hash vectorizer,Term Frequency-Inverse Dense Frequency(TF-IDF)and predictors,word to vector-word2vec(continuous bag of words,skip-gram)from Google,to extract features from text.Further,we apply more state-of-the-art methods to create vectorized features,such as GloVe.Additionally,feature engineering that is specific to URL structure is deployed to detect scams and other threats.For framework assessment,four ranking indicators are weighted:computational time and performance as accuracy,F1 score and type error II.For the computational time,we propose a new metric-Feature Building Time(FBT)as the cutting-edge feature builders(like doc2vec or GloVe)require more time.By applying the proposed assessment step,the skip-gram algorithm of word2vec surpasses other feature builders in performance.Additionally,eXtreme Gradient Boost(XGB)outperforms other classifiers.With this setup,we attain an accuracy of 99.5%and an F1 score of 0.99. 展开更多
关键词 Detecting malicious URL CLASSIFIERS text to feature deep learning ranking algorithms feature building time
下载PDF
Entropy-Based Watermarking Approach for Sensitive Tamper Detection of Arabic Text 被引量:4
4
作者 Fahd N.Al-Wesabi 《Computers, Materials & Continua》 SCIE EI 2021年第6期3635-3648,共14页
The digital text media is the most common media transferred via the internet for various purposes and is very sensitive to transfer online with the possibility to be tampered illegally by the tampering attacks.Therefo... The digital text media is the most common media transferred via the internet for various purposes and is very sensitive to transfer online with the possibility to be tampered illegally by the tampering attacks.Therefore,improving the security and authenticity of the text when it is transferred via the internet has become one of the most difcult challenges that researchers face today.Arabic text is more sensitive than other languages due to Harakat’s existence in Arabic diacritics such as Kasra,and Damma in which making basic changes such as modifying diacritic arrangements can lead to change the text meaning.In this paper,an intelligent hybrid solution is proposed with highly sensitive detection for any tampering on Arabic text exchanged via the internet.Natural language processing,entropy,and watermarking techniques have been integrated into this method to improve the security and reliability of Arabic text without limitations in text nature or size,and type or volumes of tampering attack.The proposed scheme is implemented,simulated,and validated using four standard Arabic datasets of varying lengths under multiple random locations of insertion,reorder,and deletion attacks.The experimental and simulation results prove the accuracy of tampering detection of the proposed scheme against all kinds of tampering attacks.Comparison results show that the proposed approach outperforms all of the other baseline approaches in terms of tampering detection accuracy. 展开更多
关键词 ENTROPY text features tamper detection arabic text NLP
下载PDF
A learning-based method to detect and segment text from scene images 被引量:3
5
作者 JIANG Ren-jie QI Fei-hu +2 位作者 XU Li WU Guo-rong ZHU Kai-hua 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2007年第4期568-574,共7页
This paper proposes a learning-based method for text detection and text segmentation in natural scene images. First, the input image is decomposed into multiple connected-components (CCs) by Niblack clustering algorit... This paper proposes a learning-based method for text detection and text segmentation in natural scene images. First, the input image is decomposed into multiple connected-components (CCs) by Niblack clustering algorithm. Then all the CCs including text CCs and non-text CCs are verified on their text features by a 2-stage classification module, where most non-text CCs are discarded by an attentional cascade classifier and remaining CCs are further verified by an SVM. All the accepted CCs are output to result in text only binary image. Experiments with many images in different scenes showed satisfactory performance of our proposed method. 展开更多
关键词 text detection text segmentation text feature Attentional cascade
下载PDF
A Method of Text Extremum Region Extraction Based on Joint-Channels 被引量:1
6
作者 Xueming Qiao Weiyi Zhu +4 位作者 Dongjie Zhu Liang Kong Yingxue Xia Chunxu Lin Zhenhao Guo Yiheng Sun 《Journal on Artificial Intelligence》 2020年第1期29-37,共9页
Natural scene recognition has important significance and value in the fields of image retrieval,autonomous navigation,human-computer interaction and industrial automation.Firstly,the natural scene image non-text conte... Natural scene recognition has important significance and value in the fields of image retrieval,autonomous navigation,human-computer interaction and industrial automation.Firstly,the natural scene image non-text content takes up relatively high proportion;secondly,the natural scene images have a cluttered background and complex lighting conditions,angle,font and color.Therefore,how to extract text extreme regions efficiently from complex and varied natural scene images plays an important role in natural scene image text recognition.In this paper,a Text extremum region Extraction algorithm based on Joint-Channels(TEJC)is proposed.On the one hand,it can solve the problem that the maximum stable extremum region(MSER)algorithm is only suitable for gray images and difficult to process color images.On the other hand,it solves the problem that the MSER algorithm has high complexity and low accuracy when extracting the most stable extreme region.In this paper,the proposed algorithm is tested and evaluated on the ICDAR data set.The experimental results show that the method has superiority. 展开更多
关键词 feature extraction scene text detection scene text feature extraction extreme region
下载PDF
RC-Net:Row and Column Network with Text Feature for Parsing Floor Plan Images
7
作者 王腾 孟维亮 +3 位作者 卢政达 郭建伟 肖俊 张晓鹏 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第3期526-539,共14页
The popularity of online home design and floor plan customization has been steadily increasing. However, the manual conversion of floor plan images from books or paper materials into electronic resources can be a chal... The popularity of online home design and floor plan customization has been steadily increasing. However, the manual conversion of floor plan images from books or paper materials into electronic resources can be a challenging task due to the vast amount of historical data available. By leveraging neural networks to identify and parse floor plans, the process of converting these images into electronic materials can be significantly streamlined. In this paper, we present a novel learning framework for automatically parsing floor plan images. Our key insight is that the room type text is very common and crucial in floor plan images as it identifies the important semantic information of the corresponding room. However, this clue is rarely considered in previous learning-based methods. In contrast, we propose the Row and Column network (RC-Net) for recognizing floor plan elements by integrating the text feature. Specifically, we add the text feature branch in the network to extract text features corresponding to the room type for the guidance of room type predictions. More importantly, we formulate the Row and Column constraint module (RC constraint module) to share and constrain features across the entire row and column of the feature maps to ensure that only one type is predicted in each room as much as possible, making the segmentation boundaries between different rooms more regular and cleaner. Extensive experiments on three benchmark datasets validate that our framework substantially outperforms other state-of-the-art approaches in terms of the metrics of FWIoU, mACC and mIoU. 展开更多
关键词 floor plan understanding text feature Row and Column(RC)constraint module Row and Column network(RC-Net)
原文传递
Improved Blending Attention Mechanism in Visual Question Answering
8
作者 Siyu Lu Yueming Ding +4 位作者 Zhengtong Yin Mingzhe Liu Xuan Liu Wenfeng Zheng Lirong Yin 《Computer Systems Science & Engineering》 SCIE EI 2023年第10期1149-1161,共13页
Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to ach... Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network. 展开更多
关键词 Visual question answering spatial attention mechanism channel attention mechanism image feature processing text feature extraction
下载PDF
Research on text fault recognition for on-board equipment of a C3 train control system based on an integrated XGBoost algorithm
9
作者 Lili Yue Luyue Liu +2 位作者 Maoqing Li Baodi Xiao Xiaochun Wu 《Transportation Safety and Environment》 EI 2023年第4期36-44,共9页
The robust guarantee of train control on-board equipment is inextricably linked to the safe functioning of a high-speed train.A fault diagnostic model of on-board equipment is built utilizing the integrated learning X... The robust guarantee of train control on-board equipment is inextricably linked to the safe functioning of a high-speed train.A fault diagnostic model of on-board equipment is built utilizing the integrated learning XGBoost(eXtreme Gradient Boosting)algorithm to help technicians assess the malfunction category of high-speed train control on-board equipment accurately and rapidly.The XGBoost algorithm iterates multiple decision tree models to improve the accuracy of fault diagnosis by lifting the predicted residual and adding regular terms.To begin,the text features were extracted using the improved TF-IDF(Term Frequency-Inverse Document Frequency)approach,and 24 fault feature words were chosen and converted into weight word vectors.Secondly,considering the imbalanced fault categories in the data set,the ADASYN(Adaptive Synthetic sampling)adaptive synthetically oversampling technique was used to synthesize a few category fault samples.Finally,the data samples were split into training and test sets based on the fault text data of CTCS-3train control on-board equipment recorded by Guangzhou Railway Group maintenance personnel.The XGBoost model was utilized to realize the automatic fault location of the test set after optimized parameter tuning through grid search.Compared with other methods,the evaluation index of the XGBoost model was significantly improved.The diagnostic accuracy reached 95.43%,which verifies the effectiveness of the method in text fault diagnosis. 展开更多
关键词 vehicle on-board equipment unbalanced data sets text feature extraction XGBoost model fault diagnosis
原文传递
BHLM:Bayesian theory-based hybrid learning model for multi-document summarization
10
作者 S.Suneetha A.Venugopal Reddy 《International Journal of Modeling, Simulation, and Scientific Computing》 EI 2018年第2期229-250,共22页
In order to understand and organize the document in an efficient way,the multidocument summarization becomes the prominent technique in the Internet world.As the information available is in a large amount,it is necess... In order to understand and organize the document in an efficient way,the multidocument summarization becomes the prominent technique in the Internet world.As the information available is in a large amount,it is necessary to summarize the document for obtaining the condensed information.To perform the multi-document summarization,a new Bayesian theory-based Hybrid Learning Model(BHLM)is proposed in this paper.Initially,the input documents are preprocessed,where the stop words are removed from the document.Then,the feature of the sentence is extracted to determine the sentence score for summarizing the document.The extracted feature is then fed into the hybrid learning model for learning.Subsequently,learning feature,training error and correlation coefficient are integrated with the Bayesian model to develop BHLM.Also,the proposed method is used to assign the class label assisted by the mean,variance and probability measures.Finally,based on the class label,the sentences are sorted out to generate the final summary of the multi-document.The experimental results are validated in MATLAB,and the performance is analyzed using the metrics,precision,recall,F-measure and rouge-1.The proposed model attains 99.6%precision and 75%rouge-1 measure,which shows that the model can provide the final summary efficiently. 展开更多
关键词 MULTI-DOCUMENT text feature sentence score hybrid learning model Bayesian theory
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部