Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid...Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.展开更多
Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the...Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.展开更多
Aiming at the challenges associated with the absence of a labeled dataset for Yi characters and the complexity of Yi character detection and recognition,we present a deep learning-based approach for Yi character detec...Aiming at the challenges associated with the absence of a labeled dataset for Yi characters and the complexity of Yi character detection and recognition,we present a deep learning-based approach for Yi character detection and recognition.In the detection stage,an improved Differentiable Binarization Network(DBNet)framework is introduced to detect Yi characters,in which the Omni-dimensional Dynamic Convolution(ODConv)is combined with the ResNet-18 feature extraction module to obtain multi-dimensional complementary features,thereby improving the accuracy of Yi character detection.Then,the feature pyramid network fusion module is used to further extract Yi character image features,improving target recognition at different scales.Further,the previously generated feature map is passed through a head network to produce two maps:a probability map and an adaptive threshold map of the same size as the original map.These maps are then subjected to a differentiable binarization process,resulting in an approximate binarization map.This map helps to identify the boundaries of the text boxes.Finally,the text detection box is generated after the post-processing stage.In the recognition stage,an improved lightweight MobileNetV3 framework is used to recognize the detect character regions,where the original Squeeze-and-Excitation(SE)block is replaced by the efficient Shuffle Attention(SA)that integrates spatial and channel attention,improving the accuracy of Yi characters recognition.Meanwhile,the use of depth separable convolution and reversible residual structure can reduce the number of parameters and computation of the model,so that the model can better understand the contextual information and improve the accuracy of text recognition.The experimental results illustrate that the proposed method achieves good results in detecting and recognizing Yi characters,with detection and recognition accuracy rates of 97.5%and 96.8%,respectively.And also,we have compared the detection and recognition algorithms proposed in this paper with other typical algorithms.In these comparisons,the proposed model achieves better detection and recognition results with a certain reliability.展开更多
Research on fires at the wildland-urban inter-face(WUI)has generated significant insights and advance-ments across various fields of study.Environmental,agri-culture,and social sciences have played prominent roles in ...Research on fires at the wildland-urban inter-face(WUI)has generated significant insights and advance-ments across various fields of study.Environmental,agri-culture,and social sciences have played prominent roles in understanding the impacts of fires in the environment,in protecting communities,and addressing management challenges.This study aimed to create a database using a text mining technique for global researchers interested in WUI-projects and highlighting the interest of countries in this field.Author’s-Keywords analysis emphasized the dominance of fire science-related terms,especially related to WUI,and identified keyword clusters related to the WUI fire-risk-assessment-system-“exposure”,“danger”,and“vulnerability”within wildfire research.Trends over the past decade showcase shifting research interests with a growing focus on WUI fires,while regional variations highlighted that the“exposure”keyword cluster received greater atten-tion in the southern Europe and South America.However,vulnerability keywords have relatively a lower representation across all regions.The analysis underscores the interdisci-plinary nature of WUI research and emphasizes the need for targeted approaches to address the unique challenges of the wildland-urban interface.Overall,this study provides valu-able insights for researchers and serves as a foundation for further collaboration in this field through the understanding of the trends over recent years and in different regions.展开更多
Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well a...Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well as recent advances in the field.Materials and Methods:This paper provides an overview of conventional radiography digital radiography panoramic radiography computed tomography and cone-beam computed tomography.Additionally recent advances in radiological imaging are discussed such as imaging diagnosis and modern computer-aided diagnosis systems.Results:This paper details the differences between the imaging techniques the benefits of each and the current advances in the field to aid in the diagnosis of medical conditions.Conclusion:Radiological imaging is an extremely important tool in modern medicine to assist in medical diagnosis.This work provides an overview of the types of imaging techniques used the recent advances made and their potential applications.展开更多
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te...Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).展开更多
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu...Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.展开更多
Optical Character Recognition(OCR)algorithm is a technology that converts text images from paper documents into a digital format using electronic devices such as scanners and digital cameras.This process transforms th...Optical Character Recognition(OCR)algorithm is a technology that converts text images from paper documents into a digital format using electronic devices such as scanners and digital cameras.This process transforms the captured text images into editable and searchable versions using text recognition technology.As advancements in deep learning,Al models have increasingly become pivotal in applications requiring operation on mobile devices without network connectivity,including small underwater devices,high-altitude environments,and license plate recognition systems in front-end cameras.Despite the maturity of general OCR models,there is a notable scarcity of OCR algorithms that are compatible with embedded single-chip microcomputers.These models,capable of functioning autonomously at the front-end without network support,are particularly crucial for remote applications.However,virtually no models for single-chip systems currently support the recognition of the Mongolian language.This study focuses on the development of an OCR system designed for single-chip microcomputers operating without network connectivity.The system is engineered to perform character recognition for Mongolian,English,and Chinese scripts,thereby expanding the utility of front-end single-chip devices.Specifically,the research introduces a novel approach to the recognition of modern Mongolian characters,broadening the scope of OCR system in linguistically diverse contexts.展开更多
Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal ...Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.展开更多
Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to th...Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to the correct parent node instead of treating leaf nodes as the final classification target.Second,error propagation occurs when a misclassification at a parent node propagates down the hierarchy,ultimately leading to inaccurate predictions at the leaf nodes.To address these limitations,we propose an uncertainty-guided HTC depth-aware model called DepthMatch.Specifically,we design an early stopping strategy with uncertainty to identify incomplete matching between text and labels,classifying them into the corresponding parent node labels.This approach allows us to dynamically determine the classification depth by leveraging evidence to quantify and accumulate uncertainty.Experimental results show that the proposed DepthMatch outperforms recent strong baselines on four commonly used public datasets:WOS(Web of Science),RCV1-V2(Reuters Corpus Volume I),AAPD(Arxiv Academic Paper Dataset),and BGC.Notably,on the BGC dataset,it improvesMicro-F1 andMacro-F1 scores by at least 1.09%and 1.74%,respectively.展开更多
The developed system for eye and face detection using Convolutional Neural Networks(CNN)models,followed by eye classification and voice-based assistance,has shown promising potential in enhancing accessibility for ind...The developed system for eye and face detection using Convolutional Neural Networks(CNN)models,followed by eye classification and voice-based assistance,has shown promising potential in enhancing accessibility for individuals with visual impairments.The modular approach implemented in this research allows for a seamless flow of information and assistance between the different components of the system.This research significantly contributes to the field of accessibility technology by integrating computer vision,natural language processing,and voice technologies.By leveraging these advancements,the developed system offers a practical and efficient solution for assisting blind individuals.The modular design ensures flexibility,scalability,and ease of integration with existing assistive technologies.However,it is important to acknowledge that further research and improvements are necessary to enhance the system’s accuracy and usability.Fine-tuning the CNN models and expanding the training dataset can improve eye and face detection as well as eye classification capabilities.Additionally,incorporating real-time responses through sophisticated natural language understanding techniques and expanding the knowledge base of ChatGPT can enhance the system’s ability to provide comprehensive and accurate responses.Overall,this research paves the way for the development of more advanced and robust systems for assisting visually impaired individuals.By leveraging cutting-edge technologies and integrating them into amodular framework,this research contributes to creating a more inclusive and accessible society for individuals with visual impairments.Future work can focus on refining the system,addressing its limitations,and conducting user studies to evaluate its effectiveness and impact in real-world scenarios.展开更多
Video description generates natural language sentences that describe the subject,verb,and objects of the targeted Video.The video description has been used to help visually impaired people to understand the content.It...Video description generates natural language sentences that describe the subject,verb,and objects of the targeted Video.The video description has been used to help visually impaired people to understand the content.It is also playing an essential role in devolving human-robot interaction.The dense video description is more difficult when compared with simple Video captioning because of the object’s interactions and event overlapping.Deep learning is changing the shape of computer vision(CV)technologies and natural language processing(NLP).There are hundreds of deep learning models,datasets,and evaluations that can improve the gaps in current research.This article filled this gap by evaluating some state-of-the-art approaches,especially focusing on deep learning and machine learning for video caption in a dense environment.In this article,some classic techniques concerning the existing machine learning were reviewed.And provides deep learning models,a detail of benchmark datasets with their respective domains.This paper reviews various evaluation metrics,including Bilingual EvaluationUnderstudy(BLEU),Metric for Evaluation of Translation with Explicit Ordering(METEOR),WordMover’s Distance(WMD),and Recall-Oriented Understudy for Gisting Evaluation(ROUGE)with their pros and cons.Finally,this article listed some future directions and proposed work for context enhancement using key scene extraction with object detection in a particular frame.Especially,how to improve the context of video description by analyzing key frames detection through morphological image analysis.Additionally,the paper discusses a novel approach involving sentence reconstruction and context improvement through key frame object detection,which incorporates the fusion of large languagemodels for refining results.The ultimate results arise fromenhancing the generated text of the proposedmodel by improving the predicted text and isolating objects using various keyframes.These keyframes identify dense events occurring in the video sequence.展开更多
The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classificatio...The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classification,it remains hindered by the lack of labelled dataset.In this article,we introduce a novel method for generating literature classification models through semi-supervised learning,which can generate labelled dataset iteratively with limited human input.We apply this method to train NLP models for classifying literatures related to several research directions,i.e.,battery,superconductor,topological material,and artificial intelligence(AI)in materials science.The trained NLP‘battery’model applied on a larger dataset different from the training and testing dataset can achieve F1 score of 0.738,which indicates the accuracy and reliability of this scheme.Furthermore,our approach demonstrates that even with insufficient data,the not-well-trained model in the first few cycles can identify the relationships among different research fields and facilitate the discovery and understanding of interdisciplinary directions.展开更多
The potential of text analytics is revealed by Machine Learning(ML)and Natural Language Processing(NLP)techniques.In this paper,we propose an NLP framework that is applied to multiple datasets to detect malicious Unif...The potential of text analytics is revealed by Machine Learning(ML)and Natural Language Processing(NLP)techniques.In this paper,we propose an NLP framework that is applied to multiple datasets to detect malicious Uniform Resource Locators(URLs).Three categories of features,both ML and Deep Learning(DL)algorithms and a ranking schema are included in the proposed framework.We apply frequency and prediction-based embeddings,such as hash vectorizer,Term Frequency-Inverse Dense Frequency(TF-IDF)and predictors,word to vector-word2vec(continuous bag of words,skip-gram)from Google,to extract features from text.Further,we apply more state-of-the-art methods to create vectorized features,such as GloVe.Additionally,feature engineering that is specific to URL structure is deployed to detect scams and other threats.For framework assessment,four ranking indicators are weighted:computational time and performance as accuracy,F1 score and type error II.For the computational time,we propose a new metric-Feature Building Time(FBT)as the cutting-edge feature builders(like doc2vec or GloVe)require more time.By applying the proposed assessment step,the skip-gram algorithm of word2vec surpasses other feature builders in performance.Additionally,eXtreme Gradient Boost(XGB)outperforms other classifiers.With this setup,we attain an accuracy of 99.5%and an F1 score of 0.99.展开更多
Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.Ho...Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.However,existing research predominantly depends on summarizationmodels to offer paragraph-level semantic information for enhancing factual correctness.The challenge lies in effectively generating factual text using sentence-level variational autoencoder-based models.In this paper,a novel model called fact-aware conditional variational autoencoder is proposed to balance the factual correctness and diversity of generated text.Specifically,our model encodes the input sentences and uses them as facts to build a conditional variational autoencoder network.By training a conditional variational autoencoder network,the model is enabled to generate text based on input facts.Building upon this foundation,the input text is passed to the discriminator along with the generated text.By employing adversarial training,the model is encouraged to generate text that is indistinguishable to the discriminator,thereby enhancing the quality of the generated text.To further improve the factual correctness,inspired by the natural language inference system,the entailment recognition task is introduced to be trained together with the discriminator via multi-task learning.Moreover,based on the entailment recognition results,a penalty term is further proposed to reconstruct the loss of our model,forcing the generator to generate text consistent with the facts.Experimental results demonstrate that compared with competitivemodels,ourmodel has achieved substantial improvements in both the quality and factual correctness of the text,despite only sacrificing a small amount of diversity.Furthermore,when considering a comprehensive evaluation of diversity and quality metrics,our model has also demonstrated the best performance.展开更多
To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spat...To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone.展开更多
This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schem...This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schemes like tf-idf and BM25.These conventional methods often struggle with accurately capturing document relevance,leading to inefficiencies in both retrieval performance and index size management.OWS proposes a dynamic weighting mechanism that evaluates the significance of terms based on their orbital position within the vector space,emphasizing term relationships and distribution patterns overlooked by existing models.Our research focuses on evaluating OWS’s impact on model accuracy using Information Retrieval metrics like Recall,Precision,InterpolatedAverage Precision(IAP),andMeanAverage Precision(MAP).Additionally,we assessOWS’s effectiveness in reducing the inverted index size,crucial for model efficiency.We compare OWS-based retrieval models against others using different schemes,including tf-idf variations and BM25Delta.Results reveal OWS’s superiority,achieving a 54%Recall and 81%MAP,and a notable 38%reduction in the inverted index size.This highlights OWS’s potential in optimizing retrieval processes and underscores the need for further research in this underrepresented area to fully leverage OWS’s capabilities in information retrieval methodologies.展开更多
The act of transmitting photos via the Internet has become a routine and significant activity.Enhancing the security measures to safeguard these images from counterfeiting and modifications is a critical domain that c...The act of transmitting photos via the Internet has become a routine and significant activity.Enhancing the security measures to safeguard these images from counterfeiting and modifications is a critical domain that can still be further enhanced.This study presents a system that employs a range of approaches and algorithms to ensure the security of transmitted venous images.The main goal of this work is to create a very effective system for compressing individual biometrics in order to improve the overall accuracy and security of digital photographs by means of image compression.This paper introduces a content-based image authentication mechanism that is suitable for usage across an untrusted network and resistant to data loss during transmission.By employing scale attributes and a key-dependent parametric Long Short-Term Memory(LSTM),it is feasible to improve the resilience of digital signatures against image deterioration and strengthen their security against malicious actions.Furthermore,the successful implementation of transmitting biometric data in a compressed format over a wireless network has been accomplished.For applications involving the transmission and sharing of images across a network.The suggested technique utilizes the scalability of a structural digital signature to attain a satisfactory equilibrium between security and picture transfer.An effective adaptive compression strategy was created to lengthen the overall lifetime of the network by sharing the processing of responsibilities.This scheme ensures a large reduction in computational and energy requirements while minimizing image quality loss.This approach employs multi-scale characteristics to improve the resistance of signatures against image deterioration.The proposed system attained a Gaussian noise value of 98%and a rotation accuracy surpassing 99%.展开更多
基金This work is supported in part by the National Natural Science Foundation of China(Grant Number 61971078)which provided domain expertise and computational power that greatly assisted the activity+1 种基金This work was financially supported by Chongqing Municipal Education Commission Grants forMajor Science and Technology Project(KJZD-M202301901)the Science and Technology Research Project of Jiangxi Department of Education(GJJ2201049).
文摘Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.
基金supported by the General Projects of ISTIC Innovation Foundation“Problem innovation solution mining based on text generation model”(MS2024-03).
文摘Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.
基金The work was supported by the National Natural Science Foundation of China(61972062,62306060)the Basic Research Project of Liaoning Province(2023JH2/101300191)+1 种基金the Liaoning Doctoral Research Start-Up Fund Project(2023-BS-078)the Dalian Academy of Social Sciences(2023dlsky028).
文摘Aiming at the challenges associated with the absence of a labeled dataset for Yi characters and the complexity of Yi character detection and recognition,we present a deep learning-based approach for Yi character detection and recognition.In the detection stage,an improved Differentiable Binarization Network(DBNet)framework is introduced to detect Yi characters,in which the Omni-dimensional Dynamic Convolution(ODConv)is combined with the ResNet-18 feature extraction module to obtain multi-dimensional complementary features,thereby improving the accuracy of Yi character detection.Then,the feature pyramid network fusion module is used to further extract Yi character image features,improving target recognition at different scales.Further,the previously generated feature map is passed through a head network to produce two maps:a probability map and an adaptive threshold map of the same size as the original map.These maps are then subjected to a differentiable binarization process,resulting in an approximate binarization map.This map helps to identify the boundaries of the text boxes.Finally,the text detection box is generated after the post-processing stage.In the recognition stage,an improved lightweight MobileNetV3 framework is used to recognize the detect character regions,where the original Squeeze-and-Excitation(SE)block is replaced by the efficient Shuffle Attention(SA)that integrates spatial and channel attention,improving the accuracy of Yi characters recognition.Meanwhile,the use of depth separable convolution and reversible residual structure can reduce the number of parameters and computation of the model,so that the model can better understand the contextual information and improve the accuracy of text recognition.The experimental results illustrate that the proposed method achieves good results in detecting and recognizing Yi characters,with detection and recognition accuracy rates of 97.5%and 96.8%,respectively.And also,we have compared the detection and recognition algorithms proposed in this paper with other typical algorithms.In these comparisons,the proposed model achieves better detection and recognition results with a certain reliability.
基金The funding of this research was provided by the Portuguese Foundation for Science and Technology(FCT)in the framework of the House Refuge Project(PCIF/AGT/0109/2018).
文摘Research on fires at the wildland-urban inter-face(WUI)has generated significant insights and advance-ments across various fields of study.Environmental,agri-culture,and social sciences have played prominent roles in understanding the impacts of fires in the environment,in protecting communities,and addressing management challenges.This study aimed to create a database using a text mining technique for global researchers interested in WUI-projects and highlighting the interest of countries in this field.Author’s-Keywords analysis emphasized the dominance of fire science-related terms,especially related to WUI,and identified keyword clusters related to the WUI fire-risk-assessment-system-“exposure”,“danger”,and“vulnerability”within wildfire research.Trends over the past decade showcase shifting research interests with a growing focus on WUI fires,while regional variations highlighted that the“exposure”keyword cluster received greater atten-tion in the southern Europe and South America.However,vulnerability keywords have relatively a lower representation across all regions.The analysis underscores the interdisci-plinary nature of WUI research and emphasizes the need for targeted approaches to address the unique challenges of the wildland-urban interface.Overall,this study provides valu-able insights for researchers and serves as a foundation for further collaboration in this field through the understanding of the trends over recent years and in different regions.
文摘Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well as recent advances in the field.Materials and Methods:This paper provides an overview of conventional radiography digital radiography panoramic radiography computed tomography and cone-beam computed tomography.Additionally recent advances in radiological imaging are discussed such as imaging diagnosis and modern computer-aided diagnosis systems.Results:This paper details the differences between the imaging techniques the benefits of each and the current advances in the field to aid in the diagnosis of medical conditions.Conclusion:Radiological imaging is an extremely important tool in modern medicine to assist in medical diagnosis.This work provides an overview of the types of imaging techniques used the recent advances made and their potential applications.
文摘Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).
文摘Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.
基金supported by the School of Information Technology of the Mongolian University of Science and Technology,as well as the central guidance and local science and technology development fund projects(transfer and transformation projects of scientific and technological achievements),project No:226Z1707GResearch and development project of 3D hub size measuring machine.
文摘Optical Character Recognition(OCR)algorithm is a technology that converts text images from paper documents into a digital format using electronic devices such as scanners and digital cameras.This process transforms the captured text images into editable and searchable versions using text recognition technology.As advancements in deep learning,Al models have increasingly become pivotal in applications requiring operation on mobile devices without network connectivity,including small underwater devices,high-altitude environments,and license plate recognition systems in front-end cameras.Despite the maturity of general OCR models,there is a notable scarcity of OCR algorithms that are compatible with embedded single-chip microcomputers.These models,capable of functioning autonomously at the front-end without network support,are particularly crucial for remote applications.However,virtually no models for single-chip systems currently support the recognition of the Mongolian language.This study focuses on the development of an OCR system designed for single-chip microcomputers operating without network connectivity.The system is engineered to perform character recognition for Mongolian,English,and Chinese scripts,thereby expanding the utility of front-end single-chip devices.Specifically,the research introduces a novel approach to the recognition of modern Mongolian characters,broadening the scope of OCR system in linguistically diverse contexts.
基金the National Natural Science Foundation of PRChina(42075130)Nari Technology Co.,Ltd.(4561655965)。
文摘Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.
基金sponsored by the National Key Research and Development Program of China(No.2021YFF0704100)the National Natural Science Foundation of China(No.62136002)+1 种基金the Chongqing Natural Science Foundation(No.cstc2022ycjh-bgzxm0004)the Science and Technology Commission of Chongqing Municipality(CSTB2023NSCQ-LZX0006),respectively.
文摘Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to the correct parent node instead of treating leaf nodes as the final classification target.Second,error propagation occurs when a misclassification at a parent node propagates down the hierarchy,ultimately leading to inaccurate predictions at the leaf nodes.To address these limitations,we propose an uncertainty-guided HTC depth-aware model called DepthMatch.Specifically,we design an early stopping strategy with uncertainty to identify incomplete matching between text and labels,classifying them into the corresponding parent node labels.This approach allows us to dynamically determine the classification depth by leveraging evidence to quantify and accumulate uncertainty.Experimental results show that the proposed DepthMatch outperforms recent strong baselines on four commonly used public datasets:WOS(Web of Science),RCV1-V2(Reuters Corpus Volume I),AAPD(Arxiv Academic Paper Dataset),and BGC.Notably,on the BGC dataset,it improvesMicro-F1 andMacro-F1 scores by at least 1.09%and 1.74%,respectively.
文摘The developed system for eye and face detection using Convolutional Neural Networks(CNN)models,followed by eye classification and voice-based assistance,has shown promising potential in enhancing accessibility for individuals with visual impairments.The modular approach implemented in this research allows for a seamless flow of information and assistance between the different components of the system.This research significantly contributes to the field of accessibility technology by integrating computer vision,natural language processing,and voice technologies.By leveraging these advancements,the developed system offers a practical and efficient solution for assisting blind individuals.The modular design ensures flexibility,scalability,and ease of integration with existing assistive technologies.However,it is important to acknowledge that further research and improvements are necessary to enhance the system’s accuracy and usability.Fine-tuning the CNN models and expanding the training dataset can improve eye and face detection as well as eye classification capabilities.Additionally,incorporating real-time responses through sophisticated natural language understanding techniques and expanding the knowledge base of ChatGPT can enhance the system’s ability to provide comprehensive and accurate responses.Overall,this research paves the way for the development of more advanced and robust systems for assisting visually impaired individuals.By leveraging cutting-edge technologies and integrating them into amodular framework,this research contributes to creating a more inclusive and accessible society for individuals with visual impairments.Future work can focus on refining the system,addressing its limitations,and conducting user studies to evaluate its effectiveness and impact in real-world scenarios.
文摘Video description generates natural language sentences that describe the subject,verb,and objects of the targeted Video.The video description has been used to help visually impaired people to understand the content.It is also playing an essential role in devolving human-robot interaction.The dense video description is more difficult when compared with simple Video captioning because of the object’s interactions and event overlapping.Deep learning is changing the shape of computer vision(CV)technologies and natural language processing(NLP).There are hundreds of deep learning models,datasets,and evaluations that can improve the gaps in current research.This article filled this gap by evaluating some state-of-the-art approaches,especially focusing on deep learning and machine learning for video caption in a dense environment.In this article,some classic techniques concerning the existing machine learning were reviewed.And provides deep learning models,a detail of benchmark datasets with their respective domains.This paper reviews various evaluation metrics,including Bilingual EvaluationUnderstudy(BLEU),Metric for Evaluation of Translation with Explicit Ordering(METEOR),WordMover’s Distance(WMD),and Recall-Oriented Understudy for Gisting Evaluation(ROUGE)with their pros and cons.Finally,this article listed some future directions and proposed work for context enhancement using key scene extraction with object detection in a particular frame.Especially,how to improve the context of video description by analyzing key frames detection through morphological image analysis.Additionally,the paper discusses a novel approach involving sentence reconstruction and context improvement through key frame object detection,which incorporates the fusion of large languagemodels for refining results.The ultimate results arise fromenhancing the generated text of the proposedmodel by improving the predicted text and isolating objects using various keyframes.These keyframes identify dense events occurring in the video sequence.
基金funded by the Informatization Plan of Chinese Academy of Sciences(Grant No.CASWX2021SF-0102)the National Key R&D Program of China(Grant Nos.2022YFA1603903,2022YFA1403800,and 2021YFA0718700)+1 种基金the National Natural Science Foundation of China(Grant Nos.11925408,11921004,and 12188101)the Chinese Academy of Sciences(Grant No.XDB33000000)。
文摘The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classification,it remains hindered by the lack of labelled dataset.In this article,we introduce a novel method for generating literature classification models through semi-supervised learning,which can generate labelled dataset iteratively with limited human input.We apply this method to train NLP models for classifying literatures related to several research directions,i.e.,battery,superconductor,topological material,and artificial intelligence(AI)in materials science.The trained NLP‘battery’model applied on a larger dataset different from the training and testing dataset can achieve F1 score of 0.738,which indicates the accuracy and reliability of this scheme.Furthermore,our approach demonstrates that even with insufficient data,the not-well-trained model in the first few cycles can identify the relationships among different research fields and facilitate the discovery and understanding of interdisciplinary directions.
基金supported by a grant of the Ministry of Research,Innovation and Digitization,CNCS-UEFISCDI,Project Number PN-Ⅲ-P4-PCE-2021-0334,within PNCDI Ⅲ.
文摘The potential of text analytics is revealed by Machine Learning(ML)and Natural Language Processing(NLP)techniques.In this paper,we propose an NLP framework that is applied to multiple datasets to detect malicious Uniform Resource Locators(URLs).Three categories of features,both ML and Deep Learning(DL)algorithms and a ranking schema are included in the proposed framework.We apply frequency and prediction-based embeddings,such as hash vectorizer,Term Frequency-Inverse Dense Frequency(TF-IDF)and predictors,word to vector-word2vec(continuous bag of words,skip-gram)from Google,to extract features from text.Further,we apply more state-of-the-art methods to create vectorized features,such as GloVe.Additionally,feature engineering that is specific to URL structure is deployed to detect scams and other threats.For framework assessment,four ranking indicators are weighted:computational time and performance as accuracy,F1 score and type error II.For the computational time,we propose a new metric-Feature Building Time(FBT)as the cutting-edge feature builders(like doc2vec or GloVe)require more time.By applying the proposed assessment step,the skip-gram algorithm of word2vec surpasses other feature builders in performance.Additionally,eXtreme Gradient Boost(XGB)outperforms other classifiers.With this setup,we attain an accuracy of 99.5%and an F1 score of 0.99.
基金supported by the Science and Technology Department of Sichuan Province(No.2021YFG0156).
文摘Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.However,existing research predominantly depends on summarizationmodels to offer paragraph-level semantic information for enhancing factual correctness.The challenge lies in effectively generating factual text using sentence-level variational autoencoder-based models.In this paper,a novel model called fact-aware conditional variational autoencoder is proposed to balance the factual correctness and diversity of generated text.Specifically,our model encodes the input sentences and uses them as facts to build a conditional variational autoencoder network.By training a conditional variational autoencoder network,the model is enabled to generate text based on input facts.Building upon this foundation,the input text is passed to the discriminator along with the generated text.By employing adversarial training,the model is encouraged to generate text that is indistinguishable to the discriminator,thereby enhancing the quality of the generated text.To further improve the factual correctness,inspired by the natural language inference system,the entailment recognition task is introduced to be trained together with the discriminator via multi-task learning.Moreover,based on the entailment recognition results,a penalty term is further proposed to reconstruct the loss of our model,forcing the generator to generate text consistent with the facts.Experimental results demonstrate that compared with competitivemodels,ourmodel has achieved substantial improvements in both the quality and factual correctness of the text,despite only sacrificing a small amount of diversity.Furthermore,when considering a comprehensive evaluation of diversity and quality metrics,our model has also demonstrated the best performance.
基金Sponsored by the Scientific Research Project of Zhejiang Provincial Department of Education(Grant No.KYY-ZX-20210329).
文摘To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone.
文摘This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schemes like tf-idf and BM25.These conventional methods often struggle with accurately capturing document relevance,leading to inefficiencies in both retrieval performance and index size management.OWS proposes a dynamic weighting mechanism that evaluates the significance of terms based on their orbital position within the vector space,emphasizing term relationships and distribution patterns overlooked by existing models.Our research focuses on evaluating OWS’s impact on model accuracy using Information Retrieval metrics like Recall,Precision,InterpolatedAverage Precision(IAP),andMeanAverage Precision(MAP).Additionally,we assessOWS’s effectiveness in reducing the inverted index size,crucial for model efficiency.We compare OWS-based retrieval models against others using different schemes,including tf-idf variations and BM25Delta.Results reveal OWS’s superiority,achieving a 54%Recall and 81%MAP,and a notable 38%reduction in the inverted index size.This highlights OWS’s potential in optimizing retrieval processes and underscores the need for further research in this underrepresented area to fully leverage OWS’s capabilities in information retrieval methodologies.
文摘The act of transmitting photos via the Internet has become a routine and significant activity.Enhancing the security measures to safeguard these images from counterfeiting and modifications is a critical domain that can still be further enhanced.This study presents a system that employs a range of approaches and algorithms to ensure the security of transmitted venous images.The main goal of this work is to create a very effective system for compressing individual biometrics in order to improve the overall accuracy and security of digital photographs by means of image compression.This paper introduces a content-based image authentication mechanism that is suitable for usage across an untrusted network and resistant to data loss during transmission.By employing scale attributes and a key-dependent parametric Long Short-Term Memory(LSTM),it is feasible to improve the resilience of digital signatures against image deterioration and strengthen their security against malicious actions.Furthermore,the successful implementation of transmitting biometric data in a compressed format over a wireless network has been accomplished.For applications involving the transmission and sharing of images across a network.The suggested technique utilizes the scalability of a structural digital signature to attain a satisfactory equilibrium between security and picture transfer.An effective adaptive compression strategy was created to lengthen the overall lifetime of the network by sharing the processing of responsibilities.This scheme ensures a large reduction in computational and energy requirements while minimizing image quality loss.This approach employs multi-scale characteristics to improve the resistance of signatures against image deterioration.The proposed system attained a Gaussian noise value of 98%and a rotation accuracy surpassing 99%.