Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te...Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).展开更多
Online booking of homestays through e-travel portals is based on the virtual brand and perception,which are largely affected by user-generated electronic word-of-mouth(eWOM).With the objective of mining actionable ins...Online booking of homestays through e-travel portals is based on the virtual brand and perception,which are largely affected by user-generated electronic word-of-mouth(eWOM).With the objective of mining actionable insights from eWOM,this study conducted opinion mining for homestays located in four thematic areas of Kerala.Accordingly,various techniques have been deployed,such as sentiment and emotional analyses,topic modeling,and clustering methods.Key themes revealed from topic modeling were breakfast,facilities provided,ambience,bathroom,cleanliness,hospitality exhibited,and satisfaction with the host.A lasso logistic regression-based predictive binary text classification model(with 97.6%accuracy)for homestay recommendations was developed.Our findings and predictive model have implications for managers and homestay owners in devising appropriate marketing strategies and improving their overall guest experience.展开更多
Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,da...Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,data sparsity,limited generalization ability and so on.Based on deep learning text classification,this paper presents an extensive study on the text classification models including Convolutional Neural Network-Based(CNN-Based),Recurrent Neural Network-Based(RNN-based),Attention Mechanisms-Based and so on.Many studies have proved that text classification methods based on deep learning outperform the traditional methods when processing large-scale and complex datasets.The main reasons are text classification methods based on deep learning can avoid cumbersome feature extraction process and have higher prediction accuracy for a large set of unstructured data.In this paper,we also summarize the shortcomings of traditional text classification methods and introduce the text classification process based on deep learning including text preprocessing,distributed representation of text,text classification model construction based on deep learning and performance evaluation.展开更多
Text-mining technologies have substantially affected financial industries.As the data in every sector of finance have grown immensely,text mining has emerged as an important field of research in the domain of finance....Text-mining technologies have substantially affected financial industries.As the data in every sector of finance have grown immensely,text mining has emerged as an important field of research in the domain of finance.Therefore,reviewing the recent literature on text-mining applications in finance can be useful for identifying areas for further research.This paper focuses on the text-mining literature related to financial forecasting,banking,and corporate finance.It also analyses the existing literature on text mining in financial applications and provides a summary of some recent studies.Finally,the paper briefly discusses various text-mining methods being applied in the financial domain,the challenges faced in these applications,and the future scope of text mining in finance.展开更多
传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需...传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需求的文本主题内容的方法,以推动数字人文研究的进一步发展。首先,选取本课题组前期标注的古籍语料数据进行主题类别标注和视图分类;其次,构建融合BERT(bidirectional encoder representation from transformers)预训练模型、改进卷积神经网络、循环神经网络和多头注意力机制的语义挖掘模型;最后,融入“主体-关系-客体”多视图的语义增强模型,构建DJ-TextRCNN(DianJi-recurrent convolutional neural networks for text classification)模型实现对典籍文本更细粒度、更深层次、更多维度的语义挖掘。研究结果发现,DJ-TextRCNN模型在不同视图下的古籍主题推荐任务的准确率均为最优。在“主体-关系-客体”视图下,精确率达到88.54%,初步实现了对古籍文本的精准主题推荐,对中华文化深层次、细粒度的语义挖掘具有一定的指导意义。展开更多
Text visualization is concerned with the representation of text in a graphicalform to facilitate comprehension of large textual data. Its aim is to improve the ability tounderstand and utilize the wealth of text-based...Text visualization is concerned with the representation of text in a graphicalform to facilitate comprehension of large textual data. Its aim is to improve the ability tounderstand and utilize the wealth of text-based information available. An essential task inany scientific research is the study and review of previous works in the specified domain,a process that is referred to as the literature survey process. This process involves theidentification of prior work and evaluating its relevance to the research question. With theenormous number of published studies available online in digital form, this becomes acumbersome task for the researcher. This paper presents the design and implementationof a tool that aims to facilitate this process by identifying relevant work and suggestingclusters of articles by conceptual modeling, thus providing different options that enablethe researcher to visualize a large number of articles in a graphical easy-to-analyze form.The tool helps the researcher in analyzing and synthesizing the literature and building aconceptual understanding of the designated research area. The evaluation of the toolshows that researchers have found it useful and that it supported the process of relevantwork analysis given a specific research question, and 70% of the evaluators of the toolfound it very useful.展开更多
Aim: To explore and analyze the feasibility of establishing a program of complex intervention in Traditional Chinese Medicine (TCM) based on Text Mining and Interviewing method. Methods: According to MRC, Constructing...Aim: To explore and analyze the feasibility of establishing a program of complex intervention in Traditional Chinese Medicine (TCM) based on Text Mining and Interviewing method. Methods: According to MRC, Constructing the program of complex intervention in TCM by Text Mining and Interviewing method should include 4 steps: 1) establishment of interview framework via normalization of extraction of ancient documents and Effectiveness of collection of modern periodical literatures;2) materialization of interview outline based on Focus Group Interview;3) rudimentary construction of complex intervention program based on Semi-structured Interview;4) evaluation of curative effect of complex intervention. Conclusions: It is feasible and significative to establish a program of complex intervention in TCM based on Text Mining and Interviewing method.展开更多
With the increasing interest in e-commerce shopping, customer reviews have become one of the most important elements that determine customer satisfaction regarding products. This demonstrates the importance of working...With the increasing interest in e-commerce shopping, customer reviews have become one of the most important elements that determine customer satisfaction regarding products. This demonstrates the importance of working with Text Mining. This study is based on The Women’s Clothing E-Commerce Reviews database, which consists of reviews written by real customers. The aim of this paper is to conduct a Text Mining approach on a set of customer reviews. Each review was classified as either a positive or negative review by employing a classification method. Four tree-based methods were applied to solve the classification problem, namely Classification Tree, Random Forest, Gradient Boosting and XGBoost. The dataset was categorized into training and test sets. The results indicate that the Random Forest method displays an overfitting, XGBoost displays an overfitting if the number of trees is too high, Classification Tree is good at detecting negative reviews and bad at detecting positive reviews and the Gradient Boosting shows stable values and quality measures above 77% for the test dataset. A consensus between the applied methods is noted for important classification terms.展开更多
以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人...以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人文研究范式的转型。首先,参照东汉古籍《说文解字》对文字的分析方式,以前期标注的古籍语料数据集为基础,构建全新的“字音(说)-原文(文)-结构(解)-字形(字)”四维特征数据集。其次,设计四维特征向量提取模型(speaking,word,pattern,and font to vector,SWPF2vec),并结合预训练模型实现对古籍文本细粒度的特征表示。再其次,构建融合卷积神经网络、循环神经网络和多头注意力机制的古籍文本主题分类模型(dianji-recurrent convolutional neural networks for text classification,DJ-TextRCNN)。最后,融入四维语义特征,实现对古籍文本多维度、深层次、细粒度的语义挖掘。在古籍文本主题分类任务上,DJ-TextRCNN模型在不同维度特征下的主题分类准确率均为最优,在“说文解字”四维特征下达到76.23%的准确率,初步实现了对古籍文本的精准主题分类。展开更多
The J-TEXT tokamak has been operated for ten years since its first plasma obtained at the end of 2007. The diagnostics development and main modulation systems, i.e. resonant magnetic perturbation (RMP) systems and m...The J-TEXT tokamak has been operated for ten years since its first plasma obtained at the end of 2007. The diagnostics development and main modulation systems, i.e. resonant magnetic perturbation (RMP) systems and massive gas injection (MGI) systems, will be introduced in this paper. Supported by these efforts, J-TEXT has contributed to research on several topics, especially on RMP physics and disruption mitigation. Both experimental and theoretical research show that RMP could lock, suppress or excite the tearing modes, depending on the RMP amplitude, frequency difference between RMP and rational surface rotation, and initial stabilities. The plasma rotation, particle transport and operation region are influenced by the RMP. Utilizing the MGI valves, disruptions have been mitigated with pure He, pure Ne, and a mixture of He and Ar (9:1). A significant runaway current plateau could be generated with moderate amounts of Ar injection. The RMP has been shown to suppress the generation of runaway current during disruptions.展开更多
In a text,there are many ties used to link the language together.Cohesion and coherence are the two important factors.In order to create the coherence in the discourse,writers and speakers have to use a lot of cohesiv...In a text,there are many ties used to link the language together.Cohesion and coherence are the two important factors.In order to create the coherence in the discourse,writers and speakers have to use a lot of cohesive devices within clause and between clause complexes,including reference,ellipsis,substitution and lexical cohesion.The analysis of cohesion facilitates the use of language.展开更多
The historical and cultural districts of a city serve as important cultural heritage and tourism resources.This paper focused on four such districts in Yangzhou and performed semantic analysis on online public comment...The historical and cultural districts of a city serve as important cultural heritage and tourism resources.This paper focused on four such districts in Yangzhou and performed semantic analysis on online public comments using ROST CM6 software.According to the high frequency words,attention preference of district site elements,activities and feelings in Yangzhou historical and cultural districts were analyzed.Through the analysis of semantic network and public emotional tendency,the relationship between the protection and utilization of Yangzhou historical and cultural districts and the perception and demand of users were discussed,and some suggestions for the protection,utilization and renewal of historical and cultural districts were put forward.展开更多
Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the...Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.展开更多
文摘Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).
文摘Online booking of homestays through e-travel portals is based on the virtual brand and perception,which are largely affected by user-generated electronic word-of-mouth(eWOM).With the objective of mining actionable insights from eWOM,this study conducted opinion mining for homestays located in four thematic areas of Kerala.Accordingly,various techniques have been deployed,such as sentiment and emotional analyses,topic modeling,and clustering methods.Key themes revealed from topic modeling were breakfast,facilities provided,ambience,bathroom,cleanliness,hospitality exhibited,and satisfaction with the host.A lasso logistic regression-based predictive binary text classification model(with 97.6%accuracy)for homestay recommendations was developed.Our findings and predictive model have implications for managers and homestay owners in devising appropriate marketing strategies and improving their overall guest experience.
基金This work supported in part by the National Natural Science Foundation of China under Grant 61872134,in part by the Natural Science Foundation of Hunan Province under Grant 2018JJ2062in part by Science and Technology Development Center of the Ministry of Education under Grant 2019J01020in part by the 2011 Collaborative Innovative Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province。
文摘Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,data sparsity,limited generalization ability and so on.Based on deep learning text classification,this paper presents an extensive study on the text classification models including Convolutional Neural Network-Based(CNN-Based),Recurrent Neural Network-Based(RNN-based),Attention Mechanisms-Based and so on.Many studies have proved that text classification methods based on deep learning outperform the traditional methods when processing large-scale and complex datasets.The main reasons are text classification methods based on deep learning can avoid cumbersome feature extraction process and have higher prediction accuracy for a large set of unstructured data.In this paper,we also summarize the shortcomings of traditional text classification methods and introduce the text classification process based on deep learning including text preprocessing,distributed representation of text,text classification model construction based on deep learning and performance evaluation.
文摘Text-mining technologies have substantially affected financial industries.As the data in every sector of finance have grown immensely,text mining has emerged as an important field of research in the domain of finance.Therefore,reviewing the recent literature on text-mining applications in finance can be useful for identifying areas for further research.This paper focuses on the text-mining literature related to financial forecasting,banking,and corporate finance.It also analyses the existing literature on text mining in financial applications and provides a summary of some recent studies.Finally,the paper briefly discusses various text-mining methods being applied in the financial domain,the challenges faced in these applications,and the future scope of text mining in finance.
文摘传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需求的文本主题内容的方法,以推动数字人文研究的进一步发展。首先,选取本课题组前期标注的古籍语料数据进行主题类别标注和视图分类;其次,构建融合BERT(bidirectional encoder representation from transformers)预训练模型、改进卷积神经网络、循环神经网络和多头注意力机制的语义挖掘模型;最后,融入“主体-关系-客体”多视图的语义增强模型,构建DJ-TextRCNN(DianJi-recurrent convolutional neural networks for text classification)模型实现对典籍文本更细粒度、更深层次、更多维度的语义挖掘。研究结果发现,DJ-TextRCNN模型在不同视图下的古籍主题推荐任务的准确率均为最优。在“主体-关系-客体”视图下,精确率达到88.54%,初步实现了对古籍文本的精准主题推荐,对中华文化深层次、细粒度的语义挖掘具有一定的指导意义。
文摘Text visualization is concerned with the representation of text in a graphicalform to facilitate comprehension of large textual data. Its aim is to improve the ability tounderstand and utilize the wealth of text-based information available. An essential task inany scientific research is the study and review of previous works in the specified domain,a process that is referred to as the literature survey process. This process involves theidentification of prior work and evaluating its relevance to the research question. With theenormous number of published studies available online in digital form, this becomes acumbersome task for the researcher. This paper presents the design and implementationof a tool that aims to facilitate this process by identifying relevant work and suggestingclusters of articles by conceptual modeling, thus providing different options that enablethe researcher to visualize a large number of articles in a graphical easy-to-analyze form.The tool helps the researcher in analyzing and synthesizing the literature and building aconceptual understanding of the designated research area. The evaluation of the toolshows that researchers have found it useful and that it supported the process of relevantwork analysis given a specific research question, and 70% of the evaluators of the toolfound it very useful.
文摘Aim: To explore and analyze the feasibility of establishing a program of complex intervention in Traditional Chinese Medicine (TCM) based on Text Mining and Interviewing method. Methods: According to MRC, Constructing the program of complex intervention in TCM by Text Mining and Interviewing method should include 4 steps: 1) establishment of interview framework via normalization of extraction of ancient documents and Effectiveness of collection of modern periodical literatures;2) materialization of interview outline based on Focus Group Interview;3) rudimentary construction of complex intervention program based on Semi-structured Interview;4) evaluation of curative effect of complex intervention. Conclusions: It is feasible and significative to establish a program of complex intervention in TCM based on Text Mining and Interviewing method.
文摘With the increasing interest in e-commerce shopping, customer reviews have become one of the most important elements that determine customer satisfaction regarding products. This demonstrates the importance of working with Text Mining. This study is based on The Women’s Clothing E-Commerce Reviews database, which consists of reviews written by real customers. The aim of this paper is to conduct a Text Mining approach on a set of customer reviews. Each review was classified as either a positive or negative review by employing a classification method. Four tree-based methods were applied to solve the classification problem, namely Classification Tree, Random Forest, Gradient Boosting and XGBoost. The dataset was categorized into training and test sets. The results indicate that the Random Forest method displays an overfitting, XGBoost displays an overfitting if the number of trees is too high, Classification Tree is good at detecting negative reviews and bad at detecting positive reviews and the Gradient Boosting shows stable values and quality measures above 77% for the test dataset. A consensus between the applied methods is noted for important classification terms.
文摘以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人文研究范式的转型。首先,参照东汉古籍《说文解字》对文字的分析方式,以前期标注的古籍语料数据集为基础,构建全新的“字音(说)-原文(文)-结构(解)-字形(字)”四维特征数据集。其次,设计四维特征向量提取模型(speaking,word,pattern,and font to vector,SWPF2vec),并结合预训练模型实现对古籍文本细粒度的特征表示。再其次,构建融合卷积神经网络、循环神经网络和多头注意力机制的古籍文本主题分类模型(dianji-recurrent convolutional neural networks for text classification,DJ-TextRCNN)。最后,融入四维语义特征,实现对古籍文本多维度、深层次、细粒度的语义挖掘。在古籍文本主题分类任务上,DJ-TextRCNN模型在不同维度特征下的主题分类准确率均为最优,在“说文解字”四维特征下达到76.23%的准确率,初步实现了对古籍文本的精准主题分类。
基金supported by the National Magnetic Confinement Fusion Science Program of China(Nos.2014GB118000,2014GB106001,2015GB111001,2015GB111002 and 2015GB120003)National Natural Science Foundation of China(Nos.11505069,11275079 and 11405068)
文摘The J-TEXT tokamak has been operated for ten years since its first plasma obtained at the end of 2007. The diagnostics development and main modulation systems, i.e. resonant magnetic perturbation (RMP) systems and massive gas injection (MGI) systems, will be introduced in this paper. Supported by these efforts, J-TEXT has contributed to research on several topics, especially on RMP physics and disruption mitigation. Both experimental and theoretical research show that RMP could lock, suppress or excite the tearing modes, depending on the RMP amplitude, frequency difference between RMP and rational surface rotation, and initial stabilities. The plasma rotation, particle transport and operation region are influenced by the RMP. Utilizing the MGI valves, disruptions have been mitigated with pure He, pure Ne, and a mixture of He and Ar (9:1). A significant runaway current plateau could be generated with moderate amounts of Ar injection. The RMP has been shown to suppress the generation of runaway current during disruptions.
文摘In a text,there are many ties used to link the language together.Cohesion and coherence are the two important factors.In order to create the coherence in the discourse,writers and speakers have to use a lot of cohesive devices within clause and between clause complexes,including reference,ellipsis,substitution and lexical cohesion.The analysis of cohesion facilitates the use of language.
基金the Open Project of China Grand Canal Research Institute,Yangzhou University(DYH202211)Jiangsu Provincial Social Science Applied Research Excellent Project(22SYB-053).
文摘The historical and cultural districts of a city serve as important cultural heritage and tourism resources.This paper focused on four such districts in Yangzhou and performed semantic analysis on online public comments using ROST CM6 software.According to the high frequency words,attention preference of district site elements,activities and feelings in Yangzhou historical and cultural districts were analyzed.Through the analysis of semantic network and public emotional tendency,the relationship between the protection and utilization of Yangzhou historical and cultural districts and the perception and demand of users were discussed,and some suggestions for the protection,utilization and renewal of historical and cultural districts were put forward.
基金supported by the General Projects of ISTIC Innovation Foundation“Problem innovation solution mining based on text generation model”(MS2024-03).
文摘Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.