AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surfa...AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases:ChatGPT-4,ChatGPT-3.5,Claude 2,PaLM2,and SenseNova.METHODS:A group of experienced ophthalmology professors were asked to develop a 100-question singlechoice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions.The exam includes questions on the following topics:keratitis disease(20 questions),keratoconus,keratomalaciac,corneal dystrophy,corneal degeneration,erosive corneal ulcers,and corneal lesions associated with systemic diseases(20 questions),conjunctivitis disease(20 questions),trachoma,pterygoid and conjunctival tumor diseases(20 questions),and dry eye disease(20 questions).Then the total score of each LLMs and compared their mean score,mean correlation,variance,and confidence were calculated.RESULTS:GPT-4 exhibited the highest performance in terms of LLMs.Comparing the average scores of the LLMs group with the four human groups,chief physician,attending physician,regular trainee,and graduate student,it was found that except for ChatGPT-4,the total score of the rest of the LLMs is lower than that of the graduate student group,which had the lowest score in the human group.Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers,giving very little chance of an incorrect answer.ChatGPT-4 showed higher credibility when answering questions,with a success rate of 59%,but gave the wrong answer to the question 28% of the time.CONCLUSION:GPT-4 model exhibits excellent performance in both answer relevance and confidence.PaLM2 shows a positive correlation(up to 0.8)in terms of answer accuracy during the exam.In terms of answer confidence,PaLM2 is second only to GPT4 and surpasses Claude 2,SenseNova,and GPT-3.5.Despite the fact that ocular surface disease is a highly specialized discipline,GPT-4 still exhibits superior performance,suggesting that its potential and ability to be applied in this field is enormous,perhaps with the potential to be a valuable resource for medical students and clinicians in the future.展开更多
Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ...Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.展开更多
The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Infor...The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. We describe different black-box attacks from potential adversaries and study their impact on the amount and type of information that may be recovered from commonly used and deployed LLMs. Our research investigates the relationship between PII leakage, memorization, and factors such as model size, architecture, and the nature of attacks employed. The study utilizes two broad categories of attacks: PII leakage-focused attacks (auto-completion and extraction attacks) and memorization-focused attacks (various membership inference attacks). The findings from these investigations are quantified using an array of evaluative metrics, providing a detailed understanding of LLM vulnerabilities and the effectiveness of different attacks.展开更多
A large language model(LLM)is constructed to address the sophisticated demands of data retrieval and analysis,detailed well profiling,computation of key technical indicators,and the solutions to complex problems in re...A large language model(LLM)is constructed to address the sophisticated demands of data retrieval and analysis,detailed well profiling,computation of key technical indicators,and the solutions to complex problems in reservoir performance analysis(RPA).The LLM is constructed for RPA scenarios with incremental pre-training,fine-tuning,and functional subsystems coupling.Functional subsystem and efficient coupling methods are proposed based on named entity recognition(NER),tool invocation,and Text-to-SQL construction,all aimed at resolving pivotal challenges in developing the specific application of LLMs for RDA.This study conducted a detailed accuracy test on feature extraction models,tool classification models,data retrieval models and analysis recommendation models.The results indicate that these models have demonstrated good performance in various key aspects of reservoir dynamic analysis.The research takes some injection and production well groups in the PK3 Block of the Daqing Oilfield as an example for testing.Testing results show that our model has significant potential and practical value in assisting reservoir engineers with RDA.The research results provide a powerful support to the application of LLM in reservoir performance analysis.展开更多
Modern technological advancements have made social media an essential component of daily life.Social media allow individuals to share thoughts,emotions,and ideas.Sentiment analysis plays the function of evaluating whe...Modern technological advancements have made social media an essential component of daily life.Social media allow individuals to share thoughts,emotions,and ideas.Sentiment analysis plays the function of evaluating whether the sentiment of the text is positive,negative,neutral,or any other personal emotion to understand the sentiment context of the text.Sentiment analysis is essential in business and society because it impacts strategic decision-making.Sentiment analysis involves challenges due to lexical variation,an unlabeled dataset,and text distance correlations.The execution time increases due to the sequential processing of the sequence models.However,the calculation times for the Transformer models are reduced because of the parallel processing.This study uses a hybrid deep learning strategy to combine the strengths of the Transformer and Sequence models while ignoring their limitations.In particular,the proposed model integrates the Decoding-enhanced with Bidirectional Encoder Representations from Transformers(BERT)attention(DeBERTa)and the Gated Recurrent Unit(GRU)for sentiment analysis.Using the Decoding-enhanced BERT technique,the words are mapped into a compact,semantic word embedding space,and the Gated Recurrent Unit model can capture the distance contextual semantics correctly.The proposed hybrid model achieves F1-scores of 97%on the Twitter Large Language Model(LLM)dataset,which is much higher than the performance of new techniques.展开更多
In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple e...In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.展开更多
This letter evaluates the article by Gravina et al on ChatGPT’s potential in providing medical information for inflammatory bowel disease patients.While promising,it highlights the need for advanced techniques like r...This letter evaluates the article by Gravina et al on ChatGPT’s potential in providing medical information for inflammatory bowel disease patients.While promising,it highlights the need for advanced techniques like reasoning+action and retrieval-augmented generation to improve accuracy and reliability.Emphasizing that simple question and answer testing is insufficient,it calls for more nuanced evaluation methods to truly gauge large language models’capabilities in clinical applications.展开更多
High-angle annular dark field(HAADF)imaging in scanning transmission electron microscopy(STEM)has become an indispensable tool in materials science due to its ability to offer sub-°A resolution and provide chemic...High-angle annular dark field(HAADF)imaging in scanning transmission electron microscopy(STEM)has become an indispensable tool in materials science due to its ability to offer sub-°A resolution and provide chemical information through Z-contrast.This study leverages large language models(LLMs)to conduct a comprehensive bibliometric analysis of a large amount of HAADF-related literature(more than 41000 papers).By using LLMs,specifically ChatGPT,we were able to extract detailed information on applications,sample preparation methods,instruments used,and study conclusions.The findings highlight the capability of LLMs to provide a new perspective into HAADF imaging,underscoring its increasingly important role in materials science.Moreover,the rich information extracted from these publications can be harnessed to develop AI models that enhance the automation and intelligence of electron microscopes.展开更多
This opinion paper explores the transformative potential of large language models(LLMs)in laparoscopic surgery and argues for their integration to enhance surgical education,decision support,reporting,and patient care...This opinion paper explores the transformative potential of large language models(LLMs)in laparoscopic surgery and argues for their integration to enhance surgical education,decision support,reporting,and patient care.LLMs can revolutionize surgical education by providing personalized learning experiences and accelerating skill acquisition.Intelligent decision support systems powered by LLMs can assist surgeons in making complex decisions,optimizing surgical workflows,and improving patient outcomes.Moreover,LLMs can automate surgical reporting and generate personalized patient education materials,streamlining documentation and improving patient engagement.However,challenges such as data scarcity,surgical semantic capture,real-time inference,and integration with existing systems need to be addressed for successful LLM integration.The future of laparoscopic surgery lies in the seamless integration of LLMs,enabling autonomous robotic surgery,predictive surgical planning,intraoperative decision support,virtual surgical assistants,and continuous learning.By harnessing the power of LLMs,laparoscopic surgery can be transformed,empowering surgeons and ultimately benefiting patients.展开更多
Accurately recommending candidate news to users is a basic challenge of personalized news recommendation systems.Traditional methods are usually difficult to learn and acquire complex semantic information in news text...Accurately recommending candidate news to users is a basic challenge of personalized news recommendation systems.Traditional methods are usually difficult to learn and acquire complex semantic information in news texts,resulting in unsatisfactory recommendation results.Besides,these traditional methods are more friendly to active users with rich historical behaviors.However,they can not effectively solve the long tail problem of inactive users.To address these issues,this research presents a novel general framework that combines Large Language Models(LLM)and Knowledge Graphs(KG)into traditional methods.To learn the contextual information of news text,we use LLMs’powerful text understanding ability to generate news representations with rich semantic information,and then,the generated news representations are used to enhance the news encoding in traditional methods.In addition,multi-hops relationship of news entities is mined and the structural information of news is encoded using KG,thus alleviating the challenge of long-tail distribution.Experimental results demonstrate that compared with various traditional models,on evaluation indicators such as AUC,MRR,nDCG@5 and nDCG@10,the framework significantly improves the recommendation performance.The successful integration of LLM and KG in our framework has established a feasible way for achieving more accurate personalized news recommendation.Our code is available at https://github.com/Xuan-ZW/LKPNR.展开更多
The problematic use of social media has numerous negative impacts on individuals'daily lives,interpersonal relationships,physical and mental health,and more.Currently,there are few methods and tools to alleviate p...The problematic use of social media has numerous negative impacts on individuals'daily lives,interpersonal relationships,physical and mental health,and more.Currently,there are few methods and tools to alleviate problematic social media,and their potential is yet to be fully realized.Emerging large language models(LLMs)are becoming increasingly popular for providing information and assistance to people and are being applied in many aspects of life.In mitigating problematic social media use,LLMs such as ChatGPT can play a positive role by serving as conversational partners and outlets for users,providing personalized information and resources,monitoring and intervening in problematic social media use,and more.In this process,we should recognize both the enormous potential and endless possibilities of LLMs such as ChatGPT,leveraging their advantages to better address problematic social media use,while also acknowledging the limitations and potential pitfalls of ChatGPT technology,such as errors,limitations in issue resolution,privacy and security concerns,and potential overreliance.When we leverage the advantages of LLMs to address issues in social media usage,we must adopt a cautious and ethical approach,being vigilant of the potential adverse effects that LLMs may have in addressing problematic social media use to better harness technology to serve individuals and society.展开更多
Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, a...Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, and more. However, their widespread usage emphasizes the critical need to enhance their security posture to ensure the integrity and reliability of their outputs and minimize harmful effects. Prompt injections and training data poisoning attacks are two of the most prominent vulnerabilities in LLMs, which could potentially lead to unpredictable and undesirable behaviors, such as biased outputs, misinformation propagation, and even malicious content generation. The Common Vulnerability Scoring System (CVSS) framework provides a standardized approach to capturing the principal characteristics of vulnerabilities, facilitating a deeper understanding of their severity within the security and AI communities. By extending the current CVSS framework, we generate scores for these vulnerabilities such that organizations can prioritize mitigation efforts, allocate resources effectively, and implement targeted security measures to defend against potential risks.展开更多
In recent years,large language models(LLMs)have made significant progress in natural language processing(NLP).These models not only perform well in a variety of language tasks but also show great potential in the medi...In recent years,large language models(LLMs)have made significant progress in natural language processing(NLP).These models not only perform well in a variety of language tasks but also show great potential in the medical field.This paper aims to explore the application of LLMs in clinical dialogues,analyzing their role in improving the efficiency of doctor-patient communication,aiding in diagnosis and treatment,and providing emotional support.The paper also discusses the challenges and limitations of the model in terms of privacy protection,ethical issues,and practical applications.Through comprehensive analysis,we conclude that applying LLMs in clinical dialogues is promising.However,it requires careful consideration and caution by practitioners in practice.展开更多
This article proposes a document-level prompt learning approach using LLMs to extract the timeline-based storyline. Through verification tests on datasets such as ESCv1.2 and Timeline17, the results show that the prom...This article proposes a document-level prompt learning approach using LLMs to extract the timeline-based storyline. Through verification tests on datasets such as ESCv1.2 and Timeline17, the results show that the prompt + one-shot learning proposed in this article works well. Meanwhile, our research findings indicate that although timeline-based storyline extraction has shown promising prospects in the practical applications of LLMs, it is still a complex natural language processing task that requires further research.展开更多
With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enha...With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enhance database query systems, enabling more intuitive and semantic query mechanisms. Our model leverages LLM’s deep learning architecture to interpret and process natural language queries and translate them into accurate database queries. The system integrates an LLM-powered semantic parser that translates user input into structured queries that can be understood by the database management system. First, the user query is pre-processed, the text is normalized, and the ambiguity is removed. This is followed by semantic parsing, where the LLM interprets the pre-processed text and identifies key entities and relationships. This is followed by query generation, which converts the parsed information into a structured query format and tailors it to the target database schema. Finally, there is query execution and feedback, where the resulting query is executed on the database and the results are returned to the user. The system also provides feedback mechanisms to improve and optimize future query interpretations. By using advanced LLMs for model implementation and fine-tuning on diverse datasets, the experimental results show that the proposed method significantly improves the accuracy and usability of database queries, making data retrieval easy for users without specialized knowledge.展开更多
Recently,the emergence of ChatGPT,an artificial intelligence chatbot developed by OpenAI,has attracted significant attention due to its exceptional language comprehension and content generation capabilities,highlighti...Recently,the emergence of ChatGPT,an artificial intelligence chatbot developed by OpenAI,has attracted significant attention due to its exceptional language comprehension and content generation capabilities,highlighting the immense potential of large language models(LLMs).LLMs have become a burgeoning hotspot across many fields,including health care.Within health care,LLMs may be classified into LLMs for the biomedical domain and LLMs for the clinical domain based on the corpora used for pre-training.In the last 3 years,these domain-specific LLMs have demonstrated exceptional perform-ance on multiple natural language processing tasks,surpassing the perform-ance of general LLMs as well.This not only emphasizes the significance of developing dedicated LLMs for the specific domains,but also raises expectations for their applications in health care.We believe that LLMs may be used widely in preconsultation,diagnosis,and management,with appropriate development and supervision.Additionally,LLMs hold tremen-dous promise in assisting with medical education,medical writing and other related applications.Likewise,health care systems must recognize and address the challenges posed by LLMs.展开更多
BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patie...BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patient information needs.However,LLM use to deliver accurate and comprehensible IBD-related medical information has yet to be thoroughly investigated.AIM To assess the utility of three LLMs(ChatGPT-4.0,Claude-3-Opus,and Gemini-1.5-Pro)as a reference point for patients with IBD.METHODS In this comparative study,two gastroenterology experts generated 15 IBD-related questions that reflected common patient concerns.These questions were used to evaluate the performance of the three LLMs.The answers provided by each model were independently assessed by three IBD-related medical experts using a Likert scale focusing on accuracy,comprehensibility,and correlation.Simultaneously,three patients were invited to evaluate the comprehensibility of their answers.Finally,a readability assessment was performed.RESULTS Overall,each of the LLMs achieved satisfactory levels of accuracy,comprehensibility,and completeness when answering IBD-related questions,although their performance varies.All of the investigated models demonstrated strengths in providing basic disease information such as IBD definition as well as its common symptoms and diagnostic methods.Nevertheless,when dealing with more complex medical advice,such as medication side effects,dietary adjustments,and complication risks,the quality of answers was inconsistent between the LLMs.Notably,Claude-3-Opus generated answers with better readability than the other two models.CONCLUSION LLMs have the potential as educational tools for patients with IBD;however,there are discrepancies between the models.Further optimization and the development of specialized models are necessary to ensure the accuracy and safety of the information provided.展开更多
Autonomous agents have long been a research focus in academic and industry communities.Previous research often focuses on training agents with limited knowledge within isolated environments,which diverges significantl...Autonomous agents have long been a research focus in academic and industry communities.Previous research often focuses on training agents with limited knowledge within isolated environments,which diverges significantly from human learning processes,and makes the agents hard to achieve human-like decisions.Recently,through the acquisition of vast amounts of Web knowledge,large language models(LLMs)have shown potential in human-level intelligence,leading to a surge in research on LLM-based autonomous agents.In this paper,we present a comprehensive survey of these studies,delivering a systematic review of LLM-based autonomous agents from a holistic perspective.We first discuss the construction of LLM-based autonomous agents,proposing a unified framework that encompasses much of previous work.Then,we present a overview of the diverse applications of LLM-based autonomous agents in social science,natural science,and engineering.Finally,we delve into the evaluation strategies commonly used for LLM-based autonomous agents.Based on the previous studies,we also present several challenges and future directions in this field.展开更多
Large Language Models(LLMs),such as ChatGPT and Bard,have revolutionized natural language understanding and generation.They possess deep language comprehension,human-like text generation capabilities,contextual awaren...Large Language Models(LLMs),such as ChatGPT and Bard,have revolutionized natural language understanding and generation.They possess deep language comprehension,human-like text generation capabilities,contextual awareness,and robust problem-solving skills,making them invaluable in various domains(e.g.,search engines,customer support,translation).In the meantime,LLMs have also gained traction in the security community,revealing security vulnerabilities and showcasing their potential in security-related tasks.This paper explores the intersection of LLMs with security and privacy.Specifically,we investigate how LLMs positively impact security and privacy,potential risks and threats associated with their use,and inherent vulnerabilities within LLMs.Through a comprehensive literature review,the paper categorizes the papers into‘‘The Good’’(beneficial LLM applications),‘‘The Bad’’(offensive applications),and‘‘The Ugly’’(vulnerabilities of LLMs and their defenses).We have some interesting findings.For example,LLMs have proven to enhance code security(code vulnerability detection)and data privacy(data confidentiality protection),outperforming traditional methods.However,they can also be harnessed for various attacks(particularly user-level attacks)due to their human-like reasoning abilities.We have identified areas that require further research efforts.For example,Research on model and parameter extraction attacks is limited and often theoretical,hindered by LLM parameter scale and confidentiality.Safe instruction tuning,a recent development,requires more exploration.We hope that our work can shed light on the LLMs’potential to both bolster and jeopardize cybersecurity.展开更多
Understanding complex biological pathways,including gene–gene interactions and gene regulatory networks,is critical for exploring disease mechanisms and drug development.Manual literature curation of biological pathw...Understanding complex biological pathways,including gene–gene interactions and gene regulatory networks,is critical for exploring disease mechanisms and drug development.Manual literature curation of biological pathways cannot keep up with the exponential growth of new discoveries in the literature.Large-scale language models(LLMs)trained on extensive text corpora contain rich biological information,and they can be mined as a biological knowledge graph.This study assesses 21 LLMs,including both application programming interface(API)-based models and open-source models in their capacities of retrieving biological knowledge.The evaluation focuses on predicting gene regulatory relations(activation,inhibition,and phosphorylation)and the Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway components.Results indicated a significant disparity in model performance.API-based models GPT-4 and Claude-Pro showed superior performance,with an F1 score of 0.4448 and 0.4386 for the gene regulatory relation prediction,and a Jaccard similarity index of 0.2778 and 0.2657 for the KEGG pathway prediction,respectively.Open-source models lagged behind their API-based counterparts,whereas Falcon-180b and llama2-7b had the highest F1 scores of 0.2787 and 0.1923 in gene regulatory relations,respectively.The KEGG pathway recognition had a Jaccard similarity index of 0.2237 for Falcon-180b and 0.2207 for llama2-7b.Our study suggests that LLMs are informative in gene network analysis and pathway mapping,but their effectiveness varies,necessitating careful model selection.This work also provides a case study and insight into using LLMs das knowledge graphs.Our code is publicly available at the website of GitHub(Muh-aza).展开更多
基金Supported by National Natural Science Foundation of China(No.82160195,No.82460203)Degree and Postgraduate Education Teaching Reform Project of Jiangxi Province(No.JXYJG-2020-026).
文摘AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases:ChatGPT-4,ChatGPT-3.5,Claude 2,PaLM2,and SenseNova.METHODS:A group of experienced ophthalmology professors were asked to develop a 100-question singlechoice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions.The exam includes questions on the following topics:keratitis disease(20 questions),keratoconus,keratomalaciac,corneal dystrophy,corneal degeneration,erosive corneal ulcers,and corneal lesions associated with systemic diseases(20 questions),conjunctivitis disease(20 questions),trachoma,pterygoid and conjunctival tumor diseases(20 questions),and dry eye disease(20 questions).Then the total score of each LLMs and compared their mean score,mean correlation,variance,and confidence were calculated.RESULTS:GPT-4 exhibited the highest performance in terms of LLMs.Comparing the average scores of the LLMs group with the four human groups,chief physician,attending physician,regular trainee,and graduate student,it was found that except for ChatGPT-4,the total score of the rest of the LLMs is lower than that of the graduate student group,which had the lowest score in the human group.Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers,giving very little chance of an incorrect answer.ChatGPT-4 showed higher credibility when answering questions,with a success rate of 59%,but gave the wrong answer to the question 28% of the time.CONCLUSION:GPT-4 model exhibits excellent performance in both answer relevance and confidence.PaLM2 shows a positive correlation(up to 0.8)in terms of answer accuracy during the exam.In terms of answer confidence,PaLM2 is second only to GPT4 and surpasses Claude 2,SenseNova,and GPT-3.5.Despite the fact that ocular surface disease is a highly specialized discipline,GPT-4 still exhibits superior performance,suggesting that its potential and ability to be applied in this field is enormous,perhaps with the potential to be a valuable resource for medical students and clinicians in the future.
基金We acknowledge funding from NSFC Grant 62306283.
文摘Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.
文摘The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. We describe different black-box attacks from potential adversaries and study their impact on the amount and type of information that may be recovered from commonly used and deployed LLMs. Our research investigates the relationship between PII leakage, memorization, and factors such as model size, architecture, and the nature of attacks employed. The study utilizes two broad categories of attacks: PII leakage-focused attacks (auto-completion and extraction attacks) and memorization-focused attacks (various membership inference attacks). The findings from these investigations are quantified using an array of evaluative metrics, providing a detailed understanding of LLM vulnerabilities and the effectiveness of different attacks.
基金Supported by the National Talent Fund of the Ministry of Science and Technology of China(20230240011)China University of Geosciences(Wuhan)Research Fund(162301192687)。
文摘A large language model(LLM)is constructed to address the sophisticated demands of data retrieval and analysis,detailed well profiling,computation of key technical indicators,and the solutions to complex problems in reservoir performance analysis(RPA).The LLM is constructed for RPA scenarios with incremental pre-training,fine-tuning,and functional subsystems coupling.Functional subsystem and efficient coupling methods are proposed based on named entity recognition(NER),tool invocation,and Text-to-SQL construction,all aimed at resolving pivotal challenges in developing the specific application of LLMs for RDA.This study conducted a detailed accuracy test on feature extraction models,tool classification models,data retrieval models and analysis recommendation models.The results indicate that these models have demonstrated good performance in various key aspects of reservoir dynamic analysis.The research takes some injection and production well groups in the PK3 Block of the Daqing Oilfield as an example for testing.Testing results show that our model has significant potential and practical value in assisting reservoir engineers with RDA.The research results provide a powerful support to the application of LLM in reservoir performance analysis.
文摘Modern technological advancements have made social media an essential component of daily life.Social media allow individuals to share thoughts,emotions,and ideas.Sentiment analysis plays the function of evaluating whether the sentiment of the text is positive,negative,neutral,or any other personal emotion to understand the sentiment context of the text.Sentiment analysis is essential in business and society because it impacts strategic decision-making.Sentiment analysis involves challenges due to lexical variation,an unlabeled dataset,and text distance correlations.The execution time increases due to the sequential processing of the sequence models.However,the calculation times for the Transformer models are reduced because of the parallel processing.This study uses a hybrid deep learning strategy to combine the strengths of the Transformer and Sequence models while ignoring their limitations.In particular,the proposed model integrates the Decoding-enhanced with Bidirectional Encoder Representations from Transformers(BERT)attention(DeBERTa)and the Gated Recurrent Unit(GRU)for sentiment analysis.Using the Decoding-enhanced BERT technique,the words are mapped into a compact,semantic word embedding space,and the Gated Recurrent Unit model can capture the distance contextual semantics correctly.The proposed hybrid model achieves F1-scores of 97%on the Twitter Large Language Model(LLM)dataset,which is much higher than the performance of new techniques.
基金Science and Technology Innovation 2030-Major Project of“New Generation Artificial Intelligence”granted by Ministry of Science and Technology,Grant Number 2020AAA0109300.
文摘In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.
文摘This letter evaluates the article by Gravina et al on ChatGPT’s potential in providing medical information for inflammatory bowel disease patients.While promising,it highlights the need for advanced techniques like reasoning+action and retrieval-augmented generation to improve accuracy and reliability.Emphasizing that simple question and answer testing is insufficient,it calls for more nuanced evaluation methods to truly gauge large language models’capabilities in clinical applications.
基金National Research Foundation(NRF)Singapore,under its NRF Fellowship(Grant No.NRFNRFF11-2019-0002).
文摘High-angle annular dark field(HAADF)imaging in scanning transmission electron microscopy(STEM)has become an indispensable tool in materials science due to its ability to offer sub-°A resolution and provide chemical information through Z-contrast.This study leverages large language models(LLMs)to conduct a comprehensive bibliometric analysis of a large amount of HAADF-related literature(more than 41000 papers).By using LLMs,specifically ChatGPT,we were able to extract detailed information on applications,sample preparation methods,instruments used,and study conclusions.The findings highlight the capability of LLMs to provide a new perspective into HAADF imaging,underscoring its increasingly important role in materials science.Moreover,the rich information extracted from these publications can be harnessed to develop AI models that enhance the automation and intelligence of electron microscopes.
文摘This opinion paper explores the transformative potential of large language models(LLMs)in laparoscopic surgery and argues for their integration to enhance surgical education,decision support,reporting,and patient care.LLMs can revolutionize surgical education by providing personalized learning experiences and accelerating skill acquisition.Intelligent decision support systems powered by LLMs can assist surgeons in making complex decisions,optimizing surgical workflows,and improving patient outcomes.Moreover,LLMs can automate surgical reporting and generate personalized patient education materials,streamlining documentation and improving patient engagement.However,challenges such as data scarcity,surgical semantic capture,real-time inference,and integration with existing systems need to be addressed for successful LLM integration.The future of laparoscopic surgery lies in the seamless integration of LLMs,enabling autonomous robotic surgery,predictive surgical planning,intraoperative decision support,virtual surgical assistants,and continuous learning.By harnessing the power of LLMs,laparoscopic surgery can be transformed,empowering surgeons and ultimately benefiting patients.
基金supported by National Key R&D Program of China(2022QY2000-02).
文摘Accurately recommending candidate news to users is a basic challenge of personalized news recommendation systems.Traditional methods are usually difficult to learn and acquire complex semantic information in news texts,resulting in unsatisfactory recommendation results.Besides,these traditional methods are more friendly to active users with rich historical behaviors.However,they can not effectively solve the long tail problem of inactive users.To address these issues,this research presents a novel general framework that combines Large Language Models(LLM)and Knowledge Graphs(KG)into traditional methods.To learn the contextual information of news text,we use LLMs’powerful text understanding ability to generate news representations with rich semantic information,and then,the generated news representations are used to enhance the news encoding in traditional methods.In addition,multi-hops relationship of news entities is mined and the structural information of news is encoded using KG,thus alleviating the challenge of long-tail distribution.Experimental results demonstrate that compared with various traditional models,on evaluation indicators such as AUC,MRR,nDCG@5 and nDCG@10,the framework significantly improves the recommendation performance.The successful integration of LLM and KG in our framework has established a feasible way for achieving more accurate personalized news recommendation.Our code is available at https://github.com/Xuan-ZW/LKPNR.
文摘The problematic use of social media has numerous negative impacts on individuals'daily lives,interpersonal relationships,physical and mental health,and more.Currently,there are few methods and tools to alleviate problematic social media,and their potential is yet to be fully realized.Emerging large language models(LLMs)are becoming increasingly popular for providing information and assistance to people and are being applied in many aspects of life.In mitigating problematic social media use,LLMs such as ChatGPT can play a positive role by serving as conversational partners and outlets for users,providing personalized information and resources,monitoring and intervening in problematic social media use,and more.In this process,we should recognize both the enormous potential and endless possibilities of LLMs such as ChatGPT,leveraging their advantages to better address problematic social media use,while also acknowledging the limitations and potential pitfalls of ChatGPT technology,such as errors,limitations in issue resolution,privacy and security concerns,and potential overreliance.When we leverage the advantages of LLMs to address issues in social media usage,we must adopt a cautious and ethical approach,being vigilant of the potential adverse effects that LLMs may have in addressing problematic social media use to better harness technology to serve individuals and society.
文摘Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, and more. However, their widespread usage emphasizes the critical need to enhance their security posture to ensure the integrity and reliability of their outputs and minimize harmful effects. Prompt injections and training data poisoning attacks are two of the most prominent vulnerabilities in LLMs, which could potentially lead to unpredictable and undesirable behaviors, such as biased outputs, misinformation propagation, and even malicious content generation. The Common Vulnerability Scoring System (CVSS) framework provides a standardized approach to capturing the principal characteristics of vulnerabilities, facilitating a deeper understanding of their severity within the security and AI communities. By extending the current CVSS framework, we generate scores for these vulnerabilities such that organizations can prioritize mitigation efforts, allocate resources effectively, and implement targeted security measures to defend against potential risks.
文摘In recent years,large language models(LLMs)have made significant progress in natural language processing(NLP).These models not only perform well in a variety of language tasks but also show great potential in the medical field.This paper aims to explore the application of LLMs in clinical dialogues,analyzing their role in improving the efficiency of doctor-patient communication,aiding in diagnosis and treatment,and providing emotional support.The paper also discusses the challenges and limitations of the model in terms of privacy protection,ethical issues,and practical applications.Through comprehensive analysis,we conclude that applying LLMs in clinical dialogues is promising.However,it requires careful consideration and caution by practitioners in practice.
文摘This article proposes a document-level prompt learning approach using LLMs to extract the timeline-based storyline. Through verification tests on datasets such as ESCv1.2 and Timeline17, the results show that the prompt + one-shot learning proposed in this article works well. Meanwhile, our research findings indicate that although timeline-based storyline extraction has shown promising prospects in the practical applications of LLMs, it is still a complex natural language processing task that requires further research.
文摘With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enhance database query systems, enabling more intuitive and semantic query mechanisms. Our model leverages LLM’s deep learning architecture to interpret and process natural language queries and translate them into accurate database queries. The system integrates an LLM-powered semantic parser that translates user input into structured queries that can be understood by the database management system. First, the user query is pre-processed, the text is normalized, and the ambiguity is removed. This is followed by semantic parsing, where the LLM interprets the pre-processed text and identifies key entities and relationships. This is followed by query generation, which converts the parsed information into a structured query format and tailors it to the target database schema. Finally, there is query execution and feedback, where the resulting query is executed on the database and the results are returned to the user. The system also provides feedback mechanisms to improve and optimize future query interpretations. By using advanced LLMs for model implementation and fine-tuning on diverse datasets, the experimental results show that the proposed method significantly improves the accuracy and usability of database queries, making data retrieval easy for users without specialized knowledge.
文摘Recently,the emergence of ChatGPT,an artificial intelligence chatbot developed by OpenAI,has attracted significant attention due to its exceptional language comprehension and content generation capabilities,highlighting the immense potential of large language models(LLMs).LLMs have become a burgeoning hotspot across many fields,including health care.Within health care,LLMs may be classified into LLMs for the biomedical domain and LLMs for the clinical domain based on the corpora used for pre-training.In the last 3 years,these domain-specific LLMs have demonstrated exceptional perform-ance on multiple natural language processing tasks,surpassing the perform-ance of general LLMs as well.This not only emphasizes the significance of developing dedicated LLMs for the specific domains,but also raises expectations for their applications in health care.We believe that LLMs may be used widely in preconsultation,diagnosis,and management,with appropriate development and supervision.Additionally,LLMs hold tremen-dous promise in assisting with medical education,medical writing and other related applications.Likewise,health care systems must recognize and address the challenges posed by LLMs.
基金Supported by the China Health Promotion Foundation Young Doctors'Research Foundation for Inflammatory Bowel Disease,the Taishan Scholars Program of Shandong Province,China,No.tsqn202306343National Natural Science Foundation of China,No.82270578.
文摘BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patient information needs.However,LLM use to deliver accurate and comprehensible IBD-related medical information has yet to be thoroughly investigated.AIM To assess the utility of three LLMs(ChatGPT-4.0,Claude-3-Opus,and Gemini-1.5-Pro)as a reference point for patients with IBD.METHODS In this comparative study,two gastroenterology experts generated 15 IBD-related questions that reflected common patient concerns.These questions were used to evaluate the performance of the three LLMs.The answers provided by each model were independently assessed by three IBD-related medical experts using a Likert scale focusing on accuracy,comprehensibility,and correlation.Simultaneously,three patients were invited to evaluate the comprehensibility of their answers.Finally,a readability assessment was performed.RESULTS Overall,each of the LLMs achieved satisfactory levels of accuracy,comprehensibility,and completeness when answering IBD-related questions,although their performance varies.All of the investigated models demonstrated strengths in providing basic disease information such as IBD definition as well as its common symptoms and diagnostic methods.Nevertheless,when dealing with more complex medical advice,such as medication side effects,dietary adjustments,and complication risks,the quality of answers was inconsistent between the LLMs.Notably,Claude-3-Opus generated answers with better readability than the other two models.CONCLUSION LLMs have the potential as educational tools for patients with IBD;however,there are discrepancies between the models.Further optimization and the development of specialized models are necessary to ensure the accuracy and safety of the information provided.
基金the National Natural Science Foundation of China(Grant No.62102420)the Beijing Outstanding Young Scientist Program(No.BJJWZYJH012019100020098)。
文摘Autonomous agents have long been a research focus in academic and industry communities.Previous research often focuses on training agents with limited knowledge within isolated environments,which diverges significantly from human learning processes,and makes the agents hard to achieve human-like decisions.Recently,through the acquisition of vast amounts of Web knowledge,large language models(LLMs)have shown potential in human-level intelligence,leading to a surge in research on LLM-based autonomous agents.In this paper,we present a comprehensive survey of these studies,delivering a systematic review of LLM-based autonomous agents from a holistic perspective.We first discuss the construction of LLM-based autonomous agents,proposing a unified framework that encompasses much of previous work.Then,we present a overview of the diverse applications of LLM-based autonomous agents in social science,natural science,and engineering.Finally,we delve into the evaluation strategies commonly used for LLM-based autonomous agents.Based on the previous studies,we also present several challenges and future directions in this field.
基金supported partly by the National Science Foundation award FMitF-2319242.
文摘Large Language Models(LLMs),such as ChatGPT and Bard,have revolutionized natural language understanding and generation.They possess deep language comprehension,human-like text generation capabilities,contextual awareness,and robust problem-solving skills,making them invaluable in various domains(e.g.,search engines,customer support,translation).In the meantime,LLMs have also gained traction in the security community,revealing security vulnerabilities and showcasing their potential in security-related tasks.This paper explores the intersection of LLMs with security and privacy.Specifically,we investigate how LLMs positively impact security and privacy,potential risks and threats associated with their use,and inherent vulnerabilities within LLMs.Through a comprehensive literature review,the paper categorizes the papers into‘‘The Good’’(beneficial LLM applications),‘‘The Bad’’(offensive applications),and‘‘The Ugly’’(vulnerabilities of LLMs and their defenses).We have some interesting findings.For example,LLMs have proven to enhance code security(code vulnerability detection)and data privacy(data confidentiality protection),outperforming traditional methods.However,they can also be harnessed for various attacks(particularly user-level attacks)due to their human-like reasoning abilities.We have identified areas that require further research efforts.For example,Research on model and parameter extraction attacks is limited and often theoretical,hindered by LLM parameter scale and confidentiality.Safe instruction tuning,a recent development,requires more exploration.We hope that our work can shed light on the LLMs’potential to both bolster and jeopardize cybersecurity.
基金National Institute of General Medical Sciences,Grant/Award Number:R35-GM126985National Institute of Diabetes and Digestive and Kidney Diseases,Grant/Award Number:P30DK092950U.S.National Library of Medicine,Grant/Award Number:LM013392。
文摘Understanding complex biological pathways,including gene–gene interactions and gene regulatory networks,is critical for exploring disease mechanisms and drug development.Manual literature curation of biological pathways cannot keep up with the exponential growth of new discoveries in the literature.Large-scale language models(LLMs)trained on extensive text corpora contain rich biological information,and they can be mined as a biological knowledge graph.This study assesses 21 LLMs,including both application programming interface(API)-based models and open-source models in their capacities of retrieving biological knowledge.The evaluation focuses on predicting gene regulatory relations(activation,inhibition,and phosphorylation)and the Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway components.Results indicated a significant disparity in model performance.API-based models GPT-4 and Claude-Pro showed superior performance,with an F1 score of 0.4448 and 0.4386 for the gene regulatory relation prediction,and a Jaccard similarity index of 0.2778 and 0.2657 for the KEGG pathway prediction,respectively.Open-source models lagged behind their API-based counterparts,whereas Falcon-180b and llama2-7b had the highest F1 scores of 0.2787 and 0.1923 in gene regulatory relations,respectively.The KEGG pathway recognition had a Jaccard similarity index of 0.2237 for Falcon-180b and 0.2207 for llama2-7b.Our study suggests that LLMs are informative in gene network analysis and pathway mapping,but their effectiveness varies,necessitating careful model selection.This work also provides a case study and insight into using LLMs das knowledge graphs.Our code is publicly available at the website of GitHub(Muh-aza).