Information Extraction(IE)aims to extract structural knowledge from plain natural language texts.Recently,generative Large Language Models(LLMs)have demonstrated remarkable capabilities in text understanding and gener...Information Extraction(IE)aims to extract structural knowledge from plain natural language texts.Recently,generative Large Language Models(LLMs)have demonstrated remarkable capabilities in text understanding and generation.As a result,numerous works have been proposed to integrate LLMs for IE tasks based on a generative paradigm.To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks,in this study,we survey the most recent advancements in this field.We first present an extensive overview by categorizing these works in terms of various IE subtasks and techniques,and then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.Based on a thorough review conducted,we identify several insights in technique and promising research directions that deserve further exploration in future studies.We maintain a public repository and consistently update related works and resources on GitHub(LLM4IE repository).展开更多
In recent years, large numbers of non-coding RNAs (ncR- NAs) have been identified in C. elegans but their functions are still not well studied. In C. elegans, CEP-1 is the sole homolog of the p53 family of genes. In...In recent years, large numbers of non-coding RNAs (ncR- NAs) have been identified in C. elegans but their functions are still not well studied. In C. elegans, CEP-1 is the sole homolog of the p53 family of genes. In order to obtain transcription profiles of ncRNAs regulated by CEP-1 under normal and UV stressed conditions, we applied the 'not-so- random' hexamers priming strategy to RNA sequencing in C. elegans, This NSR-seq strategy efficiently depleted rRNA transcripts from the samples and showed high technical replicability. We identified more than 1,000 ncR- NAs whose apparent expression was repressed by CEP-1, while around 200 were activated. Around 40% of the CEP-1 activated ncRNAs promoters contain a putative CEP-1- binding site. CEP-1 regulated ncRNAs were frequently clustered and concentrated on the X chromosome. These results indicate that numerous ncRNAs are involved in CEP-1 transcriptional network and that these are espe- cially enriched on the X chromosome in C. elegans.展开更多
TdT-interacting factor 1(TdIF1)is a ubiquitously expressed DNA-and protein-binding protein that directly binds to terminal deoxynucleotidyl transferase(TdT)polymerase.Little is known about the functional role of TdIF1...TdT-interacting factor 1(TdIF1)is a ubiquitously expressed DNA-and protein-binding protein that directly binds to terminal deoxynucleotidyl transferase(TdT)polymerase.Little is known about the functional role of TdIF1 in cancer cellular signaling,nor has it previously been identified as aberrant in any type of cancer.We report here for the first time that TdIF1 is abundantly expressed in clinical lung cancer patients and that high expression of TdIF1 is associated with poor patient prognosis.We further established that TdIF1 is highly expressed in human non-small cell lung cancer(NSCLC)cell lines compared to a normal lung cell line.shRNA-mediated gene silencing of TdIF1 resulted in the suppression of proliferation and anchorage-independent colony formation of the A549 adenocarcinoma cell line.Moreover,when these TdIF1-silenced cells were used to establish a mouse xenograft model of human NSCLC,tumor size was greatly reduced.These data suggest that TdIF1 is a potent regulator of lung tumor development.Several cell cycle-related and tumor growth signaling pathways,including the p53 and HDAC1/2 pathways,were identified as participating in the TdIF1 signaling network by in silico analysis.Microarray,transcriptome and protein-level analyses validated p53 and HDAC1/2 modulation upon TdIF1 downregulation in an NSCLC cellular model.Moreover,several other cell cycle regulators were affected at the transcript level by TdIF1 silencing,including an increase in CDKN1A/p21 transcripts.Taken together,these results indicate that TdIF1 is a bona fide tumor-promoting factor in NSCLC and a potential target for therapy.展开更多
基金supported in part by the grants from the National Natural Science Foundation of China(Nos.62222213,62072423)partially supported by Research Impact Fund(No.R1015-23),APRC-CityU New Research Initiatives(No.9610565,Start-up Grant for New Faculty of CityU)+7 种基金CityU-HKIDS Early Career Research Grant(No.9360163)Hong Kong ITC Innovation and Technology Fund Midstream Research Programme for Universities Project(No.ITS/034/22MS)Hong Kong Environmental and Conservation Fund(No.88/2022)SIRG-CityU Strategic Interdisciplinary Research Grant(No.7020046)Huawei(Huawei Innovation Research Program),Tencent(CCFTencent Open Fund,Tencent Rhino-Bird Focused Research Program),Ant Group(CCF-Ant Research Fund,Ant Group Research Fund)Alibaba(CCFAlimama Tech Kangaroo Fund(No.2024002))CCF-BaiChuan-Ebtech Foundation Model FundKuaishou.
文摘Information Extraction(IE)aims to extract structural knowledge from plain natural language texts.Recently,generative Large Language Models(LLMs)have demonstrated remarkable capabilities in text understanding and generation.As a result,numerous works have been proposed to integrate LLMs for IE tasks based on a generative paradigm.To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks,in this study,we survey the most recent advancements in this field.We first present an extensive overview by categorizing these works in terms of various IE subtasks and techniques,and then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.Based on a thorough review conducted,we identify several insights in technique and promising research directions that deserve further exploration in future studies.We maintain a public repository and consistently update related works and resources on GitHub(LLM4IE repository).
文摘In recent years, large numbers of non-coding RNAs (ncR- NAs) have been identified in C. elegans but their functions are still not well studied. In C. elegans, CEP-1 is the sole homolog of the p53 family of genes. In order to obtain transcription profiles of ncRNAs regulated by CEP-1 under normal and UV stressed conditions, we applied the 'not-so- random' hexamers priming strategy to RNA sequencing in C. elegans, This NSR-seq strategy efficiently depleted rRNA transcripts from the samples and showed high technical replicability. We identified more than 1,000 ncR- NAs whose apparent expression was repressed by CEP-1, while around 200 were activated. Around 40% of the CEP-1 activated ncRNAs promoters contain a putative CEP-1- binding site. CEP-1 regulated ncRNAs were frequently clustered and concentrated on the X chromosome. These results indicate that numerous ncRNAs are involved in CEP-1 transcriptional network and that these are espe- cially enriched on the X chromosome in C. elegans.
基金This study was supported by grants from the Natural Science Foundation of China(NSFC,No.81673009,81803064)the Jiangxi Provincial Natural Science Foundation(20161BBG70061)the Canadian Institutes of Health Research(CIHR).
文摘TdT-interacting factor 1(TdIF1)is a ubiquitously expressed DNA-and protein-binding protein that directly binds to terminal deoxynucleotidyl transferase(TdT)polymerase.Little is known about the functional role of TdIF1 in cancer cellular signaling,nor has it previously been identified as aberrant in any type of cancer.We report here for the first time that TdIF1 is abundantly expressed in clinical lung cancer patients and that high expression of TdIF1 is associated with poor patient prognosis.We further established that TdIF1 is highly expressed in human non-small cell lung cancer(NSCLC)cell lines compared to a normal lung cell line.shRNA-mediated gene silencing of TdIF1 resulted in the suppression of proliferation and anchorage-independent colony formation of the A549 adenocarcinoma cell line.Moreover,when these TdIF1-silenced cells were used to establish a mouse xenograft model of human NSCLC,tumor size was greatly reduced.These data suggest that TdIF1 is a potent regulator of lung tumor development.Several cell cycle-related and tumor growth signaling pathways,including the p53 and HDAC1/2 pathways,were identified as participating in the TdIF1 signaling network by in silico analysis.Microarray,transcriptome and protein-level analyses validated p53 and HDAC1/2 modulation upon TdIF1 downregulation in an NSCLC cellular model.Moreover,several other cell cycle regulators were affected at the transcript level by TdIF1 silencing,including an increase in CDKN1A/p21 transcripts.Taken together,these results indicate that TdIF1 is a bona fide tumor-promoting factor in NSCLC and a potential target for therapy.