SWEET(Sugars will eventually be exported transporters)是近年来在植物中发现的一组糖转运蛋白,在植物生长、发育和非生物及生物胁迫响应等多种生理过程中发挥着重要作用。本研究利用生物信息学方法对猕猴桃(Actinidia chinensis Pla...SWEET(Sugars will eventually be exported transporters)是近年来在植物中发现的一组糖转运蛋白,在植物生长、发育和非生物及生物胁迫响应等多种生理过程中发挥着重要作用。本研究利用生物信息学方法对猕猴桃(Actinidia chinensis Planch.)AcSWEET基因家族进行了鉴定,共获得29个AcSWEET基因,并对其氨基酸数量、相对分子量、等电点、不稳定系数、亚细胞定位、亲水指数进行了分析。结果显示:29个基因编码的氨基酸数目为680~906个;分子量范围为7.531~101.266 kDa,等电点在6.95~9.90,多数蛋白为定位于细胞膜的疏水性蛋白,具有1~2个MtN3结构域或PQ-loop结构域。此外,AcSWEET基因的外显子数量在4~6个,系统进化分析结果表明猕猴桃AcSWEET基因家族被分为4个亚族,同一亚族基因具有相似的内含子、外显子以及保守基序。表达模式分析结果表明,这些基因在果实不同发育时期具有表达特异性。推测AcSWEET26、AcSWEET7、AcSWEET15和AcSWEET13可能参与猕猴桃的蔗糖转运和积累。展开更多
As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate unders...As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.展开更多
文摘SWEET(Sugars will eventually be exported transporters)是近年来在植物中发现的一组糖转运蛋白,在植物生长、发育和非生物及生物胁迫响应等多种生理过程中发挥着重要作用。本研究利用生物信息学方法对猕猴桃(Actinidia chinensis Planch.)AcSWEET基因家族进行了鉴定,共获得29个AcSWEET基因,并对其氨基酸数量、相对分子量、等电点、不稳定系数、亚细胞定位、亲水指数进行了分析。结果显示:29个基因编码的氨基酸数目为680~906个;分子量范围为7.531~101.266 kDa,等电点在6.95~9.90,多数蛋白为定位于细胞膜的疏水性蛋白,具有1~2个MtN3结构域或PQ-loop结构域。此外,AcSWEET基因的外显子数量在4~6个,系统进化分析结果表明猕猴桃AcSWEET基因家族被分为4个亚族,同一亚族基因具有相似的内含子、外显子以及保守基序。表达模式分析结果表明,这些基因在果实不同发育时期具有表达特异性。推测AcSWEET26、AcSWEET7、AcSWEET15和AcSWEET13可能参与猕猴桃的蔗糖转运和积累。
基金financially supported by the Natural Science Foundation of China(Grant No.42301492)the National Key R&D Program of China(Grant Nos.2022YFF0711600,2022YFF0801201,2022YFF0801200)+3 种基金the Major Special Project of Xinjiang(Grant No.2022A03009-3)the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources(Grant No.KF-2022-07014)the Opening Fund of the Key Laboratory of the Geological Survey and Evaluation of the Ministry of Education(Grant No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities。
文摘As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.