The utilization of sequence stratigraphic concepts in identifying sands and their spatial continuity in distinct gross depositional settings is key,especially in frontier settings where data paucity is a common challe...The utilization of sequence stratigraphic concepts in identifying sands and their spatial continuity in distinct gross depositional settings is key,especially in frontier settings where data paucity is a common challenge.In the Baka field,onshore Niger Delta,detailed reservoir correlation guided by sequence stratigraphic framework analysis showed the distribution of sand and shale units constituting reservoirseal pairs(RSP)correlatable across the field.Within the 3rd-order packages,it is observed that the lowstand systems tract(LST)and highstand systems tract(HST)contain more RSPs and thicker 4th-and 5th-order sands than the transgressive systems tract(TST).In terms of bathymetry,it is noted that irrespective of systems tracts,the RSP Index(RI)decreases from the proximal shallow/inner shelf settings to the more distal outer shelf areas.Amongst all three systems tracts,intervals interpreted as lowstand prograding complexes contain the best developed sands and highest RSP.Sand development within the LSTs has been controlled by a pronounced growth fault regime accompanied by high subsidence and sedimentation rates.This is linked to the basinward migration of the sands during prolonged sea-level fall,creating significant accommodation space for sand deposition.On the other hand,the TSTs known to mark periods of progressive sea-level rise and landward migration of sandy facies,show thinner sands enclosed in much thicker,laterally extensive,and better-preserved deeper marine shales.Interpreted seismic sections indicate intense growth faulting and channelization that influenced the syn-and postdepositional development of the sand packages across the field.The initial timing of deformation of subregional faults in this area coincides with periods of abrupt falls in sea level.This approach could be useful for predicting sand-prone areas in frontier fields as well as possible reservoir-seal parameters required for some aspects of petroleum system analysis and quick-look volume estimation.展开更多
Computer-aided design(CAD)software continues to be a crucial tool in digital twin application and manufacturing,facilitating the design of various products.We present a novel CAD generation method,an agent that constr...Computer-aided design(CAD)software continues to be a crucial tool in digital twin application and manufacturing,facilitating the design of various products.We present a novel CAD generation method,an agent that constructs the CAD sequences containing the sketch-and-extrude modelling operations efficiently and with high quality.Starting from the sketch and extrusion operation sequences,we utilise the transformer encoder to encode them into different disentangled codebooks to represent their distribution properties while considering their correlations.Then,a combination of auto-regressive and non-autoregressive samplers is trained to sample the code for CAD sequence con-struction.Extensive experiments demonstrate that our model generates diverse and high-quality CAD models.We also show some cases of real digital twin applications and indicate that our generated model can be used as the data source for the digital twin platform,exhibiting designers'potential.展开更多
Objective The aim was to provide basis for molecular marker assisted selection and resistance breeding of Langya chicken. Method The genetic polymorphism of Hae III site of Mx gene 3' sequence in Langya chicken was ...Objective The aim was to provide basis for molecular marker assisted selection and resistance breeding of Langya chicken. Method The genetic polymorphism of Hae III site of Mx gene 3' sequence in Langya chicken was analyzed by PCR-RFLP. Result The results showed that Hae III site controlled by allele A and B were polymorphic in Langya chicken breeds and the allele frequencies of A and B were 0.562 and 0.438 respectively. The genotype distribution of Hae III site was significantly not in Hardy-Weinberg equilibrium ( P 〈0.01 ). The polymorphic fragments were cloned and sequenced, and the results revealed that the fragment size was 357 bp and a deletion of 31 bp occurred in variation sequences. Conclusion It was found that Hae III-RFLP exists in Mx gene 3' sequence in Langya chicken breeds of Shandong Province.展开更多
Iron-sulfur clusters(ISC)are essential cofactors for proteins involved in various biological processes,such as electron transport,biosynthetic reactions,DNA repair,and gene expression regulation.ISC assembly protein I...Iron-sulfur clusters(ISC)are essential cofactors for proteins involved in various biological processes,such as electron transport,biosynthetic reactions,DNA repair,and gene expression regulation.ISC assembly protein IscA1(or MagR)is found within the mitochondria of most eukaryotes.Magnetoreceptor(MagR)is a highly conserved A-type iron and iron-sulfur cluster-binding protein,characterized by two distinct types of iron-sulfur clusters,[2Fe-2S]and[3Fe-4S],each conferring unique magnetic properties.MagR forms a rod-like polymer structure in complex with photoreceptive cryptochrome(Cry)and serves as a putative magnetoreceptor for retrieving geomagnetic information in animal navigation.Although the N-terminal sequences of MagR vary among species,their specific function remains unknown.In the present study,we found that the N-terminal sequences of pigeon MagR,previously thought to serve as a mitochondrial targeting signal(MTS),were not cleaved following mitochondrial entry but instead modulated the efficiency with which iron-sulfur clusters and irons are bound.Moreover,the N-terminal region of MagR was required for the formation of a stable MagR/Cry complex.Thus,the N-terminal sequences in pigeon MagR fulfil more important functional roles than just mitochondrial targeting.These results further extend our understanding of the function of MagR and provide new insights into the origin of magnetoreception from an evolutionary perspective.展开更多
This paper gives an account of the research that the authors conducted on the cyclic sequences, events and evolutionary history from Proterozoic to Meso-Cenozoic in the Sino-Korean plate based on the principle of the ...This paper gives an account of the research that the authors conducted on the cyclic sequences, events and evolutionary history from Proterozoic to Meso-Cenozoic in the Sino-Korean plate based on the principle of the Cosmos-Earth System. The authors divided this plate into 20 super-cyclic or super-mega-cyclic periods and more than 100 Oort periods. The research focused on important sea flooding events, uplift interruption events, tilting movement events, molar-tooth carbonate events, thermal events, polarity reversal events, karst events, volcanic explosion events and storm events, as well as types of resource areas and paleotectonic evolution. By means of the isochronous theory of the Cosmos-Earth System periodicity and based on long-excentricity and periodicity, the authors elaborately studied the paleogeographic evolution of the aulacogen of the Sino-Korean plate, the oolitic beach platform formation, the development of foreland basin and continental rift valley basin, and reconstructed the evolution展开更多
Based on a study of Neoproterozoic carbonates in the Jilin-Liaoning-Xuzhou-Huaiyang area, especially its cyclic sequence stratigraphy and Sr isotopes, two maximum sea flooding events (at 820 Ma and 835 Ma) have been i...Based on a study of Neoproterozoic carbonates in the Jilin-Liaoning-Xuzhou-Huaiyang area, especially its cyclic sequence stratigraphy and Sr isotopes, two maximum sea flooding events (at 820 Ma and 835 Ma) have been identified. The resulting isochronous stratigraphic correlation proves that these Precambrian strata were connected between the Qingbaikou and the Nanhuan systems with a time range from 750 Ma to 850 Ma. The disappearance of microsparite carbonate and coming of a glacial stage offer important evidence for worldwide stratigraphic correlation and open a window for further correlation of the stratigraphic successions across the Sino-Korean and Yangtze Plates. A new correlation scheme is therefore provided based on our work.展开更多
Different genetic types of meter-scale cyclic sequences in stratigraphic records result from episodic accumulation of strata related to Milankovitch cycles. The distinctive fabric natures of facies succession result f...Different genetic types of meter-scale cyclic sequences in stratigraphic records result from episodic accumulation of strata related to Milankovitch cycles. The distinctive fabric natures of facies succession result from the sedimentation governed by different sediment sources and sedimentary dynamic conditions in different paleogeographical backgrounds, corresponding to high-frequency sea-level changes. Naturally, this is the fundamental criterion for the classification of genetic types of meter-scale cyclic sequences. The widespread development in stratigraphic records and the regular vertical stacking patterns in long-term sequences, the evolution characters of earth history and the genetic types reflected by specific fabric natures of facies successions in different paleogeographical settings, all that show meter-scale cyclic sequences are not only the elementary working units in stratigraphy and sedimentology, but also the replenishment and extension of parasequence of sequence stratigraphy. Two genetic kinds of facies succession for meter-scale cyclic sequence in neritic-facies strata of carbonate and clastic rocks, are normal grading succession mainly formed by tidal sedimentation and inverse grading succession chiefly made by wave sedimentation, and both of them constitute generally shallowing upward succession, the thickness of which ranges from several tens of centimeters to several meters. The classification of genetic types of meter-scale cyclic sequence could be made in terms of the fabric natures of facies succession, and carbonate meter-scale cyclic sequences could be divided into four types: L-M type, deep-water asymmetrical type, subtidal type and peritidal type. Clastic meter-scale cyclic sequences could be grouped into two types: tidal-dynamic type and wave-dynamic type. The boundaries of meter-scale cyclic sequences are marked by instantaneous punctuated surface formed by non-deposition resulting from high-frequency level changes, which include instantaneous exposed punctuated surface, drowned punctuated surface as well as their relative surface. The development of instantaneous punctuated surface used as the boundary of meter-scale cyclic sequence brings about the limitations of Walter's Law on the explanation of facies distribution in time and space, and reaffirm the importance of Sander's Rule on analysis of stratigraphic records. These non-continuous surface could be traced for long distance and some could be correlative within same basin range. The study of meter-scale cyclic sequences and their regularly vertical stacking patterns in long-term sequences indicate that the research into cyclicity of stratigraphic records is a useful way to get more regularity from stratigraphic records that are frequently complex as well as non-integrated.展开更多
Both the macroscopic feature and the sequence-stratigraphic position of the molar-tooth structure developed in the third member of the Gaoyuzhuang (高于庄) Formation at the Jixian (蓟县) Section in Tianjin (天津...Both the macroscopic feature and the sequence-stratigraphic position of the molar-tooth structure developed in the third member of the Gaoyuzhuang (高于庄) Formation at the Jixian (蓟县) Section in Tianjin (天津) can provide some useful information about its origin and can reveal some problems to be further researched in the future. The Mesoproterozoic Gaoyuzhuang Formation is a set of 1 600 m thick carbonate strata. This formation can be divided into four members. The first member is mainly made up of stromatolitic dolomites; the second is marked by a set of manganese dolomites; the third is mainly composed of lamina limestones with the development of molar-tooth strcutures; the fourth is a set of stromatolitic-lithoherm dolomites. According to lithofacies and its succession, several types of meter-scale cycles can be discerned in the Gaoyuzhuang Formation: the L-M type, the subtidal type and the peritidal type. There is a regularly vertical stacking pattern for meter-scale cycles in the third-order sequence. Therefore, the Mesoproterozoic Gaoyuzhuang Formation can be divided into 13 third-order sequences (SQ1 to SQ13 ) and can further be grouped into 4 second-order sequences. The third member is marked by lamina limestones and can be grouped into three third-order sequences (SQ9 to SQ11 ). The molar-tooth structure is developed in the middle part of the third sequence, i.e. SQH , in the third member. Several features of this kind of molar-tooth structure reflect some features of carbonate sedimentation in the Precambrian, such as the particular configuration, abundant organic matter, and easy silication. Stromatolites are chiefly formed in a shallow tidal-flat environment; lamina are mainly formed in the shallow ramp and molar-tooth structures are mainly generated in a relatively more deep-water environment from the middle to the deep ramp. Therefore, similar to stromatolite and lamina, the molartooth structure might also be a kind of bio-sedimentation structure. This suggestion is based on macroscopic observation and the sedimentary-facies analysis of the molar-tooth structures from the sequencestratigraphic position. These features of Precambrian sedimentation also reveal the problem of Precambrian carbonate sedimentation. With more detailed study, a more practical solution for these problems may be obtained in the future.展开更多
The effect of promoter cobalt and the sequences of adding cobalt and molybdenum precursors on the performance of sulfur-resistant methanation were investigated. All these samples were prepared by impregnation method a...The effect of promoter cobalt and the sequences of adding cobalt and molybdenum precursors on the performance of sulfur-resistant methanation were investigated. All these samples were prepared by impregnation method and characterized by N2-adsorption, X-ray diffraction(XRD), temperature-programmed reduction(TPR) and laser Raman spectroscopy(LRS). The conversions of CO for Mo-Co/Al, Co-Mo/Al and CoMo/Al catalysts were 59.7%, 54.3% and 53.9%, respectively. Among these catalysts, the Mo-Co/Al catalyst prepared stepwisely by impregnating Mo precursor firstly showed the best catalytic performance. Meanwhile, the conversions of CO were 48.9% for Mo/Al catalyst and 10.5% for Co/Al catalyst. The addition of cobalt species could improve the catalytic activity of Mo/Al catalyst. The N2-adsorption results showed that Co-Mo/Al catalyst had the smallest specific surface area among these catalysts. CoMoO4species in CoMo/Al catalyst were detected with XRD, TPR and LRS. Moreover, crystal MoS2which was reported to be less active than amorphous MoS2was found in both Co-Mo/Al and CoMo/Al catalysts. Mo-Co/Al catalyst showed the best catalytic performance as it had an appropriate surface structure, i.e., no crystal MoS2and very little CoMoO4species.展开更多
We have compared genetic diversity of 24 Chinese weak-winter, Swedish winter and spring B. napus accessions by inter-simple sequence repeats (ISSRs). By cluster analysis (UPGMA) based on 125 polymorphism bands amplifi...We have compared genetic diversity of 24 Chinese weak-winter, Swedish winter and spring B. napus accessions by inter-simple sequence repeats (ISSRs). By cluster analysis (UPGMA) based on 125 polymorphism bands amplified with 20 primers, the 24 accessions were divided into three groups. Six Swedish winter lines and eight Chinese weak-winter lines were in the group I and the groupⅡwere two Chinese weak-winter lines XiangyoulS and Bao81. The third group contained eight Swedish spring lines. Principal co-ordinates analysis (PCO) showed similar groupings to cluster analysis. Results from cluster analysis and PCO analysis showed very clearly that Chinese weak-winter, Swedish spring and winter accessions were distinguished from each other and Chinese weak-winter accessions in this study were genetically closer to Swedish winter accessions than to Swedish spring accessions. The Chinese weak-winter accessions had larger diversity than Swedish spring or winter accessions did. This study indicated that ISSR is a suitable and effective tool to evaluate genetic diversity among rapeseed germplasm.展开更多
Coding sequences (CDS) are commonly used for transient gene expression, in yeast two-hybrid screening, to verify protein interactions and in prokaryotic gene expression studies. CDS are most commonly obtained using co...Coding sequences (CDS) are commonly used for transient gene expression, in yeast two-hybrid screening, to verify protein interactions and in prokaryotic gene expression studies. CDS are most commonly obtained using complementary DNA (cDNA) derived from messenger RNA (mRNA) extracted from plant tissues and generated by reverse transcription. However, some CDS are difficult to acquire through this process as they are expressed at extremely low levels or have specific spatial and/or temporal expression patterns in vivo. These challenges require the development of alternative CDS cloning technologies. In this study, we found that the genomic intron-containing gene coding sequences (gDNA) from Arabidopsis thaliana, Oryza sativa, Brassica napus, and Glycine max can be correctly transcribed and spliced into mRNA in Nicotiana benthamiana. In contrast, gDNAs from Triticum aestivum and Sorghum bicolor did not function correctly. In transient expression experiments, the target DNA sequence is driven by a constitutive promoter. Theoretically, a sufficient amount of mRNA can be extracted from the N. benthamiana leaves, making it conducive to the cloning of CDS target genes. Our data demonstrate that N. benthamiana can be used as an effective host for the cloning CDS of plant genes.展开更多
Objective To study the alternative expression and sequence of human elongation factor-1δ (human EF-1δ p31) during malignant transformation of human bronchial epithelial cells induced by cadmium chloride (CdCl2) ...Objective To study the alternative expression and sequence of human elongation factor-1δ (human EF-1δ p31) during malignant transformation of human bronchial epithelial cells induced by cadmium chloride (CdCl2) and its possible mechanism. Methods Total RNA was isolated at different stages of transformed human bronchial epithelial cells (16HBE) induced by CdCl2 at a concentration of 5.0 μM. Special primers and probe for human EF-1δ p31 were designed and expression of human EF-18 mRNA from different cell lines was detected with fluorescent quantitative PCR technique. EF-18 cDNA from different cell lines was purified and cloned into pMD 18-T vector followed by confirming and sequencing analysis. Results The expressions of human EF-1δ p31 at different stages of 16HBE cells transformed by CdCl2 was elevated (P〈0.01 or P〈0.05). Compared with their corresponding non-transformed ceils, the overexpression level of EF-15 p31 was averagely increased 2.9 folds in Cd-pretransformed cells, 4.3 folds in Cd-transformed ceils and 7.2 folds in Cd-tumorigenic cells. No change was found in the sequence of overexpressed EF-1δ p31 at different stages of 16HBE cells transformed by CdCl2. Conclusion Overexpression of human EF-1δ p31 is positively correlated with malignant transformation of 16HBE cells induced by CdCl2, but is not correlated with DNA mutations.展开更多
Since the inception of the optimal sequence estimation (OSE) method,various research teams have substantiated its efficacy as the optimal stacking technique for handling array data,leading to its successful applicatio...Since the inception of the optimal sequence estimation (OSE) method,various research teams have substantiated its efficacy as the optimal stacking technique for handling array data,leading to its successful application in numerous geoscience studies.Nevertheless,concerns persist regarding the potential impact of aliasing resulting from the choice of distinct station distributions on the outcomes derived from OSE.In this investigation,I employ theoretical deduction and experimental analysis to elucidate the reasons behind the immunity of the Y_(l'm')-related common signal obtained through OSE to variations in station distribution selection.The primary objective of OSE is also underscored,i.e.,to restore/strip a Y_(l'm')-related common periodic signal from various stations.Furthermore,I provide additional clarification that the‘Y_(l'm')-related common signal’and the‘Y_(l'm')-related equivalent excitation sequence’are distinct concepts.These analyses will facilitate the utilization of the OSE technique by other researchers in investigating intriguing geophysical phenomena and attaining sound explanations.展开更多
Background The reliance on a solitary linear reference genome has imposed a significant constraint on our compre-hensive understanding of genetic variation in animals.This constraint is particularly pronounced for non...Background The reliance on a solitary linear reference genome has imposed a significant constraint on our compre-hensive understanding of genetic variation in animals.This constraint is particularly pronounced for non-reference sequences(NRSs),which have not been extensively studied.Results In this study,we constructed a pig pangenome graph using 21 pig assemblies and identified 23,831 NRSs with a total length of 105 Mb.Our findings revealed that NRSs were more prevalent in breeds exhibiting greater genetic divergence from the reference genome.Furthermore,we observed that NRSs were rarely found within coding sequences,while NRS insertions were enriched in immune-related Gene Ontology terms.Notably,our investigation also unveiled a close association between novel genes and the immune capacity of pigs.We observed substantial differences in terms of frequencies of NRSs between Eastern and Western pigs,and the heat-resistant pigs exhibited a substantial number of NRS insertions in an 11.6 Mb interval on chromosome X.Additionally,we discovered a 665 bp insertion in the fourth intron of the TNFRSF19 gene that may be associated with the ability of heat tolerance in South-ern Chinese pigs.Conclusions Our findings demonstrate the potential of a graph genome approach to reveal important functional features of NRSs in pig populations.展开更多
Objective:To address the phylogenetic and phylogeographic relationship between different lineages of Anopheles(An.)subpictus species complex in most parts of the Asian continent by maximum utilization of Internal Tran...Objective:To address the phylogenetic and phylogeographic relationship between different lineages of Anopheles(An.)subpictus species complex in most parts of the Asian continent by maximum utilization of Internal Transcriber Spacer 2(ITS2)and cytochrome C oxidase I(COI)sequences deposited at the GenBank.Methods:Seventy-five ITS2,210 COI and 26 concatenated sequences available in the NCBI database were used.Phylogenetic analysis was performed using Bayesian likelihood trees,whereas median-joining haplotype networks and time-scale divergence trees were generated for phylogeographic analysis.Genetic diversity indices and genetic differentiation were also calculated.Results:Two genetically divergent molecular forms of An.subpictus species complex corresponding to sibling species A and B are established.Species A evolved around 37-82 million years ago in Sri Lanka,India,and the Netherlands,and species B evolved around 22-79 million years ago in Sri Lanka,India,and Myanmar.Vietnam,Thailand,and Cambodia have two molecular forms:one is phylogenetically similar to species B.Other forms differ from species A and B and evolved recently in the above mentioned countries,Indonesia and the Philippines.Genetic subdivision among Sri Lanka,India,and the Netherlands is almost absent.A substantial genetic differentiation was obtained for some populations due to isolation by large geographical distances.Genetic diversity indices reveal the presence of a long-established stable mosquito population,at mutation-drift equilibrium,regardless of population fluctuations.Conclusions:An.subpictus species complex consists of more than two genetically divergent molecular forms.Species A is highly divergent from the rest.Sri Lanka and India contain only species A and B.展开更多
In the present paper,we mostly focus on P_(p)^(2)-statistical convergence.We will look into the uniform integrability via the power series method and its characterizations for double sequences.Also,the notions of P_(p...In the present paper,we mostly focus on P_(p)^(2)-statistical convergence.We will look into the uniform integrability via the power series method and its characterizations for double sequences.Also,the notions of P_(p)^(2)-statistically Cauchy sequence,P_(p)^(2)-statistical boundedness and core for double sequences will be described in addition to these findings.展开更多
To solve the problem of target damage assessment when fragments attack target under uncertain projectile and target intersection in an air defense intercept,this paper proposes a method for calculating target damage p...To solve the problem of target damage assessment when fragments attack target under uncertain projectile and target intersection in an air defense intercept,this paper proposes a method for calculating target damage probability leveraging spatio-temporal finite multilayer fragments distribution and the target damage assessment algorithm based on cloud model theory.Drawing on the spatial dispersion characteristics of fragments of projectile proximity explosion,we divide into a finite number of fragments distribution planes based on the time series in space,set up a fragment layer dispersion model grounded in the time series and intersection criterion for determining the effective penetration of each layer of fragments into the target.Building on the precondition that the multilayer fragments of the time series effectively assail the target,we also establish the damage criterion of the perforation and penetration damage and deduce the damage probability calculation model.Taking the damage probability of the fragment layer in the spatio-temporal sequence to the target as the input state variable,we introduce cloud model theory to research the target damage assessment method.Combining the equivalent simulation experiment,the scientific and rational nature of the proposed method were validated through quantitative calculations and comparative analysis.展开更多
The present study is devoted to understanding the evolution of the Upper Jurassic Sab'atayn Formation in the Marib-Shabwa Basin,Yemen,through a sequence stratigraphic analysis based on integrating datasets of sedi...The present study is devoted to understanding the evolution of the Upper Jurassic Sab'atayn Formation in the Marib-Shabwa Basin,Yemen,through a sequence stratigraphic analysis based on integrating datasets of sedimentology,seismic sections,and well logs.The Sab'atayn Formation(Tithonian age)is represented by a series of clastic and evaporites that were deposited under fluvio-deltaic to prodeltaic settings.It is divided into four members including Yah(at the base),upwards to Seen,Alif,and Safir at the top.Two third-order depositional sequences were determined for the Tithonian succession which were separated by three sequence boundaries.These sequences were classified into their systems tracts signifying several sedimentation patterns of progradational,aggradational,and retrogradational parasequence sets.The first depositional sequence corresponds to the early-middle Tithonian Yah and Seen units that can be classified into lowstand,transgressive,and highstand systems tracts.The second sequence comprises the late Tithonian Alif unit that can be subdivided into transgressive and highstand systems tracts.The sandy deposits of the Alif Member(highstand deposits)represent the most productive hydrocarbon reservoir in the basin.The Upper Jurassic sediments in the study area were resulted from a combination of eustatic and tectonic effects.展开更多
Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to p...Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent.展开更多
Genetic manipulation(either restraint or enhancement)of the biosynthesis pathway ofα-linolenic acid(ALA)in seed oil is an important goal in Brassica napus breeding.B.napus is a tetraploid plant whose genome often har...Genetic manipulation(either restraint or enhancement)of the biosynthesis pathway ofα-linolenic acid(ALA)in seed oil is an important goal in Brassica napus breeding.B.napus is a tetraploid plant whose genome often har-bors four and six homologous copies,respectively,of the two fatty acid desaturases FAD2 and FAD3,which con-trol the last two steps of ALA biosynthesis during seed oil accumulation.In this study,we compared their promoters,coding sequences,and expression levels in three high-ALA inbred lines 2006L,R8Q10,and YH25005,a low-ALA line A28,a low-ALA/high-oleic-acid accession SW,and the wildtype ZS11.The expression levels of most FAD2 and FAD3 homologs in the three high-ALA accessions were higher than those in ZS11 and much higher than those in A28 and SW.The three high-ALA accessions shared similar sequences with the pro-moters and CDSs of BnFAD3.C4 and BnFAD3.A3.In A28 and SW,substitution of three amino acid residues in BnFAD2.A5 and BnFAD2.C5,an absence of BnFAD2.C1 locus,and a 549 bp long deletion on the BnFAD3.A3 promoter were detected.The profile of BnFAD2 mutation in the two low-ALA accessions A28 and SW is different from that reported in previous studies.The mutations in BnFAD3 in the high-ALA accessions are reported for thefirst time.In identifying the sites of these mutations,we provide detailed information to aid the design of mole-cular markers for accelerated breeding schemes.展开更多
基金sponsored by the Shell Petroleum Development Company of Nigeria Limited(SPDC).
文摘The utilization of sequence stratigraphic concepts in identifying sands and their spatial continuity in distinct gross depositional settings is key,especially in frontier settings where data paucity is a common challenge.In the Baka field,onshore Niger Delta,detailed reservoir correlation guided by sequence stratigraphic framework analysis showed the distribution of sand and shale units constituting reservoirseal pairs(RSP)correlatable across the field.Within the 3rd-order packages,it is observed that the lowstand systems tract(LST)and highstand systems tract(HST)contain more RSPs and thicker 4th-and 5th-order sands than the transgressive systems tract(TST).In terms of bathymetry,it is noted that irrespective of systems tracts,the RSP Index(RI)decreases from the proximal shallow/inner shelf settings to the more distal outer shelf areas.Amongst all three systems tracts,intervals interpreted as lowstand prograding complexes contain the best developed sands and highest RSP.Sand development within the LSTs has been controlled by a pronounced growth fault regime accompanied by high subsidence and sedimentation rates.This is linked to the basinward migration of the sands during prolonged sea-level fall,creating significant accommodation space for sand deposition.On the other hand,the TSTs known to mark periods of progressive sea-level rise and landward migration of sandy facies,show thinner sands enclosed in much thicker,laterally extensive,and better-preserved deeper marine shales.Interpreted seismic sections indicate intense growth faulting and channelization that influenced the syn-and postdepositional development of the sand packages across the field.The initial timing of deformation of subregional faults in this area coincides with periods of abrupt falls in sea level.This approach could be useful for predicting sand-prone areas in frontier fields as well as possible reservoir-seal parameters required for some aspects of petroleum system analysis and quick-look volume estimation.
基金National Key Research and Development Program of China,Grant/Award Number:2022YFF0904303Beijing Science and Technology Planning Project,Grant/Award Number:Z221100006322003National Natural Science Foundation of China,Grant/Award Number:61932003。
文摘Computer-aided design(CAD)software continues to be a crucial tool in digital twin application and manufacturing,facilitating the design of various products.We present a novel CAD generation method,an agent that constructs the CAD sequences containing the sketch-and-extrude modelling operations efficiently and with high quality.Starting from the sketch and extrusion operation sequences,we utilise the transformer encoder to encode them into different disentangled codebooks to represent their distribution properties while considering their correlations.Then,a combination of auto-regressive and non-autoregressive samplers is trained to sample the code for CAD sequence con-struction.Extensive experiments demonstrate that our model generates diverse and high-quality CAD models.We also show some cases of real digital twin applications and indicate that our generated model can be used as the data source for the digital twin platform,exhibiting designers'potential.
基金Supported by Shandong Project of Agricultural Improved VarietyEngineering~~
文摘Objective The aim was to provide basis for molecular marker assisted selection and resistance breeding of Langya chicken. Method The genetic polymorphism of Hae III site of Mx gene 3' sequence in Langya chicken was analyzed by PCR-RFLP. Result The results showed that Hae III site controlled by allele A and B were polymorphic in Langya chicken breeds and the allele frequencies of A and B were 0.562 and 0.438 respectively. The genotype distribution of Hae III site was significantly not in Hardy-Weinberg equilibrium ( P 〈0.01 ). The polymorphic fragments were cloned and sequenced, and the results revealed that the fragment size was 357 bp and a deletion of 31 bp occurred in variation sequences. Conclusion It was found that Hae III-RFLP exists in Mx gene 3' sequence in Langya chicken breeds of Shandong Province.
基金supported by the National Natural Science Foundation of China(31640001 and T2350005 to C.X.,U21A20148 to X.Z.and C.X.)Ministry of Science and Technology of China(2021ZD0140300 to C.X.)+2 种基金Natural Science Foundation of Hainan Province(No.822RC703 for J.L.)Foundation of Hainan Educational Committee(No.Hnky2022-27 for J.L.)Presidential Foundation of Hefei Institutes of Physical Science,Chinese Academy of Sciences(Y96XC11131,E26CCG27,and E26CCD15 to C.X.,E36CWGBR24B and E36CZG14132 to T.C.)。
文摘Iron-sulfur clusters(ISC)are essential cofactors for proteins involved in various biological processes,such as electron transport,biosynthetic reactions,DNA repair,and gene expression regulation.ISC assembly protein IscA1(or MagR)is found within the mitochondria of most eukaryotes.Magnetoreceptor(MagR)is a highly conserved A-type iron and iron-sulfur cluster-binding protein,characterized by two distinct types of iron-sulfur clusters,[2Fe-2S]and[3Fe-4S],each conferring unique magnetic properties.MagR forms a rod-like polymer structure in complex with photoreceptive cryptochrome(Cry)and serves as a putative magnetoreceptor for retrieving geomagnetic information in animal navigation.Although the N-terminal sequences of MagR vary among species,their specific function remains unknown.In the present study,we found that the N-terminal sequences of pigeon MagR,previously thought to serve as a mitochondrial targeting signal(MTS),were not cleaved following mitochondrial entry but instead modulated the efficiency with which iron-sulfur clusters and irons are bound.Moreover,the N-terminal region of MagR was required for the formation of a stable MagR/Cry complex.Thus,the N-terminal sequences in pigeon MagR fulfil more important functional roles than just mitochondrial targeting.These results further extend our understanding of the function of MagR and provide new insights into the origin of magnetoreception from an evolutionary perspective.
文摘This paper gives an account of the research that the authors conducted on the cyclic sequences, events and evolutionary history from Proterozoic to Meso-Cenozoic in the Sino-Korean plate based on the principle of the Cosmos-Earth System. The authors divided this plate into 20 super-cyclic or super-mega-cyclic periods and more than 100 Oort periods. The research focused on important sea flooding events, uplift interruption events, tilting movement events, molar-tooth carbonate events, thermal events, polarity reversal events, karst events, volcanic explosion events and storm events, as well as types of resource areas and paleotectonic evolution. By means of the isochronous theory of the Cosmos-Earth System periodicity and based on long-excentricity and periodicity, the authors elaborately studied the paleogeographic evolution of the aulacogen of the Sino-Korean plate, the oolitic beach platform formation, the development of foreland basin and continental rift valley basin, and reconstructed the evolution
文摘Based on a study of Neoproterozoic carbonates in the Jilin-Liaoning-Xuzhou-Huaiyang area, especially its cyclic sequence stratigraphy and Sr isotopes, two maximum sea flooding events (at 820 Ma and 835 Ma) have been identified. The resulting isochronous stratigraphic correlation proves that these Precambrian strata were connected between the Qingbaikou and the Nanhuan systems with a time range from 750 Ma to 850 Ma. The disappearance of microsparite carbonate and coming of a glacial stage offer important evidence for worldwide stratigraphic correlation and open a window for further correlation of the stratigraphic successions across the Sino-Korean and Yangtze Plates. A new correlation scheme is therefore provided based on our work.
基金ThestudyisjointlysupportedbyNationalNaturalScienceFoundationofChina (No .4980 2 0 1 2 )andMinistryofSciencesandTechnology (SSER
文摘Different genetic types of meter-scale cyclic sequences in stratigraphic records result from episodic accumulation of strata related to Milankovitch cycles. The distinctive fabric natures of facies succession result from the sedimentation governed by different sediment sources and sedimentary dynamic conditions in different paleogeographical backgrounds, corresponding to high-frequency sea-level changes. Naturally, this is the fundamental criterion for the classification of genetic types of meter-scale cyclic sequences. The widespread development in stratigraphic records and the regular vertical stacking patterns in long-term sequences, the evolution characters of earth history and the genetic types reflected by specific fabric natures of facies successions in different paleogeographical settings, all that show meter-scale cyclic sequences are not only the elementary working units in stratigraphy and sedimentology, but also the replenishment and extension of parasequence of sequence stratigraphy. Two genetic kinds of facies succession for meter-scale cyclic sequence in neritic-facies strata of carbonate and clastic rocks, are normal grading succession mainly formed by tidal sedimentation and inverse grading succession chiefly made by wave sedimentation, and both of them constitute generally shallowing upward succession, the thickness of which ranges from several tens of centimeters to several meters. The classification of genetic types of meter-scale cyclic sequence could be made in terms of the fabric natures of facies succession, and carbonate meter-scale cyclic sequences could be divided into four types: L-M type, deep-water asymmetrical type, subtidal type and peritidal type. Clastic meter-scale cyclic sequences could be grouped into two types: tidal-dynamic type and wave-dynamic type. The boundaries of meter-scale cyclic sequences are marked by instantaneous punctuated surface formed by non-deposition resulting from high-frequency level changes, which include instantaneous exposed punctuated surface, drowned punctuated surface as well as their relative surface. The development of instantaneous punctuated surface used as the boundary of meter-scale cyclic sequence brings about the limitations of Walter's Law on the explanation of facies distribution in time and space, and reaffirm the importance of Sander's Rule on analysis of stratigraphic records. These non-continuous surface could be traced for long distance and some could be correlative within same basin range. The study of meter-scale cyclic sequences and their regularly vertical stacking patterns in long-term sequences indicate that the research into cyclicity of stratigraphic records is a useful way to get more regularity from stratigraphic records that are frequently complex as well as non-integrated.
基金This paper is financially supported by the National Natural Science Foundation of China (Nos .49802012 ,40472065) .
文摘Both the macroscopic feature and the sequence-stratigraphic position of the molar-tooth structure developed in the third member of the Gaoyuzhuang (高于庄) Formation at the Jixian (蓟县) Section in Tianjin (天津) can provide some useful information about its origin and can reveal some problems to be further researched in the future. The Mesoproterozoic Gaoyuzhuang Formation is a set of 1 600 m thick carbonate strata. This formation can be divided into four members. The first member is mainly made up of stromatolitic dolomites; the second is marked by a set of manganese dolomites; the third is mainly composed of lamina limestones with the development of molar-tooth strcutures; the fourth is a set of stromatolitic-lithoherm dolomites. According to lithofacies and its succession, several types of meter-scale cycles can be discerned in the Gaoyuzhuang Formation: the L-M type, the subtidal type and the peritidal type. There is a regularly vertical stacking pattern for meter-scale cycles in the third-order sequence. Therefore, the Mesoproterozoic Gaoyuzhuang Formation can be divided into 13 third-order sequences (SQ1 to SQ13 ) and can further be grouped into 4 second-order sequences. The third member is marked by lamina limestones and can be grouped into three third-order sequences (SQ9 to SQ11 ). The molar-tooth structure is developed in the middle part of the third sequence, i.e. SQH , in the third member. Several features of this kind of molar-tooth structure reflect some features of carbonate sedimentation in the Precambrian, such as the particular configuration, abundant organic matter, and easy silication. Stromatolites are chiefly formed in a shallow tidal-flat environment; lamina are mainly formed in the shallow ramp and molar-tooth structures are mainly generated in a relatively more deep-water environment from the middle to the deep ramp. Therefore, similar to stromatolite and lamina, the molartooth structure might also be a kind of bio-sedimentation structure. This suggestion is based on macroscopic observation and the sedimentary-facies analysis of the molar-tooth structures from the sequencestratigraphic position. These features of Precambrian sedimentation also reveal the problem of Precambrian carbonate sedimentation. With more detailed study, a more practical solution for these problems may be obtained in the future.
文摘The effect of promoter cobalt and the sequences of adding cobalt and molybdenum precursors on the performance of sulfur-resistant methanation were investigated. All these samples were prepared by impregnation method and characterized by N2-adsorption, X-ray diffraction(XRD), temperature-programmed reduction(TPR) and laser Raman spectroscopy(LRS). The conversions of CO for Mo-Co/Al, Co-Mo/Al and CoMo/Al catalysts were 59.7%, 54.3% and 53.9%, respectively. Among these catalysts, the Mo-Co/Al catalyst prepared stepwisely by impregnating Mo precursor firstly showed the best catalytic performance. Meanwhile, the conversions of CO were 48.9% for Mo/Al catalyst and 10.5% for Co/Al catalyst. The addition of cobalt species could improve the catalytic activity of Mo/Al catalyst. The N2-adsorption results showed that Co-Mo/Al catalyst had the smallest specific surface area among these catalysts. CoMoO4species in CoMo/Al catalyst were detected with XRD, TPR and LRS. Moreover, crystal MoS2which was reported to be less active than amorphous MoS2was found in both Co-Mo/Al and CoMo/Al catalysts. Mo-Co/Al catalyst showed the best catalytic performance as it had an appropriate surface structure, i.e., no crystal MoS2and very little CoMoO4species.
文摘We have compared genetic diversity of 24 Chinese weak-winter, Swedish winter and spring B. napus accessions by inter-simple sequence repeats (ISSRs). By cluster analysis (UPGMA) based on 125 polymorphism bands amplified with 20 primers, the 24 accessions were divided into three groups. Six Swedish winter lines and eight Chinese weak-winter lines were in the group I and the groupⅡwere two Chinese weak-winter lines XiangyoulS and Bao81. The third group contained eight Swedish spring lines. Principal co-ordinates analysis (PCO) showed similar groupings to cluster analysis. Results from cluster analysis and PCO analysis showed very clearly that Chinese weak-winter, Swedish spring and winter accessions were distinguished from each other and Chinese weak-winter accessions in this study were genetically closer to Swedish winter accessions than to Swedish spring accessions. The Chinese weak-winter accessions had larger diversity than Swedish spring or winter accessions did. This study indicated that ISSR is a suitable and effective tool to evaluate genetic diversity among rapeseed germplasm.
文摘Coding sequences (CDS) are commonly used for transient gene expression, in yeast two-hybrid screening, to verify protein interactions and in prokaryotic gene expression studies. CDS are most commonly obtained using complementary DNA (cDNA) derived from messenger RNA (mRNA) extracted from plant tissues and generated by reverse transcription. However, some CDS are difficult to acquire through this process as they are expressed at extremely low levels or have specific spatial and/or temporal expression patterns in vivo. These challenges require the development of alternative CDS cloning technologies. In this study, we found that the genomic intron-containing gene coding sequences (gDNA) from Arabidopsis thaliana, Oryza sativa, Brassica napus, and Glycine max can be correctly transcribed and spliced into mRNA in Nicotiana benthamiana. In contrast, gDNAs from Triticum aestivum and Sorghum bicolor did not function correctly. In transient expression experiments, the target DNA sequence is driven by a constitutive promoter. Theoretically, a sufficient amount of mRNA can be extracted from the N. benthamiana leaves, making it conducive to the cloning of CDS target genes. Our data demonstrate that N. benthamiana can be used as an effective host for the cloning CDS of plant genes.
基金Supported by the National Natural Science Foundation of China (No. 30771781)the Natural Science Foundation of Guangdong Province (No.06022672)
文摘Objective To study the alternative expression and sequence of human elongation factor-1δ (human EF-1δ p31) during malignant transformation of human bronchial epithelial cells induced by cadmium chloride (CdCl2) and its possible mechanism. Methods Total RNA was isolated at different stages of transformed human bronchial epithelial cells (16HBE) induced by CdCl2 at a concentration of 5.0 μM. Special primers and probe for human EF-1δ p31 were designed and expression of human EF-18 mRNA from different cell lines was detected with fluorescent quantitative PCR technique. EF-18 cDNA from different cell lines was purified and cloned into pMD 18-T vector followed by confirming and sequencing analysis. Results The expressions of human EF-1δ p31 at different stages of 16HBE cells transformed by CdCl2 was elevated (P〈0.01 or P〈0.05). Compared with their corresponding non-transformed ceils, the overexpression level of EF-15 p31 was averagely increased 2.9 folds in Cd-pretransformed cells, 4.3 folds in Cd-transformed ceils and 7.2 folds in Cd-tumorigenic cells. No change was found in the sequence of overexpressed EF-1δ p31 at different stages of 16HBE cells transformed by CdCl2. Conclusion Overexpression of human EF-1δ p31 is positively correlated with malignant transformation of 16HBE cells induced by CdCl2, but is not correlated with DNA mutations.
基金supported by the National Natural Science Foundation of China (Grants:42388102,42192533,and 42192531)the Fundamental Research Funds for the Central Universities (Grant:2042023kfyq01)the Project Supported by the Special Fund of Hubei Luojia Laboratory (Grant:220100002)。
文摘Since the inception of the optimal sequence estimation (OSE) method,various research teams have substantiated its efficacy as the optimal stacking technique for handling array data,leading to its successful application in numerous geoscience studies.Nevertheless,concerns persist regarding the potential impact of aliasing resulting from the choice of distinct station distributions on the outcomes derived from OSE.In this investigation,I employ theoretical deduction and experimental analysis to elucidate the reasons behind the immunity of the Y_(l'm')-related common signal obtained through OSE to variations in station distribution selection.The primary objective of OSE is also underscored,i.e.,to restore/strip a Y_(l'm')-related common periodic signal from various stations.Furthermore,I provide additional clarification that the‘Y_(l'm')-related common signal’and the‘Y_(l'm')-related equivalent excitation sequence’are distinct concepts.These analyses will facilitate the utilization of the OSE technique by other researchers in investigating intriguing geophysical phenomena and attaining sound explanations.
基金This work was supported by the National Key Research and Development Program of China(grant no.2022YFF1000500)National Natural Science Foundation of China(grant no.31941007)Zhejiang province agriculture(livestock)varieties breeding Key Technology R&D Program(grant no.2016C02054-2).
文摘Background The reliance on a solitary linear reference genome has imposed a significant constraint on our compre-hensive understanding of genetic variation in animals.This constraint is particularly pronounced for non-reference sequences(NRSs),which have not been extensively studied.Results In this study,we constructed a pig pangenome graph using 21 pig assemblies and identified 23,831 NRSs with a total length of 105 Mb.Our findings revealed that NRSs were more prevalent in breeds exhibiting greater genetic divergence from the reference genome.Furthermore,we observed that NRSs were rarely found within coding sequences,while NRS insertions were enriched in immune-related Gene Ontology terms.Notably,our investigation also unveiled a close association between novel genes and the immune capacity of pigs.We observed substantial differences in terms of frequencies of NRSs between Eastern and Western pigs,and the heat-resistant pigs exhibited a substantial number of NRS insertions in an 11.6 Mb interval on chromosome X.Additionally,we discovered a 665 bp insertion in the fourth intron of the TNFRSF19 gene that may be associated with the ability of heat tolerance in South-ern Chinese pigs.Conclusions Our findings demonstrate the potential of a graph genome approach to reveal important functional features of NRSs in pig populations.
文摘Objective:To address the phylogenetic and phylogeographic relationship between different lineages of Anopheles(An.)subpictus species complex in most parts of the Asian continent by maximum utilization of Internal Transcriber Spacer 2(ITS2)and cytochrome C oxidase I(COI)sequences deposited at the GenBank.Methods:Seventy-five ITS2,210 COI and 26 concatenated sequences available in the NCBI database were used.Phylogenetic analysis was performed using Bayesian likelihood trees,whereas median-joining haplotype networks and time-scale divergence trees were generated for phylogeographic analysis.Genetic diversity indices and genetic differentiation were also calculated.Results:Two genetically divergent molecular forms of An.subpictus species complex corresponding to sibling species A and B are established.Species A evolved around 37-82 million years ago in Sri Lanka,India,and the Netherlands,and species B evolved around 22-79 million years ago in Sri Lanka,India,and Myanmar.Vietnam,Thailand,and Cambodia have two molecular forms:one is phylogenetically similar to species B.Other forms differ from species A and B and evolved recently in the above mentioned countries,Indonesia and the Philippines.Genetic subdivision among Sri Lanka,India,and the Netherlands is almost absent.A substantial genetic differentiation was obtained for some populations due to isolation by large geographical distances.Genetic diversity indices reveal the presence of a long-established stable mosquito population,at mutation-drift equilibrium,regardless of population fluctuations.Conclusions:An.subpictus species complex consists of more than two genetically divergent molecular forms.Species A is highly divergent from the rest.Sri Lanka and India contain only species A and B.
文摘In the present paper,we mostly focus on P_(p)^(2)-statistical convergence.We will look into the uniform integrability via the power series method and its characterizations for double sequences.Also,the notions of P_(p)^(2)-statistically Cauchy sequence,P_(p)^(2)-statistical boundedness and core for double sequences will be described in addition to these findings.
基金supported by National Natural Science Foundation of China(Grant No.62073256)the Shaanxi Provincial Science and Technology Department(Grant No.2023-YBGY-342).
文摘To solve the problem of target damage assessment when fragments attack target under uncertain projectile and target intersection in an air defense intercept,this paper proposes a method for calculating target damage probability leveraging spatio-temporal finite multilayer fragments distribution and the target damage assessment algorithm based on cloud model theory.Drawing on the spatial dispersion characteristics of fragments of projectile proximity explosion,we divide into a finite number of fragments distribution planes based on the time series in space,set up a fragment layer dispersion model grounded in the time series and intersection criterion for determining the effective penetration of each layer of fragments into the target.Building on the precondition that the multilayer fragments of the time series effectively assail the target,we also establish the damage criterion of the perforation and penetration damage and deduce the damage probability calculation model.Taking the damage probability of the fragment layer in the spatio-temporal sequence to the target as the input state variable,we introduce cloud model theory to research the target damage assessment method.Combining the equivalent simulation experiment,the scientific and rational nature of the proposed method were validated through quantitative calculations and comparative analysis.
文摘The present study is devoted to understanding the evolution of the Upper Jurassic Sab'atayn Formation in the Marib-Shabwa Basin,Yemen,through a sequence stratigraphic analysis based on integrating datasets of sedimentology,seismic sections,and well logs.The Sab'atayn Formation(Tithonian age)is represented by a series of clastic and evaporites that were deposited under fluvio-deltaic to prodeltaic settings.It is divided into four members including Yah(at the base),upwards to Seen,Alif,and Safir at the top.Two third-order depositional sequences were determined for the Tithonian succession which were separated by three sequence boundaries.These sequences were classified into their systems tracts signifying several sedimentation patterns of progradational,aggradational,and retrogradational parasequence sets.The first depositional sequence corresponds to the early-middle Tithonian Yah and Seen units that can be classified into lowstand,transgressive,and highstand systems tracts.The second sequence comprises the late Tithonian Alif unit that can be subdivided into transgressive and highstand systems tracts.The sandy deposits of the Alif Member(highstand deposits)represent the most productive hydrocarbon reservoir in the basin.The Upper Jurassic sediments in the study area were resulted from a combination of eustatic and tectonic effects.
基金This work is supported by the project“Research on Methods and Technologies of Scientific Researcher Entity Linking and Subject Indexing”(Grant No.G190091)from the National Science Library,Chinese Academy of Sciencesthe project“Design and Research on a Next Generation of Open Knowledge Services System and Key Technologies”(2019XM55).
文摘Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent.
基金The study was financially supported by Projects from Shaanxi Province(2021LLRH-07-03-01 and 2023-ZDLNY-07)Yangling Seed Industry Innovation(YLzy-yc2021-01).The funders had no role in study design,data collection and analysis,decision to publish,or preparation of the manuscript.
文摘Genetic manipulation(either restraint or enhancement)of the biosynthesis pathway ofα-linolenic acid(ALA)in seed oil is an important goal in Brassica napus breeding.B.napus is a tetraploid plant whose genome often har-bors four and six homologous copies,respectively,of the two fatty acid desaturases FAD2 and FAD3,which con-trol the last two steps of ALA biosynthesis during seed oil accumulation.In this study,we compared their promoters,coding sequences,and expression levels in three high-ALA inbred lines 2006L,R8Q10,and YH25005,a low-ALA line A28,a low-ALA/high-oleic-acid accession SW,and the wildtype ZS11.The expression levels of most FAD2 and FAD3 homologs in the three high-ALA accessions were higher than those in ZS11 and much higher than those in A28 and SW.The three high-ALA accessions shared similar sequences with the pro-moters and CDSs of BnFAD3.C4 and BnFAD3.A3.In A28 and SW,substitution of three amino acid residues in BnFAD2.A5 and BnFAD2.C5,an absence of BnFAD2.C1 locus,and a 549 bp long deletion on the BnFAD3.A3 promoter were detected.The profile of BnFAD2 mutation in the two low-ALA accessions A28 and SW is different from that reported in previous studies.The mutations in BnFAD3 in the high-ALA accessions are reported for thefirst time.In identifying the sites of these mutations,we provide detailed information to aid the design of mole-cular markers for accelerated breeding schemes.