BACKGROUND Synchronous liver metastasis(SLM)is a significant contributor to morbidity in colorectal cancer(CRC).There are no effective predictive device integration algorithms to predict adverse SLM events during the ...BACKGROUND Synchronous liver metastasis(SLM)is a significant contributor to morbidity in colorectal cancer(CRC).There are no effective predictive device integration algorithms to predict adverse SLM events during the diagnosis of CRC.AIM To explore the risk factors for SLM in CRC and construct a visual prediction model based on gray-level co-occurrence matrix(GLCM)features collected from magnetic resonance imaging(MRI).METHODS Our study retrospectively enrolled 392 patients with CRC from Yichang Central People’s Hospital from January 2015 to May 2023.Patients were randomly divided into a training and validation group(3:7).The clinical parameters and GLCM features extracted from MRI were included as candidate variables.The prediction model was constructed using a generalized linear regression model,random forest model(RFM),and artificial neural network model.Receiver operating characteristic curves and decision curves were used to evaluate the prediction model.RESULTS Among the 392 patients,48 had SLM(12.24%).We obtained fourteen GLCM imaging data for variable screening of SLM prediction models.Inverse difference,mean sum,sum entropy,sum variance,sum of squares,energy,and difference variance were listed as candidate variables,and the prediction efficiency(area under the curve)of the subsequent RFM in the training set and internal validation set was 0.917[95%confidence interval(95%CI):0.866-0.968]and 0.09(95%CI:0.858-0.960),respectively.CONCLUSION A predictive model combining GLCM image features with machine learning can predict SLM in CRC.This model can assist clinicians in making timely and personalized clinical decisions.展开更多
Solving arithmetic word problems that entail deep implicit relations is still a challenging problem.However,significant progress has been made in solving Arithmetic Word Problems(AWP)over the past six decades.This pap...Solving arithmetic word problems that entail deep implicit relations is still a challenging problem.However,significant progress has been made in solving Arithmetic Word Problems(AWP)over the past six decades.This paper proposes to discover deep implicit relations by qualia inference to solve Arithmetic Word Problems entailing Deep Implicit Relations(DIR-AWP),such as entailing commonsense or subject-domain knowledge involved in the problem-solving process.This paper proposes to take three steps to solve DIR-AWPs,in which the first three steps are used to conduct the qualia inference process.The first step uses the prepared set of qualia-quantity models to identify qualia scenes from the explicit relations extracted by the Syntax-Semantic(S2)method from the given problem.The second step adds missing entities and deep implicit relations in order using the identified qualia scenes and the qualia-quantity models,respectively.The third step distills the relations for solving the given problem by pruning the spare branches of the qualia dependency graph of all the acquired relations.The research contributes to the field by presenting a comprehensive approach combining explicit and implicit knowledge to enhance reasoning abilities.The experimental results on Math23K demonstrate hat the proposed algorithm is superior to the baseline algorithms in solving AWPs requiring deep implicit relations.展开更多
Co-occurrence pattern of fish species plays an important role in understanding the spatio-temporal structure and the stability of fish community.Species coexistence may vary with time and space.The co-occurrence patte...Co-occurrence pattern of fish species plays an important role in understanding the spatio-temporal structure and the stability of fish community.Species coexistence may vary with time and space.The co-occurrence patterns of fish species were examined using the C-score under fixed-fixed null model for fish communities in spring and autumn over different years in the Haizhou Bay,China.The results showed that fish assemblages in the whole bay had non-random patterns in spring and autumn over different years.However,the fish co-occurrence patterns were different for the northern and southern fish assemblages in spring and autumn.The northern fish assemblage showed structured pattern,whereas the southern assemblage were randomly assembled in spring.The co-occurrence patterns of fish communities were relatively stable over different years,and the number of significant species pairs in northern assemblage was more than that in the southern assemblage.Environmental heterogeneity played an important role in determining the distributions of fish species that formed significant species pairs,which might affect the co-occurrence patterns of northern and southern assemblages further in the Haizhou Bay.展开更多
Background:Disentangling the relative importance of environmental variables and interspecific interaction in modulating co-occurrence patterns of sympatric species is essential for understanding the mechanisms of comm...Background:Disentangling the relative importance of environmental variables and interspecific interaction in modulating co-occurrence patterns of sympatric species is essential for understanding the mechanisms of community assembly and biodiversity. For the two sympatric Galliformes, Silver Pheasants (Lophura nycthemera) and Whitenecklaced Partridges (Arborophila gingica), we know little about the role of habitat use and interspecific interactions in modulating their coexistence. Methods:We adopted a probabilistic approach incorporating habitat preference and interspecific interaction using occupancy model to account for imperfect detection,and used daily activity pattern analysis to investigate the cooccurrence pattern of these two sympatric Galliformes in wet and dry seasons. Results: We found that the detection probability of Silver Pheasant and White-necklaced Partridge were related to habitat variables and interspecific interaction. The presence of Silver Pheasant increases the detection probability of White-necklaced Partridge in both the wet and dry season. However, the presence of White-necklaced Partridges increases the detection probability of Silver Pheasants in the wet season, but decreases the probability in the dry season. Further, Silver Pheasants were detected frequently in the sites of high values of enhanced vegetable index (EVI) in both the wet and dry season, and in sites away from human residential settlement in the wet season. Whitenecklaced partridges were mainly detected in low EVI sites. The site use probabilities of two Galliformes were best explained by habitat variables, Silver Pheasants and White-necklaced Partridges preferred steeper areas during the wet and dry season. Both species mainly occurred in low EVI areas during the wet season and occupied sites away from the resident settlement during the dry season. Moreover, the site use probabilities of two species had opposite relationships with forest canopy coverage. Silver Pheasants preferred areas with high forest canopy coverage whereas White-necklaced Partridges preferred low forest canopy coverage in the dry season, and vice versa in the wet season. Species interaction factor (SIF)corroborated weak evidence of the dependence of the site use of one species on that of the other in the either dry or wet season.Temporally, high overlapping of daily activity pattern indicated no significantly temporal niche differentiation between sympatric Galliformes in both wet and dry seasons. Conclusions:Our results demonstrated that the presence of two species influenced the detection probability interactively and there was no temporal partitioning in activity time between Silver Pheasants and White-necklaced Partridges in the wet and dry seasons.The site use probability of two Galliformes was best explained by habitat variables, especially the forest canopy coverage.Therefore, environmental variables and interspecific interaction are the leading drivers regulating the detection and site use probability and promoting co-occurrence of Silver Pheasants and White-necklaced Partridges.展开更多
Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with g...Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.展开更多
We propose a method that can achieve the Naxi-English bilingual word automatic alignment based on a log-linear model.This method defines the different Naxi-English structural feature functions,which are English-Naxi i...We propose a method that can achieve the Naxi-English bilingual word automatic alignment based on a log-linear model.This method defines the different Naxi-English structural feature functions,which are English-Naxi interval switching function and Naxi-English bilingual word position transformation function.With the manually labeled Naxi-English words alignment corpus,the parameters of the model are trained by using the minimum error,thus Naxi-English bilingual word alignment is achieved automatically.Experiments are conducted with IBM Model 3 as a benchmark,and the Naxi language constraints are introduced.The final experiment results show that the proposed alignment method achieves very good results:the introduction of the language characteristic function can effectively improve the accuracy of the Naxi-English Bilingual Word Alignment.展开更多
One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse ...One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.展开更多
Negative worded(NW)items used in psychological instruments have been studied with the bifactor model to investigate whether the NW items form a secondary factor due to negative wording orthogonal to the measured laten...Negative worded(NW)items used in psychological instruments have been studied with the bifactor model to investigate whether the NW items form a secondary factor due to negative wording orthogonal to the measured latent construct,a validation procedure which checks whether NW items form a source of construct irrelevant variance(CIV)and hence constitute a validity threat.In the context of educational testing,however,no such validation attempts have been made.In this study,we studied the psychometric impact of NW items in an English proficiency reading comprehension test using a modeling approach similar to the bifactor model,namely the three-parameter logistic cross-classified testlet response theory(3PL CCTRT)model,to account for both guessing and possible local item dependence due to passage effect in the data set.The findings indicate that modeling the NW items with a separate factor leads to noticeable improvement in model fit,and the factor variance is marginal but nonzero.However,item and ability parameter estimates are highly similar between the 3PL CCTRT model and other models that do not model the NW items.It is concluded that the NW items introduce CIV into the data,but its magnitude is too small to change item and person ability parameter estimates to an extent of practical significance.展开更多
Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based...Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based on the bilingual parallel corpus often ignore the document background in the process of retelling acquisition and application.In order to solve this problem,we introduce topic model information into the translation mode and propose a topic-based statistical machine translation method to improve the translation performance.In this method,Probabilistic Latent Semantic Analysis(PLSA)is used to obtains the co-occurrence relationship between words and documents by the hybrid matrix decomposition.Then we design a decoder to simplify the decoding process.Experiments show that the proposed method can effectively improve the accuracy of translation.展开更多
Congenital heart defect,accounting for about 30%of congenital defects,is the most common one.Data shows that congenital heart defects have seriously affected the birth rate of healthy newborns.In Fetal andNeonatal Car...Congenital heart defect,accounting for about 30%of congenital defects,is the most common one.Data shows that congenital heart defects have seriously affected the birth rate of healthy newborns.In Fetal andNeonatal Cardiology,medical imaging technology(2D ultrasonic,MRI)has been proved to be helpful to detect congenital defects of the fetal heart and assists sonographers in prenatal diagnosis.It is a highly complex task to recognize 2D fetal heart ultrasonic standard plane(FHUSP)manually.Compared withmanual identification,automatic identification through artificial intelligence can save a lot of time,ensure the efficiency of diagnosis,and improve the accuracy of diagnosis.In this study,a feature extraction method based on texture features(Local Binary Pattern LBP and Histogram of Oriented Gradient HOG)and combined with Bag of Words(BOW)model is carried out,and then feature fusion is performed.Finally,it adopts Support VectorMachine(SVM)to realize automatic recognition and classification of FHUSP.The data includes 788 standard plane data sets and 448 normal and abnormal plane data sets.Compared with some other methods and the single method model,the classification accuracy of our model has been obviously improved,with the highest accuracy reaching 87.35%.Similarly,we also verify the performance of the model in normal and abnormal planes,and the average accuracy in classifying abnormal and normal planes is 84.92%.The experimental results show that thismethod can effectively classify and predict different FHUSP and can provide certain assistance for sonographers to diagnose fetal congenital heart disease.展开更多
针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectiona...针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectional encoder representation from transformers)预训练语言模型进行文本向量化表示;通过双向长短时记忆网络(Bidirectional long short-term memory network,BiLSTM)获取上下文语义特征;由条件随机场(Conditional random field,CRF)输出全局最优标签序列。基于此,在CRF层后加入畜禽疫病领域词典进行分词匹配修正,减少在分词过程中出现的疫病名称及短语等造成的歧义切分,进一步提高了分词准确率。实验结果表明,结合词典匹配的BERT-BiLSTM-CRF模型在羊常见疫病文本数据集上的F1值为96.38%,与jieba分词器、BiLSTM-Softmax模型、BiLSTM-CRF模型、未结合词典匹配的本文模型相比,分别提升11.01、10.62、8.3、0.72个百分点,验证了方法的有效性。与单一语料相比,通用语料PKU和羊常见疫病文本数据集结合的混合语料,能够同时对畜禽疫病专业术语及疫病文本中常用词进行准确切分,在通用语料及疫病文本数据集上F1值都达到95%以上,具有较好的模型泛化能力。该方法可用于畜禽疫病文本分词。展开更多
文摘BACKGROUND Synchronous liver metastasis(SLM)is a significant contributor to morbidity in colorectal cancer(CRC).There are no effective predictive device integration algorithms to predict adverse SLM events during the diagnosis of CRC.AIM To explore the risk factors for SLM in CRC and construct a visual prediction model based on gray-level co-occurrence matrix(GLCM)features collected from magnetic resonance imaging(MRI).METHODS Our study retrospectively enrolled 392 patients with CRC from Yichang Central People’s Hospital from January 2015 to May 2023.Patients were randomly divided into a training and validation group(3:7).The clinical parameters and GLCM features extracted from MRI were included as candidate variables.The prediction model was constructed using a generalized linear regression model,random forest model(RFM),and artificial neural network model.Receiver operating characteristic curves and decision curves were used to evaluate the prediction model.RESULTS Among the 392 patients,48 had SLM(12.24%).We obtained fourteen GLCM imaging data for variable screening of SLM prediction models.Inverse difference,mean sum,sum entropy,sum variance,sum of squares,energy,and difference variance were listed as candidate variables,and the prediction efficiency(area under the curve)of the subsequent RFM in the training set and internal validation set was 0.917[95%confidence interval(95%CI):0.866-0.968]and 0.09(95%CI:0.858-0.960),respectively.CONCLUSION A predictive model combining GLCM image features with machine learning can predict SLM in CRC.This model can assist clinicians in making timely and personalized clinical decisions.
基金The National Natural Science Foundation of China(No.61977029)supported the worksupported partly by Nurturing Program for Doctoral Dissertations at Central China Normal University(No.2022YBZZ028).
文摘Solving arithmetic word problems that entail deep implicit relations is still a challenging problem.However,significant progress has been made in solving Arithmetic Word Problems(AWP)over the past six decades.This paper proposes to discover deep implicit relations by qualia inference to solve Arithmetic Word Problems entailing Deep Implicit Relations(DIR-AWP),such as entailing commonsense or subject-domain knowledge involved in the problem-solving process.This paper proposes to take three steps to solve DIR-AWPs,in which the first three steps are used to conduct the qualia inference process.The first step uses the prepared set of qualia-quantity models to identify qualia scenes from the explicit relations extracted by the Syntax-Semantic(S2)method from the given problem.The second step adds missing entities and deep implicit relations in order using the identified qualia scenes and the qualia-quantity models,respectively.The third step distills the relations for solving the given problem by pruning the spare branches of the qualia dependency graph of all the acquired relations.The research contributes to the field by presenting a comprehensive approach combining explicit and implicit knowledge to enhance reasoning abilities.The experimental results on Math23K demonstrate hat the proposed algorithm is superior to the baseline algorithms in solving AWPs requiring deep implicit relations.
基金funded by the National Natural Science Foundation of China (No. 31772852)the Fundamental Research Funds for the Central Universities (Nos. 2015 62030, 201612004)the Public Science and Technology Research Funds Projects of Ocean (No. 201305030)
文摘Co-occurrence pattern of fish species plays an important role in understanding the spatio-temporal structure and the stability of fish community.Species coexistence may vary with time and space.The co-occurrence patterns of fish species were examined using the C-score under fixed-fixed null model for fish communities in spring and autumn over different years in the Haizhou Bay,China.The results showed that fish assemblages in the whole bay had non-random patterns in spring and autumn over different years.However,the fish co-occurrence patterns were different for the northern and southern fish assemblages in spring and autumn.The northern fish assemblage showed structured pattern,whereas the southern assemblage were randomly assembled in spring.The co-occurrence patterns of fish communities were relatively stable over different years,and the number of significant species pairs in northern assemblage was more than that in the southern assemblage.Environmental heterogeneity played an important role in determining the distributions of fish species that formed significant species pairs,which might affect the co-occurrence patterns of northern and southern assemblages further in the Haizhou Bay.
基金supported by the National Key Research and Development Program of China(2017YFC0503802)China Postdoctoral Science Foundation(2017M 620905)
文摘Background:Disentangling the relative importance of environmental variables and interspecific interaction in modulating co-occurrence patterns of sympatric species is essential for understanding the mechanisms of community assembly and biodiversity. For the two sympatric Galliformes, Silver Pheasants (Lophura nycthemera) and Whitenecklaced Partridges (Arborophila gingica), we know little about the role of habitat use and interspecific interactions in modulating their coexistence. Methods:We adopted a probabilistic approach incorporating habitat preference and interspecific interaction using occupancy model to account for imperfect detection,and used daily activity pattern analysis to investigate the cooccurrence pattern of these two sympatric Galliformes in wet and dry seasons. Results: We found that the detection probability of Silver Pheasant and White-necklaced Partridge were related to habitat variables and interspecific interaction. The presence of Silver Pheasant increases the detection probability of White-necklaced Partridge in both the wet and dry season. However, the presence of White-necklaced Partridges increases the detection probability of Silver Pheasants in the wet season, but decreases the probability in the dry season. Further, Silver Pheasants were detected frequently in the sites of high values of enhanced vegetable index (EVI) in both the wet and dry season, and in sites away from human residential settlement in the wet season. Whitenecklaced partridges were mainly detected in low EVI sites. The site use probabilities of two Galliformes were best explained by habitat variables, Silver Pheasants and White-necklaced Partridges preferred steeper areas during the wet and dry season. Both species mainly occurred in low EVI areas during the wet season and occupied sites away from the resident settlement during the dry season. Moreover, the site use probabilities of two species had opposite relationships with forest canopy coverage. Silver Pheasants preferred areas with high forest canopy coverage whereas White-necklaced Partridges preferred low forest canopy coverage in the dry season, and vice versa in the wet season. Species interaction factor (SIF)corroborated weak evidence of the dependence of the site use of one species on that of the other in the either dry or wet season.Temporally, high overlapping of daily activity pattern indicated no significantly temporal niche differentiation between sympatric Galliformes in both wet and dry seasons. Conclusions:Our results demonstrated that the presence of two species influenced the detection probability interactively and there was no temporal partitioning in activity time between Silver Pheasants and White-necklaced Partridges in the wet and dry seasons.The site use probability of two Galliformes was best explained by habitat variables, especially the forest canopy coverage.Therefore, environmental variables and interspecific interaction are the leading drivers regulating the detection and site use probability and promoting co-occurrence of Silver Pheasants and White-necklaced Partridges.
基金Project(60763001) supported by the National Natural Science Foundation of ChinaProject(2010GZS0072) supported by the Natural Science Foundation of Jiangxi Province,ChinaProject(GJJ12271) supported by the Science and Technology Foundation of Provincial Education Department of Jiangxi Province,China
文摘Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.
基金supported by the National Nature Science Foundation of China under Grants No.60863011,No.61175068,No.61100205,No.60873001the Fundamental Research Funds for the Central Universities under Grant No.2009RC0212+1 种基金the National Innovation Fund for Technology-based Firms under Grant No.11C26215305905the Open Fund of Software Engineering Key Laboratory of Yunnan Province under Grant No.2011SE14
文摘We propose a method that can achieve the Naxi-English bilingual word automatic alignment based on a log-linear model.This method defines the different Naxi-English structural feature functions,which are English-Naxi interval switching function and Naxi-English bilingual word position transformation function.With the manually labeled Naxi-English words alignment corpus,the parameters of the model are trained by using the minimum error,thus Naxi-English bilingual word alignment is achieved automatically.Experiments are conducted with IBM Model 3 as a benchmark,and the Naxi language constraints are introduced.The final experiment results show that the proposed alignment method achieves very good results:the introduction of the language characteristic function can effectively improve the accuracy of the Naxi-English Bilingual Word Alignment.
文摘One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.
文摘Negative worded(NW)items used in psychological instruments have been studied with the bifactor model to investigate whether the NW items form a secondary factor due to negative wording orthogonal to the measured latent construct,a validation procedure which checks whether NW items form a source of construct irrelevant variance(CIV)and hence constitute a validity threat.In the context of educational testing,however,no such validation attempts have been made.In this study,we studied the psychometric impact of NW items in an English proficiency reading comprehension test using a modeling approach similar to the bifactor model,namely the three-parameter logistic cross-classified testlet response theory(3PL CCTRT)model,to account for both guessing and possible local item dependence due to passage effect in the data set.The findings indicate that modeling the NW items with a separate factor leads to noticeable improvement in model fit,and the factor variance is marginal but nonzero.However,item and ability parameter estimates are highly similar between the 3PL CCTRT model and other models that do not model the NW items.It is concluded that the NW items introduce CIV into the data,but its magnitude is too small to change item and person ability parameter estimates to an extent of practical significance.
基金supported by National Social Science Fund of China(Youth Program):“A Study of Acceptability of Chinese Government Public Signs in the New Era and the Countermeasures of the English Translation”(No.:13CYY010)the Subject Construction and Management Project of Zhejiang Gongshang University:“Research on the Organic Integration Path of Constructing Ideological and Political Training and Design of Mixed Teaching Platform during Epidemic Period”(No.:XKJS2020007)Ministry of Education IndustryUniversity Cooperative Education Program:“Research on the Construction of Cross-border Logistics Marketing Bilingual Course Integration”(NO.:202102494002).
文摘Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based on the bilingual parallel corpus often ignore the document background in the process of retelling acquisition and application.In order to solve this problem,we introduce topic model information into the translation mode and propose a topic-based statistical machine translation method to improve the translation performance.In this method,Probabilistic Latent Semantic Analysis(PLSA)is used to obtains the co-occurrence relationship between words and documents by the hybrid matrix decomposition.Then we design a decoder to simplify the decoding process.Experiments show that the proposed method can effectively improve the accuracy of translation.
基金supported by Fujian Provincial Science and Technology Major Project(No.2020HZ02014)by the grants from National Natural Science Foundation of Fujian(2021J01133,2021J011404)by the Quanzhou Scientific and Technological Planning Projects(Nos.2018C113R,2019C028R,2019C029R,2019C076R and 2019C099R).
文摘Congenital heart defect,accounting for about 30%of congenital defects,is the most common one.Data shows that congenital heart defects have seriously affected the birth rate of healthy newborns.In Fetal andNeonatal Cardiology,medical imaging technology(2D ultrasonic,MRI)has been proved to be helpful to detect congenital defects of the fetal heart and assists sonographers in prenatal diagnosis.It is a highly complex task to recognize 2D fetal heart ultrasonic standard plane(FHUSP)manually.Compared withmanual identification,automatic identification through artificial intelligence can save a lot of time,ensure the efficiency of diagnosis,and improve the accuracy of diagnosis.In this study,a feature extraction method based on texture features(Local Binary Pattern LBP and Histogram of Oriented Gradient HOG)and combined with Bag of Words(BOW)model is carried out,and then feature fusion is performed.Finally,it adopts Support VectorMachine(SVM)to realize automatic recognition and classification of FHUSP.The data includes 788 standard plane data sets and 448 normal and abnormal plane data sets.Compared with some other methods and the single method model,the classification accuracy of our model has been obviously improved,with the highest accuracy reaching 87.35%.Similarly,we also verify the performance of the model in normal and abnormal planes,and the average accuracy in classifying abnormal and normal planes is 84.92%.The experimental results show that thismethod can effectively classify and predict different FHUSP and can provide certain assistance for sonographers to diagnose fetal congenital heart disease.
文摘针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectional encoder representation from transformers)预训练语言模型进行文本向量化表示;通过双向长短时记忆网络(Bidirectional long short-term memory network,BiLSTM)获取上下文语义特征;由条件随机场(Conditional random field,CRF)输出全局最优标签序列。基于此,在CRF层后加入畜禽疫病领域词典进行分词匹配修正,减少在分词过程中出现的疫病名称及短语等造成的歧义切分,进一步提高了分词准确率。实验结果表明,结合词典匹配的BERT-BiLSTM-CRF模型在羊常见疫病文本数据集上的F1值为96.38%,与jieba分词器、BiLSTM-Softmax模型、BiLSTM-CRF模型、未结合词典匹配的本文模型相比,分别提升11.01、10.62、8.3、0.72个百分点,验证了方法的有效性。与单一语料相比,通用语料PKU和羊常见疫病文本数据集结合的混合语料,能够同时对畜禽疫病专业术语及疫病文本中常用词进行准确切分,在通用语料及疫病文本数据集上F1值都达到95%以上,具有较好的模型泛化能力。该方法可用于畜禽疫病文本分词。