Glycation is a non-enzymatic post-translational modification which assigns sugar molecule and residues to a peptide.It is a clinically important attribute to numerous age-related,metabolic,and chronic diseases such as...Glycation is a non-enzymatic post-translational modification which assigns sugar molecule and residues to a peptide.It is a clinically important attribute to numerous age-related,metabolic,and chronic diseases such as diabetes,Alzheimer’s,renal failure,etc.Identification of a non-enzymatic reaction are quite challenging in research.Manual identification in labs is a very costly and timeconsuming process.In this research,we developed an accurate,valid,and a robust model named as Gly-LysPred to differentiate the glycated sites from non-glycated sites.Comprehensive techniques using position relative features are used for feature extraction.An algorithm named as a random forest with some preprocessing techniques and feature engineering techniques was developed to train a computational model.Various types of testing techniques such as self-consistency testing,jackknife testing,and cross-validation testing are used to evaluate the model.The overall model’s accuracy was accomplished through self-consistency,jackknife,and cross-validation testing 100%,99.92%,and 99.88%with MCC 1.00,0.99,and 0.997 respectively.In this regard,a user-friendly webserver is also urbanized to accumulate the whole procedure.These features vectorization methods suggest that they can play a critical role in other web servers which are developed to classify lysine glycation.展开更多
Concurrent engineering(CE)involves the consideration during the design phase of the various factors associated with the life cycle of the product.Using the principle of CE,a feature-based CAPP system is proposed.On th...Concurrent engineering(CE)involves the consideration during the design phase of the various factors associated with the life cycle of the product.Using the principle of CE,a feature-based CAPP system is proposed.On the basis of feature modeling,the system is able to reason feature relationships,produce feature digraph of a part,and decide the machining sequence of features.展开更多
Generally speaking, "an economic circle" refers to a group of countriesand regions whose economic relations override the universally accepted in-ternational practice or norms and they have formulated new eco...Generally speaking, "an economic circle" refers to a group of countriesand regions whose economic relations override the universally accepted in-ternational practice or norms and they have formulated new economic ruleswhich are applicable only to countries and regions inside the circle.展开更多
To investigate the nonlinear properties of wind waves, experiments are carried out in a wind-wave flume with slope bottom at different wind speeds and fetches. Both the internal structure and apparent features of the ...To investigate the nonlinear properties of wind waves, experiments are carried out in a wind-wave flume with slope bottom at different wind speeds and fetches. Both the internal structure and apparent features of the nonlin-earity of wind waves are studied by using bispectral and statistical analysis of surface elevations. The relations between bispectra and nonlinear apparent characteristics of wind waves are established and confirmed.展开更多
At first, the forming conditions and developing characteristics of several kinds of typical harmful features related to frozen ground are discussed in the paper, such as flooding ice, icing mound, frost mound, thick g...At first, the forming conditions and developing characteristics of several kinds of typical harmful features related to frozen ground are discussed in the paper, such as flooding ice, icing mound, frost mound, thick ground ice, thaw slumping, thermokarst lake and swampland. Secondly, the investigating results of new harmful permafrost features in winter along Qinghai-Tibet Railway are analysed and summarized. Lastly, some data and suggestions will be provided to designing and construction departments.展开更多
Relative radiometric normalization (RRN) minimizes radiometric differences among images caused by inconsistencies of acquisition conditions rather than changes in surface. Scale invariant feature transform (SIFT) has ...Relative radiometric normalization (RRN) minimizes radiometric differences among images caused by inconsistencies of acquisition conditions rather than changes in surface. Scale invariant feature transform (SIFT) has the ability to automatically extract control points (CPs) and is commonly used for remote sensing images. However, its results are mostly inaccurate and sometimes contain incorrect matching caused by generating a small number of false CP pairs. These CP pairs have high false alarm matching. This paper presents a modified method to improve the performance of SIFT CPs matching by applying sum of absolute difference (SAD) in a different manner for the new optical satellite generation called near-equatorial orbit satellite and multi-sensor images. The proposed method, which has a significantly high rate of correct matches, improves CP matching. The data in this study were obtained from the RazakSAT satellite a new near equatorial satellite system. The proposed method involves six steps: 1) data reduction, 2) applying the SIFT to automatically extract CPs, 3) refining CPs matching by using SAD algorithm with empirical threshold, and 4) calculation of true CPs intensity values over all image’ bands, 5) preforming a linear regression model between the intensity values of CPs locate in reverence and sensed image’ bands, 6) Relative radiometric normalization conducting using regression transformation functions. Different thresholds have experimentally tested and used in conducting this study (50 and 70), by followed the proposed method, and it removed the false extracted SIFT CPs to be from 775, 1125, 883, 804, 883 and 681 false pairs to 342, 424, 547, 706, 547, and 469 corrected and matched pairs, respectively.展开更多
In this paper we will discuss novel algorithms to develop the brain-computer interface (BCI) system in speller application based on single-trial classification of electroencephalogram (EEG) signal. The idea is to empl...In this paper we will discuss novel algorithms to develop the brain-computer interface (BCI) system in speller application based on single-trial classification of electroencephalogram (EEG) signal. The idea is to employ proper methods for reducing the number of channels and optimizing feature vectors. Removal unnecessary channels and reducing feature dimension result in cost decrement, time saving and improve the BCI implementation eventually. Optimal channels will be gotten after two stages sifting. In the first stage, the channels reduced up to 30% based on channels of the important event related potential (ERP) components and in the next stage, optimal channels were extracted by backward forward selection (BFS) algorithm. Also we will show that suitable single-trial analysis requires applying proper feature vector that was constructed by recognizing important ERP components, so as to propose an algorithm to distinguish less important features in feature vectors. F-Score criteria used to recognize effective features which created more discrimination between different classes and feature vectors were reconstructed based on effective features. Our algorithm has tested on dataset II of BCI competition III. The results show that we achieve accuracy up to 31% in single-trial, which is better than the performance of winner who is in this competition (about 25.5%). Also we use simple classifier and few channels to compute output performances while more complicated classifier and all channels are used by them.展开更多
In order to ensure that the large-scale application of photovoltaic power generation does not affect the stability of the grid, accurate photovoltaic (PV) power generation forecast is essential. A short-term PV power ...In order to ensure that the large-scale application of photovoltaic power generation does not affect the stability of the grid, accurate photovoltaic (PV) power generation forecast is essential. A short-term PV power generation forecast method using the combination of K-means++, grey relational analysis (GRA) and support vector regression (SVR) based on feature selection (Hybrid Kmeans-GRA-SVR, HKGSVR) was proposed. The historical power data were clustered through the multi-index K-means++ algorithm and divided into ideal and non-ideal weather. The GRA algorithm was used to match the similar day and the nearest neighbor similar day of the prediction day. And selected appropriate input features for different weather types to train the SVR model. Under ideal weather, the average values of MAE, RMSE and R2 were 0.8101, 0.9608 kW and 99.66%, respectively. And this method reduced the average training time by 77.27% compared with the standard SVR model. Under non-ideal weather conditions, the average values of MAE, RMSE and R2 were 1.8337, 2.1379 kW and 98.47%, respectively. And this method reduced the average training time of the standard SVR model by 98.07%. The experimental results show that the prediction accuracy of the proposed model is significantly improved compared to the other five models, which verify the effectiveness of the method.展开更多
English words in pairs are a special form of English idioms, which have different kinds and are used widely. For English learners, words in pairs are one of the difficult points. This paper discusses their form patter...English words in pairs are a special form of English idioms, which have different kinds and are used widely. For English learners, words in pairs are one of the difficult points. This paper discusses their form patterns, semantic relations, grammatical functions, rhetoric features and their application in translation. Its purpose is to help learners understand and use them accurately and correctly so as to improve language expressing ability.展开更多
Identifying drug–drug interactions(DDIs)is an important aspect of drug design research,and predicting DDIs serves as a crucial guarantee for avoiding potential adverse effects.Current substructure-based prediction me...Identifying drug–drug interactions(DDIs)is an important aspect of drug design research,and predicting DDIs serves as a crucial guarantee for avoiding potential adverse effects.Current substructure-based prediction methods still have some limitations:(i)The process of substructure extraction does not fully exploit the graph structure information of drugs,as it only evaluates the importance of different radius substructures from a single perspective.(ii)The process of constructing drug representations has overlooked the significant impact of relation embedding on optimizing drug representations.In this work,we propose a substructure-aware graph neural network incorporating relation features(RFSA-DDI)for DDI prediction,which introduces a directed message passing neural network with substructure attention mechanism based on graph self-adaptive pooling(GSP-DMPNN)and a substructure-aware interaction module incorporating relation features(RSAM).GSP-DMPNN utilizes graph self-adaptive pooling to comprehensively consider node features and local drug information for adaptive extraction of substructures.RSAM interacts drug features with relation representations to enhance their respective features individually,highlighting substructures that significantly impact predictions.RFSA-DDI is evaluated on two real-world datasets.Compared to existing methods,RFSA-DDI demonstrates certain advantages in both transductive and inductive settings,effectively handling the task of predicting DDIs for unseen drugs and exhibiting good generalization capability.The experimental results show that RFSA-DDI can effectively capture valuable structural information of drugs more accurately for DDI prediction,and provide more reliable assistance for potential DDIs detection in drug development and treatment stages.展开更多
Recently,many knowledge graph embedding models for knowledge graph completion have been proposed,ranging from the initial translation-based model such as TransE to recent CNN-based models such as ConvE.These models fi...Recently,many knowledge graph embedding models for knowledge graph completion have been proposed,ranging from the initial translation-based model such as TransE to recent CNN-based models such as ConvE.These models fill in the missing relations between entities by focusing on capturing the representation features to further complete the existing knowledge graph(KG).However,the above KG-based relation prediction research ignores the interaction information among entities in KG.To solve this problem,this work proposes a novel model called Gate Feature Interaction Network(GFINet)with a weighted loss function that takes the benefit of interaction information and deep expressive features together.Specifically,the proposed GFINet consists of a gate convolution block and an interaction attention module,corresponding to catching deep expressive features and interaction information based on these valid features respectively.Our method establishes state-of-the-art experimental results on the standard datasets for knowledge graph completion.In addition,we make ablation experiments to verify the effectiveness of the gate convolution block and the interaction attention module.展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
Inland freshwater lake wetlands play an important role in regional ecological balance. Hongze Lake is the fourth biggest freshwater lake in China. In the past three decades, there has been significant loss of freshwat...Inland freshwater lake wetlands play an important role in regional ecological balance. Hongze Lake is the fourth biggest freshwater lake in China. In the past three decades, there has been significant loss of freshwater wet- lands within the lake and at the mouths of neighboring rivers, due to disturbance, primarily from human activities. The main purpose of this paper was to explore a practical technology for differentiating wetlands effectively from upland types in close proximity to them. In the paper, an integrated method, which combined per-pixel and per-field classifi- cation, was used for mapping wetlands of Hongze Lake and their neighboring upland types. Firstly, Landsat ETM+ imagery was segmented and classified by using spectral and textural features. Secondly, ETM+ spectral bands, textural features derived from ETM+ Pan imagery, relative relations between neighboring classes, shape fea^xes, and elevation were used in a decision tree classification. Thirdly, per-pixel classification results from the decision tree classifier were improved by using classification results from object-oriented classification as a context. The results show that the technology has not only overcome the salt-and-pepper effect commonly observed in the past studies, but also has im- proved the accuracy of identification by nearly 5%.展开更多
中文电子病历实体关系抽取是构建医疗知识图谱,服务下游子任务的重要基础。目前,中文电子病例进行实体关系抽取仍存在因医疗文本关系复杂、实体密度大而造成医疗名词识别不准确的问题。针对这一问题,提出了基于对抗学习与多特征融合的...中文电子病历实体关系抽取是构建医疗知识图谱,服务下游子任务的重要基础。目前,中文电子病例进行实体关系抽取仍存在因医疗文本关系复杂、实体密度大而造成医疗名词识别不准确的问题。针对这一问题,提出了基于对抗学习与多特征融合的中文电子病历实体关系联合抽取模型AMFRel(adversarial learning and multi-feature fusion for relation triple extraction),提取电子病历的文本和词性特征,得到融合词性信息的编码向量;利用编码向量联合对抗训练产生的扰动生成对抗样本,抽取句子主语;利用信息融合模块丰富文本结构特征,并根据特定的关系信息抽取出相应的宾语,得到医疗文本的三元组。采用CHIP2020关系抽取数据集和糖尿病数据集进行实验验证,结果显示:AMFRel在CHIP2020关系抽取数据集上的Precision为63.922%,Recall为57.279%,F1值为60.418%;在糖尿病数据集上的Precision、Recall和F1值分别为83.914%,67.021%和74.522%,证明了该模型的三元组抽取性能优于其他基线模型。展开更多
针对词向量语义信息不完整以及文本特征抽取时的一词多义问题,提出基于BERT(Bidirectional Encoder Representation from Transformer)的两次注意力加权算法(TARE)。首先,在词向量编码阶段,通过构建Q、K、V矩阵使用自注意力机制动态编...针对词向量语义信息不完整以及文本特征抽取时的一词多义问题,提出基于BERT(Bidirectional Encoder Representation from Transformer)的两次注意力加权算法(TARE)。首先,在词向量编码阶段,通过构建Q、K、V矩阵使用自注意力机制动态编码算法,为当前词的词向量捕获文本前后词语义信息;其次,在模型输出句子级特征向量后,利用定位信息符提取全连接层对应参数,构建关系注意力矩阵;最后,运用句子级注意力机制算法为每个句子级特征向量添加不同的注意力分数,提高句子级特征的抗噪能力。实验结果表明:在NYT-10m数据集上,与基于对比学习框架的CIL(Contrastive Instance Learning)算法相比,TARE的F1值提升了4.0个百分点,按置信度降序排列后前100、200和300条数据精准率Precision@N的平均值(P@M)提升了11.3个百分点;在NYT-10d数据集上,与基于注意力机制的PCNN-ATT(Piecewise Convolutional Neural Network algorithm based on ATTention mechanism)算法相比,精准率与召回率曲线下的面积(AUC)提升了4.8个百分点,P@M值提升了2.1个百分点。在主流的远程监督关系抽取(DSER)任务中,TARE有效地提升了模型对数据特征的学习能力。展开更多
基金the Research Management Center,Xiamen University Malaysia under XMUM Research Program Cycle 4(Grant No.XMUMRF/2019-C4/IECE/0012).
文摘Glycation is a non-enzymatic post-translational modification which assigns sugar molecule and residues to a peptide.It is a clinically important attribute to numerous age-related,metabolic,and chronic diseases such as diabetes,Alzheimer’s,renal failure,etc.Identification of a non-enzymatic reaction are quite challenging in research.Manual identification in labs is a very costly and timeconsuming process.In this research,we developed an accurate,valid,and a robust model named as Gly-LysPred to differentiate the glycated sites from non-glycated sites.Comprehensive techniques using position relative features are used for feature extraction.An algorithm named as a random forest with some preprocessing techniques and feature engineering techniques was developed to train a computational model.Various types of testing techniques such as self-consistency testing,jackknife testing,and cross-validation testing are used to evaluate the model.The overall model’s accuracy was accomplished through self-consistency,jackknife,and cross-validation testing 100%,99.92%,and 99.88%with MCC 1.00,0.99,and 0.997 respectively.In this regard,a user-friendly webserver is also urbanized to accumulate the whole procedure.These features vectorization methods suggest that they can play a critical role in other web servers which are developed to classify lysine glycation.
文摘Concurrent engineering(CE)involves the consideration during the design phase of the various factors associated with the life cycle of the product.Using the principle of CE,a feature-based CAPP system is proposed.On the basis of feature modeling,the system is able to reason feature relationships,produce feature digraph of a part,and decide the machining sequence of features.
文摘Generally speaking, "an economic circle" refers to a group of countriesand regions whose economic relations override the universally accepted in-ternational practice or norms and they have formulated new economic ruleswhich are applicable only to countries and regions inside the circle.
基金This study was supported in part by the National Natural Science Fundation of China
文摘To investigate the nonlinear properties of wind waves, experiments are carried out in a wind-wave flume with slope bottom at different wind speeds and fetches. Both the internal structure and apparent features of the nonlin-earity of wind waves are studied by using bispectral and statistical analysis of surface elevations. The relations between bispectra and nonlinear apparent characteristics of wind waves are established and confirmed.
文摘At first, the forming conditions and developing characteristics of several kinds of typical harmful features related to frozen ground are discussed in the paper, such as flooding ice, icing mound, frost mound, thick ground ice, thaw slumping, thermokarst lake and swampland. Secondly, the investigating results of new harmful permafrost features in winter along Qinghai-Tibet Railway are analysed and summarized. Lastly, some data and suggestions will be provided to designing and construction departments.
文摘Relative radiometric normalization (RRN) minimizes radiometric differences among images caused by inconsistencies of acquisition conditions rather than changes in surface. Scale invariant feature transform (SIFT) has the ability to automatically extract control points (CPs) and is commonly used for remote sensing images. However, its results are mostly inaccurate and sometimes contain incorrect matching caused by generating a small number of false CP pairs. These CP pairs have high false alarm matching. This paper presents a modified method to improve the performance of SIFT CPs matching by applying sum of absolute difference (SAD) in a different manner for the new optical satellite generation called near-equatorial orbit satellite and multi-sensor images. The proposed method, which has a significantly high rate of correct matches, improves CP matching. The data in this study were obtained from the RazakSAT satellite a new near equatorial satellite system. The proposed method involves six steps: 1) data reduction, 2) applying the SIFT to automatically extract CPs, 3) refining CPs matching by using SAD algorithm with empirical threshold, and 4) calculation of true CPs intensity values over all image’ bands, 5) preforming a linear regression model between the intensity values of CPs locate in reverence and sensed image’ bands, 6) Relative radiometric normalization conducting using regression transformation functions. Different thresholds have experimentally tested and used in conducting this study (50 and 70), by followed the proposed method, and it removed the false extracted SIFT CPs to be from 775, 1125, 883, 804, 883 and 681 false pairs to 342, 424, 547, 706, 547, and 469 corrected and matched pairs, respectively.
文摘In this paper we will discuss novel algorithms to develop the brain-computer interface (BCI) system in speller application based on single-trial classification of electroencephalogram (EEG) signal. The idea is to employ proper methods for reducing the number of channels and optimizing feature vectors. Removal unnecessary channels and reducing feature dimension result in cost decrement, time saving and improve the BCI implementation eventually. Optimal channels will be gotten after two stages sifting. In the first stage, the channels reduced up to 30% based on channels of the important event related potential (ERP) components and in the next stage, optimal channels were extracted by backward forward selection (BFS) algorithm. Also we will show that suitable single-trial analysis requires applying proper feature vector that was constructed by recognizing important ERP components, so as to propose an algorithm to distinguish less important features in feature vectors. F-Score criteria used to recognize effective features which created more discrimination between different classes and feature vectors were reconstructed based on effective features. Our algorithm has tested on dataset II of BCI competition III. The results show that we achieve accuracy up to 31% in single-trial, which is better than the performance of winner who is in this competition (about 25.5%). Also we use simple classifier and few channels to compute output performances while more complicated classifier and all channels are used by them.
文摘In order to ensure that the large-scale application of photovoltaic power generation does not affect the stability of the grid, accurate photovoltaic (PV) power generation forecast is essential. A short-term PV power generation forecast method using the combination of K-means++, grey relational analysis (GRA) and support vector regression (SVR) based on feature selection (Hybrid Kmeans-GRA-SVR, HKGSVR) was proposed. The historical power data were clustered through the multi-index K-means++ algorithm and divided into ideal and non-ideal weather. The GRA algorithm was used to match the similar day and the nearest neighbor similar day of the prediction day. And selected appropriate input features for different weather types to train the SVR model. Under ideal weather, the average values of MAE, RMSE and R2 were 0.8101, 0.9608 kW and 99.66%, respectively. And this method reduced the average training time by 77.27% compared with the standard SVR model. Under non-ideal weather conditions, the average values of MAE, RMSE and R2 were 1.8337, 2.1379 kW and 98.47%, respectively. And this method reduced the average training time of the standard SVR model by 98.07%. The experimental results show that the prediction accuracy of the proposed model is significantly improved compared to the other five models, which verify the effectiveness of the method.
文摘English words in pairs are a special form of English idioms, which have different kinds and are used widely. For English learners, words in pairs are one of the difficult points. This paper discusses their form patterns, semantic relations, grammatical functions, rhetoric features and their application in translation. Its purpose is to help learners understand and use them accurately and correctly so as to improve language expressing ability.
基金Natural Science Foundation of Shandong Province,Grant/Award Number:ZR2023MF05。
文摘Identifying drug–drug interactions(DDIs)is an important aspect of drug design research,and predicting DDIs serves as a crucial guarantee for avoiding potential adverse effects.Current substructure-based prediction methods still have some limitations:(i)The process of substructure extraction does not fully exploit the graph structure information of drugs,as it only evaluates the importance of different radius substructures from a single perspective.(ii)The process of constructing drug representations has overlooked the significant impact of relation embedding on optimizing drug representations.In this work,we propose a substructure-aware graph neural network incorporating relation features(RFSA-DDI)for DDI prediction,which introduces a directed message passing neural network with substructure attention mechanism based on graph self-adaptive pooling(GSP-DMPNN)and a substructure-aware interaction module incorporating relation features(RSAM).GSP-DMPNN utilizes graph self-adaptive pooling to comprehensively consider node features and local drug information for adaptive extraction of substructures.RSAM interacts drug features with relation representations to enhance their respective features individually,highlighting substructures that significantly impact predictions.RFSA-DDI is evaluated on two real-world datasets.Compared to existing methods,RFSA-DDI demonstrates certain advantages in both transductive and inductive settings,effectively handling the task of predicting DDIs for unseen drugs and exhibiting good generalization capability.The experimental results show that RFSA-DDI can effectively capture valuable structural information of drugs more accurately for DDI prediction,and provide more reliable assistance for potential DDIs detection in drug development and treatment stages.
基金supported in part by the Science and Technology Innovation 2030-"New Generation of Artificial Intelligence"Major Project under Grant No.2021ZD0111000the Henan Province Science and Technology Research Project(232102311232).
文摘Recently,many knowledge graph embedding models for knowledge graph completion have been proposed,ranging from the initial translation-based model such as TransE to recent CNN-based models such as ConvE.These models fill in the missing relations between entities by focusing on capturing the representation features to further complete the existing knowledge graph(KG).However,the above KG-based relation prediction research ignores the interaction information among entities in KG.To solve this problem,this work proposes a novel model called Gate Feature Interaction Network(GFINet)with a weighted loss function that takes the benefit of interaction information and deep expressive features together.Specifically,the proposed GFINet consists of a gate convolution block and an interaction attention module,corresponding to catching deep expressive features and interaction information based on these valid features respectively.Our method establishes state-of-the-art experimental results on the standard datasets for knowledge graph completion.In addition,we make ablation experiments to verify the effectiveness of the gate convolution block and the interaction attention module.
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
基金Under the auspices of Natural Science Foundation of Jiangsu Province (No. BK2008360)Foundamental Research Funds for the Central Universities (No. 2009B12714,2009B11714)
文摘Inland freshwater lake wetlands play an important role in regional ecological balance. Hongze Lake is the fourth biggest freshwater lake in China. In the past three decades, there has been significant loss of freshwater wet- lands within the lake and at the mouths of neighboring rivers, due to disturbance, primarily from human activities. The main purpose of this paper was to explore a practical technology for differentiating wetlands effectively from upland types in close proximity to them. In the paper, an integrated method, which combined per-pixel and per-field classifi- cation, was used for mapping wetlands of Hongze Lake and their neighboring upland types. Firstly, Landsat ETM+ imagery was segmented and classified by using spectral and textural features. Secondly, ETM+ spectral bands, textural features derived from ETM+ Pan imagery, relative relations between neighboring classes, shape fea^xes, and elevation were used in a decision tree classification. Thirdly, per-pixel classification results from the decision tree classifier were improved by using classification results from object-oriented classification as a context. The results show that the technology has not only overcome the salt-and-pepper effect commonly observed in the past studies, but also has im- proved the accuracy of identification by nearly 5%.
文摘中文电子病历实体关系抽取是构建医疗知识图谱,服务下游子任务的重要基础。目前,中文电子病例进行实体关系抽取仍存在因医疗文本关系复杂、实体密度大而造成医疗名词识别不准确的问题。针对这一问题,提出了基于对抗学习与多特征融合的中文电子病历实体关系联合抽取模型AMFRel(adversarial learning and multi-feature fusion for relation triple extraction),提取电子病历的文本和词性特征,得到融合词性信息的编码向量;利用编码向量联合对抗训练产生的扰动生成对抗样本,抽取句子主语;利用信息融合模块丰富文本结构特征,并根据特定的关系信息抽取出相应的宾语,得到医疗文本的三元组。采用CHIP2020关系抽取数据集和糖尿病数据集进行实验验证,结果显示:AMFRel在CHIP2020关系抽取数据集上的Precision为63.922%,Recall为57.279%,F1值为60.418%;在糖尿病数据集上的Precision、Recall和F1值分别为83.914%,67.021%和74.522%,证明了该模型的三元组抽取性能优于其他基线模型。
文摘针对词向量语义信息不完整以及文本特征抽取时的一词多义问题,提出基于BERT(Bidirectional Encoder Representation from Transformer)的两次注意力加权算法(TARE)。首先,在词向量编码阶段,通过构建Q、K、V矩阵使用自注意力机制动态编码算法,为当前词的词向量捕获文本前后词语义信息;其次,在模型输出句子级特征向量后,利用定位信息符提取全连接层对应参数,构建关系注意力矩阵;最后,运用句子级注意力机制算法为每个句子级特征向量添加不同的注意力分数,提高句子级特征的抗噪能力。实验结果表明:在NYT-10m数据集上,与基于对比学习框架的CIL(Contrastive Instance Learning)算法相比,TARE的F1值提升了4.0个百分点,按置信度降序排列后前100、200和300条数据精准率Precision@N的平均值(P@M)提升了11.3个百分点;在NYT-10d数据集上,与基于注意力机制的PCNN-ATT(Piecewise Convolutional Neural Network algorithm based on ATTention mechanism)算法相比,精准率与召回率曲线下的面积(AUC)提升了4.8个百分点,P@M值提升了2.1个百分点。在主流的远程监督关系抽取(DSER)任务中,TARE有效地提升了模型对数据特征的学习能力。