摘要
文档中的关键实体可以抽象概括文本所描述的事件(或话题)的主体,推动面向实体的检索和问答系统等方面的研究.然而,文档中的实体是无序的,对文本中的实体进行排序显得尤为重要.提取文本实体特征并借助维基百科和词汇分布表示引入外部特征,提出了一种基于前向分步算法(Forward Stagewise Algorithm,FSAM)的排序模型LA-FSAM(FSAM based on AUC Metric and Logistic Function).该模型利用曲线下面积(Area Under the Curve,AUC)准则构造损失函数,逻辑斯谛函数整合实体特征,最后使用随机梯度下降法求解模型参数.通过LA-FSAM与基线方法的实验对比证明了所提方法的有效性.
Key entities of a document can help to summarize the subjects of the events or the topics that the document describes,which can contribute to applications such as entity-oriented information retrieval and question-answering.However,entities in free text are unordered and hence it is important to rank entities of a document.In this paper,firstly,we make full use of features of entities that extracted from the document and draw support from Wikipedia and Word Embedding to generate external features.Then,we propose a novel ranking model named LA-FSAM(FSAM based on AUC Metric and Logistic Function)which is based on forward stagewise algorithm additive modeling.In LA-FSAM,we employ the AUC(Area Under the Curve)metric to construct the loss function and the logistic function to integrate features of entities.Finally,the stochastic gradient descent is utilized to optimize parameters of LA-FSAM model.After experiments,our evaluation shows the efficiency of the model we proposed.
作者
王燕华
WANG Yan-hua(School of Data Science and Engineering, East China Normal University,Shanghai 200062, China)
出处
《华东师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2018年第1期91-102,145,共13页
Journal of East China Normal University(Natural Science)
基金
上海市科技兴农推广项目(2015第3-2号)
关键词
实体排序
前向分步算法
曲线下面积
逻辑斯谛函数
随机梯度下降
entity ranking
forward stagewise additive modeling
area under the curve
logistic function
stochastic gradient descent