For an extract description of threads information in question and answer (QnA) web forums, it is proposed to construct a QnA knowledge presentation model in the English language, and then an entire solution for the ...For an extract description of threads information in question and answer (QnA) web forums, it is proposed to construct a QnA knowledge presentation model in the English language, and then an entire solution for the QnA knowledge system is presented, including data gathering, platform building and applications design. With pre-defined dictionary and grammatical analysis, the model draws semantic information, grammatical information and knowledge confidence into IR methods, in the form of statement sets and term sets with semantic links. Theoretical analysis shows that the statement model can provide an exact presentation for QnA knowledge, breaking through any limits from original QnA patterns and being adaptable to various query demands; the semantic links between terms can assist the statement model, in terms of deducing new from existing knowledge. The model makes use of both information retrieval (IR) and natural language processing (NLP) features, strengthening the knowledge presentation ability. Many knowledge-based applications built upon this model can be improved, providing better performance.展开更多
With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of train...With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of training data using large pre-trained language models,which is a hardware threshold to accomplish this task.Some researchers have achieved competitive results with less training data through ingenious methods,such as utilizing information provided by the named entity recognition model.This paper presents a novel semantic-enhancement-based entity linking approach,named semantically enhanced hardware-friendly entity linking(SHEL),which is designed to be hardware friendly and efficient while maintaining good performance.Specifically,SHEL's semantic enhancement approach consists of three aspects:(1)semantic compression of entity descriptions using a text summarization model;(2)maximizing the capture of mention contexts using asymmetric heuristics;(3)calculating a fixed size mention representation through pooling operations.These series of semantic enhancement methods effectively improve the model's ability to capture semantic information while taking into account the hardware constraints,and significantly improve the model's convergence speed by more than 50%compared with the strong baseline model proposed in this paper.In terms of performance,SHEL is comparable to the previous method,with superior performance on six well-established datasets,even though SHEL is trained using a smaller pre-trained language model as the encoder.展开更多
基金Microsoft Research Asia Internet Services in Aca-demic Research Fund (NoFY07-RES-OPP-116)Tianjin Technological Development Program Project (No06YFGZGX05900)
文摘For an extract description of threads information in question and answer (QnA) web forums, it is proposed to construct a QnA knowledge presentation model in the English language, and then an entire solution for the QnA knowledge system is presented, including data gathering, platform building and applications design. With pre-defined dictionary and grammatical analysis, the model draws semantic information, grammatical information and knowledge confidence into IR methods, in the form of statement sets and term sets with semantic links. Theoretical analysis shows that the statement model can provide an exact presentation for QnA knowledge, breaking through any limits from original QnA patterns and being adaptable to various query demands; the semantic links between terms can assist the statement model, in terms of deducing new from existing knowledge. The model makes use of both information retrieval (IR) and natural language processing (NLP) features, strengthening the knowledge presentation ability. Many knowledge-based applications built upon this model can be improved, providing better performance.
基金the Beijing Municipal Science and Technology Program(Z231100001323004)。
文摘With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of training data using large pre-trained language models,which is a hardware threshold to accomplish this task.Some researchers have achieved competitive results with less training data through ingenious methods,such as utilizing information provided by the named entity recognition model.This paper presents a novel semantic-enhancement-based entity linking approach,named semantically enhanced hardware-friendly entity linking(SHEL),which is designed to be hardware friendly and efficient while maintaining good performance.Specifically,SHEL's semantic enhancement approach consists of three aspects:(1)semantic compression of entity descriptions using a text summarization model;(2)maximizing the capture of mention contexts using asymmetric heuristics;(3)calculating a fixed size mention representation through pooling operations.These series of semantic enhancement methods effectively improve the model's ability to capture semantic information while taking into account the hardware constraints,and significantly improve the model's convergence speed by more than 50%compared with the strong baseline model proposed in this paper.In terms of performance,SHEL is comparable to the previous method,with superior performance on six well-established datasets,even though SHEL is trained using a smaller pre-trained language model as the encoder.
文摘在前期研究的基础之上,本文提出了一种基于知识元迁移的ESI(essential science indicators)研究前沿知识演进分析方法,通过对研究前沿中的知识元迁移现象,进行定量分析和迁移程度计算,从语义分析和知识计算的角度,进一步探索研究前沿的演进机理。借助命名实体识别、词袋模型、PLDA(parallellatent Dirichlet allocation)主题模型、信息熵算法等文本语义挖掘和自然语言处理技术,通过设计贡献度指数CVI(contribution value index)和迁移度指数MVI(migration value index)两种计量指标,探究知识元的迁移规律。研究结果表明,以前沿主题中的个体知识元作为分析对象,可以从最为直接、最为细粒度的视角,对研究前沿随时间变化时内在知识结构特征的变迁规律进行挖掘,揭示领域知识要素在不同时期的演化状态,能够更为深入地回答研究前沿的追踪发展变迁问题,为面向学科前沿的科技情报工作提供方法论参考。