期刊文献+

科技文献的实验语料句抽取方法 被引量:2

Extracting experiments corpus sentence in scientific literature
下载PDF
导出
摘要 为方便研究自然语言处理的学者选择更有效的实验语料,进行自然语言处理类科技文献的实验语料抽取研究。实验语料是指自然语言处理类文献在实验过程中使用的文本类数据,如训练数据、测试数据等。将文本划分为实验语料句和非实验语料句两类,统计实验语料句的词汇特征和位置特征,构建相应的特征库,用朴素贝叶斯模型对特征进行训练。在词性标注和分词的基础上,结合机器学习生成的模型判定是否为实验语料句,进行抽取。以自然语言类科技文献作为数据来源,在该领域随机选取了200篇科技文献进行抽取实验,对比人工判别方法和所提方法的抽取结果,验证了所提方法能够较为准确地获取实验语料信息。 To facilitate the research of the natural language processing,a research of the experimental corpus extraction in the literature of natural language processing was carried out.Experimental corpus referred to the text data that were used during the testing process in the literature of natural language processing,such as training data,test data,etc.The text was divided into two categories,namely experimental corpus sentences and non-experimental corpus sentences.The lexical features and location features of the experimental corpus sentences were extracted,and the corresponding feature library was constructed.Based on these features and training data,a Nave Bayesian classification model was built to determine whether the sentence is the experimental corpus sentence.Using natural language literature as a data source,200 scientific and technological documents were selected randomly in the natural language processing field,and exclusive experiments were conducted.Results show that the proposed method can obtain more accurate information of experiment corpus.
作者 朱丽萍 刘蔷 苏斐 杨中国 王显灿 ZHU Li-ping LIU Qiang SU Fei YANG Zhong-guo WANG Xian-can(Beijing Key Lab of Petroleum Data Mining, China University of Petroleum (Beijing), Beijing 102249, China College of Geophysics and Information Engineering, China University of Petroleum (Beijing), Beijing 102249, China China Petroleum Information Service Technology Center, Beijing 100000, China)
出处 《计算机工程与设计》 北大核心 2016年第11期3086-3091,共6页 Computer Engineering and Design
关键词 信息抽取 科技文献 特征提取 机器学习 朴素贝叶斯模型 information extraction scientific and technical literature feature extraction machine learning Nave Bayes model
  • 相关文献

参考文献9

二级参考文献125

共引文献114

同被引文献26

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部