摘要
针对目前大数据知识获取存在的噪声大的问题,提出了文本知识点深度挖掘方法。首先构建了学术论文创造性特征的"问题,方法,结果"三元组本体模型;其次利用模式识别等技术对学术论文文摘进行统计分析、特征提取、机器学习、模式判定分析;最后对学术论文创造性核心知识的三元组进行深度挖掘。实验结果表明,该方法能大大过滤掉学术文献大数据检索的噪声,便于用户快速定位大型学术文献数据库论文的研究问题,采用的新方法和得到的结果能判断学术论文的阅读价值,并为大数据深度知识挖掘和关联发现研究提供基础。该类方法未见有公开的文献报道,属于一种探索性研究和实验。
A new method of text mining was presented to make up for the disadvantages of big data knowledge acquisi tion.Firstly,we constructed triple ontology model about academic inventive features "problem,methods,results".Secondly,pattern recognition techniques were used for statistical analysis,feature extraction,machine learning and pattern determination analysis.Finally,depth mining of triples of the creative core academic knowledge was realized.The Experimental results show that the new method can effectively reduce the retrieval noise of academic literature,which is convenient for users to quickly locate the research problem.The methods and results can determine the reading value of papers and provide a basis for depth knowledge mining of large data and related discovery.The method has not been reported in the literature,and it is a kind of exploratory research and experimentation.
出处
《计算机科学》
CSCD
北大核心
2016年第3期279-284,共6页
Computer Science
基金
国家自然科学基金项目(70373946)资助
关键词
模式识别
文本挖掘
语义三元组
直接知识获取
Pattern recognition
Text mining
Semantic triples
Direct knowledge acquisition