摘要
机器学习是人工智能的重要分支,TensorFlow是谷歌第二代开源人工智能机器学习平台。此文重点介绍机器学习(主要是深度神经网络)的基本原理和利用TensorFlow进行机器学习的基本方法,探讨在图书馆领域应用的可能和场景。以《全国报刊索引》的自动分类问题作为实验对象,利用两台图形工作站,建立了TensorFlow深度学习模型,通过设定参数和阈值、系统调优等工作,实践了应用TensorFlow的完整过程,论证了其可行性。实验通过对170万余条题录数据进行训练和测试,克服了报刊索引数据过于简单与中国图书馆分类法的类目过于细致之间的矛盾,实现了大类近80%和四级分类总体近70%的准确率(其中TP类达到91%),得出基本可代替人工分类流程的结论,为全国报刊索引的分类流程的半自动化提供有力工具,从而可望大大节省人力成本。下一步将继续利用TensorFlow的优化功能,结合更多的字段属性,进行系统调优,力争做到自动分类90%以上的准确率。
Machine learning (ML) is a particular approach to artificial intelligence. TensorFlow is the second generation machine learning framework of Google. This paper focuses on the basic principles of ma- chine learning and the basic methods of machine learning by using TensorFlow. Its purpose is to explore the possibilities and scenarios of machine learning applications in library. A TensorFlow ML model is es- tablished and with the index data from National Index of Newspapers and Magazines, a complete process of automatic classification of records had been accomplished and proved feasible. Through the training process and testing of more than 170 million data records, the experiment has overcome the contradiction between the less comprehension of the index data and the trivial category labels, and reached nearly 80 ~ of the cate- gories and nearly 70% of the accuracy rate. It can be concluded that the approach is capable of taking into practice, at least to carry on a semi-automatic processing of classification, which is expected to significantly save labor costs. The next step will be optimizing the parameters and system tuning. We hope it can strive to achieve an accuracy of 90 % by automatic classification.
出处
《大学图书馆学报》
CSSCI
北大核心
2017年第6期31-40,共10页
Journal of Academic Libraries
基金
国家社会科学基金重大项目"面向大数据的数字图书馆移动视觉搜索机制及应用研究"(编号:15ZDB126)的研究成果之一