摘要
文本聚类与知识获取所产生的知识发现系统需要更加快速、更加准确的算法支持,以满足用户知识需求准确性和关联性的不断增长。论文基于随机游走知识发现系统,融合网络爬虫技术和学术资源网站结构化数据的特征,通过应用拉普拉斯机制,将所有图书馆文献通过关键词函数的设定和游走过程的通量化,确定图论下的拉普拉斯算子,对原有遍历所有文献节点并反复迭代完成聚类的运作模式添加游走终点判定,对其进行算法的优化,有效解决了随机游走知识发现系统的时间、空间复杂度过大的问题,增加了随机游走聚类的准确性。
With the increasing demand of user’s knowledge,the requirement of accuracy and relevance is higher and higher,the knowledge discovery system for text clustering and knowledge acquisition needs more fast and accurate algorithm support.Based on the random walk knowledge discovery system,this paper optimizes its algorithm:by using Laplace mechanism,all library documents are quantified by keyword function setting and walk process,the Laplace operator under graph theory is determined,and the accuracy of random walk clustering is increased by adding run end point judgment to the original operation mode of traversing all document nodes and increase the accuracy of random walk clustering.
出处
《新世纪图书馆》
CSSCI
2021年第12期38-43,共6页
New Century Library