摘要
随着新疆农业信息技术的不断发展和农村互联网的广泛普及,互联网中海量的农业相关知识和信息虽然为工作人员带来了便利,但是与此同时也给信息检索增加了难度。在对具有新疆特色的农作物网页信息分类研究的基础上,提出并实现了K-means与SVM相结合的分类方法,以帮助农业相关工作人员获得更准确有效的信息。该分类方法采用K-means对训练样本进行聚类以减少边缘训练样本,并应用SVM对删减后的训练集进行训练。为减少训练集边缘样本、节省训练时间,还提出了两种基于中心向量的边缘样本删减方法,分别仅保留中心向量方法和保留中心向量临近样本。实验验证结果表明,所提出的方法均能够同时有效地减少训练样本和训练时间。
With the continuous development of Xinjiang agricultural information technology and the widespread popularity of rural Inter- net, the amount of relevant knowledge and information in Internet has been bringing lots of conveniences for people and difficulty for ef- fective information retrieval at the same time. Based on the requirement analysis of Xinjiang Rural Information Acquisition System and ai- ming at categorization of the web pages which are about characteristic crops in Xinjiang to help display more accurate and effective agri- cultural information and reduce the number of training sets and save training time,a method combined with SVM and K-means has been proposed. Its main process contains clustering the training sets with K-means to delete edge samples and training the SVM on the new de- leted training sets. Two methods of deleting edge samples and retaining neighbors of the centers have also been proposed. Experimental re- sults show that these methods can decrease training samples and training time.
出处
《计算机技术与发展》
2017年第6期178-182,共5页
Computer Technology and Development
基金
新疆维吾尔自治区科技计划项目(2015X0108-1)
关键词
农业信息
分类
聚类
边缘样本删减
agricultural information
classification
clustering
edge samples reduction