摘要
针对农作物病虫害信息搜索中查询信息不准确、查询效率偏低的问题,基于Lucene 3.6的搜索架构,以Simple方式处理字符串,通过HashMap构建同义词词典,采用单链表结构优化内存空间,对Mmseg4j分词算法进行改进,完成对蝴蝶兰、红枣和马铃薯等农作物病虫害的全文搜索。实验结果表明,搜索准确率达到82.3%,Java虚拟机内存空间减少1/3,实用价值较好。
Chinese word segmentation is an important part of the search engine. But the query information of crop diseases and insect pests are not accurate and the query efficiency is low. The Mmseg4j segmentation algorithm is improved based on Lucene 3.6 search architecture, dealing with strings in a simple way, through a Hash Map constructing synonyms dictionary, adopting singly linked list structure optimization of memory space. The butterfly orchid, red jujube, and potato crops diseases and insect pests of full-text search have been completed. Experimental results show that the search accuracy rate is up to 82.3% with one-third of ascension in taking the Java virtual machine memory space and its practical value is better.
出处
《宁夏工程技术》
CAS
2017年第3期229-232,共4页
Ningxia Engineering Technology
基金
Google支持教育部产学合作育人项目"智能检索技术"(201601005045)