摘要
中文文本分词技术是文本挖掘领域的一个重要分支,在中国仍然处于发展阶段.Apache Jakarta的开源工程Lucene是一个十分优秀的基于Java语言的文本检索工具包,在国外已经得到广泛的应用.但是Lucene对中文分词功能的支持不太理想,给Lucene加入好的中文分词功能对Lucene在国内的发展和应用将会起到很大的推动作用.
Chinese Automatic Word-cut technology is an important branch of Text Mining. It is still in the phases of evolution. Lucene is one of the open source project in Apache Jakarta. It is a very excellent text Retrieval toolkit base on Java and applied widely into many areas in many foreign countries. But Lucene do not support Chinese Automatic Word-cut effectively,implementing Lucene with the function of Chinese Automatic Word-cut will drive the application and improvement of Lucene in China.
出处
《内蒙古工业大学学报(自然科学版)》
2007年第3期185-188,共4页
Journal of Inner Mongolia University of Technology:Natural Science Edition