摘要
总结了面向中文文本网页的文本综述的生成过程,详细分析了文本预处理、语句相似度计算、局部主题区域发现、差异性获取、综述生成等关键技术。在内容选择上,通过融合关键词和语句的内在特征进行相似度计算来考量语句的相关性;使用文本聚类技术来寻找语句的差异性。同时,基于MyEclipse环境的Java ME平台,结合其轻量级UI工具包LWUIT,使用WTK作为开发工具,设计并实现了基于手机终端的自动综述系统。最后选取了近200篇文献作为测试语料,进行了可接受性评测和基于Q&A的信息性评测,测试结果比较满意。
The generation process of multi-document automatic summarization for Chinese webpage text is summed up. Several key techniques are analyzed in detail involving text preprocessing, sentence similarity calculation, topic information and difference detection, and summarization generation. For content selection, on the one hand, it includes how to identify the important content by sentence similarity calculation based on inosculated inherent features about key words and sentence. On the other hand, it also includes how to find the differ- ences between sentences using text clustering. At the same time, on the basis of Java ME platform, combining with LWUIT, a mobile phone terminal based multi-document automatic summarization system by means of WTK is designed and implemented. Then nearly 200 articles are selected and the evaluating methods include quality and information evaluation based on Q&A. Finally the applying of this system gained comparatively satisfactory result.
出处
《计算机与数字工程》
2013年第6期943-946,995,共5页
Computer & Digital Engineering