摘要
回顾了中文和英文语言环境下的Web文本挖掘现状,阐明了其现阶段的特点和技术瓶颈.之后提出了一种基于Web文本挖掘的网页内容挖掘技术:AIS(Augmented information support),介绍了相关实现所涉及的基础技术和功能.最后将AIS技术应用于香山科学会议网站,开发了AIS4XSSC文本挖掘系统并展示了现阶段其主要功能.实践表明AIS技术能够从大量的Web文本中有效提炼信息,提高用户检索效率并向用户推送有价值的信息.
Web text mining (WTM) is a technology for information support as one component of the machine system of HWMSE. Concerning the deficiencies of current search engine for retrieval of WWW, improvements are expected. In this paper, a brief review on recent WTM developments was presented at first. Then a technology on augmented information support, AIS, was proposed to cope with "information explosion" based on WTM technologies. Finally, AIS is applied to the development of the AIS4XSSC (AIS for Xiangshan Science Conference) system, which is customized for information retrieval and knowledge discovery from XSSC Website. The practical application demonstrates that AIS is useful to extract information from Web documents and improve the performance of information retrieval.
出处
《系统工程理论与实践》
EI
CSSCI
CSCD
北大核心
2010年第1期96-104,共9页
Systems Engineering-Theory & Practice
基金
国家自然科学基金(70571078)