摘要
信息时代的开源情报传播速度快、体量大、时效性强,大量数据难以用人工进行分析,为了解决对海量数据分析的效率,研究设计了Web开源情报信息处理方法。该方法首先利用网络爬虫通过URL爬取目标情报,之后用DOM树对网页内容进行整理,采用TextRank算法提取关键词并使用Chameleon聚类算法构建主题挖掘模型,该模型用于情报主题生成,自动进行情报主题分析。性能测试表明,基于Chameleon聚类算法的Web开源情报信息处理方法能够对开源情报进行有效分析。
In the information age, open source intelligence spreads faster, has a larger volume, and is more timely. However, due to the difficulty of manual analyze of a large amount of data, in order to solve the problem of collecting intelligence from massive amounts of data and improve the efficiency of intelligence analysis, a Web open source intelligence information processing method has been studied and designed. This method firstly uses web crawlers to crawl target intelligence through URLs, then uses DOM trees to organize web content structure, uses TextRank algorithm to extract keywords, and finally adopts Chameleon clustering algorithm to construct a topic mining model.The experiment results demonstrate that the proposed Chameleon clustering algorithm-based Web open source intelligence information processing method can effectively analyse open source intelligence and has excellent performance.
作者
方世敏
FANG Shi-min(School of Politics,National Defence University,Shanghai 200433,China)
出处
《信息技术》
2024年第11期63-68,76,共7页
Information Technology
基金
国家社科基金军事学青年项目(2019-SKJJ-C-064)。