The massive data on the Intemet needs to be exchanged and managed in a way that can understand the semantics of the data. The development and maturity of the semantic web make the highly efficient and high-quality sem...The massive data on the Intemet needs to be exchanged and managed in a way that can understand the semantics of the data. The development and maturity of the semantic web make the highly efficient and high-quality semantic information retrieval possible. The semantic web technology has been applied in the intelligent information retrieval, the inter-enterprise data exchange and knowledge management, and the Interact services and so on. Through the semantic web technology, we can explore the intelligent convergence of the overseas Chinese information data, design the intelligent aggregation model of the overseas Chinese information data, and design the relationship and functions between various modules.展开更多
该文提出一种多文档关键词抽取方法,该方法提出ATF×PDF(Average Term Frequency×ProportionalDocument Frequency)来计算词语权重,并根据候选关键词之间的语义相似度,采用联合权重方法重新计算候选关键词的权重来抽取关键词...该文提出一种多文档关键词抽取方法,该方法提出ATF×PDF(Average Term Frequency×ProportionalDocument Frequency)来计算词语权重,并根据候选关键词之间的语义相似度,采用联合权重方法重新计算候选关键词的权重来抽取关键词。该方法综合考虑了词语的频率,词性以及词语之间的语义相似性等信息,实验表明,该方法能有效抽取多个文档的关键词,同基于关键词的聚类标记方法相比,其准确率提高3%,召回率提高7%,F-measure提高4.4%。展开更多
文摘The massive data on the Intemet needs to be exchanged and managed in a way that can understand the semantics of the data. The development and maturity of the semantic web make the highly efficient and high-quality semantic information retrieval possible. The semantic web technology has been applied in the intelligent information retrieval, the inter-enterprise data exchange and knowledge management, and the Interact services and so on. Through the semantic web technology, we can explore the intelligent convergence of the overseas Chinese information data, design the intelligent aggregation model of the overseas Chinese information data, and design the relationship and functions between various modules.
文摘该文提出一种多文档关键词抽取方法,该方法提出ATF×PDF(Average Term Frequency×ProportionalDocument Frequency)来计算词语权重,并根据候选关键词之间的语义相似度,采用联合权重方法重新计算候选关键词的权重来抽取关键词。该方法综合考虑了词语的频率,词性以及词语之间的语义相似性等信息,实验表明,该方法能有效抽取多个文档的关键词,同基于关键词的聚类标记方法相比,其准确率提高3%,召回率提高7%,F-measure提高4.4%。