摘要
传统文本资源的标签生成算法忽略了与领域有关的语义属性,不适用于针对特定领域的标签生成任务。论文提出了一种适应于军事领域特征的标签生成算法,首先使用适合该领域的分词方法,进而基于文本资源的主题信息和词语的统计特征进行标签的自动生成。实验结果显示,所提方法在准确率、召回率及F值上较传统的TF-IDF算法有一定的提升。
The traditional tag generation algorithm for text resources can not be well applied to the task of tag generation for specific fields because it ignores the semantic features related to fields.In this paper,a tag generation algorithm suitable for the char-acteristics of the military field is proposed.First,the paper uses the word segmentation method suitable for this field,and then auto-matically generates the tags based on the topic information of the text resources and the statistical characteristics of the words.The experimental results show that the proposed method has a certain improvement in accuracy,recall rate and F value compared with the traditional TF-IDF algorithm.
作者
景道月
JING Daoyue(Zhenjiang Food and Drug Superision and Inspection Center,Zhenjiang 212004)
出处
《计算机与数字工程》
2024年第5期1459-1462,1501,共5页
Computer & Digital Engineering
关键词
抽取
标签生成
分词
LDA主题模型
统计特征
keyword extraction
tag generation
word segmentation
LDA topic model
statistical characteristics