摘要
[目的/意义]LDA应用于细分领域主题分析时,所得结果普遍存在可读性和可解释性欠佳的问题。在情报分析实践中采用领域术语开展主题分析已逐渐成为一种趋势,有必要专门将其与传统选词方案所得主题结果进行量化评估对比,以检验其有效性,为后续情报理论研究与实践应用提供支撐。[方法/过程]首先,在文献调研的基础上,选定“名词+动词”“名词”“领域术语”三种选词方案,构建具有多组参数(主题数和词数)的LDA对比实验,并提出基于领域专家分析和主题一致性计算的定性、定量评估方法,以对比不同方案所得主题结果的可解释性和一致性。随后,以心血管领域为例,设定具体实验参数,共开展600轮具体LDA实验,并对其结果进行分析。[结果/结论]实验结果表明,以领域术语作为选词方案所得到的LDA主题可解释性、可读性更好,情报研究中涉及细分领域主题分析可尽量采用领域术语作为分析对象。
[Purpose/significance] When LDA is applied to the subject analysis of subdivisions, the results are poorly readable and poorly interpretable. In the practice of intelligence analysis, the use of domain terminology for topic analysis has gradually become a trend. It is necessary to specifically compare and compare the results of the traditional word selection program to verify its validity, and to provide support to follow-up intelligence theory research and practice.[ Method/process] Firstly, on the basis of literature research, three word selection schemes such as “noun + verb”,“noun” and “domain term” are selected to construct LDA with multiple sets of parameters ( number of topics and number of words). Contrasting experiments and presenting qualitative and quantitative evaluation methods based on domain expert analysis and subject consistency calculation to compare the interpretability and consistency of the thematic results obtained by different schemes. Subsequently, taking the cardiovascular field as an example ,specific experimental parameters were set, and 600 rounds of specific LDA experiments were carried out, and the results were analyzed.[ Result/conclusion ] The experimental results show that the LDA theme obtained by using the domain term as the word selection scheme is more interpretative and readable. The subject analysis of the subdivisions involved in intelligence research can use domain terminology as the analysis object.
出处
《情报理论与实践》
CSSCI
北大核心
2019年第6期138-143,共6页
Information Studies:Theory & Application
基金
国家社会科学基金青年项目“领域分析视角下的科技词汇语义挖掘与知识演化研究”的成果之一,项目编号:16CTQ024
关键词
LDA
领域术语
主题分析
领域知识分析
心血管医学
LDA
domain terminology
subject analysis
domain knowledge analysis
cardiovascular medicine