摘要
提出ATValue(Advanced TValue and Fieldhood Integration)术语抽取法。为提高术语抽取质量,在TValue五属性的基础上,提出领域度。通过相关性分析获得六属性组合值AValue,最后识别AValue高于术语可信度的词串来选择候选术语。能源行业的实验结果表明:ATValue术语抽取法的F值约比TValue术语抽取法高出2个百分点,原因在于ATValue的领域度测算了词串中各种单词对领域的贡献。
It proposes an automatic term extraction based on ATValue(advanced TValue and fieldhood integration). In order to increase the quality of term extraction,it puts forward the degree of fieldhood based on the five attributes of TValue. The value of AValue is computed by the six attributes of the strings based on multiplication of probability after their correlations are analyzed. It gains the candidate terms by the analysis of the strings whose value of AValue is more than the pre-defined confidence threshold. The simulation results of term extraction in energy industry show that the F-score of automatic term extraction based on ATValue is about 2% higher than that based on TValue,because it measures the score of importance of compound words by the degree of fieldhood of ATValue.
作者
杨雅娜
刘胜奇
YANG Yana LIU Shengqi(Postal Savings Bank of China, Beijing, 100070, China China Patent Information Center, Beijing, 100088, China)
出处
《情报工程》
2015年第5期25-31,共7页
Technology Intelligence Engineering
关键词
术语抽取
术语识别
数据挖掘
领域度
Term Extraction
Term Recognition
Data Mining
Fieldhood