摘要
Entity perception of ambiguous user comments is a critical problem of target identification for huge amount of public opinions.In this paper,a Two-Step-Matching method is proposed to identify the precise target entity from multiple entities mentioned.Firstly,potential entities are extracted by BiLSTM-CRF model and characteristic words by TF-IDF model from public comments.Secondly,the first matching is implemented between potential entities and an official business directory by Jaro-Winkler distance algorithm.Then,in order to find the pre-cise one,an industry-characteristic dictionary is developed into the second matching process.The precise entity is identified according to the count of characteristic words matching to industry-characteristic dictionary.In addition,associated rate(global indicator)and accuracy rate(sample indicator)are defined for evaluation of matching accuracy.The results for three data sets of public opinions about major public health events show that the highest associated rate and accuracy rate arrive at 0.93 and 0.95,averagely enhanced by 32%and 30%above the case of using the first matching process alone.This framework provides the method to find the true target entity of really wanted expression from public opinions.
基金
This work is partially supported by the National Natural Science Foundation of China(Grant Nos.71901144,71771152,61773248)
the Major Program of National Fund of Philosophy and Social Science of China(18ZDA088,20ZDA060)
Shanghai Planning Office of Philosophy and Social Science Foundation(Grant No.2019EXW001)
Foundation of University of Finance and Economics(Grant No.2017110709)
S-Tech internet communication project(Grant Nos.2018PHD005 and 2018TECH003).