摘要
句子相似计算技术能够得到两个句子的相似关系,在敏感数据检测领域具有广阔的应用前景。因此,提出了一种融合依存句法和Word Rotator’s Distance语义距离的句子相似计算方法。通过依存句法分析句子结构并提取各种依存关系词集,计算对应依存关系词集间的Word Rotator’s Distance语义距离,并最终得到两个句子的相似指数。实验证明,该方法能够得到比较准确的句子相似计算结果,适用于多种敏感数据检测场景。
Sentence similarity computing technology can obtain the similarity of two sentences,which has a broad application prospect in the field of sensitive data detection.This paper proposes a sentence similarity calculation method that combines dependency parsing and Word Rotator’s Distance semantic distance.This method analyzes the sentence structure through dependency parsing and extracts various dependency relation word sets,and calculates the Word Rotator’s Distance semantic distance between the dependency relation word sets and finally obtains the similarity index of the two sentences.Experiments prove that this method can obtain relatively accurate sentence similarity calculation results,and is suitable for a variety of sensitive data detection scenarios.
作者
周俊
石元兵
魏忠
金贵涛
郭红
ZHOU Jun;SHI Yuanbing;WEI Zhong;JIN Guitao;GUO Hong(Westone Information Industry Ltd.,Chengdu Sichuan 610041,China)
出处
《通信技术》
2021年第1期181-187,共7页
Communications Technology