In order to improve the accuracy of text similarity calculation,this paper presents a text similarity function part of speech and word order-smooth inverse frequency(PO-SIF)based on sentence vector,which optimizes the...In order to improve the accuracy of text similarity calculation,this paper presents a text similarity function part of speech and word order-smooth inverse frequency(PO-SIF)based on sentence vector,which optimizes the classical SIF calculation method in two aspects:part of speech and word order.The classical SIF algorithm is to calculate sentence similarity by getting a sentence vector through weighting and reducing noise.However,the different methods of weighting or reducing noise would affect the efficiency and the accuracy of similarity calculation.In our proposed PO-SIF,the weight parameters of the SIF sentence vector are first updated by the part of speech subtraction factor,to determine the most crucial words.Furthermore,PO-SIF calculates the sentence vector similarity taking into the account of word order,which overcomes the drawback of similarity analysis that is mostly based on the word frequency.The experimental results validate the performance of our proposed PO-SIF on improving the accuracy of text similarity calculation.展开更多
基金supported by Chongqing Education Committee(20SKGH059)。
文摘In order to improve the accuracy of text similarity calculation,this paper presents a text similarity function part of speech and word order-smooth inverse frequency(PO-SIF)based on sentence vector,which optimizes the classical SIF calculation method in two aspects:part of speech and word order.The classical SIF algorithm is to calculate sentence similarity by getting a sentence vector through weighting and reducing noise.However,the different methods of weighting or reducing noise would affect the efficiency and the accuracy of similarity calculation.In our proposed PO-SIF,the weight parameters of the SIF sentence vector are first updated by the part of speech subtraction factor,to determine the most crucial words.Furthermore,PO-SIF calculates the sentence vector similarity taking into the account of word order,which overcomes the drawback of similarity analysis that is mostly based on the word frequency.The experimental results validate the performance of our proposed PO-SIF on improving the accuracy of text similarity calculation.