期刊文献+

一种高效的文本查重算法在电子商务中的应用 被引量:1

APPLYING AN EFFICIENT TEXTUAL REPLICAS DETECTION ALGORITHM IN E-COMMERCE
下载PDF
导出
摘要 研究一种高效的文本信息查重算法,对电子商务网站的相似信息进行自动归类排序,大幅度提高信息审核效率与正确性。测试表明,信息数量在100-1000条时,该算法十分有效,1000条的文本信息相互比较可控制在2秒之内。信息数量超过1000条后,计算时间会大幅度上升。可通过调整算法中相关参数来调整精度。对于过短信息(少于10个字),可将本算法与Levenshtein算法相结合,以提高该文本信息查重算法的灵活性。 In this paper, an efficient textual information replicas detection algorithm is studied. Similar information on the e-commerce site is automatically classified and sorted, which greatly increases the efficiency and accuracy of information auditing. Tests show that when the information number is between 100 and 1000 ,the algorithm is quite effective,for the comparison of 1000 text messages can be controlled within two seconds. When the information amount is over 1000, the computation time will be significantly increased. The precision can be rectified by adjusting the relevant parameters of the algorithm. For the case that the information is too short (less than 10 words), the algorithm can be combined with the Levenshtein algorithm in order to improve the flexibility of the textual replicas detection algorithm.
出处 《计算机应用与软件》 CSCD 2009年第1期197-199,共3页 Computer Applications and Software
关键词 查重 算法 电子商务 Replicas detection Algorithm E-commerce
  • 相关文献

参考文献3

二级参考文献40

  • 1[1]Fan, P. Z., Darnell, M., Sequence Design for Communications Applications, New York: Wiley, 1996.
  • 2[2]Pursley, M. B., Sarwate, D. V., Performance evaluation for phase-coded spread spectrum multiple-access communications-Part Ⅰ: System analysis, IEEE Trans. Commun., 1977, COM-25: 795-799.
  • 3[3]Sarwate, D. V., Pursley, M. B., Crosscorrelation properties of pseudonoise and related sequences, Proceedings of IEEE, 1980, 68(5): 593-619.
  • 4[4]Sarwate, D. V., Bounds on crosscorrelation and autocorrelation of sequences, IEEE Trans. Inform. Theory,1979, 25: 720-724.
  • 5[5]Welch, L. R., Lower bounds on the maximum crosscorrelation of signals, IEEE Trans. Inform. Theory, 1974,IT-20: 397-399.
  • 6[6]Sidelnikov, V. M., Crosscorrelation of sequences, Probl. Kybem (in Russian), 1971, 24:15-42.
  • 7[7]Sidelnikov, V. M., On mutual correlation of sequences, Soviet Math Doklady, 1971, 12: 197-201.
  • 8[8]Massey, J. L., On Welch's Bound for the crosscorrelation of a sequence set, Proceedings of EEE ISIT'90,Sept. 1990, 385.
  • 9[9]Levenshtein, V. I., New lower bounds on aperiodic crosscorrelation of binary codes, IEEE Trans. Inform.Theory, 1999, 45(1): 284-288.
  • 10[10]Peng, D. Y., Fan, P. Z., Bounds on Aperiodic auto- and cross-correlations of binary sequences with low or zero correlation zone, PDCAT'2003 Proceedings, IEEE Press, August, 2003, 882-886.

共引文献7

同被引文献11

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部