摘要
数字商品,即以数字方式存在的商品,主要有正文、图像、视频和音频等4种形式.在Internet电子商务环境下,数字商品很容易被非法复制和扩散,这无疑会妨碍电子商务的健康发展.本文试图解决电子商务环境下数字正文的非法复制和扩散问题.文中首先提出了一种数字正文的多层次、多粒度表示方法;然后,在此基础上给出了相应的重叠性度量算法。该方法不但能较为准确地检测等价复制、超集复制和移位整体复制这样的数字正文整体非法复制行为,而且还能检测诸如子集复制和移位局部复制等部分非法复制行为.同时,该算法也具有较强的扩展性.文章从5个不同的方面对提出的方法进行了测试,实验结果表明该算法是有效的.
The purpose of the paper is to attack the problem of illegal coping and distribution of digital texts. Based on the registration-based mechanism, one reconstruction method and the corresponding overlapping measuring algorithms are presented in the paper. In the proposed method, at first, the protected digital texts are represented both in the aspects of physical structure and semantic content with different levels and different granularities; after that they are registered in the database; and then the web crawler surfs on the Internet automatically and periodically and returns the suspected digital texts; finally, by using the proposed overlapping measuring algorithms, the returned digital texts are compared with those saved in the database at semantic content level and physical structure level respectively. If the semantic similarity between two digital texts is smaller than the given threshold, the comparing process stops. So the efficiency is greatly improved. Because of the flexibility of the representing method and the overlapping measuring algorithms, the proposed method can discovery not only the whole illegal coping and distributing behaviors, such as equivalence replication, superset replication, and shift whole replication; but also the partial illegal coping and distributing behaviors, such as subset replication and shift partial replication. In the same time, it also possesses the good scaling ability to massive data. In order to examine the effectiveness of the algorithms, experiments have been done from five aspects. Results show that average precision of the algorithms is more than the sentence-based exhausting comparing method 3. 4%, the average running time of the algorithms is less than the sentence-based exhausting comparing method 40. 2%.
出处
《计算机学报》
EI
CSCD
北大核心
2002年第11期1206-1211,共6页
Chinese Journal of Computers
基金
本课题得到国家自然科学基金(60173058)资助
国家"八六三"青年基金资助项目(863-306-QN2000-5)
关键词
数字商品
非法复制
检测算法
电子商务
INTERNET
digital goods, coping and distributing detection, similar patterns mining, electronic commerce