摘要
提出一种基于XML模式的文档相似度算法,其中,XML模式间的相似度是XML文档聚类的重要依据,元素是XML模式的主体,模式的相似度由元素相似度组成,该算法综合考虑XML模式中元素的结构和语义信息,进一步提高计算相似度的精度。另外,该算法通过计算XML模式间的相似度,可以降低算法的复杂度,提高聚类的准确性,易于提取聚簇的通用XML模式。
A similarity algorithm based on XML schema is brought forward. The similarity of XML Schema is an important foundation for XML clustering. Elements in XML are the main body and the similarity among elements is the major components of schemas similarity. The algorithm takes full account of the structure and semantics of elements. It can make more accurate calculation of similarity. In the mean while, it reduces the complexity and improves the accuracy of clustering. In addition, it is easy to extract the common XML schema of clustering by calculating the similarity among the XML schemas.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第21期54-56,共3页
Computer Engineering