摘要
为估算Web数据库大小,提出了一种基于属性相关度和样本独立特性的Web数据库大小估算方法。首先通过中科院分词系统ICTCLAS对通过提交查询获得文本属性值进行分词以便计算属性相关度,再通过属性的相关性获得属性近似独立样本,进而依据样本的独立性来估算Web数据库的大小。并通过实验验证,本方法能获得较高的准确性。
This paper proposed a new method based on the attribute relevance to estimate the size of Web database. Firstly,ICTCLAS was used to divide the values in the text attributes,which were acquired according to queries,to compute the attribute relevance. Then,an attribute approximately independent sample was gained based on the above relevance,and the size of database was estimated according to the independence of sample. The experiment had proved that this approach achieved more high accuracy.
出处
《信息技术与信息化》
2010年第2期63-66,共4页
Information Technology and Informatization