摘要
WWW为各行各业提供了大量的信息,但如何准确地从这些信息中提取出相关领域的权威信息是目前研究的热点问题之一。该文提出评判网站信息的多因素综合评估模型,该模型对网站的权威值进行合理计算,给出基于表格数据的语法树模型,完成了表格数据的自动提取。通过实例证明,该方法很好地解决了权威信息的准确和自动提取。
Although WWW has provided much information for all fields, how to extract the authoritative information from related fields exactly is becoming a hot topic. This paper provides a process of extracting table data it provides a multiple factors assessment model to judge the Web page. Using the model, the authoritative value of Web page can be gained correctly. It provides a table-based phrase tree method to extract the interesting data automatically. Example proves that this method can extract the authoritative information exactly and automatically.
出处
《计算机工程》
CAS
CSCD
北大核心
2008年第13期54-55,66,共3页
Computer Engineering
基金
上海高校优秀青年教师科研专项基金资助项目