期刊文献+

基于页面分块与信息熵的评论发现及抽取 被引量:4

Reviews Discovery and Opinions Extraction Based on Segment and Entropy
下载PDF
导出
摘要 提出了一种新颖的REA(Review Extract Algorithm)算法进行评论信息的发现与抽取。算法采用了页面分块与信息熵的迭代计算技术实现了评论块的自动发现与抽取。其中,页面分块技术的运用有效地去除了噪声信息;基于块的熵值计算精确定位了每一个用户评论。实验结果证明该算法具有较高的查全率与查准率。 This paper puts forward a kind of novel algorithm - REA ( Review Extract Algorithm ). REA iteratively segements page and calculates the information entropy to automatically discover and extract the reviews. Page-segement technology effectively eliminates the noise information. The block-entropy calculation locates every user's comments accurately. The result of experiment proves that the algorithm has higher rate of recall and precision.
出处 《计算机应用研究》 CSCD 北大核心 2007年第2期269-271,291,共4页 Application Research of Computers
基金 江苏省自然科学基金资助项目(BK2005046)
关键词 评论抽取 自动 语义块 Opinions Extraction Automatic Semantic Block Entropy Values
  • 相关文献

参考文献10

  • 1D Cai,S P Yu,J R Wen,et al.Extracting Content Structure for Web Pages Based on Visual Representation[C].Xi'an:Proc.of the 5th Asia Pacific Web Conf.,2003.928-937.
  • 2Chakrabarti S.Mining the Web:Discovering Knowledge from Hypertext Data[M].Morgan Kaufmann Publishers,2002.
  • 3Hammer J,Garcia-Molina H,Cho J,et al.Extracting Semi-structured Information from the Web[C].Proceedings of the Workshop on Management of Semi-structured Data,1997.283-294.
  • 4Chakrabarti S,Punera K,Subramanyam M.Accelerated Focused Crawling through On-line Relevance Feedback[C].Proceedings of the 11th International Conference on WWW,2002.148-159.
  • 5Shian-Hua Lin,Jan-Ming Ho.Discovering Informative Content Blocks from Web Documents[C].Proceedings of the 8th ACM SIGKDD International Conference,2002.588-593.
  • 6Hsu C-N,Dung M-T.Generating Finite-state Transducers for Semistructured Data Extraction from the Web[J].Information Systems,1998,23(8):521-538.
  • 7Kushmerick N,Weld D,Doorenbos R.Wrapper Induction for Information Extraction[C].Nagoya:IJCAI,1997.246-247.
  • 8Jiawei Han,Kevin Chen-chuan Chang.Data Mining for Web Intelligence[J].IEEE Computer,2002,35(11):64-70.
  • 9Chen J,Zhou B,Shi J,et al.Function-based Object Model towards Website Adaptation[C].Proceedings of the 10th International World Wide Web Conference,2001.587-596.
  • 10Chakrabarti S.Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction[C].The 10th International World Wide Web Conference,2001.211-220.

同被引文献52

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部