期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Web News Extraction via Tag Path Feature Fusion Using DS Theory 被引量:4
1
作者 Gong-Qing Wu Lei Li Xindong Wu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第4期661-672,共12页
Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these... Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these reasons, how to design features with excellent extraction performances for massive and heterogeneous web news pages is a challenging issue. Our extensive case studies indicate that there is potential relevancy between web content layouts and their tag paths. Inspired by the observation, we design a series of tag path extraction features to extract web news. Because each feature has its own strength, we fuse all those features with the DS (Dempster-Shafer) evidence theory, and then design a content extraction method CEDS. Experimental results on both CleanEval datasets and web news pages selected randomly from well-known websites show that the Fl-score with CEDS is 8.08% and 3.08% higher than existing popular content extraction methods CETR and CEPR-TPR respectively. 展开更多
关键词 content extraction web news tag path extraction feature Dempster-Shafer (DS) theory
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部