Purpose: Communicating scientific results to the public is essential to inspire future researchers and ensure that discoveries are exploited. News stories about research are a key communication pathway for this and ha...Purpose: Communicating scientific results to the public is essential to inspire future researchers and ensure that discoveries are exploited. News stories about research are a key communication pathway for this and have been manually monitored to assess the extent of press coverage of scholarship.Design/methodology/Approach: To make larger scale studies practical, this paper introduces an automatic method to extract citations from newspaper stories to large sets of academic journals. Curated ProQuest queries were used to search for citations to 9,639 Science and3,412 Social Science Web of Science(WoS) journals from eight UK daily newspapers during2006–2015. False matches were automatically filtered out by a new program, with 94% of the remaining stories meaningfully citing research.Findings: Most Science(95%) and Social Science(94%) journals were never cited by these newspapers. Half of the cited Science journals covered medical or health-related topics,whereas 43% of the Social Sciences journals were related to psychiatry or psychology. From the citing news stories, 60% described research extensively and 53% used multiple sources,but few commented on research quality.Research Limitations: The method has only been tested in English and from the ProQuest Newspapers database.Practical implications: Others can use the new method to systematically harvest press coverage of research.Originality/value: An automatic method was introduced and tested to extract citations from newspaper stories to large sets of academic journals.展开更多
Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these...Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these reasons, how to design features with excellent extraction performances for massive and heterogeneous web news pages is a challenging issue. Our extensive case studies indicate that there is potential relevancy between web content layouts and their tag paths. Inspired by the observation, we design a series of tag path extraction features to extract web news. Because each feature has its own strength, we fuse all those features with the DS (Dempster-Shafer) evidence theory, and then design a content extraction method CEDS. Experimental results on both CleanEval datasets and web news pages selected randomly from well-known websites show that the Fl-score with CEDS is 8.08% and 3.08% higher than existing popular content extraction methods CETR and CEPR-TPR respectively.展开更多
News and Media websites have evolved over time,with increasing complexity in their design,content,and monetization strategy.In this study we examined and reported the trends of rich media(images and video),social shar...News and Media websites have evolved over time,with increasing complexity in their design,content,and monetization strategy.In this study we examined and reported the trends of rich media(images and video),social sharing(newer and older social media),and ad placements(display ads and native ads)for eight high-ranked news and media websites in four categories:online television news(CNN,FoxNews),online newspapers(LA Times,NY Times),online magazines(Wired,Forbes),and technology blogs(TechCrunch,TheNextWeb)over the course of 10 years.展开更多
文摘Purpose: Communicating scientific results to the public is essential to inspire future researchers and ensure that discoveries are exploited. News stories about research are a key communication pathway for this and have been manually monitored to assess the extent of press coverage of scholarship.Design/methodology/Approach: To make larger scale studies practical, this paper introduces an automatic method to extract citations from newspaper stories to large sets of academic journals. Curated ProQuest queries were used to search for citations to 9,639 Science and3,412 Social Science Web of Science(WoS) journals from eight UK daily newspapers during2006–2015. False matches were automatically filtered out by a new program, with 94% of the remaining stories meaningfully citing research.Findings: Most Science(95%) and Social Science(94%) journals were never cited by these newspapers. Half of the cited Science journals covered medical or health-related topics,whereas 43% of the Social Sciences journals were related to psychiatry or psychology. From the citing news stories, 60% described research extensively and 53% used multiple sources,but few commented on research quality.Research Limitations: The method has only been tested in English and from the ProQuest Newspapers database.Practical implications: Others can use the new method to systematically harvest press coverage of research.Originality/value: An automatic method was introduced and tested to extract citations from newspaper stories to large sets of academic journals.
基金It was supported by the National Basic Research 973 Program of China under Grant No. 2013CB329604, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of Ministry of Education of China under Grant No. IRT13059, and the National Natural Science Foundation of China under Grant Nos. 61273297, 61229301 and 61503114.
文摘Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these reasons, how to design features with excellent extraction performances for massive and heterogeneous web news pages is a challenging issue. Our extensive case studies indicate that there is potential relevancy between web content layouts and their tag paths. Inspired by the observation, we design a series of tag path extraction features to extract web news. Because each feature has its own strength, we fuse all those features with the DS (Dempster-Shafer) evidence theory, and then design a content extraction method CEDS. Experimental results on both CleanEval datasets and web news pages selected randomly from well-known websites show that the Fl-score with CEDS is 8.08% and 3.08% higher than existing popular content extraction methods CETR and CEPR-TPR respectively.
文摘News and Media websites have evolved over time,with increasing complexity in their design,content,and monetization strategy.In this study we examined and reported the trends of rich media(images and video),social sharing(newer and older social media),and ad placements(display ads and native ads)for eight high-ranked news and media websites in four categories:online television news(CNN,FoxNews),online newspapers(LA Times,NY Times),online magazines(Wired,Forbes),and technology blogs(TechCrunch,TheNextWeb)over the course of 10 years.