Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these...Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these reasons, how to design features with excellent extraction performances for massive and heterogeneous web news pages is a challenging issue. Our extensive case studies indicate that there is potential relevancy between web content layouts and their tag paths. Inspired by the observation, we design a series of tag path extraction features to extract web news. Because each feature has its own strength, we fuse all those features with the DS (Dempster-Shafer) evidence theory, and then design a content extraction method CEDS. Experimental results on both CleanEval datasets and web news pages selected randomly from well-known websites show that the Fl-score with CEDS is 8.08% and 3.08% higher than existing popular content extraction methods CETR and CEPR-TPR respectively.展开更多
Road network extraction is vital to both vehicle navigation and road planning.Existing approaches focus on mining urban trunk roads from GPS trajectories of floating cars.However,path extraction,which plays an importa...Road network extraction is vital to both vehicle navigation and road planning.Existing approaches focus on mining urban trunk roads from GPS trajectories of floating cars.However,path extraction,which plays an important role in earthquake relief and village tour,is always ignored.Addressing this issue,we propose a novel approach of extracting campus’road network from walking GPS trajectories.It consists of data preprocessing and road centerline generation.The patrolling GPS trajectories,collected at Hunan University of Science and Technology,were used as the experimental data.The experimental evaluation results show that our approach is able to effectively and accurately extract both campus’trunk roads and paths.The coverage rate is 96.21%while the error rate is 3.26%.展开更多
基金It was supported by the National Basic Research 973 Program of China under Grant No. 2013CB329604, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of Ministry of Education of China under Grant No. IRT13059, and the National Natural Science Foundation of China under Grant Nos. 61273297, 61229301 and 61503114.
文摘Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these reasons, how to design features with excellent extraction performances for massive and heterogeneous web news pages is a challenging issue. Our extensive case studies indicate that there is potential relevancy between web content layouts and their tag paths. Inspired by the observation, we design a series of tag path extraction features to extract web news. Because each feature has its own strength, we fuse all those features with the DS (Dempster-Shafer) evidence theory, and then design a content extraction method CEDS. Experimental results on both CleanEval datasets and web news pages selected randomly from well-known websites show that the Fl-score with CEDS is 8.08% and 3.08% higher than existing popular content extraction methods CETR and CEPR-TPR respectively.
基金This project was funded by the National Natural Science Foundation of China(61872139,41871320)Provincial and Municipal Joint Fund of Hunan Provincial Natural Science Foundation of China(2018JJ4052)+2 种基金Hunan Provincial Natural Science Foundation of China(2017JJ2081)the Key Project of Hunan Provincial Education Department(17A070,19A172)the Project of Hunan Provincial Education Department(17C0646).
文摘Road network extraction is vital to both vehicle navigation and road planning.Existing approaches focus on mining urban trunk roads from GPS trajectories of floating cars.However,path extraction,which plays an important role in earthquake relief and village tour,is always ignored.Addressing this issue,we propose a novel approach of extracting campus’road network from walking GPS trajectories.It consists of data preprocessing and road centerline generation.The patrolling GPS trajectories,collected at Hunan University of Science and Technology,were used as the experimental data.The experimental evaluation results show that our approach is able to effectively and accurately extract both campus’trunk roads and paths.The coverage rate is 96.21%while the error rate is 3.26%.