期刊文献+

Information Classification and Extraction on Official Web Pages of Organizations

下载PDF
导出
摘要 As a real-time and authoritative source,the official Web pages of organizations contain a large amount of information.The diversity of Web content and format makes it essential for pre-processing to get the unified attributed data,which has the value of organizational analysis and mining.The existing research on dealing with multiple Web scenarios and accuracy performance is insufficient.This paper aims to propose a method to transform organizational official Web pages into the data with attributes.After locating the active blocks in the Web pages,the structural and content features are proposed to classify information with the specific model.The extraction methods based on trigger lexicon and LSTM(Long Short-Term Memory)are proposed,which efficiently process the classified information and extract data that matches the attributes.Finally,an accurate and efficient method to classify and extract information from organizational official Web pages is formed.Experimental results show that our approach improves the performing indicators and exceeds the level of state of the art on real data set from organizational official Web pages.
出处 《Computers, Materials & Continua》 SCIE EI 2020年第9期2057-2073,共17页 计算机、材料和连续体(英文)
基金 This work was supported by the National Key Research and Development Program of China(Nos.2016QY03D0501,2017YFB0803300) the National Natural Science Foundation of China(Nos.61601146,61732022) Sichuan Science and Technology Program(No.2019YFSY0049).
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部