Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A nove...Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A novel and accurate solution for extracting content of HTML pages was proposed.First of all,the HTML page is parsed into DOM object and the IDs of all leaf nodes are generated.Secondly,the score of each leaf node is calculated and the score is adjusted according to the relationship with neighbors.Finally,the information blocks are found according to the definition,and a universal classification algorithm is used to identify the content blocks.The experimental results show that the algorithm can extract content effectively and accurately,and the recall rate and precision are 96.5% and 93.8%,respectively.展开更多
Memory analysis gains a weight in the area of computer live forensics.How to get network connection information is one of the challenges in memory analysis and plays an important role in identifying sources of malicio...Memory analysis gains a weight in the area of computer live forensics.How to get network connection information is one of the challenges in memory analysis and plays an important role in identifying sources of malicious cyber attack. It is more difficult to fred the drivers and get network connections information from a 64-bit windows 7 memory image file than from a 32-bit operating system memory image f'de. In this paper, an approach to fred drivers and get network connection information from 64-bit windows 7 memory images is given. The method is verified on 64-bit windows 7 version 6.1.7600 and proved reliable and efficient.展开更多
基金Project(2012BAH18B05) supported by the Supporting Program of Ministry of Science and Technology of China
文摘Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A novel and accurate solution for extracting content of HTML pages was proposed.First of all,the HTML page is parsed into DOM object and the IDs of all leaf nodes are generated.Secondly,the score of each leaf node is calculated and the score is adjusted according to the relationship with neighbors.Finally,the information blocks are found according to the definition,and a universal classification algorithm is used to identify the content blocks.The experimental results show that the algorithm can extract content effectively and accurately,and the recall rate and precision are 96.5% and 93.8%,respectively.
基金This work is supported by the National Natural Science Foundation of China(61070163) and Shandong Natural Science Foundation (Y2008G35).
文摘Memory analysis gains a weight in the area of computer live forensics.How to get network connection information is one of the challenges in memory analysis and plays an important role in identifying sources of malicious cyber attack. It is more difficult to fred the drivers and get network connections information from a 64-bit windows 7 memory image file than from a 32-bit operating system memory image f'de. In this paper, an approach to fred drivers and get network connection information from 64-bit windows 7 memory images is given. The method is verified on 64-bit windows 7 version 6.1.7600 and proved reliable and efficient.