Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A nove...Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A novel and accurate solution for extracting content of HTML pages was proposed.First of all,the HTML page is parsed into DOM object and the IDs of all leaf nodes are generated.Secondly,the score of each leaf node is calculated and the score is adjusted according to the relationship with neighbors.Finally,the information blocks are found according to the definition,and a universal classification algorithm is used to identify the content blocks.The experimental results show that the algorithm can extract content effectively and accurately,and the recall rate and precision are 96.5% and 93.8%,respectively.展开更多
With the rapid development of modern high-tech, human society is advancing towards the information age. The advent of Intemet technology has brought mankind a new communication experience, and it makes communication b...With the rapid development of modern high-tech, human society is advancing towards the information age. The advent of Intemet technology has brought mankind a new communication experience, and it makes communication between people undergo a fimdamental change. Today' s information and communication has broken the traditional way, storage, use and exchange of information are all inseparable from the network, the platform. Network development is affecting all aspects of human behavior and production life. In network technology, web interface design is a very important part. Web interface design is inseparable from the visual arts, both in mutual integration and the overall design, so that the web interface has the aesthetic appeal of the visual arts and the humanities connotation. Due to the development of network technology, people also put forward higher requirements for web design. Web interface design should not only reflect the excellent content, but also create one of the most intuitive visual arts presentation form. This text will begin from a Web page design and production, analyzed and studied its visual art elements to achieve a reasonable and beautiful web interface.展开更多
基金Project(2012BAH18B05) supported by the Supporting Program of Ministry of Science and Technology of China
文摘Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A novel and accurate solution for extracting content of HTML pages was proposed.First of all,the HTML page is parsed into DOM object and the IDs of all leaf nodes are generated.Secondly,the score of each leaf node is calculated and the score is adjusted according to the relationship with neighbors.Finally,the information blocks are found according to the definition,and a universal classification algorithm is used to identify the content blocks.The experimental results show that the algorithm can extract content effectively and accurately,and the recall rate and precision are 96.5% and 93.8%,respectively.
文摘With the rapid development of modern high-tech, human society is advancing towards the information age. The advent of Intemet technology has brought mankind a new communication experience, and it makes communication between people undergo a fimdamental change. Today' s information and communication has broken the traditional way, storage, use and exchange of information are all inseparable from the network, the platform. Network development is affecting all aspects of human behavior and production life. In network technology, web interface design is a very important part. Web interface design is inseparable from the visual arts, both in mutual integration and the overall design, so that the web interface has the aesthetic appeal of the visual arts and the humanities connotation. Due to the development of network technology, people also put forward higher requirements for web design. Web interface design should not only reflect the excellent content, but also create one of the most intuitive visual arts presentation form. This text will begin from a Web page design and production, analyzed and studied its visual art elements to achieve a reasonable and beautiful web interface.