Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A nove...Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A novel and accurate solution for extracting content of HTML pages was proposed.First of all,the HTML page is parsed into DOM object and the IDs of all leaf nodes are generated.Secondly,the score of each leaf node is calculated and the score is adjusted according to the relationship with neighbors.Finally,the information blocks are found according to the definition,and a universal classification algorithm is used to identify the content blocks.The experimental results show that the algorithm can extract content effectively and accurately,and the recall rate and precision are 96.5% and 93.8%,respectively.展开更多
In order to solve the problem of the lack of ornamental value and research value of ancient mural paintings due to low resolution and fuzzy texture details,a super resolution(SR)method based on generative adduction ne...In order to solve the problem of the lack of ornamental value and research value of ancient mural paintings due to low resolution and fuzzy texture details,a super resolution(SR)method based on generative adduction network(GAN)was proposed.This method reconstructed the detail texture of mural image better.Firstly,in view of the insufficient utilization of shallow image features,information distillation blocks(IDB)were introduced to extract shallow image features and enhance the output results of the network behind.Secondly,residual dense blocks with residual scaling and feature fusion(RRDB-Fs)were used to extract deep image features,which removed the BN layer in the residual block that affected the quality of image generation,and improved the training speed of the network.Furthermore,local feature fusion and global feature fusion were applied in the generation network,and the features of different levels were merged together adaptively,so that the reconstructed image contained rich details.Finally,in calculating the perceptual loss,the brightness consistency between the reconstructed fresco and the original fresco was enhanced by using the features before activation,while avoiding artificial interference.The experimental results showed that the peak signal-to-noise ratio and structural similarity metrics were improved compared with other algorithms,with an improvement of 0.512 dB-3.016 dB in peak signal-to-noise ratio and 0.009-0.089 in structural similarity,and the proposed method had better visual effects.展开更多
基金Project(2012BAH18B05) supported by the Supporting Program of Ministry of Science and Technology of China
文摘Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A novel and accurate solution for extracting content of HTML pages was proposed.First of all,the HTML page is parsed into DOM object and the IDs of all leaf nodes are generated.Secondly,the score of each leaf node is calculated and the score is adjusted according to the relationship with neighbors.Finally,the information blocks are found according to the definition,and a universal classification algorithm is used to identify the content blocks.The experimental results show that the algorithm can extract content effectively and accurately,and the recall rate and precision are 96.5% and 93.8%,respectively.
文摘In order to solve the problem of the lack of ornamental value and research value of ancient mural paintings due to low resolution and fuzzy texture details,a super resolution(SR)method based on generative adduction network(GAN)was proposed.This method reconstructed the detail texture of mural image better.Firstly,in view of the insufficient utilization of shallow image features,information distillation blocks(IDB)were introduced to extract shallow image features and enhance the output results of the network behind.Secondly,residual dense blocks with residual scaling and feature fusion(RRDB-Fs)were used to extract deep image features,which removed the BN layer in the residual block that affected the quality of image generation,and improved the training speed of the network.Furthermore,local feature fusion and global feature fusion were applied in the generation network,and the features of different levels were merged together adaptively,so that the reconstructed image contained rich details.Finally,in calculating the perceptual loss,the brightness consistency between the reconstructed fresco and the original fresco was enhanced by using the features before activation,while avoiding artificial interference.The experimental results showed that the peak signal-to-noise ratio and structural similarity metrics were improved compared with other algorithms,with an improvement of 0.512 dB-3.016 dB in peak signal-to-noise ratio and 0.009-0.089 in structural similarity,and the proposed method had better visual effects.