The pinewood nematode(PWN), Bursaphelenchus xylophilus, has become one of the most severe threats to pine forest worldwide. Nematodes, migrating through resin canals and feeding on the living cells, induce rapid met...The pinewood nematode(PWN), Bursaphelenchus xylophilus, has become one of the most severe threats to pine forest worldwide. Nematodes, migrating through resin canals and feeding on the living cells, induce rapid metabolic changes in ray parenchyma cells, create cavitation areas, decrease xylem water content and oleoresin exudation, and cause necrosis of parenchyma and cambial cells. This study focused on the impact of PWN infection on technological parameters of wood and evaluated the impact of anatomic and biochemical incidences of tree defense reactions on basic density, extractive content and moisture sorption properties of Pinus pinaster wood.Samples of infected and uninfected wood were studied.The presence of nematodes reduced wood basic density by2 % and decreased the total content of extractives in infected wood as compared with uninfected(5.98 and8.90 % of dry wood mass, respectively). Extractives in infected trees had inverse distribution along the trunk as compared with uninfected trees. The adsorption isotherms for infected and uninfected wood had similar positioning.We recorded differences(some statistically significant) in the equilibrium moisture content of infected and uninfected wood under varying environmental conditions. Despite the verified differences in wood basic density, extractive content and moisture sorption properties, the overall conclusion is that the PWN had a slight impact on these characteristics of wood.展开更多
For measurement of component content in the extraction and separation process of praseodymium/neodymium(Pr/Nd), a soft measurement method was proposed based on modeling of ion color features, which is suitable for fas...For measurement of component content in the extraction and separation process of praseodymium/neodymium(Pr/Nd), a soft measurement method was proposed based on modeling of ion color features, which is suitable for fast estimation of component content in production field. Feature analysis on images of the solution is conducted,which are captured from Pr/Nd extraction/separation field. H/S components in the HSI color space are selected as model inputs, so as to establish the least squares support vector machine(LSSVM) model for Nd(Pr) content,while the model parameters are determined with the GA algorithm. To improve the adaptability of the model,the adaptive iteration algorithm is used to correct parameters of the LSSVM model, on the basis of model correction strategy and new sample data. Using the field data collected from rare earth extraction production, predictive methods for component content and comparisons are given. The results indicate that the proposed method presents good adaptability and high prediction precision, so it is applicable to the fast detection of element content in the rare earth extraction.展开更多
Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A nove...Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A novel and accurate solution for extracting content of HTML pages was proposed.First of all,the HTML page is parsed into DOM object and the IDs of all leaf nodes are generated.Secondly,the score of each leaf node is calculated and the score is adjusted according to the relationship with neighbors.Finally,the information blocks are found according to the definition,and a universal classification algorithm is used to identify the content blocks.The experimental results show that the algorithm can extract content effectively and accurately,and the recall rate and precision are 96.5% and 93.8%,respectively.展开更多
Variations of wood specific gravity and extractive contents from pith to bark and from base to the top of tree were investigated in a 14-year-old commercial pulpwood species Sterculia setigera Del. Growing in savanna ...Variations of wood specific gravity and extractive contents from pith to bark and from base to the top of tree were investigated in a 14-year-old commercial pulpwood species Sterculia setigera Del. Growing in savanna zone in Nigeria. Tree mean specific gravity averaged 0.37; wood at the base had significant higher specific gravity than those at the top while it increased from pith to bark. For extractive content mean value was 1.20% for wood and 1.72% for bark; i[t varied significantly between trees and from base of the tree to the top and from pith to the bark. Extractive content at the butt and breast height is more than double of the value at the top of the tree. The high extractive content at the base is similar to high specific gravity observed for wood samples from the base. Extractive content of the bast was significantly higher than that of the wood. The low specific gravity show possible suitability of the species for paper making in Nigerian paper mills. The wood of Sterculia setigera showed a significant variation between- and within-trees in the two properties considered, though the wood is light with low extractive content; it is however a potential raw material for large scale pulpwood production in Nigeria.展开更多
This paper had developed and tested optimized content extraction algorithm using NLP method, TFIDF method for word of weight, VSM for information search, cosine method for similar quality calculation from learning doc...This paper had developed and tested optimized content extraction algorithm using NLP method, TFIDF method for word of weight, VSM for information search, cosine method for similar quality calculation from learning document at the distance learning system database. This test covered following things: 1) to parse word structure at the distance learning system database documents and Cyrillic Mongolian language documents at the section, to form new documents by algorithm for identifying word stem;2) to test optimized content extraction from text material based on e-test results (key word, correct answer, base form with affix and new form formed by word stem without affix) at distance learning system, also to search key word by automatically selecting using word extraction algorithm;3) to test Boolean and probabilistic retrieval method through extended vector space retrieval method. This chapter covers: to process document content extraction retrieval algorithm, to propose recommendations query through word stem, not depending on word position based on Cyrillic Mongolian language documents distinction.展开更多
The main content of a news web page is a source of data for Natural Language Processing(NLP)and Information Retrieval(IR),which contains large quantities of valuable information.This paper proposes a method that formu...The main content of a news web page is a source of data for Natural Language Processing(NLP)and Information Retrieval(IR),which contains large quantities of valuable information.This paper proposes a method that formulates the main content extraction problem as a DOM tree node classification problem.In terms of feature extraction,we use the DOM tree node to represent HTML document and then develop multiple features by using the DOM tree node properties,such as text length,tag path,tag properties and so on.In consideration that the essence of the problem is the classification model,we use Xgboost to help select nodes.Experimental results show that the proposed approach is effective and efficient in extracting main content of new web pages,and achieves about 98%accuracy over 1083 news pages from 10 different new sites,and the average processing time per page is within 10 ms.展开更多
When evaluating ionic liquids (ILs) for extractive desulfurization (EDS) of fuel oils, the inevitable presence of water in the system may have a significant and in many cases strongly negative effect. However, few...When evaluating ionic liquids (ILs) for extractive desulfurization (EDS) of fuel oils, the inevitable presence of water in the system may have a significant and in many cases strongly negative effect. However, few studies have considered this particular issue and a promoted water effect on EDS is scarcely reported. In this work, COSMO-RS was firstly employed to calculate the capacity and selectivity for EDS of various IL/H20 mixtures, which cover different IL characters and a wide water concentration range. Experiments were then conducted with a representative IL [C4MIM]IH2P04], whose stable and even promoted extraction performance with a small amount of water was suggested by COSMO-RS. Through analyses of the desulfurization ratio, the cross- solubility and the water content in the desulfurized fuel, the promoted effect of water within a certain range (〈 10 wt%) was experimentally demonstrated. Moreover, such effect of water was explained combining the viscosity, the solvent-solute interactions and the COSMO-RS based analysis.展开更多
Density-based approaches in content extraction, whose task is to extract contents from Web pages, are commonly used to obtain page contents that are critical to many Web mining applications. How- ever, traditional den...Density-based approaches in content extraction, whose task is to extract contents from Web pages, are commonly used to obtain page contents that are critical to many Web mining applications. How- ever, traditional density-based approaches cannot effectively manage pages that contain short contents and long noises. To overcome this problem, in this paper, we propose a content extraction approach for obtain- ing content from news pages that combines a segmentation-like approach and a density-based approach. A tool called BlockExtractor was developed based on this approach. BlockExtractor identifies contents in three steps. First, it looks for all Block-Level Elements (BLE) & Inline Elements (IE) blocks, which are designed to roughly segment pages into blocks. Second, it computes the densities of each BLE&IE block and its ele- ment to eliminate noises. Third, it removes all redundant BLE&IE blocks that have emerged in other pages from the same site. Compared with three other density-based approaches, our approach shows significant advantages in both precision and recall.展开更多
Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these...Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these reasons, how to design features with excellent extraction performances for massive and heterogeneous web news pages is a challenging issue. Our extensive case studies indicate that there is potential relevancy between web content layouts and their tag paths. Inspired by the observation, we design a series of tag path extraction features to extract web news. Because each feature has its own strength, we fuse all those features with the DS (Dempster-Shafer) evidence theory, and then design a content extraction method CEDS. Experimental results on both CleanEval datasets and web news pages selected randomly from well-known websites show that the Fl-score with CEDS is 8.08% and 3.08% higher than existing popular content extraction methods CETR and CEPR-TPR respectively.展开更多
基金supported by Grant No.SFRH/BPD/40135/2008 Funded by FCT(POPH-QREN-Typology 4.1,FCI and MEC)
文摘The pinewood nematode(PWN), Bursaphelenchus xylophilus, has become one of the most severe threats to pine forest worldwide. Nematodes, migrating through resin canals and feeding on the living cells, induce rapid metabolic changes in ray parenchyma cells, create cavitation areas, decrease xylem water content and oleoresin exudation, and cause necrosis of parenchyma and cambial cells. This study focused on the impact of PWN infection on technological parameters of wood and evaluated the impact of anatomic and biochemical incidences of tree defense reactions on basic density, extractive content and moisture sorption properties of Pinus pinaster wood.Samples of infected and uninfected wood were studied.The presence of nematodes reduced wood basic density by2 % and decreased the total content of extractives in infected wood as compared with uninfected(5.98 and8.90 % of dry wood mass, respectively). Extractives in infected trees had inverse distribution along the trunk as compared with uninfected trees. The adsorption isotherms for infected and uninfected wood had similar positioning.We recorded differences(some statistically significant) in the equilibrium moisture content of infected and uninfected wood under varying environmental conditions. Despite the verified differences in wood basic density, extractive content and moisture sorption properties, the overall conclusion is that the PWN had a slight impact on these characteristics of wood.
基金Supported by the National Natural Science Foundation of China(51174091,61364013,61164013)Earlier Research Project of the State Key Development Program for Basic Research of China(2014CB360502)
文摘For measurement of component content in the extraction and separation process of praseodymium/neodymium(Pr/Nd), a soft measurement method was proposed based on modeling of ion color features, which is suitable for fast estimation of component content in production field. Feature analysis on images of the solution is conducted,which are captured from Pr/Nd extraction/separation field. H/S components in the HSI color space are selected as model inputs, so as to establish the least squares support vector machine(LSSVM) model for Nd(Pr) content,while the model parameters are determined with the GA algorithm. To improve the adaptability of the model,the adaptive iteration algorithm is used to correct parameters of the LSSVM model, on the basis of model correction strategy and new sample data. Using the field data collected from rare earth extraction production, predictive methods for component content and comparisons are given. The results indicate that the proposed method presents good adaptability and high prediction precision, so it is applicable to the fast detection of element content in the rare earth extraction.
基金Project(2012BAH18B05) supported by the Supporting Program of Ministry of Science and Technology of China
文摘Content extraction of HTML pages is the basis of the web page clustering and information retrieval,so it is necessary to eliminate cluttered information and very important to extract content of pages accurately.A novel and accurate solution for extracting content of HTML pages was proposed.First of all,the HTML page is parsed into DOM object and the IDs of all leaf nodes are generated.Secondly,the score of each leaf node is calculated and the score is adjusted according to the relationship with neighbors.Finally,the information blocks are found according to the definition,and a universal classification algorithm is used to identify the content blocks.The experimental results show that the algorithm can extract content effectively and accurately,and the recall rate and precision are 96.5% and 93.8%,respectively.
文摘Variations of wood specific gravity and extractive contents from pith to bark and from base to the top of tree were investigated in a 14-year-old commercial pulpwood species Sterculia setigera Del. Growing in savanna zone in Nigeria. Tree mean specific gravity averaged 0.37; wood at the base had significant higher specific gravity than those at the top while it increased from pith to bark. For extractive content mean value was 1.20% for wood and 1.72% for bark; i[t varied significantly between trees and from base of the tree to the top and from pith to the bark. Extractive content at the butt and breast height is more than double of the value at the top of the tree. The high extractive content at the base is similar to high specific gravity observed for wood samples from the base. Extractive content of the bast was significantly higher than that of the wood. The low specific gravity show possible suitability of the species for paper making in Nigerian paper mills. The wood of Sterculia setigera showed a significant variation between- and within-trees in the two properties considered, though the wood is light with low extractive content; it is however a potential raw material for large scale pulpwood production in Nigeria.
文摘This paper had developed and tested optimized content extraction algorithm using NLP method, TFIDF method for word of weight, VSM for information search, cosine method for similar quality calculation from learning document at the distance learning system database. This test covered following things: 1) to parse word structure at the distance learning system database documents and Cyrillic Mongolian language documents at the section, to form new documents by algorithm for identifying word stem;2) to test optimized content extraction from text material based on e-test results (key word, correct answer, base form with affix and new form formed by word stem without affix) at distance learning system, also to search key word by automatically selecting using word extraction algorithm;3) to test Boolean and probabilistic retrieval method through extended vector space retrieval method. This chapter covers: to process document content extraction retrieval algorithm, to propose recommendations query through word stem, not depending on word position based on Cyrillic Mongolian language documents distinction.
基金supported by National Key R&D Program of China(Grant No.2018YFC0830300)Science and Technology Program of Fujian,China(Grant No.2018H0035)+1 种基金Science and Technology Program of Xiamen,China(3502Z20183011)Fund of XMU-ZhangShu FinTech Joint Lab
文摘The main content of a news web page is a source of data for Natural Language Processing(NLP)and Information Retrieval(IR),which contains large quantities of valuable information.This paper proposes a method that formulates the main content extraction problem as a DOM tree node classification problem.In terms of feature extraction,we use the DOM tree node to represent HTML document and then develop multiple features by using the DOM tree node properties,such as text length,tag path,tag properties and so on.In consideration that the essence of the problem is the classification model,we use Xgboost to help select nodes.Experimental results show that the proposed approach is effective and efficient in extracting main content of new web pages,and achieves about 98%accuracy over 1083 news pages from 10 different new sites,and the average processing time per page is within 10 ms.
基金Supported by the National Natural Science Foundation of China(NSFC U1462123)PetroChina Innovation FoundationFundamental Research Funds for the Central Universities of China(22201313007)
文摘When evaluating ionic liquids (ILs) for extractive desulfurization (EDS) of fuel oils, the inevitable presence of water in the system may have a significant and in many cases strongly negative effect. However, few studies have considered this particular issue and a promoted water effect on EDS is scarcely reported. In this work, COSMO-RS was firstly employed to calculate the capacity and selectivity for EDS of various IL/H20 mixtures, which cover different IL characters and a wide water concentration range. Experiments were then conducted with a representative IL [C4MIM]IH2P04], whose stable and even promoted extraction performance with a small amount of water was suggested by COSMO-RS. Through analyses of the desulfurization ratio, the cross- solubility and the water content in the desulfurized fuel, the promoted effect of water within a certain range (〈 10 wt%) was experimentally demonstrated. Moreover, such effect of water was explained combining the viscosity, the solvent-solute interactions and the COSMO-RS based analysis.
基金Supported by the Program for Beijing Municipal Commission of Education (No.1320037010601)the 111 Project of Beijing Institute of Technologythe National Key Basic Research and Development (973) Program of China (No. 2012CB7207002)
文摘Density-based approaches in content extraction, whose task is to extract contents from Web pages, are commonly used to obtain page contents that are critical to many Web mining applications. How- ever, traditional density-based approaches cannot effectively manage pages that contain short contents and long noises. To overcome this problem, in this paper, we propose a content extraction approach for obtain- ing content from news pages that combines a segmentation-like approach and a density-based approach. A tool called BlockExtractor was developed based on this approach. BlockExtractor identifies contents in three steps. First, it looks for all Block-Level Elements (BLE) & Inline Elements (IE) blocks, which are designed to roughly segment pages into blocks. Second, it computes the densities of each BLE&IE block and its ele- ment to eliminate noises. Third, it removes all redundant BLE&IE blocks that have emerged in other pages from the same site. Compared with three other density-based approaches, our approach shows significant advantages in both precision and recall.
基金It was supported by the National Basic Research 973 Program of China under Grant No. 2013CB329604, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of Ministry of Education of China under Grant No. IRT13059, and the National Natural Science Foundation of China under Grant Nos. 61273297, 61229301 and 61503114.
文摘Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these reasons, how to design features with excellent extraction performances for massive and heterogeneous web news pages is a challenging issue. Our extensive case studies indicate that there is potential relevancy between web content layouts and their tag paths. Inspired by the observation, we design a series of tag path extraction features to extract web news. Because each feature has its own strength, we fuse all those features with the DS (Dempster-Shafer) evidence theory, and then design a content extraction method CEDS. Experimental results on both CleanEval datasets and web news pages selected randomly from well-known websites show that the Fl-score with CEDS is 8.08% and 3.08% higher than existing popular content extraction methods CETR and CEPR-TPR respectively.