The China’s Conversion Cropland to Forest Program(CCFP)is one of the largest national ecological construction programs,which has effectively improved ecological environment and produced large ecological benefi t.To p...The China’s Conversion Cropland to Forest Program(CCFP)is one of the largest national ecological construction programs,which has effectively improved ecological environment and produced large ecological benefi t.To provide references for further improving ecological benefi t of CCFP,we analyzed the features,differences and relationships of the categorized forest ecological“benefi t value”(B-V)s in 3 kinds of forest restoration ways in different regions in CCFP,using the data of Chinese Forest Ecosystem Research Network(CFERN)from 1999 to 2013 and the methods of the national standards of(LY/T1606-2003),(LY/T1721-2008)and(LY/T1952-2011).The result showed that annual B-Vs of unit area varied in the range of 3.5-10.0 e4 RMBs/hm2·a.Water conservation B-Vs and species conservation B-Vs are the 2 largest constituents,nutrient accumulation B-V was the least in total B-Vs.The B-Vs performed inconsistently among the forest restoration ways and different regions.The rank of average annual total B-Vs of unit area from high to low was“hillside forest conservation”,“returning cropland to forest”,“afforestation on suitable barren hills and wasteland”.Species conservation B-Vs and water conservation B-Vs in southern regions were higher than that of northern and northwestern regions in China.The hot and rainy regions could produce higher species conservation B-Vs.The regression analysis indicated that water conservation B-Vs had signifi cantly positive correlation with the relevant total B-Vs and positive correlation with the relevant atmosphere purification B-Vs whether in regional or in unit area scale.Unit area species conservation B-V was negatively correlated with the relevant nutrient accumulation B-Vs except the way of“afforestation on suitable barren hills and wasteland”.Regional total species conservation B-Vs had signifi cantly negative correlation with its relevant nutrient accumulation B-Vs except“hillside forest conservation”way.We suggest that suitable forest restoration ways must be selective according to the regional specifi c,B-V features and local ecological goals.展开更多
Don't be surprisod if someone tolls you that you can win a world championship by playing online games.Competitive online gaming,or eSports,has debuted as a demonstration sport at the 18th Asian Games held in Jakarta.
Objective:To explore the complex prescription compatibility law of the cold and hot nature of Mahuang Decoction(麻黄汤,MHD) and Maxing Shigan Decoction(麻杏石甘汤,MXSGD),both categorized formulas but with differe...Objective:To explore the complex prescription compatibility law of the cold and hot nature of Mahuang Decoction(麻黄汤,MHD) and Maxing Shigan Decoction(麻杏石甘汤,MXSGD),both categorized formulas but with different hot/cold natures.Methods:Oxygen consumption of mice was determined among three groups:MHD,MXSGD and the control;a cold-hot pad differentiating assay was used to observe the variability of temperature tropism among the groups of mice which was treated with MHD,MXSGD,and their compositions. Meanwhile,the total anti-oxidant capability(T-AOC) activity were detected.Results:After administration of MHD, the mice showed increased oxygen consumption(P0.01).Compared with MHD group,the remaining rate of MXSGD mice on the hot pad was found to be significantly increased with the cold-hot pad differentiating assay (P0.05).There was no significant difference(P0.05) among the remaining rates of MXSGD,MXSGD with high dose Gypsum Fibrosum(MXHGF) group,and MXSGD with low dose Gypsum Fibrosum(MXLGF) group mice.Compared with the MHD group,T-AOC activity of the mice in the Consensus Compositons group was significantly decreased(P=0.0494).Compared with the MXSGD group,T-AOC activity of Gypsum Fibrosum (GF) group was increased significantly(P=0.0013).Conclusions:The differences in cold and hot nature could be represented objectively between MHD with a hot nature and MXSGD with a cold nature.The reason may be the Gypsum Fibrosum which decreased the efficacy of the consensus compositions.However,increasing or decreasing the dose of Gypsum Fibrosum will not change the cold and hot nature of MXSGD.展开更多
The school placement processes of students from immigrant backgrounds considered to be in“difficulty”is an international concern at the intersection of works relating to special education and those concerning the sc...The school placement processes of students from immigrant backgrounds considered to be in“difficulty”is an international concern at the intersection of works relating to special education and those concerning the school experiences of students from immigrant backgrounds or racialized groups.The research problem of this article concerns the identification of these students as disabled or as having adjustment or learning difficulties.From a perspective anchored in Disability Critical Race Studies,this ethnographic study documents different interpretations of perceived difficulties made by school actors with regard to seven primary school students from immigrant backgrounds.Five interpretation types are presented:(1)medicalization by dismissal of cultural markers,(2)medicalization by professional constraint,(3)medicalization by cultural deficit,(4)precautionary wait,and(5)cultural differentialism.Our results help to shed light on the special education overrepresentation phenomenon regarding these students and to understand how ableism and(neo)racism contribute to it.展开更多
To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a...To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.展开更多
Mandarin在(pinyin:zài)is the most frequently used character in representing spatial and temporal relationship.Current studies mostly focus on its lexical meaning and syntactic structure while cognitive features o...Mandarin在(pinyin:zài)is the most frequently used character in representing spatial and temporal relationship.Current studies mostly focus on its lexical meaning and syntactic structure while cognitive features of its grammatical categories have been neglected.This paper investigates into the categorization of zài by conducting a morphosyntactic test among College English majors in China.The results show that:prototypes are organizing the grammatical categories of zài at all levels in terms of intra-categorial gradience;the semantic construal of zài construction could significantly influence the accuracy of the grammatical categorization of zài;the syntactic structure can provide viable cue for the identification of grammatical categories of zài;spatiality,temporality and the status of existing are three essential semantic features encoded by zài,the concurrence of which leads to various degree of inter-categorial vagueness,indicating a conflict between the rigid grammatical classification and the indeterminate nature of the grammatical functions of zai,suggesting the necessity to reconsider the efficacy of applying indiscriminately the Anglo-Saxon grammar into the study of Chinese spatial-temporal constructions.展开更多
In cognitive linguistics,debates on the status and functions of categorization have been a heated issue.In semantics and second language acquisition,scholars have discussed and achieved vocabulary acquisition from dif...In cognitive linguistics,debates on the status and functions of categorization have been a heated issue.In semantics and second language acquisition,scholars have discussed and achieved vocabulary acquisition from different perspectives and academic levels.Vocabulary learning exerts a fundamental role in second language vocabulary acquisition(SLVA),and it is closely related to learners’cognitive competence.However,studies on second language vocabulary acquisition under the categorization theory in cognitive linguistics have received less attention from linguists when compared with other studies.This paper employs two representative dimensions,the basic-level effect and the prototype effect,under the categorization theory to further delve into the implications on second language vocabulary acquisition.This article first provides a comprehensive introduction to the nature and the approaches of the categorization theory,and then analyzes the relations and implications for second language vocabulary acquisition under the categorization theory from the perspective of the basic-level and the prototype effects.The research results showed that the basic-level effect on SLVA is mainly on the classification of word categories distinguished from the superordinate and subordinate categories,while the prototype effect is more on understanding the complexity and use of word meaning.展开更多
The recent developments of cognitive theories may provide a better interpretation for studies of translation rather than a description.The paper tries to put categorization and metaphor into the process of translating...The recent developments of cognitive theories may provide a better interpretation for studies of translation rather than a description.The paper tries to put categorization and metaphor into the process of translating and translators’ psychology so as to produce a more powerful interpretation. [展开更多
This paper proposes a new approach of feature selection based on the independent measure between features for text categorization. A fundamental hypothesis that occurrence of the terms in documents is independent of e...This paper proposes a new approach of feature selection based on the independent measure between features for text categorization. A fundamental hypothesis that occurrence of the terms in documents is independent of each other, widely used in the probabilistic models for text categorization (TC), is discussed. However, the basic hypothesis is incom plete for independence of feature set. From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to ob rain a feature subset. The selected subset is high in relevance with category and strong in independence between features, satisfies the basic hypothesis at maximum degree. Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.展开更多
This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages. It provides clues for making use of appropriate automatic classi...This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages. It provides clues for making use of appropriate automatic classifying algorithms in different fields. Finally some evaluations and summaries of these algorithms are discussed, and directions to further research have been pointed out. Key words text categorization - naive bayes - KNN - SVM - neural network CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70031010) and the Research Foundation of Beijing Institute of TechnologyBiography: SHI Yong-feng (1980-), male, Master candidate, research direction: web information mining.展开更多
BACKGROUND According to the latest American Joint Committee on Cancer and Union for International Cancer Control manuals,cystic duct cancer(CC)is categorized as a type of gallbladder cancer(GC),which has the worst pro...BACKGROUND According to the latest American Joint Committee on Cancer and Union for International Cancer Control manuals,cystic duct cancer(CC)is categorized as a type of gallbladder cancer(GC),which has the worst prognosis among all types of biliary cancers.We hypothesized that this categorization could be verified by using taxonomic methods.AIM To investigate the categorization of CC based on population-level data.METHODS Cases of biliary cancers were identified from the Surveillance,Epidemiology,and End Results 18 registries database.Together with routinely used statistical methods,three taxonomic methods,including Fisher’s discriminant,binary logistics and artificial neuron network(ANN)models,were used to clarify the categorizing problem of CC.RESULTS The T staging system of perihilar cholangiocarcinoma[a type of extrahepatic cholangiocarcinoma(EC)]better discriminated CC prognosis than that of GC.After adjusting other covariates,the hazard ratio of CC tended to be closer to that of EC,although not reaching statistical significance.To differentiate EC from GC,three taxonomic models were built and all showed good accuracies.The ANN model had an area under the receiver operating characteristic curve of 0.902.Using the three models,the majority(75.0%-77.8%)of CC cases were categorized as EC.CONCLUSION Our study suggested that CC should be categorized as a type of EC,not GC.Aggressive surgical attitude might be considered in CC cases,to see whether long-term prognosis could be immensely improved like the situation in EC.展开更多
Aiming at the importance of the analysis for public opinion on Internet, the authors propose a high-performance extraction method for public opinion. In this method, the space model for classification is adopted to de...Aiming at the importance of the analysis for public opinion on Internet, the authors propose a high-performance extraction method for public opinion. In this method, the space model for classification is adopted to describe the relationship between words and categories. The combined feature selection method is used to remove noisy words from the original feature space effectively. Then the category weight of words is calculated by the improved formula combining the frequency of words and distribution of words. Finally, the class weights of the not-categorized documents based on the category weight of words are obtained for realizing opinion extraction. Experiment results show that the method has comparatively high classification and good stability.展开更多
As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have ...As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have been proposed,most of the keyword recognition algorithms are time consuming.This paper firstly uses the traffic clustering method F-DBSCAN to cluster the unknown protocol traffic.Then an improved CFSM(Closed Frequent Sequence Mining)algorithm is used to mine closed frequent sequences from the messages and identify protocol keywords.Finally,CFGM(Closed Frequent Group Mining)algorithm is proposed to explore the parallel,sequential and hierarchical relations between the protocol keywords and obtain accurate protocol message formats.Experimental results show that the proposed protocol formats extraction method is better than Apriori algorithm and Sequence alignment algorithm in terms of time complexity and it can achieve high keyword recognition accuracy.Additionally,based on the relations between the keywords,the method can obtain accurate protocol formats.Compared with the protocol formats obtained from the existing methods,our protocol format can better grasp the overall structure of target protocols and the results perform better in the application of protocol reverse engineering such as fuzzing test.展开更多
This paper provides a brief introduction to the methods for generating fuzzy categorical maps from remotely sensed images (in graphical and digital forms).This is followed by a description of the slicing process for d...This paper provides a brief introduction to the methods for generating fuzzy categorical maps from remotely sensed images (in graphical and digital forms).This is followed by a description of the slicing process for deriving fuzzy boundaries from fuzzy categorical maps,which can be based on the maximum fuzzy membership values,confusion index,or measure of entropy.Results from an empirical test preformed in an Edinburgh suburb show that fuzzy boundaries of land cover can be derived from aerial photographs and satellite images by using the three criteria with small differences,and that slicing based on the maximum fuzzy membership values is the easiest and most straightforward solution.This,in turn,implies the suitability of maintaining both a crisp classification and its underlying certainty map for deriving fuzzy boundaries at different thresholds,which is a flexible and compact management of categorical map data and their uncertainty.展开更多
A central claim of Cognitive Grammar is that basic grammatical categories,such as noun and verb,have conceptual characterizations valid for all members.The standard argument against this position is invalid.The propos...A central claim of Cognitive Grammar is that basic grammatical categories,such as noun and verb,have conceptual characterizations valid for all members.The standard argument against this position is invalid.The proposed characterizations rely only on basic cognitive abilities that are clearly evident on non-linguistic grounds.Responses are made to arguments against the proposal as well as to putative alternatives.It is shown in specific detail how the characterization offered for nouns applies to a broad range of cases.展开更多
Fires have a noteworthy role to play with regards to ecological and environmental losses in Mediterranean forests. In addition to ecological impacts, fire may create economic, social as well as cultural changes. The d...Fires have a noteworthy role to play with regards to ecological and environmental losses in Mediterranean forests. In addition to ecological impacts, fire may create economic, social as well as cultural changes. The detection of fire-scars has critical importance to help decrease losses.In the present study, forest fires recorded in Antalya, one of the most important ecological and tourist regions within the Western Mediterranean, were clustered and mapped. Since the dominant factors and devastation records derived from the cases had nominal-scaled properties, a categorical databased nonparametric clustering algorithm was performed in this evaluation. The proposed tool, k-modes algorithm,uses modes instead of means for clustering. The algorithm may be implemented quickly and does not make distributional assumptions concerning the available data. It uses a frequency-based method to update the modes of the fires.The derived modes from the maps may be useful information for local authorities to manage. In conclusion, the proposed nonparametric clustering procedure may be employed to build a decision-support system to monitor and identify fire activities and to enhance fire management efficiency.展开更多
There is a lack of information about the factors responsible for the effectiveness of environmental policies in Brazilian agriculture.This study aimed at identifying the perception and practices of agrarian profession...There is a lack of information about the factors responsible for the effectiveness of environmental policies in Brazilian agriculture.This study aimed at identifying the perception and practices of agrarian professionals.The data analysis was carried out using a survey and methodological approaches focusing on environmental complexity and categorization of environmental actions.Quantitative analysis was based on descriptive statistics.Atmospheric problems were perceived as the main problems for the current and next two generations,while hydrological problems were indicated as those deserving most urgent solutions.On the other hand,the main developed actions and those planned to be carried out were classified within the responsibility category.Because of the reductionist perceptions,introduction of the concept of a socio-ecological system is indicated by means of methodological interventions during the development of agrarian professionals;also,in order to stimulate actions related to the competence and citizenship category,a methodological intervention focusing on resilience thinking is proposed.Typical actions of individuals with either reductionist or complex conceptions of the environment can be captured and,therefore,educational strategies can be traced based on the profiles obtained.展开更多
To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although havin...To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.展开更多
Accurate estimation of dew point temperature(Tdew)plays a very important role in the fields of water resource management,agricultural engineering,climatology and energy utilization.However,there are few studies on the...Accurate estimation of dew point temperature(Tdew)plays a very important role in the fields of water resource management,agricultural engineering,climatology and energy utilization.However,there are few studies on the applicability of local Tdew algorithms at regional scales.This study evaluated the performance of a new machine learning algorithm,i.e.,gradient boosting on decision trees with categorical features support(Cat Boost)to estimate daily Tdew using limited local and cross-station meteorological data.The random forests(RF)algorithm was also assessed for comparison.Daily meteorological data from 2016 to 2019,including maximum,minimum and average temperature(Tmax,Tmin and Tmean),maximum,minimum and average relative humidity(RHmax,RHmin and RHmean),maximum,minimum and average global solar radiation(Rsmax,Rsmin and Rsmean)from three weather stations in Hunan of China were used to evaluate the CatBoost and RF algorithms.The results showed that both algorithms achieved satisfactory estimation accuracy at the target stations(on average RMSE=1.020℃,R^(2)=0.969,MAE=0.718℃and NRMSE=0.087)in the absence of complete meteorological parameters(with only temperature data as input).The Cat Boost algorithm(on average RMSE=1.900℃and R^(2)=0.835)was better than the RF algorithm(on average RMSE=2.214℃andR^(2)=0.828).The accuracy and stability of the CatBoost and RF algorithms were positively correlated with the number of input parameters,and the three-parameter algorithms achieved higher estimation accuracy than the two-parameter algorithms.The developed methodology is helpful to predict Tdew at regional scale.展开更多
基金Hebei Provincial Science&Technology Supporting Program(No.15227652D)Guided by Observation Methodology for Long-term Forest Ecosystem Research of National Standards of the People’s Republic of China(GB/T 33027-2016).
文摘The China’s Conversion Cropland to Forest Program(CCFP)is one of the largest national ecological construction programs,which has effectively improved ecological environment and produced large ecological benefi t.To provide references for further improving ecological benefi t of CCFP,we analyzed the features,differences and relationships of the categorized forest ecological“benefi t value”(B-V)s in 3 kinds of forest restoration ways in different regions in CCFP,using the data of Chinese Forest Ecosystem Research Network(CFERN)from 1999 to 2013 and the methods of the national standards of(LY/T1606-2003),(LY/T1721-2008)and(LY/T1952-2011).The result showed that annual B-Vs of unit area varied in the range of 3.5-10.0 e4 RMBs/hm2·a.Water conservation B-Vs and species conservation B-Vs are the 2 largest constituents,nutrient accumulation B-V was the least in total B-Vs.The B-Vs performed inconsistently among the forest restoration ways and different regions.The rank of average annual total B-Vs of unit area from high to low was“hillside forest conservation”,“returning cropland to forest”,“afforestation on suitable barren hills and wasteland”.Species conservation B-Vs and water conservation B-Vs in southern regions were higher than that of northern and northwestern regions in China.The hot and rainy regions could produce higher species conservation B-Vs.The regression analysis indicated that water conservation B-Vs had signifi cantly positive correlation with the relevant total B-Vs and positive correlation with the relevant atmosphere purification B-Vs whether in regional or in unit area scale.Unit area species conservation B-V was negatively correlated with the relevant nutrient accumulation B-Vs except the way of“afforestation on suitable barren hills and wasteland”.Regional total species conservation B-Vs had signifi cantly negative correlation with its relevant nutrient accumulation B-Vs except“hillside forest conservation”way.We suggest that suitable forest restoration ways must be selective according to the regional specifi c,B-V features and local ecological goals.
文摘Don't be surprisod if someone tolls you that you can win a world championship by playing online games.Competitive online gaming,or eSports,has debuted as a demonstration sport at the 18th Asian Games held in Jakarta.
基金Supported by the Major State Basic Research Development Program of China(973 Program,No.2007CB512607)the National Science Fund for Distinguished Young Scholars (No.30625042)National Science and Technology Major Project of the Ministry of Science and Technology of China(No. 2009ZX10005-017)
文摘Objective:To explore the complex prescription compatibility law of the cold and hot nature of Mahuang Decoction(麻黄汤,MHD) and Maxing Shigan Decoction(麻杏石甘汤,MXSGD),both categorized formulas but with different hot/cold natures.Methods:Oxygen consumption of mice was determined among three groups:MHD,MXSGD and the control;a cold-hot pad differentiating assay was used to observe the variability of temperature tropism among the groups of mice which was treated with MHD,MXSGD,and their compositions. Meanwhile,the total anti-oxidant capability(T-AOC) activity were detected.Results:After administration of MHD, the mice showed increased oxygen consumption(P0.01).Compared with MHD group,the remaining rate of MXSGD mice on the hot pad was found to be significantly increased with the cold-hot pad differentiating assay (P0.05).There was no significant difference(P0.05) among the remaining rates of MXSGD,MXSGD with high dose Gypsum Fibrosum(MXHGF) group,and MXSGD with low dose Gypsum Fibrosum(MXLGF) group mice.Compared with the MHD group,T-AOC activity of the mice in the Consensus Compositons group was significantly decreased(P=0.0494).Compared with the MXSGD group,T-AOC activity of Gypsum Fibrosum (GF) group was increased significantly(P=0.0013).Conclusions:The differences in cold and hot nature could be represented objectively between MHD with a hot nature and MXSGD with a cold nature.The reason may be the Gypsum Fibrosum which decreased the efficacy of the consensus compositions.However,increasing or decreasing the dose of Gypsum Fibrosum will not change the cold and hot nature of MXSGD.
文摘The school placement processes of students from immigrant backgrounds considered to be in“difficulty”is an international concern at the intersection of works relating to special education and those concerning the school experiences of students from immigrant backgrounds or racialized groups.The research problem of this article concerns the identification of these students as disabled or as having adjustment or learning difficulties.From a perspective anchored in Disability Critical Race Studies,this ethnographic study documents different interpretations of perceived difficulties made by school actors with regard to seven primary school students from immigrant backgrounds.Five interpretation types are presented:(1)medicalization by dismissal of cultural markers,(2)medicalization by professional constraint,(3)medicalization by cultural deficit,(4)precautionary wait,and(5)cultural differentialism.Our results help to shed light on the special education overrepresentation phenomenon regarding these students and to understand how ableism and(neo)racism contribute to it.
文摘To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.
文摘Mandarin在(pinyin:zài)is the most frequently used character in representing spatial and temporal relationship.Current studies mostly focus on its lexical meaning and syntactic structure while cognitive features of its grammatical categories have been neglected.This paper investigates into the categorization of zài by conducting a morphosyntactic test among College English majors in China.The results show that:prototypes are organizing the grammatical categories of zài at all levels in terms of intra-categorial gradience;the semantic construal of zài construction could significantly influence the accuracy of the grammatical categorization of zài;the syntactic structure can provide viable cue for the identification of grammatical categories of zài;spatiality,temporality and the status of existing are three essential semantic features encoded by zài,the concurrence of which leads to various degree of inter-categorial vagueness,indicating a conflict between the rigid grammatical classification and the indeterminate nature of the grammatical functions of zai,suggesting the necessity to reconsider the efficacy of applying indiscriminately the Anglo-Saxon grammar into the study of Chinese spatial-temporal constructions.
基金“Research on the Development Path of Ideological Leadership of Ideological and Political Education in Colleges and Universities in the New Era”of the Counselor Special Research Projects of Furlong College,Hunan University of Science and Arts in 2023(Project number:FRfdy2307)。
文摘In cognitive linguistics,debates on the status and functions of categorization have been a heated issue.In semantics and second language acquisition,scholars have discussed and achieved vocabulary acquisition from different perspectives and academic levels.Vocabulary learning exerts a fundamental role in second language vocabulary acquisition(SLVA),and it is closely related to learners’cognitive competence.However,studies on second language vocabulary acquisition under the categorization theory in cognitive linguistics have received less attention from linguists when compared with other studies.This paper employs two representative dimensions,the basic-level effect and the prototype effect,under the categorization theory to further delve into the implications on second language vocabulary acquisition.This article first provides a comprehensive introduction to the nature and the approaches of the categorization theory,and then analyzes the relations and implications for second language vocabulary acquisition under the categorization theory from the perspective of the basic-level and the prototype effects.The research results showed that the basic-level effect on SLVA is mainly on the classification of word categories distinguished from the superordinate and subordinate categories,while the prototype effect is more on understanding the complexity and use of word meaning.
文摘The recent developments of cognitive theories may provide a better interpretation for studies of translation rather than a description.The paper tries to put categorization and metaphor into the process of translating and translators’ psychology so as to produce a more powerful interpretation. [
基金Supported by the National Natural Science Foun-dation of China (60373066 ,60503020) the Outstanding Young Sci-entist’s Fund(60425206) Doctor Foundatoin of Nanjing Universityof Posts and Telecommunications (2003-02)
文摘This paper proposes a new approach of feature selection based on the independent measure between features for text categorization. A fundamental hypothesis that occurrence of the terms in documents is independent of each other, widely used in the probabilistic models for text categorization (TC), is discussed. However, the basic hypothesis is incom plete for independence of feature set. From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to ob rain a feature subset. The selected subset is high in relevance with category and strong in independence between features, satisfies the basic hypothesis at maximum degree. Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.
文摘This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages. It provides clues for making use of appropriate automatic classifying algorithms in different fields. Finally some evaluations and summaries of these algorithms are discussed, and directions to further research have been pointed out. Key words text categorization - naive bayes - KNN - SVM - neural network CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70031010) and the Research Foundation of Beijing Institute of TechnologyBiography: SHI Yong-feng (1980-), male, Master candidate, research direction: web information mining.
基金Supported by Zhejiang Provincial Natural Science Foundation of China,No.LQ17H030003
文摘BACKGROUND According to the latest American Joint Committee on Cancer and Union for International Cancer Control manuals,cystic duct cancer(CC)is categorized as a type of gallbladder cancer(GC),which has the worst prognosis among all types of biliary cancers.We hypothesized that this categorization could be verified by using taxonomic methods.AIM To investigate the categorization of CC based on population-level data.METHODS Cases of biliary cancers were identified from the Surveillance,Epidemiology,and End Results 18 registries database.Together with routinely used statistical methods,three taxonomic methods,including Fisher’s discriminant,binary logistics and artificial neuron network(ANN)models,were used to clarify the categorizing problem of CC.RESULTS The T staging system of perihilar cholangiocarcinoma[a type of extrahepatic cholangiocarcinoma(EC)]better discriminated CC prognosis than that of GC.After adjusting other covariates,the hazard ratio of CC tended to be closer to that of EC,although not reaching statistical significance.To differentiate EC from GC,three taxonomic models were built and all showed good accuracies.The ANN model had an area under the receiver operating characteristic curve of 0.902.Using the three models,the majority(75.0%-77.8%)of CC cases were categorized as EC.CONCLUSION Our study suggested that CC should be categorized as a type of EC,not GC.Aggressive surgical attitude might be considered in CC cases,to see whether long-term prognosis could be immensely improved like the situation in EC.
基金Supported by the National High Technology Research and Development Program of China (2005AA147030)
文摘Aiming at the importance of the analysis for public opinion on Internet, the authors propose a high-performance extraction method for public opinion. In this method, the space model for classification is adopted to describe the relationship between words and categories. The combined feature selection method is used to remove noisy words from the original feature space effectively. Then the category weight of words is calculated by the improved formula combining the frequency of words and distribution of words. Finally, the class weights of the not-categorized documents based on the category weight of words are obtained for realizing opinion extraction. Experiment results show that the method has comparatively high classification and good stability.
基金supported by the National Key R&D Subsidized Project with 2017YFB0802900.
文摘As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have been proposed,most of the keyword recognition algorithms are time consuming.This paper firstly uses the traffic clustering method F-DBSCAN to cluster the unknown protocol traffic.Then an improved CFSM(Closed Frequent Sequence Mining)algorithm is used to mine closed frequent sequences from the messages and identify protocol keywords.Finally,CFGM(Closed Frequent Group Mining)algorithm is proposed to explore the parallel,sequential and hierarchical relations between the protocol keywords and obtain accurate protocol message formats.Experimental results show that the proposed protocol formats extraction method is better than Apriori algorithm and Sequence alignment algorithm in terms of time complexity and it can achieve high keyword recognition accuracy.Additionally,based on the relations between the keywords,the method can obtain accurate protocol formats.Compared with the protocol formats obtained from the existing methods,our protocol format can better grasp the overall structure of target protocols and the results perform better in the application of protocol reverse engineering such as fuzzing test.
文摘This paper provides a brief introduction to the methods for generating fuzzy categorical maps from remotely sensed images (in graphical and digital forms).This is followed by a description of the slicing process for deriving fuzzy boundaries from fuzzy categorical maps,which can be based on the maximum fuzzy membership values,confusion index,or measure of entropy.Results from an empirical test preformed in an Edinburgh suburb show that fuzzy boundaries of land cover can be derived from aerial photographs and satellite images by using the three criteria with small differences,and that slicing based on the maximum fuzzy membership values is the easiest and most straightforward solution.This,in turn,implies the suitability of maintaining both a crisp classification and its underlying certainty map for deriving fuzzy boundaries at different thresholds,which is a flexible and compact management of categorical map data and their uncertainty.
文摘A central claim of Cognitive Grammar is that basic grammatical categories,such as noun and verb,have conceptual characterizations valid for all members.The standard argument against this position is invalid.The proposed characterizations rely only on basic cognitive abilities that are clearly evident on non-linguistic grounds.Responses are made to arguments against the proposal as well as to putative alternatives.It is shown in specific detail how the characterization offered for nouns applies to a broad range of cases.
文摘Fires have a noteworthy role to play with regards to ecological and environmental losses in Mediterranean forests. In addition to ecological impacts, fire may create economic, social as well as cultural changes. The detection of fire-scars has critical importance to help decrease losses.In the present study, forest fires recorded in Antalya, one of the most important ecological and tourist regions within the Western Mediterranean, were clustered and mapped. Since the dominant factors and devastation records derived from the cases had nominal-scaled properties, a categorical databased nonparametric clustering algorithm was performed in this evaluation. The proposed tool, k-modes algorithm,uses modes instead of means for clustering. The algorithm may be implemented quickly and does not make distributional assumptions concerning the available data. It uses a frequency-based method to update the modes of the fires.The derived modes from the maps may be useful information for local authorities to manage. In conclusion, the proposed nonparametric clustering procedure may be employed to build a decision-support system to monitor and identify fire activities and to enhance fire management efficiency.
基金FAPEMIG and CNPq for financial support,studentships(to M.R.F.and S.V.B.G.M.),and fellowships(to R.L.G.M.and N.V.)
文摘There is a lack of information about the factors responsible for the effectiveness of environmental policies in Brazilian agriculture.This study aimed at identifying the perception and practices of agrarian professionals.The data analysis was carried out using a survey and methodological approaches focusing on environmental complexity and categorization of environmental actions.Quantitative analysis was based on descriptive statistics.Atmospheric problems were perceived as the main problems for the current and next two generations,while hydrological problems were indicated as those deserving most urgent solutions.On the other hand,the main developed actions and those planned to be carried out were classified within the responsibility category.Because of the reductionist perceptions,introduction of the concept of a socio-ecological system is indicated by means of methodological interventions during the development of agrarian professionals;also,in order to stimulate actions related to the competence and citizenship category,a methodological intervention focusing on resilience thinking is proposed.Typical actions of individuals with either reductionist or complex conceptions of the environment can be captured and,therefore,educational strategies can be traced based on the profiles obtained.
文摘To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.
基金the Shandong Provincial Natural Science Fund(ZR2020ME254 and ZR2020QD061).
文摘Accurate estimation of dew point temperature(Tdew)plays a very important role in the fields of water resource management,agricultural engineering,climatology and energy utilization.However,there are few studies on the applicability of local Tdew algorithms at regional scales.This study evaluated the performance of a new machine learning algorithm,i.e.,gradient boosting on decision trees with categorical features support(Cat Boost)to estimate daily Tdew using limited local and cross-station meteorological data.The random forests(RF)algorithm was also assessed for comparison.Daily meteorological data from 2016 to 2019,including maximum,minimum and average temperature(Tmax,Tmin and Tmean),maximum,minimum and average relative humidity(RHmax,RHmin and RHmean),maximum,minimum and average global solar radiation(Rsmax,Rsmin and Rsmean)from three weather stations in Hunan of China were used to evaluate the CatBoost and RF algorithms.The results showed that both algorithms achieved satisfactory estimation accuracy at the target stations(on average RMSE=1.020℃,R^(2)=0.969,MAE=0.718℃and NRMSE=0.087)in the absence of complete meteorological parameters(with only temperature data as input).The Cat Boost algorithm(on average RMSE=1.900℃and R^(2)=0.835)was better than the RF algorithm(on average RMSE=2.214℃andR^(2)=0.828).The accuracy and stability of the CatBoost and RF algorithms were positively correlated with the number of input parameters,and the three-parameter algorithms achieved higher estimation accuracy than the two-parameter algorithms.The developed methodology is helpful to predict Tdew at regional scale.