This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed...This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.展开更多
As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of d...As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of digitized documents,the classification of digitized documents in real time has been identified as the primary goal of our study.A paper classification is the first stage in automating document control and efficient knowledge discovery with no or little human involvement.Artificial intelligence methods such as Deep Learning are now combined with segmentation to study and interpret those traits,which were not conceivable ten years ago.Deep learning aids in comprehending input patterns so that object classes may be predicted.The segmentation process divides the input image into separate segments for a more thorough image study.This study proposes a deep learning-enabled framework for automated document classification,which can be implemented in higher education.To further this goal,a dataset was developed that includes seven categories:Diplomas,Personal documents,Journal of Accounting of higher education diplomas,Service letters,Orders,Production orders,and Student orders.Subsequently,a deep learning model based on Conv2D layers is proposed for the document classification process.In the final part of this research,the proposed model is evaluated and compared with other machine-learning techniques.The results demonstrate that the proposed deep learning model shows high results in document categorization overtaking the other machine learning models by reaching 94.84%,94.79%,94.62%,94.43%,94.07%in accuracy,precision,recall,F-score,and AUC-ROC,respectively.The achieved results prove that the proposed deep model is acceptable to use in practice as an assistant to an office worker.展开更多
This report is a continuation of (2—5)We introduce several notions such as Skolem functions and sets of indiscernibles, saturated and atomic models, and stable theories in power in lattice-valued version. On the basi...This report is a continuation of (2—5)We introduce several notions such as Skolem functions and sets of indiscernibles, saturated and atomic models, and stable theories in power in lattice-valued version. On the basis of [2—5] Morley categoricity theorem for finite valued lattice is deduced.展开更多
The school placement processes of students from immigrant backgrounds considered to be in“difficulty”is an international concern at the intersection of works relating to special education and those concerning the sc...The school placement processes of students from immigrant backgrounds considered to be in“difficulty”is an international concern at the intersection of works relating to special education and those concerning the school experiences of students from immigrant backgrounds or racialized groups.The research problem of this article concerns the identification of these students as disabled or as having adjustment or learning difficulties.From a perspective anchored in Disability Critical Race Studies,this ethnographic study documents different interpretations of perceived difficulties made by school actors with regard to seven primary school students from immigrant backgrounds.Five interpretation types are presented:(1)medicalization by dismissal of cultural markers,(2)medicalization by professional constraint,(3)medicalization by cultural deficit,(4)precautionary wait,and(5)cultural differentialism.Our results help to shed light on the special education overrepresentation phenomenon regarding these students and to understand how ableism and(neo)racism contribute to it.展开更多
To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a...To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.展开更多
Mandarin在(pinyin:zài)is the most frequently used character in representing spatial and temporal relationship.Current studies mostly focus on its lexical meaning and syntactic structure while cognitive features o...Mandarin在(pinyin:zài)is the most frequently used character in representing spatial and temporal relationship.Current studies mostly focus on its lexical meaning and syntactic structure while cognitive features of its grammatical categories have been neglected.This paper investigates into the categorization of zài by conducting a morphosyntactic test among College English majors in China.The results show that:prototypes are organizing the grammatical categories of zài at all levels in terms of intra-categorial gradience;the semantic construal of zài construction could significantly influence the accuracy of the grammatical categorization of zài;the syntactic structure can provide viable cue for the identification of grammatical categories of zài;spatiality,temporality and the status of existing are three essential semantic features encoded by zài,the concurrence of which leads to various degree of inter-categorial vagueness,indicating a conflict between the rigid grammatical classification and the indeterminate nature of the grammatical functions of zai,suggesting the necessity to reconsider the efficacy of applying indiscriminately the Anglo-Saxon grammar into the study of Chinese spatial-temporal constructions.展开更多
In cognitive linguistics,debates on the status and functions of categorization have been a heated issue.In semantics and second language acquisition,scholars have discussed and achieved vocabulary acquisition from dif...In cognitive linguistics,debates on the status and functions of categorization have been a heated issue.In semantics and second language acquisition,scholars have discussed and achieved vocabulary acquisition from different perspectives and academic levels.Vocabulary learning exerts a fundamental role in second language vocabulary acquisition(SLVA),and it is closely related to learners’cognitive competence.However,studies on second language vocabulary acquisition under the categorization theory in cognitive linguistics have received less attention from linguists when compared with other studies.This paper employs two representative dimensions,the basic-level effect and the prototype effect,under the categorization theory to further delve into the implications on second language vocabulary acquisition.This article first provides a comprehensive introduction to the nature and the approaches of the categorization theory,and then analyzes the relations and implications for second language vocabulary acquisition under the categorization theory from the perspective of the basic-level and the prototype effects.The research results showed that the basic-level effect on SLVA is mainly on the classification of word categories distinguished from the superordinate and subordinate categories,while the prototype effect is more on understanding the complexity and use of word meaning.展开更多
To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree...To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.展开更多
The recent developments of cognitive theories may provide a better interpretation for studies of translation rather than a description.The paper tries to put categorization and metaphor into the process of translating...The recent developments of cognitive theories may provide a better interpretation for studies of translation rather than a description.The paper tries to put categorization and metaphor into the process of translating and translators’ psychology so as to produce a more powerful interpretation. [展开更多
Whether a collection of scientific data can be explained only by a unique theory or whether such data can be equally explained by multiple theories is one of the more contested issues in the history and philosophy of ...Whether a collection of scientific data can be explained only by a unique theory or whether such data can be equally explained by multiple theories is one of the more contested issues in the history and philosophy of science. This paper argues that the case for multiple explanations is strengthened by the widespread failure of models in mathematical logic to be unique, i.e., categorical. Science is taken to require replicable and explicit public knowledge; this necessitates an unambiguous language for its transmission. Mathematics has been chosen as the vehicle to transmit scientific knowledge, both because of its "unreasonable effectiveness" and because of its unambiguous nature, hence the vogue of axiomatic systems. But mathematical logic tells us that axiomatic systems need not refer to uniquely defined real structures. Hence what is accepted as science may be only one of several possibilities.展开更多
This paper proposes a new approach of feature selection based on the independent measure between features for text categorization. A fundamental hypothesis that occurrence of the terms in documents is independent of e...This paper proposes a new approach of feature selection based on the independent measure between features for text categorization. A fundamental hypothesis that occurrence of the terms in documents is independent of each other, widely used in the probabilistic models for text categorization (TC), is discussed. However, the basic hypothesis is incom plete for independence of feature set. From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to ob rain a feature subset. The selected subset is high in relevance with category and strong in independence between features, satisfies the basic hypothesis at maximum degree. Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.展开更多
This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages. It provides clues for making use of appropriate automatic classi...This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages. It provides clues for making use of appropriate automatic classifying algorithms in different fields. Finally some evaluations and summaries of these algorithms are discussed, and directions to further research have been pointed out. Key words text categorization - naive bayes - KNN - SVM - neural network CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70031010) and the Research Foundation of Beijing Institute of TechnologyBiography: SHI Yong-feng (1980-), male, Master candidate, research direction: web information mining.展开更多
BACKGROUND According to the latest American Joint Committee on Cancer and Union for International Cancer Control manuals,cystic duct cancer(CC)is categorized as a type of gallbladder cancer(GC),which has the worst pro...BACKGROUND According to the latest American Joint Committee on Cancer and Union for International Cancer Control manuals,cystic duct cancer(CC)is categorized as a type of gallbladder cancer(GC),which has the worst prognosis among all types of biliary cancers.We hypothesized that this categorization could be verified by using taxonomic methods.AIM To investigate the categorization of CC based on population-level data.METHODS Cases of biliary cancers were identified from the Surveillance,Epidemiology,and End Results 18 registries database.Together with routinely used statistical methods,three taxonomic methods,including Fisher’s discriminant,binary logistics and artificial neuron network(ANN)models,were used to clarify the categorizing problem of CC.RESULTS The T staging system of perihilar cholangiocarcinoma[a type of extrahepatic cholangiocarcinoma(EC)]better discriminated CC prognosis than that of GC.After adjusting other covariates,the hazard ratio of CC tended to be closer to that of EC,although not reaching statistical significance.To differentiate EC from GC,three taxonomic models were built and all showed good accuracies.The ANN model had an area under the receiver operating characteristic curve of 0.902.Using the three models,the majority(75.0%-77.8%)of CC cases were categorized as EC.CONCLUSION Our study suggested that CC should be categorized as a type of EC,not GC.Aggressive surgical attitude might be considered in CC cases,to see whether long-term prognosis could be immensely improved like the situation in EC.展开更多
Aiming at the importance of the analysis for public opinion on Internet, the authors propose a high-performance extraction method for public opinion. In this method, the space model for classification is adopted to de...Aiming at the importance of the analysis for public opinion on Internet, the authors propose a high-performance extraction method for public opinion. In this method, the space model for classification is adopted to describe the relationship between words and categories. The combined feature selection method is used to remove noisy words from the original feature space effectively. Then the category weight of words is calculated by the improved formula combining the frequency of words and distribution of words. Finally, the class weights of the not-categorized documents based on the category weight of words are obtained for realizing opinion extraction. Experiment results show that the method has comparatively high classification and good stability.展开更多
The scientific evidence that climate is changing due to greenhouse gas emission is now incontestable, which may put many social, biological, and geophysical systems in the world at risk. In this paper, we first identi...The scientific evidence that climate is changing due to greenhouse gas emission is now incontestable, which may put many social, biological, and geophysical systems in the world at risk. In this paper, we first identified main risks induced from or aggravated by climate change. Then we categorized them applying a new risk categorization system brought forward by Renn in a framework of International Risk Governance Council. We proposed that "uncertainty" could be treated as the classification criteria. Based on this, we established a quantitative method with fuzzy set theory, in which "confidence" and "likelihood", the main quantitative terms for expressing uncertainties in IPCC, were used as the feature parameters to construct the fuzzy membership functions of four risk types. According to the maximum principle, most climate change risks identified were classified into the appropriate risk types. In the mean time, given that not all the quantitative terms are available, a qualitative approach was also adopted as a complementary classification method. Finally, we get the preliminary results of climate change risk categorization, which might iay the foundation for the future integrated risk management of climate change.展开更多
As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have ...As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have been proposed,most of the keyword recognition algorithms are time consuming.This paper firstly uses the traffic clustering method F-DBSCAN to cluster the unknown protocol traffic.Then an improved CFSM(Closed Frequent Sequence Mining)algorithm is used to mine closed frequent sequences from the messages and identify protocol keywords.Finally,CFGM(Closed Frequent Group Mining)algorithm is proposed to explore the parallel,sequential and hierarchical relations between the protocol keywords and obtain accurate protocol message formats.Experimental results show that the proposed protocol formats extraction method is better than Apriori algorithm and Sequence alignment algorithm in terms of time complexity and it can achieve high keyword recognition accuracy.Additionally,based on the relations between the keywords,the method can obtain accurate protocol formats.Compared with the protocol formats obtained from the existing methods,our protocol format can better grasp the overall structure of target protocols and the results perform better in the application of protocol reverse engineering such as fuzzing test.展开更多
This paper provides a brief introduction to the methods for generating fuzzy categorical maps from remotely sensed images (in graphical and digital forms).This is followed by a description of the slicing process for d...This paper provides a brief introduction to the methods for generating fuzzy categorical maps from remotely sensed images (in graphical and digital forms).This is followed by a description of the slicing process for deriving fuzzy boundaries from fuzzy categorical maps,which can be based on the maximum fuzzy membership values,confusion index,or measure of entropy.Results from an empirical test preformed in an Edinburgh suburb show that fuzzy boundaries of land cover can be derived from aerial photographs and satellite images by using the three criteria with small differences,and that slicing based on the maximum fuzzy membership values is the easiest and most straightforward solution.This,in turn,implies the suitability of maintaining both a crisp classification and its underlying certainty map for deriving fuzzy boundaries at different thresholds,which is a flexible and compact management of categorical map data and their uncertainty.展开更多
A central claim of Cognitive Grammar is that basic grammatical categories,such as noun and verb,have conceptual characterizations valid for all members.The standard argument against this position is invalid.The propos...A central claim of Cognitive Grammar is that basic grammatical categories,such as noun and verb,have conceptual characterizations valid for all members.The standard argument against this position is invalid.The proposed characterizations rely only on basic cognitive abilities that are clearly evident on non-linguistic grounds.Responses are made to arguments against the proposal as well as to putative alternatives.It is shown in specific detail how the characterization offered for nouns applies to a broad range of cases.展开更多
Fires have a noteworthy role to play with regards to ecological and environmental losses in Mediterranean forests. In addition to ecological impacts, fire may create economic, social as well as cultural changes. The d...Fires have a noteworthy role to play with regards to ecological and environmental losses in Mediterranean forests. In addition to ecological impacts, fire may create economic, social as well as cultural changes. The detection of fire-scars has critical importance to help decrease losses.In the present study, forest fires recorded in Antalya, one of the most important ecological and tourist regions within the Western Mediterranean, were clustered and mapped. Since the dominant factors and devastation records derived from the cases had nominal-scaled properties, a categorical databased nonparametric clustering algorithm was performed in this evaluation. The proposed tool, k-modes algorithm,uses modes instead of means for clustering. The algorithm may be implemented quickly and does not make distributional assumptions concerning the available data. It uses a frequency-based method to update the modes of the fires.The derived modes from the maps may be useful information for local authorities to manage. In conclusion, the proposed nonparametric clustering procedure may be employed to build a decision-support system to monitor and identify fire activities and to enhance fire management efficiency.展开更多
文摘This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.
文摘As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of digitized documents,the classification of digitized documents in real time has been identified as the primary goal of our study.A paper classification is the first stage in automating document control and efficient knowledge discovery with no or little human involvement.Artificial intelligence methods such as Deep Learning are now combined with segmentation to study and interpret those traits,which were not conceivable ten years ago.Deep learning aids in comprehending input patterns so that object classes may be predicted.The segmentation process divides the input image into separate segments for a more thorough image study.This study proposes a deep learning-enabled framework for automated document classification,which can be implemented in higher education.To further this goal,a dataset was developed that includes seven categories:Diplomas,Personal documents,Journal of Accounting of higher education diplomas,Service letters,Orders,Production orders,and Student orders.Subsequently,a deep learning model based on Conv2D layers is proposed for the document classification process.In the final part of this research,the proposed model is evaluated and compared with other machine-learning techniques.The results demonstrate that the proposed deep learning model shows high results in document categorization overtaking the other machine learning models by reaching 94.84%,94.79%,94.62%,94.43%,94.07%in accuracy,precision,recall,F-score,and AUC-ROC,respectively.The achieved results prove that the proposed deep model is acceptable to use in practice as an assistant to an office worker.
文摘This report is a continuation of (2—5)We introduce several notions such as Skolem functions and sets of indiscernibles, saturated and atomic models, and stable theories in power in lattice-valued version. On the basis of [2—5] Morley categoricity theorem for finite valued lattice is deduced.
文摘The school placement processes of students from immigrant backgrounds considered to be in“difficulty”is an international concern at the intersection of works relating to special education and those concerning the school experiences of students from immigrant backgrounds or racialized groups.The research problem of this article concerns the identification of these students as disabled or as having adjustment or learning difficulties.From a perspective anchored in Disability Critical Race Studies,this ethnographic study documents different interpretations of perceived difficulties made by school actors with regard to seven primary school students from immigrant backgrounds.Five interpretation types are presented:(1)medicalization by dismissal of cultural markers,(2)medicalization by professional constraint,(3)medicalization by cultural deficit,(4)precautionary wait,and(5)cultural differentialism.Our results help to shed light on the special education overrepresentation phenomenon regarding these students and to understand how ableism and(neo)racism contribute to it.
文摘To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.
文摘Mandarin在(pinyin:zài)is the most frequently used character in representing spatial and temporal relationship.Current studies mostly focus on its lexical meaning and syntactic structure while cognitive features of its grammatical categories have been neglected.This paper investigates into the categorization of zài by conducting a morphosyntactic test among College English majors in China.The results show that:prototypes are organizing the grammatical categories of zài at all levels in terms of intra-categorial gradience;the semantic construal of zài construction could significantly influence the accuracy of the grammatical categorization of zài;the syntactic structure can provide viable cue for the identification of grammatical categories of zài;spatiality,temporality and the status of existing are three essential semantic features encoded by zài,the concurrence of which leads to various degree of inter-categorial vagueness,indicating a conflict between the rigid grammatical classification and the indeterminate nature of the grammatical functions of zai,suggesting the necessity to reconsider the efficacy of applying indiscriminately the Anglo-Saxon grammar into the study of Chinese spatial-temporal constructions.
基金“Research on the Development Path of Ideological Leadership of Ideological and Political Education in Colleges and Universities in the New Era”of the Counselor Special Research Projects of Furlong College,Hunan University of Science and Arts in 2023(Project number:FRfdy2307)。
文摘In cognitive linguistics,debates on the status and functions of categorization have been a heated issue.In semantics and second language acquisition,scholars have discussed and achieved vocabulary acquisition from different perspectives and academic levels.Vocabulary learning exerts a fundamental role in second language vocabulary acquisition(SLVA),and it is closely related to learners’cognitive competence.However,studies on second language vocabulary acquisition under the categorization theory in cognitive linguistics have received less attention from linguists when compared with other studies.This paper employs two representative dimensions,the basic-level effect and the prototype effect,under the categorization theory to further delve into the implications on second language vocabulary acquisition.This article first provides a comprehensive introduction to the nature and the approaches of the categorization theory,and then analyzes the relations and implications for second language vocabulary acquisition under the categorization theory from the perspective of the basic-level and the prototype effects.The research results showed that the basic-level effect on SLVA is mainly on the classification of word categories distinguished from the superordinate and subordinate categories,while the prototype effect is more on understanding the complexity and use of word meaning.
基金The National Natural Science Foundation of China(No.60473045)the Technology Research Project of Hebei Province(No.05213573)the Research Plan of Education Office of Hebei Province(No.2004406)
文摘To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.
文摘The recent developments of cognitive theories may provide a better interpretation for studies of translation rather than a description.The paper tries to put categorization and metaphor into the process of translating and translators’ psychology so as to produce a more powerful interpretation. [
文摘Whether a collection of scientific data can be explained only by a unique theory or whether such data can be equally explained by multiple theories is one of the more contested issues in the history and philosophy of science. This paper argues that the case for multiple explanations is strengthened by the widespread failure of models in mathematical logic to be unique, i.e., categorical. Science is taken to require replicable and explicit public knowledge; this necessitates an unambiguous language for its transmission. Mathematics has been chosen as the vehicle to transmit scientific knowledge, both because of its "unreasonable effectiveness" and because of its unambiguous nature, hence the vogue of axiomatic systems. But mathematical logic tells us that axiomatic systems need not refer to uniquely defined real structures. Hence what is accepted as science may be only one of several possibilities.
基金Supported by the National Natural Science Foun-dation of China (60373066 ,60503020) the Outstanding Young Sci-entist’s Fund(60425206) Doctor Foundatoin of Nanjing Universityof Posts and Telecommunications (2003-02)
文摘This paper proposes a new approach of feature selection based on the independent measure between features for text categorization. A fundamental hypothesis that occurrence of the terms in documents is independent of each other, widely used in the probabilistic models for text categorization (TC), is discussed. However, the basic hypothesis is incom plete for independence of feature set. From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to ob rain a feature subset. The selected subset is high in relevance with category and strong in independence between features, satisfies the basic hypothesis at maximum degree. Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.
文摘This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages. It provides clues for making use of appropriate automatic classifying algorithms in different fields. Finally some evaluations and summaries of these algorithms are discussed, and directions to further research have been pointed out. Key words text categorization - naive bayes - KNN - SVM - neural network CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70031010) and the Research Foundation of Beijing Institute of TechnologyBiography: SHI Yong-feng (1980-), male, Master candidate, research direction: web information mining.
基金Supported by Zhejiang Provincial Natural Science Foundation of China,No.LQ17H030003
文摘BACKGROUND According to the latest American Joint Committee on Cancer and Union for International Cancer Control manuals,cystic duct cancer(CC)is categorized as a type of gallbladder cancer(GC),which has the worst prognosis among all types of biliary cancers.We hypothesized that this categorization could be verified by using taxonomic methods.AIM To investigate the categorization of CC based on population-level data.METHODS Cases of biliary cancers were identified from the Surveillance,Epidemiology,and End Results 18 registries database.Together with routinely used statistical methods,three taxonomic methods,including Fisher’s discriminant,binary logistics and artificial neuron network(ANN)models,were used to clarify the categorizing problem of CC.RESULTS The T staging system of perihilar cholangiocarcinoma[a type of extrahepatic cholangiocarcinoma(EC)]better discriminated CC prognosis than that of GC.After adjusting other covariates,the hazard ratio of CC tended to be closer to that of EC,although not reaching statistical significance.To differentiate EC from GC,three taxonomic models were built and all showed good accuracies.The ANN model had an area under the receiver operating characteristic curve of 0.902.Using the three models,the majority(75.0%-77.8%)of CC cases were categorized as EC.CONCLUSION Our study suggested that CC should be categorized as a type of EC,not GC.Aggressive surgical attitude might be considered in CC cases,to see whether long-term prognosis could be immensely improved like the situation in EC.
基金Supported by the National High Technology Research and Development Program of China (2005AA147030)
文摘Aiming at the importance of the analysis for public opinion on Internet, the authors propose a high-performance extraction method for public opinion. In this method, the space model for classification is adopted to describe the relationship between words and categories. The combined feature selection method is used to remove noisy words from the original feature space effectively. Then the category weight of words is calculated by the improved formula combining the frequency of words and distribution of words. Finally, the class weights of the not-categorized documents based on the category weight of words are obtained for realizing opinion extraction. Experiment results show that the method has comparatively high classification and good stability.
基金Under the auspices of National Science & Technology Pillar Program During the 11th Five-Year Plan Period (No 2006BAD20B05)
文摘The scientific evidence that climate is changing due to greenhouse gas emission is now incontestable, which may put many social, biological, and geophysical systems in the world at risk. In this paper, we first identified main risks induced from or aggravated by climate change. Then we categorized them applying a new risk categorization system brought forward by Renn in a framework of International Risk Governance Council. We proposed that "uncertainty" could be treated as the classification criteria. Based on this, we established a quantitative method with fuzzy set theory, in which "confidence" and "likelihood", the main quantitative terms for expressing uncertainties in IPCC, were used as the feature parameters to construct the fuzzy membership functions of four risk types. According to the maximum principle, most climate change risks identified were classified into the appropriate risk types. In the mean time, given that not all the quantitative terms are available, a qualitative approach was also adopted as a complementary classification method. Finally, we get the preliminary results of climate change risk categorization, which might iay the foundation for the future integrated risk management of climate change.
基金supported by the National Key R&D Subsidized Project with 2017YFB0802900.
文摘As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have been proposed,most of the keyword recognition algorithms are time consuming.This paper firstly uses the traffic clustering method F-DBSCAN to cluster the unknown protocol traffic.Then an improved CFSM(Closed Frequent Sequence Mining)algorithm is used to mine closed frequent sequences from the messages and identify protocol keywords.Finally,CFGM(Closed Frequent Group Mining)algorithm is proposed to explore the parallel,sequential and hierarchical relations between the protocol keywords and obtain accurate protocol message formats.Experimental results show that the proposed protocol formats extraction method is better than Apriori algorithm and Sequence alignment algorithm in terms of time complexity and it can achieve high keyword recognition accuracy.Additionally,based on the relations between the keywords,the method can obtain accurate protocol formats.Compared with the protocol formats obtained from the existing methods,our protocol format can better grasp the overall structure of target protocols and the results perform better in the application of protocol reverse engineering such as fuzzing test.
文摘This paper provides a brief introduction to the methods for generating fuzzy categorical maps from remotely sensed images (in graphical and digital forms).This is followed by a description of the slicing process for deriving fuzzy boundaries from fuzzy categorical maps,which can be based on the maximum fuzzy membership values,confusion index,or measure of entropy.Results from an empirical test preformed in an Edinburgh suburb show that fuzzy boundaries of land cover can be derived from aerial photographs and satellite images by using the three criteria with small differences,and that slicing based on the maximum fuzzy membership values is the easiest and most straightforward solution.This,in turn,implies the suitability of maintaining both a crisp classification and its underlying certainty map for deriving fuzzy boundaries at different thresholds,which is a flexible and compact management of categorical map data and their uncertainty.
文摘A central claim of Cognitive Grammar is that basic grammatical categories,such as noun and verb,have conceptual characterizations valid for all members.The standard argument against this position is invalid.The proposed characterizations rely only on basic cognitive abilities that are clearly evident on non-linguistic grounds.Responses are made to arguments against the proposal as well as to putative alternatives.It is shown in specific detail how the characterization offered for nouns applies to a broad range of cases.
文摘Fires have a noteworthy role to play with regards to ecological and environmental losses in Mediterranean forests. In addition to ecological impacts, fire may create economic, social as well as cultural changes. The detection of fire-scars has critical importance to help decrease losses.In the present study, forest fires recorded in Antalya, one of the most important ecological and tourist regions within the Western Mediterranean, were clustered and mapped. Since the dominant factors and devastation records derived from the cases had nominal-scaled properties, a categorical databased nonparametric clustering algorithm was performed in this evaluation. The proposed tool, k-modes algorithm,uses modes instead of means for clustering. The algorithm may be implemented quickly and does not make distributional assumptions concerning the available data. It uses a frequency-based method to update the modes of the fires.The derived modes from the maps may be useful information for local authorities to manage. In conclusion, the proposed nonparametric clustering procedure may be employed to build a decision-support system to monitor and identify fire activities and to enhance fire management efficiency.