In order to provide predictable runtime performante for text categorization (TC) systems, an innovative system design method is proposed for soft real time TC systems. An analyzable mathematical model is established...In order to provide predictable runtime performante for text categorization (TC) systems, an innovative system design method is proposed for soft real time TC systems. An analyzable mathematical model is established to approximately describe the nonlinear and time-varying TC systems. According to this mathematical model, the feedback control theory is adopted to prove the system's stableness and zero steady state error. The experiments result shows that the error of deadline satisfied ratio in the system is kept within 4 of the desired value. And the number of classifiers can be dynamically adjusted by the system itself to save the computa tion resources. The proposed methodology enables the theo retical analysis and evaluation to the TC systems, leading to a high-quality and low cost implementation approach.展开更多
Mandarin在(pinyin:zài)is the most frequently used character in representing spatial and temporal relationship.Current studies mostly focus on its lexical meaning and syntactic structure while cognitive features o...Mandarin在(pinyin:zài)is the most frequently used character in representing spatial and temporal relationship.Current studies mostly focus on its lexical meaning and syntactic structure while cognitive features of its grammatical categories have been neglected.This paper investigates into the categorization of zài by conducting a morphosyntactic test among College English majors in China.The results show that:prototypes are organizing the grammatical categories of zài at all levels in terms of intra-categorial gradience;the semantic construal of zài construction could significantly influence the accuracy of the grammatical categorization of zài;the syntactic structure can provide viable cue for the identification of grammatical categories of zài;spatiality,temporality and the status of existing are three essential semantic features encoded by zài,the concurrence of which leads to various degree of inter-categorial vagueness,indicating a conflict between the rigid grammatical classification and the indeterminate nature of the grammatical functions of zai,suggesting the necessity to reconsider the efficacy of applying indiscriminately the Anglo-Saxon grammar into the study of Chinese spatial-temporal constructions.展开更多
In cognitive linguistics,debates on the status and functions of categorization have been a heated issue.In semantics and second language acquisition,scholars have discussed and achieved vocabulary acquisition from dif...In cognitive linguistics,debates on the status and functions of categorization have been a heated issue.In semantics and second language acquisition,scholars have discussed and achieved vocabulary acquisition from different perspectives and academic levels.Vocabulary learning exerts a fundamental role in second language vocabulary acquisition(SLVA),and it is closely related to learners’cognitive competence.However,studies on second language vocabulary acquisition under the categorization theory in cognitive linguistics have received less attention from linguists when compared with other studies.This paper employs two representative dimensions,the basic-level effect and the prototype effect,under the categorization theory to further delve into the implications on second language vocabulary acquisition.This article first provides a comprehensive introduction to the nature and the approaches of the categorization theory,and then analyzes the relations and implications for second language vocabulary acquisition under the categorization theory from the perspective of the basic-level and the prototype effects.The research results showed that the basic-level effect on SLVA is mainly on the classification of word categories distinguished from the superordinate and subordinate categories,while the prototype effect is more on understanding the complexity and use of word meaning.展开更多
To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree...To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.展开更多
As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of d...As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of digitized documents,the classification of digitized documents in real time has been identified as the primary goal of our study.A paper classification is the first stage in automating document control and efficient knowledge discovery with no or little human involvement.Artificial intelligence methods such as Deep Learning are now combined with segmentation to study and interpret those traits,which were not conceivable ten years ago.Deep learning aids in comprehending input patterns so that object classes may be predicted.The segmentation process divides the input image into separate segments for a more thorough image study.This study proposes a deep learning-enabled framework for automated document classification,which can be implemented in higher education.To further this goal,a dataset was developed that includes seven categories:Diplomas,Personal documents,Journal of Accounting of higher education diplomas,Service letters,Orders,Production orders,and Student orders.Subsequently,a deep learning model based on Conv2D layers is proposed for the document classification process.In the final part of this research,the proposed model is evaluated and compared with other machine-learning techniques.The results demonstrate that the proposed deep learning model shows high results in document categorization overtaking the other machine learning models by reaching 94.84%,94.79%,94.62%,94.43%,94.07%in accuracy,precision,recall,F-score,and AUC-ROC,respectively.The achieved results prove that the proposed deep model is acceptable to use in practice as an assistant to an office worker.展开更多
This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed...This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.展开更多
To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a...To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.展开更多
The school placement processes of students from immigrant backgrounds considered to be in“difficulty”is an international concern at the intersection of works relating to special education and those concerning the sc...The school placement processes of students from immigrant backgrounds considered to be in“difficulty”is an international concern at the intersection of works relating to special education and those concerning the school experiences of students from immigrant backgrounds or racialized groups.The research problem of this article concerns the identification of these students as disabled or as having adjustment or learning difficulties.From a perspective anchored in Disability Critical Race Studies,this ethnographic study documents different interpretations of perceived difficulties made by school actors with regard to seven primary school students from immigrant backgrounds.Five interpretation types are presented:(1)medicalization by dismissal of cultural markers,(2)medicalization by professional constraint,(3)medicalization by cultural deficit,(4)precautionary wait,and(5)cultural differentialism.Our results help to shed light on the special education overrepresentation phenomenon regarding these students and to understand how ableism and(neo)racism contribute to it.展开更多
This paper proposes a new approach of feature selection based on the independent measure between features for text categorization. A fundamental hypothesis that occurrence of the terms in documents is independent of e...This paper proposes a new approach of feature selection based on the independent measure between features for text categorization. A fundamental hypothesis that occurrence of the terms in documents is independent of each other, widely used in the probabilistic models for text categorization (TC), is discussed. However, the basic hypothesis is incom plete for independence of feature set. From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to ob rain a feature subset. The selected subset is high in relevance with category and strong in independence between features, satisfies the basic hypothesis at maximum degree. Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.展开更多
This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages. It provides clues for making use of appropriate automatic classi...This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages. It provides clues for making use of appropriate automatic classifying algorithms in different fields. Finally some evaluations and summaries of these algorithms are discussed, and directions to further research have been pointed out. Key words text categorization - naive bayes - KNN - SVM - neural network CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70031010) and the Research Foundation of Beijing Institute of TechnologyBiography: SHI Yong-feng (1980-), male, Master candidate, research direction: web information mining.展开更多
The scientific evidence that climate is changing due to greenhouse gas emission is now incontestable, which may put many social, biological, and geophysical systems in the world at risk. In this paper, we first identi...The scientific evidence that climate is changing due to greenhouse gas emission is now incontestable, which may put many social, biological, and geophysical systems in the world at risk. In this paper, we first identified main risks induced from or aggravated by climate change. Then we categorized them applying a new risk categorization system brought forward by Renn in a framework of International Risk Governance Council. We proposed that "uncertainty" could be treated as the classification criteria. Based on this, we established a quantitative method with fuzzy set theory, in which "confidence" and "likelihood", the main quantitative terms for expressing uncertainties in IPCC, were used as the feature parameters to construct the fuzzy membership functions of four risk types. According to the maximum principle, most climate change risks identified were classified into the appropriate risk types. In the mean time, given that not all the quantitative terms are available, a qualitative approach was also adopted as a complementary classification method. Finally, we get the preliminary results of climate change risk categorization, which might iay the foundation for the future integrated risk management of climate change.展开更多
To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although havin...To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.展开更多
A study examining affective information processing in persons with Multiple Sclerosis and healthy adults was carried out. It was hypothesized that individual characteristics could modulate participants’ emotional cat...A study examining affective information processing in persons with Multiple Sclerosis and healthy adults was carried out. It was hypothesized that individual characteristics could modulate participants’ emotional categorization and reaction times for categorization decisions. For example, individuals with negative valenced emotional profile (e.g. anxious) should choose negative emotional alternatives faster and more frequently. Participants consisted of two different populations: 80 right-handed healthy French-speakers, and 40 right-handed French- speakers with multiple sclerosis. The results showed a positive correlation between high- level of negative emotional sensibility and emotional categorization (decision and decision speed) for affective information presented on the right-side of the screen. For all participants there were more frequent emotional choices and faster decisions for left-side presented emotional alternatives. It seems individuals’ emotional differences in general and in MS populations modulate hemispheric asymmetry of processing emotional judgments.展开更多
The ability of achieving a semantic understanding of workspaces is an important capability for mobile robot. A method is proposed to categorize different places in a typical indoor environment by using a Kinect sensor...The ability of achieving a semantic understanding of workspaces is an important capability for mobile robot. A method is proposed to categorize different places in a typical indoor environment by using a Kinect sensors for mobile robot exploration. At first, the invariant feature based images stitching approach is adopted to form a panoramic image according to Kinect visual information, and the translation between Kinect depth information and obstacle distance information is performed to obtain virtual LIDAR data. Then, the semantic classifier is designed by using convolutional neural networks (CNN) for indoor place eategorization based on Kinect visual observations with panoramic view. At last, a frontier-based exploration method is applied to carry out indoor autonomous exploration of mo- bile robots, which integrates the CNN-based categorization approach. The proposed method has been implemented and tested on a real robot, and experiment results demonstrate the approach effective- ness on solving the semantic categorization problem for mobile robot exploration.展开更多
The concept of word classes (parts of speech) has always generated controversy among linguists. The earlier Prescriptive and Descriptive Schools might have set the pace for this controversy but the present dilemma i...The concept of word classes (parts of speech) has always generated controversy among linguists. The earlier Prescriptive and Descriptive Schools might have set the pace for this controversy but the present dilemma is much deeper. Learners and even teachers are sometimes at quandary as to how to proof that a particular word belongs to a particular class. This is because a word may sometimes belong to several classes, in context as in the word "watch" which can belong to different classes. This paper therefore tries to provide answers to the problem of word class classification by using a morphological and syntactical evidence to prove that English words follow a particular range of inflections and belong to strictly ordered particular categories and do not change their class arbitrarily. This is in line with the natural perfect order of homogeneity in creation which precludes a specie from merging effectively with another specie without having to undergo some fundamental changes. Other variables were also looked into and it was concluded that teachers and learners as well, can rely on this sub-categorization approach as a reliable paradigm for their assumptions concerning word classes.展开更多
This paper is intended to reveal the likelihood that conceptual categorization can be used to understand a text by reconstructing the semantic categories through which the author's meaning is conveyed, and proposes a...This paper is intended to reveal the likelihood that conceptual categorization can be used to understand a text by reconstructing the semantic categories through which the author's meaning is conveyed, and proposes an alternative way to look into reading comprehension. It is proposed that categorization can be taken as an alternative approach to second/foreign language reading instruction. That is, while reading comprehension is defined in terms of the ability to recognize the inclusion and membership properties of contextually determined semantic categories in a text, the learner needs to arrange the events, actions, or concepts into a structured unit, both horizontally and vertically. Categorization theory will be introduced in relation to Rosch famous studies (1973, 1975), examples taken from a graded reader will be illustrated as how to identify items with category structure, and finally issues that are not addressed in this paper will be discussed.展开更多
A hierarchical system to perform automatic categorization and reorientation of images using content analysis is pre-sented. The proposed system first categorizes images to some a priori defined categories using rotati...A hierarchical system to perform automatic categorization and reorientation of images using content analysis is pre-sented. The proposed system first categorizes images to some a priori defined categories using rotation invariant features. At the second stage, it detects their correct orientation out of {0o, 90o, 180o, and 270o} using category specific model. The system has been specially designed for embedded devices applications using only low level color and edge features. Machine learning algorithms optimized to suit the embedded implementation like support vector machines (SVMs) and scalable boosting have been used to develop classifiers for categorization and orientation detection. Results are presented on a collection of about 7000 consumer images collected from open resources. The proposed system finds it applications to various digital media products and brings pattern recognition solutions to the consumer electronics domain.展开更多
In this paper, the role of rare or infrequent terms in enhancing the accuracy of English Text Categorization using Polynomial Networks (PNs) is investigated. To study the impact of rare terms in enhancing the accuracy...In this paper, the role of rare or infrequent terms in enhancing the accuracy of English Text Categorization using Polynomial Networks (PNs) is investigated. To study the impact of rare terms in enhancing the accuracy of PNs-based text categorization, different term reduction criteria as well as different term weighting schemes were experimented on the Reuters Corpus using PNs. Each term weighting scheme on each reduced term set was tested once keeping the rare terms and another time removing them. All the experiments conducted in this research show that keeping rare terms substantially improves the performance of Polynomial Networks in Text Categorization, regardless of the term reduction method, the number of terms used in classification, or the term weighting scheme adopted.展开更多
In current study, behavioral measures were conducted to investigate clothing color. The purpose was to focus on the rule that color brightness influencedpositive-negative emotional categorization. Results showed that ...In current study, behavioral measures were conducted to investigate clothing color. The purpose was to focus on the rule that color brightness influencedpositive-negative emotional categorization. Results showed that the effect of brightness on clothing color emotion categorization was significant. With the increase of brightness, the variation curve of positive emotion appears to be a “U-shaped”, whereas that of the negative emotion shows an upside down “U-shaped”. Compared with the low brightness colors, the emotion reaction to the high brightness colors was more positive;Most of the colors with different brightness scales were classified as positive emotions and the minors were classified as negative emotions;the positive colors could be done much faster than the negative ones.展开更多
基金Supported by the National Natural Science Foun-dation of China (90104032) ,the National High-Tech Research andDevelopment Plan of China (2003AA1Z2090)
文摘In order to provide predictable runtime performante for text categorization (TC) systems, an innovative system design method is proposed for soft real time TC systems. An analyzable mathematical model is established to approximately describe the nonlinear and time-varying TC systems. According to this mathematical model, the feedback control theory is adopted to prove the system's stableness and zero steady state error. The experiments result shows that the error of deadline satisfied ratio in the system is kept within 4 of the desired value. And the number of classifiers can be dynamically adjusted by the system itself to save the computa tion resources. The proposed methodology enables the theo retical analysis and evaluation to the TC systems, leading to a high-quality and low cost implementation approach.
文摘Mandarin在(pinyin:zài)is the most frequently used character in representing spatial and temporal relationship.Current studies mostly focus on its lexical meaning and syntactic structure while cognitive features of its grammatical categories have been neglected.This paper investigates into the categorization of zài by conducting a morphosyntactic test among College English majors in China.The results show that:prototypes are organizing the grammatical categories of zài at all levels in terms of intra-categorial gradience;the semantic construal of zài construction could significantly influence the accuracy of the grammatical categorization of zài;the syntactic structure can provide viable cue for the identification of grammatical categories of zài;spatiality,temporality and the status of existing are three essential semantic features encoded by zài,the concurrence of which leads to various degree of inter-categorial vagueness,indicating a conflict between the rigid grammatical classification and the indeterminate nature of the grammatical functions of zai,suggesting the necessity to reconsider the efficacy of applying indiscriminately the Anglo-Saxon grammar into the study of Chinese spatial-temporal constructions.
基金“Research on the Development Path of Ideological Leadership of Ideological and Political Education in Colleges and Universities in the New Era”of the Counselor Special Research Projects of Furlong College,Hunan University of Science and Arts in 2023(Project number:FRfdy2307)。
文摘In cognitive linguistics,debates on the status and functions of categorization have been a heated issue.In semantics and second language acquisition,scholars have discussed and achieved vocabulary acquisition from different perspectives and academic levels.Vocabulary learning exerts a fundamental role in second language vocabulary acquisition(SLVA),and it is closely related to learners’cognitive competence.However,studies on second language vocabulary acquisition under the categorization theory in cognitive linguistics have received less attention from linguists when compared with other studies.This paper employs two representative dimensions,the basic-level effect and the prototype effect,under the categorization theory to further delve into the implications on second language vocabulary acquisition.This article first provides a comprehensive introduction to the nature and the approaches of the categorization theory,and then analyzes the relations and implications for second language vocabulary acquisition under the categorization theory from the perspective of the basic-level and the prototype effects.The research results showed that the basic-level effect on SLVA is mainly on the classification of word categories distinguished from the superordinate and subordinate categories,while the prototype effect is more on understanding the complexity and use of word meaning.
基金The National Natural Science Foundation of China(No.60473045)the Technology Research Project of Hebei Province(No.05213573)the Research Plan of Education Office of Hebei Province(No.2004406)
文摘To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.
文摘As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of digitized documents,the classification of digitized documents in real time has been identified as the primary goal of our study.A paper classification is the first stage in automating document control and efficient knowledge discovery with no or little human involvement.Artificial intelligence methods such as Deep Learning are now combined with segmentation to study and interpret those traits,which were not conceivable ten years ago.Deep learning aids in comprehending input patterns so that object classes may be predicted.The segmentation process divides the input image into separate segments for a more thorough image study.This study proposes a deep learning-enabled framework for automated document classification,which can be implemented in higher education.To further this goal,a dataset was developed that includes seven categories:Diplomas,Personal documents,Journal of Accounting of higher education diplomas,Service letters,Orders,Production orders,and Student orders.Subsequently,a deep learning model based on Conv2D layers is proposed for the document classification process.In the final part of this research,the proposed model is evaluated and compared with other machine-learning techniques.The results demonstrate that the proposed deep learning model shows high results in document categorization overtaking the other machine learning models by reaching 94.84%,94.79%,94.62%,94.43%,94.07%in accuracy,precision,recall,F-score,and AUC-ROC,respectively.The achieved results prove that the proposed deep model is acceptable to use in practice as an assistant to an office worker.
文摘This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.
文摘To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.
文摘The school placement processes of students from immigrant backgrounds considered to be in“difficulty”is an international concern at the intersection of works relating to special education and those concerning the school experiences of students from immigrant backgrounds or racialized groups.The research problem of this article concerns the identification of these students as disabled or as having adjustment or learning difficulties.From a perspective anchored in Disability Critical Race Studies,this ethnographic study documents different interpretations of perceived difficulties made by school actors with regard to seven primary school students from immigrant backgrounds.Five interpretation types are presented:(1)medicalization by dismissal of cultural markers,(2)medicalization by professional constraint,(3)medicalization by cultural deficit,(4)precautionary wait,and(5)cultural differentialism.Our results help to shed light on the special education overrepresentation phenomenon regarding these students and to understand how ableism and(neo)racism contribute to it.
基金Supported by the National Natural Science Foun-dation of China (60373066 ,60503020) the Outstanding Young Sci-entist’s Fund(60425206) Doctor Foundatoin of Nanjing Universityof Posts and Telecommunications (2003-02)
文摘This paper proposes a new approach of feature selection based on the independent measure between features for text categorization. A fundamental hypothesis that occurrence of the terms in documents is independent of each other, widely used in the probabilistic models for text categorization (TC), is discussed. However, the basic hypothesis is incom plete for independence of feature set. From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to ob rain a feature subset. The selected subset is high in relevance with category and strong in independence between features, satisfies the basic hypothesis at maximum degree. Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.
文摘This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages. It provides clues for making use of appropriate automatic classifying algorithms in different fields. Finally some evaluations and summaries of these algorithms are discussed, and directions to further research have been pointed out. Key words text categorization - naive bayes - KNN - SVM - neural network CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70031010) and the Research Foundation of Beijing Institute of TechnologyBiography: SHI Yong-feng (1980-), male, Master candidate, research direction: web information mining.
基金Under the auspices of National Science & Technology Pillar Program During the 11th Five-Year Plan Period (No 2006BAD20B05)
文摘The scientific evidence that climate is changing due to greenhouse gas emission is now incontestable, which may put many social, biological, and geophysical systems in the world at risk. In this paper, we first identified main risks induced from or aggravated by climate change. Then we categorized them applying a new risk categorization system brought forward by Renn in a framework of International Risk Governance Council. We proposed that "uncertainty" could be treated as the classification criteria. Based on this, we established a quantitative method with fuzzy set theory, in which "confidence" and "likelihood", the main quantitative terms for expressing uncertainties in IPCC, were used as the feature parameters to construct the fuzzy membership functions of four risk types. According to the maximum principle, most climate change risks identified were classified into the appropriate risk types. In the mean time, given that not all the quantitative terms are available, a qualitative approach was also adopted as a complementary classification method. Finally, we get the preliminary results of climate change risk categorization, which might iay the foundation for the future integrated risk management of climate change.
文摘To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.
文摘A study examining affective information processing in persons with Multiple Sclerosis and healthy adults was carried out. It was hypothesized that individual characteristics could modulate participants’ emotional categorization and reaction times for categorization decisions. For example, individuals with negative valenced emotional profile (e.g. anxious) should choose negative emotional alternatives faster and more frequently. Participants consisted of two different populations: 80 right-handed healthy French-speakers, and 40 right-handed French- speakers with multiple sclerosis. The results showed a positive correlation between high- level of negative emotional sensibility and emotional categorization (decision and decision speed) for affective information presented on the right-side of the screen. For all participants there were more frequent emotional choices and faster decisions for left-side presented emotional alternatives. It seems individuals’ emotional differences in general and in MS populations modulate hemispheric asymmetry of processing emotional judgments.
基金Supported by the National Key Basic Research Program of China(No.2013CB035503)
文摘The ability of achieving a semantic understanding of workspaces is an important capability for mobile robot. A method is proposed to categorize different places in a typical indoor environment by using a Kinect sensors for mobile robot exploration. At first, the invariant feature based images stitching approach is adopted to form a panoramic image according to Kinect visual information, and the translation between Kinect depth information and obstacle distance information is performed to obtain virtual LIDAR data. Then, the semantic classifier is designed by using convolutional neural networks (CNN) for indoor place eategorization based on Kinect visual observations with panoramic view. At last, a frontier-based exploration method is applied to carry out indoor autonomous exploration of mo- bile robots, which integrates the CNN-based categorization approach. The proposed method has been implemented and tested on a real robot, and experiment results demonstrate the approach effective- ness on solving the semantic categorization problem for mobile robot exploration.
文摘The concept of word classes (parts of speech) has always generated controversy among linguists. The earlier Prescriptive and Descriptive Schools might have set the pace for this controversy but the present dilemma is much deeper. Learners and even teachers are sometimes at quandary as to how to proof that a particular word belongs to a particular class. This is because a word may sometimes belong to several classes, in context as in the word "watch" which can belong to different classes. This paper therefore tries to provide answers to the problem of word class classification by using a morphological and syntactical evidence to prove that English words follow a particular range of inflections and belong to strictly ordered particular categories and do not change their class arbitrarily. This is in line with the natural perfect order of homogeneity in creation which precludes a specie from merging effectively with another specie without having to undergo some fundamental changes. Other variables were also looked into and it was concluded that teachers and learners as well, can rely on this sub-categorization approach as a reliable paradigm for their assumptions concerning word classes.
文摘This paper is intended to reveal the likelihood that conceptual categorization can be used to understand a text by reconstructing the semantic categories through which the author's meaning is conveyed, and proposes an alternative way to look into reading comprehension. It is proposed that categorization can be taken as an alternative approach to second/foreign language reading instruction. That is, while reading comprehension is defined in terms of the ability to recognize the inclusion and membership properties of contextually determined semantic categories in a text, the learner needs to arrange the events, actions, or concepts into a structured unit, both horizontally and vertically. Categorization theory will be introduced in relation to Rosch famous studies (1973, 1975), examples taken from a graded reader will be illustrated as how to identify items with category structure, and finally issues that are not addressed in this paper will be discussed.
文摘A hierarchical system to perform automatic categorization and reorientation of images using content analysis is pre-sented. The proposed system first categorizes images to some a priori defined categories using rotation invariant features. At the second stage, it detects their correct orientation out of {0o, 90o, 180o, and 270o} using category specific model. The system has been specially designed for embedded devices applications using only low level color and edge features. Machine learning algorithms optimized to suit the embedded implementation like support vector machines (SVMs) and scalable boosting have been used to develop classifiers for categorization and orientation detection. Results are presented on a collection of about 7000 consumer images collected from open resources. The proposed system finds it applications to various digital media products and brings pattern recognition solutions to the consumer electronics domain.
文摘In this paper, the role of rare or infrequent terms in enhancing the accuracy of English Text Categorization using Polynomial Networks (PNs) is investigated. To study the impact of rare terms in enhancing the accuracy of PNs-based text categorization, different term reduction criteria as well as different term weighting schemes were experimented on the Reuters Corpus using PNs. Each term weighting scheme on each reduced term set was tested once keeping the rare terms and another time removing them. All the experiments conducted in this research show that keeping rare terms substantially improves the performance of Polynomial Networks in Text Categorization, regardless of the term reduction method, the number of terms used in classification, or the term weighting scheme adopted.
文摘In current study, behavioral measures were conducted to investigate clothing color. The purpose was to focus on the rule that color brightness influencedpositive-negative emotional categorization. Results showed that the effect of brightness on clothing color emotion categorization was significant. With the increase of brightness, the variation curve of positive emotion appears to be a “U-shaped”, whereas that of the negative emotion shows an upside down “U-shaped”. Compared with the low brightness colors, the emotion reaction to the high brightness colors was more positive;Most of the colors with different brightness scales were classified as positive emotions and the minors were classified as negative emotions;the positive colors could be done much faster than the negative ones.