To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Con...To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.展开更多
In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing ...In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing is proposed. First, every word vector by new entities is extended after the documents are represented by tf-idf. Then the feature extracting algorithm is applied for the documents. Finally, the algorithm of ontology aggregation clustering (OAC) is proposed to improve the result of document clustering. Experiments are based on the data set of Reuters 20 News Group, and experimental results are compared with the results obtained by mutual information(MI). The conclusion draws that the proposed algorithm of document clustering based on ontology is better than the other existed clustering algorithms such as MNB, CLUTO, co-clustering, etc.展开更多
A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywor...A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described.展开更多
For different texts, different translation strategies should be adopted. This paper makes a case study on lines from Disney animation picture Mulan in the light of functional concept of translation. And it draws a con...For different texts, different translation strategies should be adopted. This paper makes a case study on lines from Disney animation picture Mulan in the light of functional concept of translation. And it draws a conclusion that in the process of film translation, the translator is allowed to make some adaptations to make the translated version more appropriate in the original context, and that the translator is supposed to have both "semantic awareness" and "functional awareness", to achieve the equivalence between the SL text and TL text functionally.展开更多
To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree...To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.展开更多
The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall...The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.展开更多
With the development of short video industry,video and bullet screen have become important ways to spread public opinions.Public attitudes can be timely obtained through emotional analysis on bullet screen,which can a...With the development of short video industry,video and bullet screen have become important ways to spread public opinions.Public attitudes can be timely obtained through emotional analysis on bullet screen,which can also reduce difficulties in management of online public opinions.A convolutional neural network model based on multi-head attention is proposed to solve the problem of how to effectively model relations among words and identify key words in emotion classification tasks with short text contents and lack of complete context information.Firstly,encode word positions so that order information of input sequences can be used by the model.Secondly,use a multi-head attention mechanism to obtain semantic expressions in different subspaces,effectively capture internal relevance and enhance dependent relationships among words,as well as highlight emotional weights of key emotional words.Then a dilated convolution is used to increase the receptive field and extract more features.On this basis,the above multi-attention mechanism is combined with a convolutional neural network to model and analyze the seven emotional categories of bullet screens.Testing from perspectives of model and dataset,experimental results can validate effectiveness of our approach.Finally,emotions of bullet screens are visualized to provide data supports for hot event controls and other fields.展开更多
Although, researchers in the ATC field have done a wide range of work based on SVM, almost all existing approaches utilize an empirical model of selection algorithms. Their attempts to model automatic selection in pra...Although, researchers in the ATC field have done a wide range of work based on SVM, almost all existing approaches utilize an empirical model of selection algorithms. Their attempts to model automatic selection in practical, large-scale, text classification systems have been limited. In this paper, we propose a new model selection algorithm that utilizes the DDAG learning architecture. This architecture derives a new large-scale text classifier with very good performance. Experimental results show that the proposed algorithm has good efficiency and the necessary generalization capability while handling large-scale multi-class text classification tasks.展开更多
In text classification, labeling documents is a tedious and costly task, as it would consume a lot of expert time. On the other hand, it usually is easier to obtain a lot of unlabeled documents, with the help of some ...In text classification, labeling documents is a tedious and costly task, as it would consume a lot of expert time. On the other hand, it usually is easier to obtain a lot of unlabeled documents, with the help of some tools like Digital Library, Crawler Programs, and Searching Engine. To learn text classifier from labeled and unlabeled examples, a novel fuzzy method is proposed. Firstly, a Seeded Fuzzy c-means Clustering algorithm is proposed to learn fuzzy clusters from a set of labeled and unlabeled examples. Secondly, based on the resulting fuzzy clusters, some examples with high confidence are selected to construct training data set. Finally, the constructed training data set is used to train Fuzzy Support Vector Machine, and get text classifier. Empirical results on two benchmark datasets indicate that, by incorporating unlabeled examples into learning process, the method performs significantly better than FSVM trained with a small number of labeled examples only. Also, the method proposed performs at least as well as the related method-EM with Nave Bayes. One advantage of the method proposed is that it does not rely on any parametric assumptions about the data as it is usually the case with generative methods widely used in semi-supervised learning.展开更多
Support vector machines have met with significant success in the information retrieval field, especially in handling text classification tasks. Although various performance estimators for SVMs have been proposed, thes...Support vector machines have met with significant success in the information retrieval field, especially in handling text classification tasks. Although various performance estimators for SVMs have been proposed, these only focus on accuracy which is based on the leave-one-out cross validation procedure. Information-retrieval-related performance measures are always neglected in a kernel learning methodology. In this paper, we have proposed a set of information-retrieval-oriented performance estimators for SVMs, which are based on the span bound of the leave-one-out procedure. Experiments have proven that our proposed estimators are both effective and stable.展开更多
The paper is devoted to the study of quantitative methods in linguistics and describes the studies conducted. The purpose is to give the general idea of these studies. The first one considers one of the principal logi...The paper is devoted to the study of quantitative methods in linguistics and describes the studies conducted. The purpose is to give the general idea of these studies. The first one considers one of the principal logical categories--the quality. The basis of the research was comprised of lexicographical recourses. The text study finishes up the research. The second one dwells on the usage of the typological indices method in the research of comparatives and superlatives in English, German, and Russian texts. The principal method used is that of typological indices. As the result, people can observe the prospects of this method in linguistics展开更多
Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with ...Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification.展开更多
Group distance coding is suitable for secret communication covered by printed documents. However there is no effective method against it. The study found that the hiding method will make group distances of text lines ...Group distance coding is suitable for secret communication covered by printed documents. However there is no effective method against it. The study found that the hiding method will make group distances of text lines coverage on specified values, and make variances of group distances among N-Window text lines become small. Inspired by the discovery, the research brings out a Support Vector Machine (SVM) based steganalysis algorithm. To avoid the disturbance of large difference among words length from same line, the research only reserves samples whose occurrence-frequencies are ± 10dB of the maximum frequency. The results show that the correct rate of the SVM classifier is higher than 90%.展开更多
N+N nominal sentence is an important structure type of nominal sentences in Mandarin Chinese. Attributive-center, combination, apposition and subject-predicate are its main structure types. In main literary genres, ...N+N nominal sentence is an important structure type of nominal sentences in Mandarin Chinese. Attributive-center, combination, apposition and subject-predicate are its main structure types. In main literary genres, the distribution of N+N nominal sentence shows a certain trend of dominant hierarchy: poem﹥drama﹥novel﹥prose. No matter what kind of literary genres, attributive-center structure is the type with maximum quantity, while appositive structure is the type with minimum quantity. Statistical result indicates that most of N+N nominal sentence is nominal and its use is limited by genres. Function of N+N nominal sentence is textual. When it comes to discourse, it can be used as theme, rheme and dual identity of theme and rheme based on the theory of Theme-Rheme (T-R) structure pattern. It does not only construct the information structure to deliver textual information, but also its a vital means of discourse cohesion and coherence.展开更多
An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the...An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the base for the Naive Bayes classifier to approve the effectiveness of the domain ontology for document classification. The 1752 documents divided into 10 categories are used to assess the effectiveness of the ontology, where 1252 and 500 documents are the training and testing documents, respectively. The Fl-measure is as the assessment criteria and the following three results are obtained. The average recall of Naive Bayes classifier is 0.94. Therefore, in recall, the performance of Naive Bayes classifier is excellent based on the automatically constructed ontology. The average precision of Naive Bayes classifier is 0.81. Therefore, in precision, the performance of Naive Bayes classifier is gored based on the automatically constructed ontology. The average Fl-measure for 10 categories by Naive Bayes classifier is 0.86. Therefore, the performance of Naive Bayes classifier is effective based on the automatically constructed ontology in the point of F 1-measure. Thus, the domain ontology automatically constructed could indeed be acted as the document categories to reach the effectiveness for document classification.展开更多
This paper is intended to reveal the likelihood that conceptual categorization can be used to understand a text by reconstructing the semantic categories through which the author's meaning is conveyed, and proposes a...This paper is intended to reveal the likelihood that conceptual categorization can be used to understand a text by reconstructing the semantic categories through which the author's meaning is conveyed, and proposes an alternative way to look into reading comprehension. It is proposed that categorization can be taken as an alternative approach to second/foreign language reading instruction. That is, while reading comprehension is defined in terms of the ability to recognize the inclusion and membership properties of contextually determined semantic categories in a text, the learner needs to arrange the events, actions, or concepts into a structured unit, both horizontally and vertically. Categorization theory will be introduced in relation to Rosch famous studies (1973, 1975), examples taken from a graded reader will be illustrated as how to identify items with category structure, and finally issues that are not addressed in this paper will be discussed.展开更多
Municipalities are autonomous economic and administrative entities, with common actions and responsibilities. Moreover, all Municipalities are quite different considering specific characteristics, such as geographic, ...Municipalities are autonomous economic and administrative entities, with common actions and responsibilities. Moreover, all Municipalities are quite different considering specific characteristics, such as geographic, demographic, and economic. The aim of this research is to separate the entire sample of Municipalities in Greece into categories, based on the effectiveness of financial management and financial performance into effective and ineffective ones. For the separation of the sample into groups, cluster analysis was preferred. For this reason, three variables were used: the lending capacity of the Municipality, flexibility in making non-investment costs, and flexibility in investment spending. These three variables were considered to be the key dimensions of effectiveness in financial management and therefore their use, representatively describes the effectiveness of Greek Municipalities. Thus, this paper presents the literature review of the financial effectiveness of Municipalities and the methodology of an empirical research through structured questionnaire that was sent to the entire population of Greek Municipalities, characterized in this way with considerable heterogeneity. In this way, it investigates the views of Mayors in the two categories of Municipalities (effective and non effective financial management and financial performance) as regards: (a) the biggest problems faced by the citizens in their Municipality, and (b) the biggest personnel problems faced by their Municipality. Concluding, the prioritization of both problems seems to be the same for both groups of Municipalities. The frequency of responses differs slightly and differences are not so large that financial performance can be considered to affect respondents' opinions.展开更多
Marx's hermeneutics has introduced the concept of praxis into the basic dimension of all understanding and interpretation, and thus has accomplished the “Copernican Revolution” in the history of hermeneutics. This ...Marx's hermeneutics has introduced the concept of praxis into the basic dimension of all understanding and interpretation, and thus has accomplished the “Copernican Revolution” in the history of hermeneutics. This means that we are unable to understand and interpret human existential practical activities from the perspective of idealistic texts, but should understand and interpret the idealistic texts fi'om the perspective of human existential practical activities. In this way, Marx's hermeneutics of praxis has pointed us the general direction of the development of hermeneutics.展开更多
基金The National Natural Science Foundation of China(No60672056)Open Fund of MOE-MS Key Laboratory of Multime-dia Computing and Communication(No06120809)
文摘To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.
基金The National Natural Science Foundation of China(No.60373099),the Natural Science Foundation for Young Scholars of Northeast Normal University (No.20061005)
文摘In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing is proposed. First, every word vector by new entities is extended after the documents are represented by tf-idf. Then the feature extracting algorithm is applied for the documents. Finally, the algorithm of ontology aggregation clustering (OAC) is proposed to improve the result of document clustering. Experiments are based on the data set of Reuters 20 News Group, and experimental results are compared with the results obtained by mutual information(MI). The conclusion draws that the proposed algorithm of document clustering based on ontology is better than the other existed clustering algorithms such as MNB, CLUTO, co-clustering, etc.
基金The Young Teachers Scientific Research Foundation (YTSRF) of Nanjing University of Science and Technology in the Year of2005-2006.
文摘A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described.
文摘For different texts, different translation strategies should be adopted. This paper makes a case study on lines from Disney animation picture Mulan in the light of functional concept of translation. And it draws a conclusion that in the process of film translation, the translator is allowed to make some adaptations to make the translated version more appropriate in the original context, and that the translator is supposed to have both "semantic awareness" and "functional awareness", to achieve the equivalence between the SL text and TL text functionally.
基金The National Natural Science Foundation of China(No.60473045)the Technology Research Project of Hebei Province(No.05213573)the Research Plan of Education Office of Hebei Province(No.2004406)
文摘To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.
基金Project(KC18071)supported by the Application Foundation Research Program of Xuzhou,ChinaProjects(2017YFC0804401,2017YFC0804409)supported by the National Key R&D Program of China
文摘The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.
基金National Natural Science Foundation of China(No.61562057)Gansu Science and Technology Plan Project(No.18JR3RA104)。
文摘With the development of short video industry,video and bullet screen have become important ways to spread public opinions.Public attitudes can be timely obtained through emotional analysis on bullet screen,which can also reduce difficulties in management of online public opinions.A convolutional neural network model based on multi-head attention is proposed to solve the problem of how to effectively model relations among words and identify key words in emotion classification tasks with short text contents and lack of complete context information.Firstly,encode word positions so that order information of input sequences can be used by the model.Secondly,use a multi-head attention mechanism to obtain semantic expressions in different subspaces,effectively capture internal relevance and enhance dependent relationships among words,as well as highlight emotional weights of key emotional words.Then a dilated convolution is used to increase the receptive field and extract more features.On this basis,the above multi-attention mechanism is combined with a convolutional neural network to model and analyze the seven emotional categories of bullet screens.Testing from perspectives of model and dataset,experimental results can validate effectiveness of our approach.Finally,emotions of bullet screens are visualized to provide data supports for hot event controls and other fields.
文摘Although, researchers in the ATC field have done a wide range of work based on SVM, almost all existing approaches utilize an empirical model of selection algorithms. Their attempts to model automatic selection in practical, large-scale, text classification systems have been limited. In this paper, we propose a new model selection algorithm that utilizes the DDAG learning architecture. This architecture derives a new large-scale text classifier with very good performance. Experimental results show that the proposed algorithm has good efficiency and the necessary generalization capability while handling large-scale multi-class text classification tasks.
文摘In text classification, labeling documents is a tedious and costly task, as it would consume a lot of expert time. On the other hand, it usually is easier to obtain a lot of unlabeled documents, with the help of some tools like Digital Library, Crawler Programs, and Searching Engine. To learn text classifier from labeled and unlabeled examples, a novel fuzzy method is proposed. Firstly, a Seeded Fuzzy c-means Clustering algorithm is proposed to learn fuzzy clusters from a set of labeled and unlabeled examples. Secondly, based on the resulting fuzzy clusters, some examples with high confidence are selected to construct training data set. Finally, the constructed training data set is used to train Fuzzy Support Vector Machine, and get text classifier. Empirical results on two benchmark datasets indicate that, by incorporating unlabeled examples into learning process, the method performs significantly better than FSVM trained with a small number of labeled examples only. Also, the method proposed performs at least as well as the related method-EM with Nave Bayes. One advantage of the method proposed is that it does not rely on any parametric assumptions about the data as it is usually the case with generative methods widely used in semi-supervised learning.
文摘Support vector machines have met with significant success in the information retrieval field, especially in handling text classification tasks. Although various performance estimators for SVMs have been proposed, these only focus on accuracy which is based on the leave-one-out cross validation procedure. Information-retrieval-related performance measures are always neglected in a kernel learning methodology. In this paper, we have proposed a set of information-retrieval-oriented performance estimators for SVMs, which are based on the span bound of the leave-one-out procedure. Experiments have proven that our proposed estimators are both effective and stable.
文摘The paper is devoted to the study of quantitative methods in linguistics and describes the studies conducted. The purpose is to give the general idea of these studies. The first one considers one of the principal logical categories--the quality. The basis of the research was comprised of lexicographical recourses. The text study finishes up the research. The second one dwells on the usage of the typological indices method in the research of comparatives and superlatives in English, German, and Russian texts. The principal method used is that of typological indices. As the result, people can observe the prospects of this method in linguistics
基金supported by the National Natural Science Foundation of China under Grants No.61100205,No.60873001the HiTech Research and Development Program of China under Grant No.2011AA010705the Fundamental Research Funds for the Central Universities under Grant No.2009RC0212
文摘Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification.
基金the National Natural Science Foundation of China under Grant No.61170269,No.61170272,No.61202082,No.61003285,and the Fundamental Research Funds for the Central Universities under Grant No.BUPT2013RC0308,No.BUPT2013RC0311
文摘Group distance coding is suitable for secret communication covered by printed documents. However there is no effective method against it. The study found that the hiding method will make group distances of text lines coverage on specified values, and make variances of group distances among N-Window text lines become small. Inspired by the discovery, the research brings out a Support Vector Machine (SVM) based steganalysis algorithm. To avoid the disturbance of large difference among words length from same line, the research only reserves samples whose occurrence-frequencies are ± 10dB of the maximum frequency. The results show that the correct rate of the SVM classifier is higher than 90%.
文摘N+N nominal sentence is an important structure type of nominal sentences in Mandarin Chinese. Attributive-center, combination, apposition and subject-predicate are its main structure types. In main literary genres, the distribution of N+N nominal sentence shows a certain trend of dominant hierarchy: poem﹥drama﹥novel﹥prose. No matter what kind of literary genres, attributive-center structure is the type with maximum quantity, while appositive structure is the type with minimum quantity. Statistical result indicates that most of N+N nominal sentence is nominal and its use is limited by genres. Function of N+N nominal sentence is textual. When it comes to discourse, it can be used as theme, rheme and dual identity of theme and rheme based on the theory of Theme-Rheme (T-R) structure pattern. It does not only construct the information structure to deliver textual information, but also its a vital means of discourse cohesion and coherence.
文摘An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the base for the Naive Bayes classifier to approve the effectiveness of the domain ontology for document classification. The 1752 documents divided into 10 categories are used to assess the effectiveness of the ontology, where 1252 and 500 documents are the training and testing documents, respectively. The Fl-measure is as the assessment criteria and the following three results are obtained. The average recall of Naive Bayes classifier is 0.94. Therefore, in recall, the performance of Naive Bayes classifier is excellent based on the automatically constructed ontology. The average precision of Naive Bayes classifier is 0.81. Therefore, in precision, the performance of Naive Bayes classifier is gored based on the automatically constructed ontology. The average Fl-measure for 10 categories by Naive Bayes classifier is 0.86. Therefore, the performance of Naive Bayes classifier is effective based on the automatically constructed ontology in the point of F 1-measure. Thus, the domain ontology automatically constructed could indeed be acted as the document categories to reach the effectiveness for document classification.
文摘This paper is intended to reveal the likelihood that conceptual categorization can be used to understand a text by reconstructing the semantic categories through which the author's meaning is conveyed, and proposes an alternative way to look into reading comprehension. It is proposed that categorization can be taken as an alternative approach to second/foreign language reading instruction. That is, while reading comprehension is defined in terms of the ability to recognize the inclusion and membership properties of contextually determined semantic categories in a text, the learner needs to arrange the events, actions, or concepts into a structured unit, both horizontally and vertically. Categorization theory will be introduced in relation to Rosch famous studies (1973, 1975), examples taken from a graded reader will be illustrated as how to identify items with category structure, and finally issues that are not addressed in this paper will be discussed.
文摘Municipalities are autonomous economic and administrative entities, with common actions and responsibilities. Moreover, all Municipalities are quite different considering specific characteristics, such as geographic, demographic, and economic. The aim of this research is to separate the entire sample of Municipalities in Greece into categories, based on the effectiveness of financial management and financial performance into effective and ineffective ones. For the separation of the sample into groups, cluster analysis was preferred. For this reason, three variables were used: the lending capacity of the Municipality, flexibility in making non-investment costs, and flexibility in investment spending. These three variables were considered to be the key dimensions of effectiveness in financial management and therefore their use, representatively describes the effectiveness of Greek Municipalities. Thus, this paper presents the literature review of the financial effectiveness of Municipalities and the methodology of an empirical research through structured questionnaire that was sent to the entire population of Greek Municipalities, characterized in this way with considerable heterogeneity. In this way, it investigates the views of Mayors in the two categories of Municipalities (effective and non effective financial management and financial performance) as regards: (a) the biggest problems faced by the citizens in their Municipality, and (b) the biggest personnel problems faced by their Municipality. Concluding, the prioritization of both problems seems to be the same for both groups of Municipalities. The frequency of responses differs slightly and differences are not so large that financial performance can be considered to affect respondents' opinions.
文摘Marx's hermeneutics has introduced the concept of praxis into the basic dimension of all understanding and interpretation, and thus has accomplished the “Copernican Revolution” in the history of hermeneutics. This means that we are unable to understand and interpret human existential practical activities from the perspective of idealistic texts, but should understand and interpret the idealistic texts fi'om the perspective of human existential practical activities. In this way, Marx's hermeneutics of praxis has pointed us the general direction of the development of hermeneutics.