期刊文献+
共找到17篇文章
< 1 >
每页显示 20 50 100
Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark 被引量:21
1
作者 LIU Peng ZHAO Hui-han +3 位作者 TENG Jia-yu YANG Yan-yan LIU Ya-feng ZHU Zong-wei 《Journal of Central South University》 SCIE EI CAS CSCD 2019年第1期1-12,共12页
The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall... The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining. 展开更多
关键词 chinese text classification naive Bayes SPARK HADOOP resilient distributed dataset PARALLELIZATION
下载PDF
Supervised Contrastive Learning with Term Weighting for Improving Chinese Text Classification
2
作者 Jiabao Guo Bo Zhao +2 位作者 Hui Liu Yifan Liu Qian Zhong 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第1期59-68,共10页
With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with... With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with English,Chinese text task is more complex in semantic information representations.However,most existing Chinese text classification approaches typically regard feature representation and feature selection as the key points,but fail to take into account the learning strategy that adapts to the task.Besides,these approaches compress the Chinese word into a representation vector,without considering the distribution of the term among the categories of interest.In order to improve the effect of Chinese text classification,a unified method,called Supervised Contrastive Learning with Term Weighting(SCL-TW),is proposed in this paper.Supervised contrastive learning makes full use of a large amount of unlabeled data to improve model stability.In SCL-TW,we calculate the score of term weighting to optimize the process of data augmentation of Chinese text.Subsequently,the transformed features are fed into a temporal convolution network to conduct feature representation.Experimental verifications are conducted on two Chinese benchmark datasets.The results demonstrate that SCL-TW outperforms other advanced Chinese text classification approaches by an amazing margin. 展开更多
关键词 chinese text classification Supervised Contrastive Learning(SCL) Term Weighting(TW) Temporal Convolution Network(TCN)
原文传递
An Optimized Chinese Filtering Model Using Value Scale Extended Text Vector
3
作者 Siyu Lu Ligao Cai +5 位作者 Zhixin Liu Shan Liu Bo Yang Lirong Yin Mingzhe Liu Wenfeng Zheng 《Computer Systems Science & Engineering》 SCIE EI 2023年第11期1881-1899,共19页
With the development of Internet technology,the explosive growth of Internet information presentation has led to difficulty in filtering effective information.Finding a model with high accuracy for text classification... With the development of Internet technology,the explosive growth of Internet information presentation has led to difficulty in filtering effective information.Finding a model with high accuracy for text classification has become a critical problem to be solved by text filtering,especially for Chinese texts.This paper selected the manually calibrated Douban movie website comment data for research.First,a text filtering model based on the BP neural network has been built;Second,based on the Term Frequency-Inverse Document Frequency(TF-IDF)vector space model and the doc2vec method,the text word frequency vector and the text semantic vector were obtained respectively,and the text word frequency vector was linearly reduced by the Principal Component Analysis(PCA)method.Third,the text word frequency vector after dimensionality reduction and the text semantic vector were combined,add the text value degree,and the text synthesis vector was constructed.Experiments show that the model combined with text word frequency vector degree after dimensionality reduction,text semantic vector,and text value has reached the highest accuracy of 84.67%. 展开更多
关键词 chinese text filtering text vector word frequency vectors text semantic vectors value degree BP neural network TF-IDF doc2vec PCA
下载PDF
Research and Analysis of Grammatical Error Correction Technology for Chinese Documents
4
作者 Wei Jin Feng Jiang +2 位作者 Xiulai Wang Ningling Ma Yutao Zhang 《Journal of Computer and Communications》 2024年第8期202-223,共22页
With the widespread use of Chinese globally, the number of Chinese learners has been increasing, leading to various grammatical errors among beginners. Additionally, as domestic efforts to develop industrial informati... With the widespread use of Chinese globally, the number of Chinese learners has been increasing, leading to various grammatical errors among beginners. Additionally, as domestic efforts to develop industrial information grow, electronic documents have also proliferated. When dealing with numerous electronic documents and texts written by Chinese beginners, manually written texts often contain hidden grammatical errors, posing a significant challenge to traditional manual proofreading. Correcting these grammatical errors is crucial to ensure fluency and readability. However, certain special types of text grammar or logical errors can have a huge impact, and manually proofreading a large number of texts individually is clearly impractical. Consequently, research on text error correction techniques has garnered significant attention in recent years. The advent and advancement of deep learning have paved the way for sequence-to-sequence learning methods to be extensively applied to the task of text error correction. This paper presents a comprehensive analysis of Chinese text grammar error correction technology, elaborates on its current research status, discusses existing problems, proposes preliminary solutions, and conducts experiments using judicial documents as an example. The aim is to provide a feasible research approach for Chinese text error correction technology. 展开更多
关键词 chinese text Error Judicial Documents Neural Network Deep Learning TRANSFORMER
下载PDF
Sentiment Analysis for Chinese Text Based on Emotion Degree Lexicon and Cognitive Theories 被引量:2
5
作者 武星 吕海涛 卓少剑 《Journal of Shanghai Jiaotong university(Science)》 EI 2015年第1期1-6,共6页
The mass data of social media and social networks generated by users play an important role in tracking users’sentiments and opinions online.A good polarity lexicon which can effectively improve the classification re... The mass data of social media and social networks generated by users play an important role in tracking users’sentiments and opinions online.A good polarity lexicon which can effectively improve the classification results of sentiment analysis is indispensable to analyze the user’s sentiments.Inspired by social cognitive theories,we combine basic emotion value lexicon and social evidence lexicon to improve traditional polarity lexicon.The proposed method obtains significant improvement in Chinese text sentiment analysis by using the proposed lexicon and new syntactic analysis method. 展开更多
关键词 chinese text sentiment analysis emotion lexicon social cognitive theory emotion tendency
原文传递
Non-Independent Term Selection for Chinese Text Categorization 被引量:2
6
作者 李景阳 孙茂松 《Tsinghua Science and Technology》 SCIE EI CAS 2009年第1期113-120,共8页
Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams), which results in very slow training and working of modern high-performance classi... Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams), which results in very slow training and working of modern high-performance classifiers. This study assumes that this high-dimensionality problem is related to the redundancy in the term set, which cannot be solved by traditional term selection methods. A greedy algorithm framework named "non-independent term selection" is presented, which reduces the redundancy according to string-level correlations. Several preliminary implementations of this idea are demonstrated. Experiment results show that a good tradeoff can be reached between the performance and the size of the term set. 展开更多
关键词 chinese text categorization term selection dimentionality
原文传递
Intercultural Communication and Chinese Tourism Texts Translation
7
作者 梁靖华 《英语广场(学术研究)》 2012年第8期57-59,共3页
Intercultural communication language plays a crucial role in our global tourism.When we are doing translation we are doing intercultural communication in a sense,so it is necessary for translators to have intercultura... Intercultural communication language plays a crucial role in our global tourism.When we are doing translation we are doing intercultural communication in a sense,so it is necessary for translators to have intercultural communication awareness and be sensitive to the cultural elements in translation.Taking the perspective of intercultural communication,this paper analyses the cultural elements in Chinese tourism material translation in terms of culturally-loaded words and terms,and presents certain translation techniques a translator can use to deal with culturally-loaded words in their translation. 展开更多
关键词 Culturally-loaded words Intercultural communication chinese tourism texts TRANSLATION
下载PDF
Chinese News Text Classification Based on Convolutional Neural Network 被引量:1
8
作者 Hanxu Wang Xin Li 《Journal on Big Data》 2022年第1期41-60,共20页
With the explosive growth of Internet text information,the task of text classification is more important.As a part of text classification,Chinese news text classification also plays an important role.In public securit... With the explosive growth of Internet text information,the task of text classification is more important.As a part of text classification,Chinese news text classification also plays an important role.In public security work,public opinion news classification is an important topic.Effective and accurate classification of public opinion news is a necessary prerequisite for relevant departments to grasp the situation of public opinion and control the trend of public opinion in time.This paper introduces a combinedconvolutional neural network text classification model based on word2vec and improved TF-IDF:firstly,the word vector is trained through word2vec model,then the weight of each word is calculated by using the improved TFIDF algorithm based on class frequency variance,and the word vector and weight are combined to construct the text vector representation.Finally,the combined-convolutional neural network is used to train and test the Thucnews data set.The results show that the classification effect of this model is better than the traditional Text-RNN model,the traditional Text-CNN model and word2vec-CNN model.The test accuracy is 97.56%,the accuracy rate is 97%,the recall rate is 97%,and the F1-score is 97%. 展开更多
关键词 chinese news text classification word2vec model improved TF-IDF combined-convolutional neural network public opinion news
下载PDF
Digital Rights Management for a Chinese XML Text Centre
9
作者 Wai -man Wong(The Open University of Hong Kong Library,Hong Kong,China) 《现代图书情报技术》 CSSCI 北大核心 2002年第S1期172-177,共6页
The Electronic Text Centre of the OpenUniversity of Hong Kong(OUHK)has been in full operationsince early 2001.It currently houses 7,300+electronictexts,including free electronic titles,electronic titlespurchased direc... The Electronic Text Centre of the OpenUniversity of Hong Kong(OUHK)has been in full operationsince early 2001.It currently houses 7,300+electronictexts,including free electronic titles,electronic titlespurchased directly from the market,and about,1,000 locallyproduced electronic titles.The locally produced titles are notavailable in the market but require local digitization andnegotiation with publishers with regard to the right to use(RTU)them so as to meet the learning needs of the OUHKcommunity.Nearl... 展开更多
关键词 text Digital Rights Management for a chinese XML text Centre XML
下载PDF
On Translation of Classical Chinese Literary Texts
10
作者 Hu Shanshan Pang Wenfang Wang Diqiu 《International English Education Research》 2014年第12期26-28,共3页
Chinese classical literature is precious treasure of the world literature. In order to transmit and carry forward it, translation is an effective and necessary way, especially as the development ofglobalization and Ch... Chinese classical literature is precious treasure of the world literature. In order to transmit and carry forward it, translation is an effective and necessary way, especially as the development ofglobalization and China's economy. This paper mainly discusses the history, difficulties, ways and skills on translation of classical Chinese literary texts in this paper. 展开更多
关键词 Translation of classical chinese literary texts HISTORY DIFFICULTIES Ways and Skills
下载PDF
English Translation Study of Chinese Tourism Texts Based on the Skopos Theory--A Case Study of Hubei Scenic-spot Translation
11
作者 易威伟 《海外英语》 2018年第3期136-137,共2页
Nowadays, China has witnessed vigorous development in tourism industry, and it has made a great contribution to Chinese economic growth. In order to draw more foreign tourists and demonstrate the unique charm and cult... Nowadays, China has witnessed vigorous development in tourism industry, and it has made a great contribution to Chinese economic growth. In order to draw more foreign tourists and demonstrate the unique charm and cultural deposits of Chinese landscapes, the translators should capitalize on appropriate translation methods so as to guarantee the translation quality.The thesis analyzes the guiding role of Skopos Theory in tourism texts with a lot of examples, taking the Hubei scenic-spot translation as a carrier, which has important guiding significanse to translators. 展开更多
关键词 Skopos Theory chinese tourism texts Translation methods
下载PDF
Multi-Label Chinese Comments Categorization: Comparison of Multi-Label Learning Algorithms 被引量:4
12
作者 Jiahui He Chaozhi Wang +2 位作者 Hongyu Wu Leiming Yan Christian Lu 《Journal of New Media》 2019年第2期51-61,共11页
Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages suc... Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages such as English which use spaces to separate words.Before classifying text, it is necessary to perform a word segmentation operation to converta continuous language into a list of separate words and then convert it into a vector of acertain dimension. Generally, multi-label learning algorithms can be divided into twocategories, problem transformation methods and adapted algorithms. This work will usecustomer's comments about some hotels as a training data set, which contains labels for allaspects of the hotel evaluation, aiming to analyze and compare the performance of variousmulti-label learning algorithms on Chinese text classification. The experiment involves threebasic methods of problem transformation methods: Support Vector Machine, Random Forest,k-Nearest-Neighbor;and one adapted algorithm of Convolutional Neural Network. Theexperimental results show that the Support Vector Machine has better performance. 展开更多
关键词 Multi-label classification chinese text classification problem transformation adapted algorithms
下载PDF
Hierarchical Classification of Chinese Documents Based on N grams 被引量:1
13
作者 Zhou Shui geng 1, Guan Ji hong 2, He Yan xiang 2 1. State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China 2. School of Computer Science, Wuhan University, Wuhan 430072, China 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期416-422,共7页
We explore the techniques of utilizing N gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation process... We explore the techniques of utilizing N gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A hierarchical Chinese text classifier is implemented. Experimental results show that hierarchically classifying Chinese text documents based N grams can achieve satisfactory performance and outperforms the other traditional Chinese text classifiers. 展开更多
关键词 chinese text classification N grams feature selection hierarchical classification
下载PDF
A Short Text Classification Model Based on Chinese Part-of-Speech Information and Mutual Learning
14
作者 Yihe Deng Zuxu Dai 《国际计算机前沿大会会议论文集》 EI 2023年第2期330-343,共14页
Short text classification is one of the common tasks in natural language processing.Short text contains less information,and there is still much room for improvement in the performance of short text classification model... Short text classification is one of the common tasks in natural language processing.Short text contains less information,and there is still much room for improvement in the performance of short text classification models.This paper proposes a new short text classification model ML-BERT based on the idea of mutual learning.ML-BERT includes a BERT that only uses word vector informa-tion and a BERT that fuses word information and part-of-speech information and introduces transmissionflag to control the information transfer between the two BERTs to simulate the mutual learning process between the two models.Experi-mental results show that the ML-BERT model obtains a MAF1 score of 93.79%on the THUCNews dataset.Compared with the representative models Text-CNN,Text-RNN and BERT,the MAF1 score improves by 8.11%,6.69%and 1.69%,respectively. 展开更多
关键词 Natural language processing Neural network chinese short text classification BERT Mutual deep learning
原文传递
Identifying Proper Names Based on Association Analysis
15
作者 张云涛 龚玲 《Journal of Shanghai Jiaotong university(Science)》 EI 2007年第5期559-562,共4页
The issue of proper names recognition in Chinese text was discussed. An automatic approach based on association analysis to extract rules from corpus was presented. The method tries to discover rules relevant to exter... The issue of proper names recognition in Chinese text was discussed. An automatic approach based on association analysis to extract rules from corpus was presented. The method tries to discover rules relevant to external evidence by association analysis, without additional manual effort. These rules can be used to recognize the proper nouns in Chinese texts. The experimental result shows that our method is practical in some applications. Moreover, the method is language independent. 展开更多
关键词 named entity recognition natural language processing text processing chinese text proper name
下载PDF
A Text Zero-Watermarking Algorithm Based on Chinese Phonetic Alphabets 被引量:13
16
作者 ZHU Ping XIANG Guangli +3 位作者 SONG Wenna LI Ankang ZHANG Yuexin TAO Ran 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2016年第4期277-282,共6页
The text watermarking is a feasible method to protect the copyright from being copied and tampered. In this paper, a text zero-watermarking algorithm is proposed based on the connection between the Chinese characters ... The text watermarking is a feasible method to protect the copyright from being copied and tampered. In this paper, a text zero-watermarking algorithm is proposed based on the connection between the Chinese characters and the Chinese phonetic alphabets. According to the predefined interval threshold, the proposed algorithm extracts the characteristics of the text content by valuing on the basis of the custom of Chinese phonetic alphabets. After being chaotic transformed, the algorithm combines the text characteristics with the embedded watermarking information in the Chinese text. The experimental results show that the watermarking's capability of preventing tampering is up to 0.1%, which demonstrates the strong robustness and resistance to aggressive behavior of the algorithm. 展开更多
关键词 text watermarking chinese phonetic alphabets chaotic transformation text characteristic
原文传递
The text design for continuous speech database of standard Chinese
17
作者 ZU Yiqing(Institute of Linguistics, Chinese Academy of Social Sciences Beijing 100732) 《Chinese Journal of Acoustics》 1999年第1期56-69,共14页
Well developed continuous speech recognition and synthesis systems demand a high quality continuous speech database which is compact and valid, and whose scientific design would benefit from incorporating linguistic a... Well developed continuous speech recognition and synthesis systems demand a high quality continuous speech database which is compact and valid, and whose scientific design would benefit from incorporating linguistic and phonetic knowledge. It is argued that at the present stage the database should be limited to read speech. To describe those very complex variabilities in continuous speech, the following speech units are proposed: (1) 401syllables without tone; (2) 415 inter-syllabic diphones, (3) 3035 inter-syllabic triphones, (4) 781 inter-syllabic final-initial structures. The 17 basic sefltence patterns in standard Chinese are summarized to cover the most important prosodic phenomena. By using the automatic method,2393 sentences and 388 phrases are selected by above phonetic rules from a large corpus, which includes People's Daily in recent years, TV play scripts and dictionary entries, as the reading text of continuous speech recognition database in standard Chinese. This set of sentences and pbrases covers 99.8% syllables without counting tones, 100% inter-syllable diphones, 99.6% inter-syllable triphones and 100% sentence patterns. 展开更多
关键词 The text design for continuous speech database of standard chinese
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部