Through the exploration of project planning mode of agricultural sight-viewing garden based on characteristics,the development object of Taiwan peasants innovation park in Gucheng Town of Gaochun County in Nanjing Cit...Through the exploration of project planning mode of agricultural sight-viewing garden based on characteristics,the development object of Taiwan peasants innovation park in Gucheng Town of Gaochun County in Nanjing City was analyzed and summarized. Feasibility study was conducted on the programs,relevant infrastructure and overall construction of landscape in the park,from which reasonable recognition and suggestion was put forward. It provided conceptual reference model which could be applied in construction of agricultural sight-viewing theme park and characteristic agricultural sight-seeing tourist destination in Nanjing Region.展开更多
The tourism-oriented characteristic town is a main part of China's tourism development.Tourism-oriented characteristic towns take up the largest proportion in the characteristic towns and have the largest developm...The tourism-oriented characteristic town is a main part of China's tourism development.Tourism-oriented characteristic towns take up the largest proportion in the characteristic towns and have the largest development potential and value.Taking Jiangwan Town of Jiangxi Province as an example,this paper analyzed the factors influencing the development of tourism-oriented characteristic towns,including natural resources,regional conditions,market demands and perceived value.It refined and built the theme of Jiangwan Town: ecological history and culture tourism,a scenic spot of Fujian,Zhejiang and Anhui provinces,and experience-type farm culture and traditional ancient culture tourism.展开更多
Document classification is widely applied in many scientific areas and academic environments, using NLP techniques and term extraction algorithms like CValue, TfIdf, TermEx, GlossEx, Weirdness and the others like. Nev...Document classification is widely applied in many scientific areas and academic environments, using NLP techniques and term extraction algorithms like CValue, TfIdf, TermEx, GlossEx, Weirdness and the others like. Nevertheless, they mainly have weaknesses in extracting most important terms when input text has not been rectified grammatically, or even has non-alphabetic methodical and math or chemical notations, and cross-domain inference of terms and phrases. In this paper, we propose a novel Text-Categorization and Term-Extraction method based on human-expert choice of classified categories. Papers are the training phase substances of the proposed algorithm. They have been already labeled with some scientific pre-defined field specific categories, by a human expert, especially one with high experiences and researches and surveys in the field. Our approach thereafter extracts (concept) terms of the labeled papers of each category and assigns all to the category. Categorization of test papers is then applied based on their extracted terms and further comparing with each category’s terms. Besides, our approach will produce semantic enabled outputs that are useful for many goals such as knowledge bases and data sets complement of the Linked Data cloud and for semantic querying of them by some languages such as SparQL. Besides, further finding classified papers’ gained topic or class will be easy by using URIs contained in the ontological outputs. The experimental results, comparing LPTC with five well-known term extraction algorithms by measuring precision and recall, show that categorization effectiveness can be achieved using our approach. In other words, the method LPTC is significantly superior to CValue, TfIdf, TermEx, GlossEx and Weirdness in the target study. As well, we conclude that higher number of papers for training, even higher precision we have.展开更多
Theme is the point of departure of a message, and it is a textual phenomenon. It can interpret the comprehension and production of texts. This paper first makes a thematic analysis of the English original text and the...Theme is the point of departure of a message, and it is a textual phenomenon. It can interpret the comprehension and production of texts. This paper first makes a thematic analysis of the English original text and the Chinese target text, and then makes a comparative study of them in terms of the textual properties. It is found that in reproduction of target text, theme equivalence can not only help to realize the coherence of the target text with the same effect on receptors but also can save translation efforts. It is unnecessary to change the theme-rheme arrangements of the original unless it is required due to the social and cultural differences between English and Chinese.展开更多
传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需...传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需求的文本主题内容的方法,以推动数字人文研究的进一步发展。首先,选取本课题组前期标注的古籍语料数据进行主题类别标注和视图分类;其次,构建融合BERT(bidirectional encoder representation from transformers)预训练模型、改进卷积神经网络、循环神经网络和多头注意力机制的语义挖掘模型;最后,融入“主体-关系-客体”多视图的语义增强模型,构建DJ-TextRCNN(DianJi-recurrent convolutional neural networks for text classification)模型实现对典籍文本更细粒度、更深层次、更多维度的语义挖掘。研究结果发现,DJ-TextRCNN模型在不同视图下的古籍主题推荐任务的准确率均为最优。在“主体-关系-客体”视图下,精确率达到88.54%,初步实现了对古籍文本的精准主题推荐,对中华文化深层次、细粒度的语义挖掘具有一定的指导意义。展开更多
Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid mo...Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model.展开更多
The development of various applications based on social network text is in full swing.Studying text features and classifications is of great value to extract important information.This paper mainly introduces the comm...The development of various applications based on social network text is in full swing.Studying text features and classifications is of great value to extract important information.This paper mainly introduces the common feature selection algorithms and feature representation methods,and introduces the basic principles,advantages and disadvantages of SVM and KNN,and the evaluation indexes of classification algorithms.In the aspect of mutual information feature selection function,it describes its processing flow,shortcomings and optimization improvements.In view of its weakness in not balancing the positive and negative correlation characteristics,a balance weight attribute factor and feature difference factor are introduced to make up for its deficiency.The experimental stage mainly describes the specific process:the word segmentation processing,to disuse words,using various feature selection algorithms,including optimized mutual information,and weighted with TF-IDF.Under the two classification algorithms of SVM and KNN,we compare the merits and demerits of all the feature selection algorithms according to the evaluation index.Experiments show that the optimized mutual information feature selection has good performance and is better than KNN under the SVM classification algorithm.This experiment proves its validity.展开更多
Text extraction from images using the traditional techniques of image collecting,and pattern recognition using machine learning consume time due to the amount of extracted features from the images.Deep Neural Networks...Text extraction from images using the traditional techniques of image collecting,and pattern recognition using machine learning consume time due to the amount of extracted features from the images.Deep Neural Networks introduce effective solutions to extract text features from images using a few techniques and the ability to train large datasets of images with significant results.This study proposes using Dual Maxpooling and concatenating convolution Neural Networks(CNN)layers with the activation functions Relu and the Optimized Leaky Relu(OLRelu).The proposed method works by dividing the word image into slices that contain characters.Then pass them to deep learning layers to extract feature maps and reform the predicted words.Bidirectional Short Memory(BiLSTM)layers extractmore compelling features and link the time sequence fromforward and backward directions during the training phase.The Connectionist Temporal Classification(CTC)function calcifies the training and validation loss rates.In addition to decoding the extracted feature to reform characters again and linking them according to their time sequence.The proposed model performance is evaluated using training and validation loss errors on the Mjsynth and Integrated Argument Mining Tasks(IAM)datasets.The result of IAM was 2.09%for the average loss errors with the proposed dualMaxpooling and OLRelu.In the Mjsynth dataset,the best validation loss rate shrunk to 2.2%by applying concatenating CNN layers,and Relu.展开更多
文摘Through the exploration of project planning mode of agricultural sight-viewing garden based on characteristics,the development object of Taiwan peasants innovation park in Gucheng Town of Gaochun County in Nanjing City was analyzed and summarized. Feasibility study was conducted on the programs,relevant infrastructure and overall construction of landscape in the park,from which reasonable recognition and suggestion was put forward. It provided conceptual reference model which could be applied in construction of agricultural sight-viewing theme park and characteristic agricultural sight-seeing tourist destination in Nanjing Region.
文摘The tourism-oriented characteristic town is a main part of China's tourism development.Tourism-oriented characteristic towns take up the largest proportion in the characteristic towns and have the largest development potential and value.Taking Jiangwan Town of Jiangxi Province as an example,this paper analyzed the factors influencing the development of tourism-oriented characteristic towns,including natural resources,regional conditions,market demands and perceived value.It refined and built the theme of Jiangwan Town: ecological history and culture tourism,a scenic spot of Fujian,Zhejiang and Anhui provinces,and experience-type farm culture and traditional ancient culture tourism.
文摘Document classification is widely applied in many scientific areas and academic environments, using NLP techniques and term extraction algorithms like CValue, TfIdf, TermEx, GlossEx, Weirdness and the others like. Nevertheless, they mainly have weaknesses in extracting most important terms when input text has not been rectified grammatically, or even has non-alphabetic methodical and math or chemical notations, and cross-domain inference of terms and phrases. In this paper, we propose a novel Text-Categorization and Term-Extraction method based on human-expert choice of classified categories. Papers are the training phase substances of the proposed algorithm. They have been already labeled with some scientific pre-defined field specific categories, by a human expert, especially one with high experiences and researches and surveys in the field. Our approach thereafter extracts (concept) terms of the labeled papers of each category and assigns all to the category. Categorization of test papers is then applied based on their extracted terms and further comparing with each category’s terms. Besides, our approach will produce semantic enabled outputs that are useful for many goals such as knowledge bases and data sets complement of the Linked Data cloud and for semantic querying of them by some languages such as SparQL. Besides, further finding classified papers’ gained topic or class will be easy by using URIs contained in the ontological outputs. The experimental results, comparing LPTC with five well-known term extraction algorithms by measuring precision and recall, show that categorization effectiveness can be achieved using our approach. In other words, the method LPTC is significantly superior to CValue, TfIdf, TermEx, GlossEx and Weirdness in the target study. As well, we conclude that higher number of papers for training, even higher precision we have.
文摘Theme is the point of departure of a message, and it is a textual phenomenon. It can interpret the comprehension and production of texts. This paper first makes a thematic analysis of the English original text and the Chinese target text, and then makes a comparative study of them in terms of the textual properties. It is found that in reproduction of target text, theme equivalence can not only help to realize the coherence of the target text with the same effect on receptors but also can save translation efforts. It is unnecessary to change the theme-rheme arrangements of the original unless it is required due to the social and cultural differences between English and Chinese.
文摘传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需求的文本主题内容的方法,以推动数字人文研究的进一步发展。首先,选取本课题组前期标注的古籍语料数据进行主题类别标注和视图分类;其次,构建融合BERT(bidirectional encoder representation from transformers)预训练模型、改进卷积神经网络、循环神经网络和多头注意力机制的语义挖掘模型;最后,融入“主体-关系-客体”多视图的语义增强模型,构建DJ-TextRCNN(DianJi-recurrent convolutional neural networks for text classification)模型实现对典籍文本更细粒度、更深层次、更多维度的语义挖掘。研究结果发现,DJ-TextRCNN模型在不同视图下的古籍主题推荐任务的准确率均为最优。在“主体-关系-客体”视图下,精确率达到88.54%,初步实现了对古籍文本的精准主题推荐,对中华文化深层次、细粒度的语义挖掘具有一定的指导意义。
基金Fundamental Research Funds for the Central University,China(No.2232018D3-17)。
文摘Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model.
文摘The development of various applications based on social network text is in full swing.Studying text features and classifications is of great value to extract important information.This paper mainly introduces the common feature selection algorithms and feature representation methods,and introduces the basic principles,advantages and disadvantages of SVM and KNN,and the evaluation indexes of classification algorithms.In the aspect of mutual information feature selection function,it describes its processing flow,shortcomings and optimization improvements.In view of its weakness in not balancing the positive and negative correlation characteristics,a balance weight attribute factor and feature difference factor are introduced to make up for its deficiency.The experimental stage mainly describes the specific process:the word segmentation processing,to disuse words,using various feature selection algorithms,including optimized mutual information,and weighted with TF-IDF.Under the two classification algorithms of SVM and KNN,we compare the merits and demerits of all the feature selection algorithms according to the evaluation index.Experiments show that the optimized mutual information feature selection has good performance and is better than KNN under the SVM classification algorithm.This experiment proves its validity.
基金supported this project under the Fundamental Research Grant Scheme(FRGS)FRGS/1/2019/ICT02/UKM/02/9 entitled“Convolution Neural Network Enhancement Based on Adaptive Convexity and Regularization Functions for Fake Video Analytics”.This grant was received by Prof.Assis.Dr.S.N.H.Sheikh Abdullah,https://www.ukm.my/spifper/research_news/instrumentfunds.
文摘Text extraction from images using the traditional techniques of image collecting,and pattern recognition using machine learning consume time due to the amount of extracted features from the images.Deep Neural Networks introduce effective solutions to extract text features from images using a few techniques and the ability to train large datasets of images with significant results.This study proposes using Dual Maxpooling and concatenating convolution Neural Networks(CNN)layers with the activation functions Relu and the Optimized Leaky Relu(OLRelu).The proposed method works by dividing the word image into slices that contain characters.Then pass them to deep learning layers to extract feature maps and reform the predicted words.Bidirectional Short Memory(BiLSTM)layers extractmore compelling features and link the time sequence fromforward and backward directions during the training phase.The Connectionist Temporal Classification(CTC)function calcifies the training and validation loss rates.In addition to decoding the extracted feature to reform characters again and linking them according to their time sequence.The proposed model performance is evaluated using training and validation loss errors on the Mjsynth and Integrated Argument Mining Tasks(IAM)datasets.The result of IAM was 2.09%for the average loss errors with the proposed dualMaxpooling and OLRelu.In the Mjsynth dataset,the best validation loss rate shrunk to 2.2%by applying concatenating CNN layers,and Relu.