Document processing in natural language includes retrieval,sentiment analysis,theme extraction,etc.Classical methods for handling these tasks are based on models of probability,semantics and networks for machine learn...Document processing in natural language includes retrieval,sentiment analysis,theme extraction,etc.Classical methods for handling these tasks are based on models of probability,semantics and networks for machine learning.The probability model is loss of semantic information in essential,and it influences the processing accuracy.Machine learning approaches include supervised,unsupervised,and semi-supervised approaches,labeled corpora is necessary for semantics model and supervised learning.The method for achieving a reliably labeled corpus is done manually,it is costly and time-consuming because people have to read each document and annotate the label of each document.Recently,the continuous CBOW model is efficient for learning high-quality distributed vector representations,and it can capture a large number of precise syntactic and semantic word relationships,this model can be easily extended to learn paragraph vector,but it is not precise.Towards these problems,this paper is devoted to developing a new model for learning paragraph vector,we combine the CBOW model and CNNs to establish a new deep learning model.Experimental results show that paragraph vector generated by the new model is better than the paragraph vector generated by CBOW model in semantic relativeness and accuracy.展开更多
A complete English Paragraph consists of five parts:opening sentence,topic sentence,supporting sentences,concluding sentence(and sometimes transitional sentence).The paper explores the construction rules of paragraph ...A complete English Paragraph consists of five parts:opening sentence,topic sentence,supporting sentences,concluding sentence(and sometimes transitional sentence).The paper explores the construction rules of paragraph and explains its implementation techniques.The significance lies in that it is helpful to the teaching and study in some fields like reading comprehension,speaking and writing.展开更多
Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relat...Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relation extraction.But multi-instance multi-label learning only uses hidden variables when inference relation between entities,which could not make full use of training data.Besides,traditional lexical and syntactic features are defective reflecting domain knowledge and global information of sentence,which limits the system’s performance.This paper presents a novel approach for multi-instance multilabel learning,which takes the idea of fuzzy classification.We use cluster center as train-data and in this way we can adequately utilize sentencelevel features.Meanwhile,we extend feature set by paragraph vector,which carries semantic information of sentences.We conduct an extensive empirical study to verify our contributions.The result shows our method is superior to the state-of-the-art distant supervised baseline.展开更多
This essay aims at analyzing the stylistic features of Hemingway’s In Another Country from linguistic point of view.Different from qualitative analysis,the first paragraph,which is one of the rare depictions of the e...This essay aims at analyzing the stylistic features of Hemingway’s In Another Country from linguistic point of view.Different from qualitative analysis,the first paragraph,which is one of the rare depictions of the environment in Hemingway’s works,is chosen as a sample to perform quantitative analysis based on AntConc,whose data shows that Hemingway prefers words of Anglo-Saxon origin and using repetition and polysemy,revealing his writing“principle of iceberg”.展开更多
Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to...Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods.展开更多
This paper raises, for the first time, with a model reading passage, a three-layer passage reading method (3LPRM in brief ) in English reading: reading beyond paragraphs; reading within paragraphs and sentences, an...This paper raises, for the first time, with a model reading passage, a three-layer passage reading method (3LPRM in brief ) in English reading: reading beyond paragraphs; reading within paragraphs and sentences, and reading with coherences of words and expressions. The paper illustrates the definition, contents, characteristics and application of the method which is easy to master and apply. The paper aims at supplying readers especially beginners of E/S/FL (English as second/foreign language) with a practical passage reading method so as to improve their reading efficiency. The paper has some practical significance and directing value.展开更多
基金The authors would like to thank all anonymous reviewers for their suggestions and feedback.This work Supported by the National Natural Science,Foundation of China(No.61379052,61379103)the National Key Research and Development Program(2016YFB1000101)+1 种基金The Natural Science Foundation for Distinguished Young Scholars of Hunan Province(Grant No.14JJ1026)Specialized Research Fund for the Doctoral Program of Higher Education(Grant No.20124307110015).
文摘Document processing in natural language includes retrieval,sentiment analysis,theme extraction,etc.Classical methods for handling these tasks are based on models of probability,semantics and networks for machine learning.The probability model is loss of semantic information in essential,and it influences the processing accuracy.Machine learning approaches include supervised,unsupervised,and semi-supervised approaches,labeled corpora is necessary for semantics model and supervised learning.The method for achieving a reliably labeled corpus is done manually,it is costly and time-consuming because people have to read each document and annotate the label of each document.Recently,the continuous CBOW model is efficient for learning high-quality distributed vector representations,and it can capture a large number of precise syntactic and semantic word relationships,this model can be easily extended to learn paragraph vector,but it is not precise.Towards these problems,this paper is devoted to developing a new model for learning paragraph vector,we combine the CBOW model and CNNs to establish a new deep learning model.Experimental results show that paragraph vector generated by the new model is better than the paragraph vector generated by CBOW model in semantic relativeness and accuracy.
基金The paper is funded by the projects of East China University of Science and Technology:JGS01201001,YS80222301901001,YS0125322 and YS50222361903001.
文摘A complete English Paragraph consists of five parts:opening sentence,topic sentence,supporting sentences,concluding sentence(and sometimes transitional sentence).The paper explores the construction rules of paragraph and explains its implementation techniques.The significance lies in that it is helpful to the teaching and study in some fields like reading comprehension,speaking and writing.
文摘Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relation extraction.But multi-instance multi-label learning only uses hidden variables when inference relation between entities,which could not make full use of training data.Besides,traditional lexical and syntactic features are defective reflecting domain knowledge and global information of sentence,which limits the system’s performance.This paper presents a novel approach for multi-instance multilabel learning,which takes the idea of fuzzy classification.We use cluster center as train-data and in this way we can adequately utilize sentencelevel features.Meanwhile,we extend feature set by paragraph vector,which carries semantic information of sentences.We conduct an extensive empirical study to verify our contributions.The result shows our method is superior to the state-of-the-art distant supervised baseline.
文摘This essay aims at analyzing the stylistic features of Hemingway’s In Another Country from linguistic point of view.Different from qualitative analysis,the first paragraph,which is one of the rare depictions of the environment in Hemingway’s works,is chosen as a sample to perform quantitative analysis based on AntConc,whose data shows that Hemingway prefers words of Anglo-Saxon origin and using repetition and polysemy,revealing his writing“principle of iceberg”.
基金supported in part by National Natural Science Foundation of China(Nos.61721004,61976214,62076078 and 62176246).
文摘Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods.
文摘This paper raises, for the first time, with a model reading passage, a three-layer passage reading method (3LPRM in brief ) in English reading: reading beyond paragraphs; reading within paragraphs and sentences, and reading with coherences of words and expressions. The paper illustrates the definition, contents, characteristics and application of the method which is easy to master and apply. The paper aims at supplying readers especially beginners of E/S/FL (English as second/foreign language) with a practical passage reading method so as to improve their reading efficiency. The paper has some practical significance and directing value.