This paper raises, for the first time, with a model reading passage, a three-layer passage reading method (3LPRM in brief ) in English reading: reading beyond paragraphs; reading within paragraphs and sentences, an...This paper raises, for the first time, with a model reading passage, a three-layer passage reading method (3LPRM in brief ) in English reading: reading beyond paragraphs; reading within paragraphs and sentences, and reading with coherences of words and expressions. The paper illustrates the definition, contents, characteristics and application of the method which is easy to master and apply. The paper aims at supplying readers especially beginners of E/S/FL (English as second/foreign language) with a practical passage reading method so as to improve their reading efficiency. The paper has some practical significance and directing value.展开更多
Document processing in natural language includes retrieval,sentiment analysis,theme extraction,etc.Classical methods for handling these tasks are based on models of probability,semantics and networks for machine learn...Document processing in natural language includes retrieval,sentiment analysis,theme extraction,etc.Classical methods for handling these tasks are based on models of probability,semantics and networks for machine learning.The probability model is loss of semantic information in essential,and it influences the processing accuracy.Machine learning approaches include supervised,unsupervised,and semi-supervised approaches,labeled corpora is necessary for semantics model and supervised learning.The method for achieving a reliably labeled corpus is done manually,it is costly and time-consuming because people have to read each document and annotate the label of each document.Recently,the continuous CBOW model is efficient for learning high-quality distributed vector representations,and it can capture a large number of precise syntactic and semantic word relationships,this model can be easily extended to learn paragraph vector,but it is not precise.Towards these problems,this paper is devoted to developing a new model for learning paragraph vector,we combine the CBOW model and CNNs to establish a new deep learning model.Experimental results show that paragraph vector generated by the new model is better than the paragraph vector generated by CBOW model in semantic relativeness and accuracy.展开更多
A complete English Paragraph consists of five parts:opening sentence,topic sentence,supporting sentences,concluding sentence(and sometimes transitional sentence).The paper explores the construction rules of paragraph ...A complete English Paragraph consists of five parts:opening sentence,topic sentence,supporting sentences,concluding sentence(and sometimes transitional sentence).The paper explores the construction rules of paragraph and explains its implementation techniques.The significance lies in that it is helpful to the teaching and study in some fields like reading comprehension,speaking and writing.展开更多
This paper argues for an overt innovational shift in praxis, as well as classroom configuration in the ESOL writing class by calling for a move away from the current foci on process-based pedagogies for newcomer popul...This paper argues for an overt innovational shift in praxis, as well as classroom configuration in the ESOL writing class by calling for a move away from the current foci on process-based pedagogies for newcomer populations, to an explicit teaching of modeling strategies with concomitant practice opportunities provided in the ESOL writing class. It is argued that explicit, sequenced instruction in the domains of rhetorical structure as well as grammatical accuracy provide ESOL (English for Speakers of Other Languages) learners in the emerging stages of language learning with a more concrete grasp of meaning, structure and grammar in rhetorical construction. The modeling strategies proposed in the paper focus on a simultaneous building of rhetorical fluency and grammatical accuracy via spotlighted and sequenced strategies which afford learners practice in smaller chunks of composition including but not limited to thesis statement writing, varied paragraph organization, multiple modes of exposition, and grammatical complexity all in a bid to generate rhetorical depth and grammatical detail in writing. In short, both form and function need to be explicitly taught in the ESOL writing class with adequate opportunities provided for rhetorical practice. Using a meticulous blend of meaningful, authentic and purposeful tasks combined with one-on-one instruction which incorporates a variety of visual and rhetorical modeling strategies, emerging writers, it is argued a move from controlled to automatic writing fluency within a short time span. The pedagogy proposed in the current paper spotlights the specific learner, rather than the writing process and entails a move away from traditional, teacher-fronted classrooms to targeted, workshop-centered configurations which permit for one-on-one conferencing in the ESOL writing class. The visually rendered modeling strategies proposed in this paper argue for writing instruction for ESOL students which is learner responsive, relevant and practical.展开更多
Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relat...Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relation extraction.But multi-instance multi-label learning only uses hidden variables when inference relation between entities,which could not make full use of training data.Besides,traditional lexical and syntactic features are defective reflecting domain knowledge and global information of sentence,which limits the system’s performance.This paper presents a novel approach for multi-instance multilabel learning,which takes the idea of fuzzy classification.We use cluster center as train-data and in this way we can adequately utilize sentencelevel features.Meanwhile,we extend feature set by paragraph vector,which carries semantic information of sentences.We conduct an extensive empirical study to verify our contributions.The result shows our method is superior to the state-of-the-art distant supervised baseline.展开更多
This essay aims at analyzing the stylistic features of Hemingway’s In Another Country from linguistic point of view.Different from qualitative analysis,the first paragraph,which is one of the rare depictions of the e...This essay aims at analyzing the stylistic features of Hemingway’s In Another Country from linguistic point of view.Different from qualitative analysis,the first paragraph,which is one of the rare depictions of the environment in Hemingway’s works,is chosen as a sample to perform quantitative analysis based on AntConc,whose data shows that Hemingway prefers words of Anglo-Saxon origin and using repetition and polysemy,revealing his writing“principle of iceberg”.展开更多
Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to...Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods.展开更多
The risk classification of BBS posts is important to the evaluation of societal risk level within a period. Using the posts collected from Tianya forum as the data source, the authors adopted the societal risk indicat...The risk classification of BBS posts is important to the evaluation of societal risk level within a period. Using the posts collected from Tianya forum as the data source, the authors adopted the societal risk indicators from socio psychology, and conduct document-level multiple societal risk classification of BBS posts. To effectively capture the semantics and word order of documents, a shallow neural network as Paragraph Vector is applied to realize the distributed vector representations of the posts in the vector space. Based on the document vectors, the authors apply one classification method KNN to identify the societal risk category of the posts. The experimental results reveal that paragraph vector in document-level societal risk classification achieves much faster training speed and at least 10% improvements of F-measures than Bag-of-Words. Furthermore, the performance of paragraph vector is also superior to edit distance and Lucene-based search method. The present work is the first attempt of combining document embedding method with socio psychology research results to public opinions area.展开更多
Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representati...Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations: string representation, term-frequency representation, TF-IDF representation and the distributed representation of BBS posts are applied. Using edit distance or cosine similarity as distance metric, four k-Nearest Neighbor (kNN) classifiers based on different representations are developed and compared. Owing to the priority of word order and semantic extraction of the neural network model Paragraph Vector, kNN based on the distributed representation generated by Paragraph Vector (kNN-PV) shows effectiveness for societal risk classification. Furthermore, to improve the performance of societal risk classification, through different weights, kNN-PV is combined with other three kNN classifiers as an ensemble model. Through brute force grid search method, the optimal weights are assigned to different kNN classifiers. Compared with kNN-PV, the experimental results reveal that Macro-F of the ensemble method is significantly improved for societal risk classification.展开更多
Rhetorical features of Chinese writers’essays have been studied for decades but inconsistent interpretations of deduction and induction lead to controversial results.Taking a comparative rhetoric perspective,this pap...Rhetorical features of Chinese writers’essays have been studied for decades but inconsistent interpretations of deduction and induction lead to controversial results.Taking a comparative rhetoric perspective,this paper clarifies the notions of deduction and induction and investigates what rhetorical features characterize Chinese expository paragraphs besides deduction and induction and whether Chinese EFL learners’English paragraphs have similar features.Two kinds of data sources were used—29 full-score Chinese expositions in College Entrance Examinations and 29 English expositions written by Chinese EFL learners.The results show that deduction is preferred in both Chinese and EFL writing,and that rhetorical paragraphs and coordinate paragraphs are particular to Chinese writing while the EFL learners’paragraphs display hybrid rhetoric such as semi-coordination.It is concluded that neither Chinese paragraphs nor EFL ones are similar to the modern English rhetorical paradigm,and English rhetoric instruction will facilitate the introspection of the two kinds of rhetoric.展开更多
文摘This paper raises, for the first time, with a model reading passage, a three-layer passage reading method (3LPRM in brief ) in English reading: reading beyond paragraphs; reading within paragraphs and sentences, and reading with coherences of words and expressions. The paper illustrates the definition, contents, characteristics and application of the method which is easy to master and apply. The paper aims at supplying readers especially beginners of E/S/FL (English as second/foreign language) with a practical passage reading method so as to improve their reading efficiency. The paper has some practical significance and directing value.
基金The authors would like to thank all anonymous reviewers for their suggestions and feedback.This work Supported by the National Natural Science,Foundation of China(No.61379052,61379103)the National Key Research and Development Program(2016YFB1000101)+1 种基金The Natural Science Foundation for Distinguished Young Scholars of Hunan Province(Grant No.14JJ1026)Specialized Research Fund for the Doctoral Program of Higher Education(Grant No.20124307110015).
文摘Document processing in natural language includes retrieval,sentiment analysis,theme extraction,etc.Classical methods for handling these tasks are based on models of probability,semantics and networks for machine learning.The probability model is loss of semantic information in essential,and it influences the processing accuracy.Machine learning approaches include supervised,unsupervised,and semi-supervised approaches,labeled corpora is necessary for semantics model and supervised learning.The method for achieving a reliably labeled corpus is done manually,it is costly and time-consuming because people have to read each document and annotate the label of each document.Recently,the continuous CBOW model is efficient for learning high-quality distributed vector representations,and it can capture a large number of precise syntactic and semantic word relationships,this model can be easily extended to learn paragraph vector,but it is not precise.Towards these problems,this paper is devoted to developing a new model for learning paragraph vector,we combine the CBOW model and CNNs to establish a new deep learning model.Experimental results show that paragraph vector generated by the new model is better than the paragraph vector generated by CBOW model in semantic relativeness and accuracy.
基金The paper is funded by the projects of East China University of Science and Technology:JGS01201001,YS80222301901001,YS0125322 and YS50222361903001.
文摘A complete English Paragraph consists of five parts:opening sentence,topic sentence,supporting sentences,concluding sentence(and sometimes transitional sentence).The paper explores the construction rules of paragraph and explains its implementation techniques.The significance lies in that it is helpful to the teaching and study in some fields like reading comprehension,speaking and writing.
文摘This paper argues for an overt innovational shift in praxis, as well as classroom configuration in the ESOL writing class by calling for a move away from the current foci on process-based pedagogies for newcomer populations, to an explicit teaching of modeling strategies with concomitant practice opportunities provided in the ESOL writing class. It is argued that explicit, sequenced instruction in the domains of rhetorical structure as well as grammatical accuracy provide ESOL (English for Speakers of Other Languages) learners in the emerging stages of language learning with a more concrete grasp of meaning, structure and grammar in rhetorical construction. The modeling strategies proposed in the paper focus on a simultaneous building of rhetorical fluency and grammatical accuracy via spotlighted and sequenced strategies which afford learners practice in smaller chunks of composition including but not limited to thesis statement writing, varied paragraph organization, multiple modes of exposition, and grammatical complexity all in a bid to generate rhetorical depth and grammatical detail in writing. In short, both form and function need to be explicitly taught in the ESOL writing class with adequate opportunities provided for rhetorical practice. Using a meticulous blend of meaningful, authentic and purposeful tasks combined with one-on-one instruction which incorporates a variety of visual and rhetorical modeling strategies, emerging writers, it is argued a move from controlled to automatic writing fluency within a short time span. The pedagogy proposed in the current paper spotlights the specific learner, rather than the writing process and entails a move away from traditional, teacher-fronted classrooms to targeted, workshop-centered configurations which permit for one-on-one conferencing in the ESOL writing class. The visually rendered modeling strategies proposed in this paper argue for writing instruction for ESOL students which is learner responsive, relevant and practical.
文摘Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relation extraction.But multi-instance multi-label learning only uses hidden variables when inference relation between entities,which could not make full use of training data.Besides,traditional lexical and syntactic features are defective reflecting domain knowledge and global information of sentence,which limits the system’s performance.This paper presents a novel approach for multi-instance multilabel learning,which takes the idea of fuzzy classification.We use cluster center as train-data and in this way we can adequately utilize sentencelevel features.Meanwhile,we extend feature set by paragraph vector,which carries semantic information of sentences.We conduct an extensive empirical study to verify our contributions.The result shows our method is superior to the state-of-the-art distant supervised baseline.
文摘This essay aims at analyzing the stylistic features of Hemingway’s In Another Country from linguistic point of view.Different from qualitative analysis,the first paragraph,which is one of the rare depictions of the environment in Hemingway’s works,is chosen as a sample to perform quantitative analysis based on AntConc,whose data shows that Hemingway prefers words of Anglo-Saxon origin and using repetition and polysemy,revealing his writing“principle of iceberg”.
基金supported in part by National Natural Science Foundation of China(Nos.61721004,61976214,62076078 and 62176246).
文摘Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods.
基金supported by the National Natural Science Foundation of China under Grant Nos.71171187,71371107,and 61473284
文摘The risk classification of BBS posts is important to the evaluation of societal risk level within a period. Using the posts collected from Tianya forum as the data source, the authors adopted the societal risk indicators from socio psychology, and conduct document-level multiple societal risk classification of BBS posts. To effectively capture the semantics and word order of documents, a shallow neural network as Paragraph Vector is applied to realize the distributed vector representations of the posts in the vector space. Based on the document vectors, the authors apply one classification method KNN to identify the societal risk category of the posts. The experimental results reveal that paragraph vector in document-level societal risk classification achieves much faster training speed and at least 10% improvements of F-measures than Bag-of-Words. Furthermore, the performance of paragraph vector is also superior to edit distance and Lucene-based search method. The present work is the first attempt of combining document embedding method with socio psychology research results to public opinions area.
基金This study is supported by the National Key Research and Development Program of China under grant No. 2016YFB1000902 and National Natural Science Foundation of China under grant Nos. 61473284, 71601023 and 71371107.
文摘Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations: string representation, term-frequency representation, TF-IDF representation and the distributed representation of BBS posts are applied. Using edit distance or cosine similarity as distance metric, four k-Nearest Neighbor (kNN) classifiers based on different representations are developed and compared. Owing to the priority of word order and semantic extraction of the neural network model Paragraph Vector, kNN based on the distributed representation generated by Paragraph Vector (kNN-PV) shows effectiveness for societal risk classification. Furthermore, to improve the performance of societal risk classification, through different weights, kNN-PV is combined with other three kNN classifiers as an ensemble model. Through brute force grid search method, the optimal weights are assigned to different kNN classifiers. Compared with kNN-PV, the experimental results reveal that Macro-F of the ensemble method is significantly improved for societal risk classification.
基金supported by the Social Science Department of the Ministry of Education of China[Grant No.:16YJA740022].
文摘Rhetorical features of Chinese writers’essays have been studied for decades but inconsistent interpretations of deduction and induction lead to controversial results.Taking a comparative rhetoric perspective,this paper clarifies the notions of deduction and induction and investigates what rhetorical features characterize Chinese expository paragraphs besides deduction and induction and whether Chinese EFL learners’English paragraphs have similar features.Two kinds of data sources were used—29 full-score Chinese expositions in College Entrance Examinations and 29 English expositions written by Chinese EFL learners.The results show that deduction is preferred in both Chinese and EFL writing,and that rhetorical paragraphs and coordinate paragraphs are particular to Chinese writing while the EFL learners’paragraphs display hybrid rhetoric such as semi-coordination.It is concluded that neither Chinese paragraphs nor EFL ones are similar to the modern English rhetorical paradigm,and English rhetoric instruction will facilitate the introspection of the two kinds of rhetoric.