Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir...Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.展开更多
In order to convey complete meanings,there is a phenomenon in Chinese of using multiple running sentences.Xu Jingning(2023,p.66)states,“In communication,a complete expression of meaning often requires more than one c...In order to convey complete meanings,there is a phenomenon in Chinese of using multiple running sentences.Xu Jingning(2023,p.66)states,“In communication,a complete expression of meaning often requires more than one clause,which is common in human languages.”Domestic research on running sentences includes discussions on defining the concept and structural features of running sentences,sentence properties,sentence pattern classifications and their criteria,as well as issues related to translating running sentences into English.This article primarily focuses on scholarly research into the English translation of running sentences in China,highlighting recent achievements and identifying existing issues in the study of running sentence translation.However,by reviewing literature on the translation of running sentences,it is found that current research in the academic community on non-core running sentences is limited.Therefore,this paper proposes relevant strategies to address this issue.展开更多
Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input source.The contents of video captioning are limited since few studies employed external corpus information to guide...Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input source.The contents of video captioning are limited since few studies employed external corpus information to guide the generation of video captioning,which is not conducive to the accurate descrip-tion and understanding of video content.To address this issue,a novel video captioning method guided by a sentence retrieval generation network(ED-SRG)is proposed in this paper.First,a ResNeXt network model,an efficient convolutional network for online video understanding(ECO)model,and a long short-term memory(LSTM)network model are integrated to construct an encoder-decoder,which is utilized to extract the 2D features,3D features,and object features of video data respectively.These features are decoded to generate textual sentences that conform to video content for sentence retrieval.Then,a sentence-transformer network model is employed to retrieve different sentences in an external corpus that are semantically similar to the above textual sentences.The candidate sentences are screened out through similarity measurement.Finally,a novel GPT-2 network model is constructed based on GPT-2 network structure.The model introduces a designed random selector to randomly select predicted words with a high probability in the corpus,which is used to guide and generate textual sentences that are more in line with human natural language expressions.The proposed method in this paper is compared with several existing works by experiments.The results show that the indicators BLEU-4,CIDEr,ROUGE_L,and METEOR are improved by 3.1%,1.3%,0.3%,and 1.5%on a public dataset MSVD and 1.3%,0.5%,0.2%,1.9%on a public dataset MSR-VTT respectively.It can be seen that the proposed method in this paper can generate video captioning with richer semantics than several state-of-the-art approaches.展开更多
We use a lot of devices in our daily life to communicate with others. In this modern world, people use email, Facebook, Twitter, and many other social network sites for exchanging information. People lose their valuab...We use a lot of devices in our daily life to communicate with others. In this modern world, people use email, Facebook, Twitter, and many other social network sites for exchanging information. People lose their valuable time misspelling and retyping, and some people are not happy to type large sentences because they face unnecessary words or grammatical issues. So, for this reason, word predictive systems help to exchange textual information more quickly, easier, and comfortably for all people. These systems predict the next most probable words and give users to choose of the needed word from these suggested words. Word prediction can help the writer by predicting the next word and helping complete the sentence correctly. This research aims to forecast the most suitable next word to complete a sentence for any given context. In this research, we have worked on the Bangla language. We have presented a process that can expect the next maximum probable and proper words and suggest a complete sentence using predicted words. In this research, GRU-based RNN has been used on the N-gram dataset to develop the proposed model. We collected a large dataset using multiple sources in the Bangla language and also compared it to the other approaches that have been used such as LSTM, and Naive Bayes. But this suggested approach provides excellent exactness than others. Here, the Unigram model provides 88.22%, Bi-gram model is 99.24%, Tri-gram model is 97.69%, and 4-gram and 5-gram models provide 99.43% and 99.78% on average accurateness. We think that our proposed method profound impression on Bangla search engines.展开更多
Cognitive grammar,as a linguistic theory that attaches importance to the relationship between language and thinking,provides us with a more comprehensive way to understand the structure,semantics and cognitive process...Cognitive grammar,as a linguistic theory that attaches importance to the relationship between language and thinking,provides us with a more comprehensive way to understand the structure,semantics and cognitive processing of noun predicate sentences.Therefore,under the framework of cognitive grammar,this paper tries to analyze the semantic connection and cognitive process in noun predicate sentences from the semantic perspective and the method of example theory,and discusses the motivation of the formation of this construction,so as to provide references for in-depth analysis of the cognitive laws behind noun predicate sentences.展开更多
This paper discusses the structure and hierarchy of sentence-endings in Tibetan language. Tibetan sentence-endings are hierarchical. They can be divided into two levels from the perspectives of structure, distribution...This paper discusses the structure and hierarchy of sentence-endings in Tibetan language. Tibetan sentence-endings are hierarchical. They can be divided into two levels from the perspectives of structure, distribution and expressive function. The first level comes after the predicate or verb phrase, indicating the category of tense/aspect/mood(TAM). The second level, which follows a self-sufficient sentence, mainly expresses the meaning of the speaker’s hint, inference, evaluation and attitude to the information. Each level includes several different types of endings, which act on different syntactic categories or manifest different degrees of subjectivity. The lower the degree of correlation between the endings and the information of the self-sufficient sentence is, the higher the corresponding semantic category and the speaker’s subjective participation are. Some lower-level endings can also express the grammatical meaning of the higher level in certain context with increasing subjectivity.展开更多
Automatic partition of Chinese sentence group is very important to the statistical machine translation system based on discourse. This paper presents an approach to this issue: first, each sentence in a discourse is ...Automatic partition of Chinese sentence group is very important to the statistical machine translation system based on discourse. This paper presents an approach to this issue: first, each sentence in a discourse is expressed as a feature vector; second, a special hierarchical clustering algorithm is applied to present a discourse as a sentence group tree. In this paper, local reoccurrence measure is proposed to the selection of key phras and the evaluation of the weight of key phrases. Experimental results show our approach promising.展开更多
Periodic sentence is often identified as sentence with main clause as end-weight. In fact, this definition is so confusing that it causes the same confusion in practice. This paper aims at rethinking periodic sentence...Periodic sentence is often identified as sentence with main clause as end-weight. In fact, this definition is so confusing that it causes the same confusion in practice. This paper aims at rethinking periodic sentence and advocates the adoption of noble styles with periodic sentence as its chief representative.展开更多
Dealing with issues such as too simple image features and word noise inference in product image sentence anmotation, a product image sentence annotation model focusing on image feature learning and key words summariza...Dealing with issues such as too simple image features and word noise inference in product image sentence anmotation, a product image sentence annotation model focusing on image feature learning and key words summarization is described. Three kernel descriptors such as gradient, shape, and color are extracted, respectively. Feature late-fusion is executed in turn by the multiple kernel learning model to obtain more discriminant image features. Absolute rank and relative rank of the tag-rank model are used to boost the key words' weights. A new word integration algorithm named word sequence blocks building (WSBB) is designed to create N-gram word sequences. Sentences are generated according to the N-gram word sequences and predefined templates. Experimental results show that both the BLEU-1 scores and BLEU-2 scores of the sentences are superior to those of the state-of-art baselines.展开更多
In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely f...In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely fully analyzed,which is a major obstacle to the intelligent development of the social IoT.In this paper,we propose a sentence similarity analysis model to analyze the similarity in people’s opinions on hot topics in social media and news pages.Most of these data are unstructured or semi-structured sentences,so the accuracy of sentence similarity analysis largely determines the model’s performance.For the purpose of improving accuracy,we propose a novel method of sentence similarity computation to extract the syntactic and semantic information of the semi-structured and unstructured sentences.We mainly consider the subjects,predicates and objects of sentence pairs and use Stanford Parser to classify the dependency relation triples to calculate the syntactic and semantic similarity between two sentences.Finally,we verify the performance of the model with the Microsoft Research Paraphrase Corpus(MRPC),which consists of 4076 pairs of training sentences and 1725 pairs of test sentences,and most of the data came from the news of social data.Extensive simulations demonstrate that our method outperforms other state-of-the-art methods regarding the correlation coefficient and the mean deviation.展开更多
Since the opening-up policy was carried out in 1979, every facility of social modernized construction has developed at high speed; meanwhile, the need of English is increased year by year. The occasion and scope of us...Since the opening-up policy was carried out in 1979, every facility of social modernized construction has developed at high speed; meanwhile, the need of English is increased year by year. The occasion and scope of using it are expanded with the communications among countries. Therefore, English has become the generally international language in our country; in particular, translation plays an important and irreplaceable part in English to convey information. This paper aims to introduce the contrasts of English sentences and Chinese sentences and discuss some skills of translating each other. Though it is not complete and authoritative, yet it may help some people to understand the differences between two languages and to grasp some practical skills.展开更多
Sentence similarity computing plays an important role in machine question-answering systems, machine-translation systems, information retrieval and automatic abstracting systems. This article firstly sums up several m...Sentence similarity computing plays an important role in machine question-answering systems, machine-translation systems, information retrieval and automatic abstracting systems. This article firstly sums up several methods for calculating similarity between sentences, and brings out a new method which takes all factors into consideration including critical words, semantic information, sentential form and sen-tence length. And on this basis, a automatic abstracting system based on LexRank algorithm is implemented. We made several improvements in both sentence weight computing and redundancy resolution. The system described in this article could deal with single or multi-document summarization both in English and Chinese. With evaluations on two corpuses, our system could produce better summaries to a certain degree. We also show that our system is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. And in the end, existing problem and the developing trend of automatic summariza-tion technology are discussed.展开更多
My investigation will serve two purposes. First, I shall investigate the function of the subclauses in the corpus in relation to their complexity, and I shall establish whether there is a correlation between sentence ...My investigation will serve two purposes. First, I shall investigate the function of the subclauses in the corpus in relation to their complexity, and I shall establish whether there is a correlation between sentence length and sentence complexity.Second, I shall analyse the complexity of the subclauses collected from the two sections and compare the results from these sections, focusing on finite subclauses and non-finite subclauses. I hope to be able to point out some differences in style between the news and sports sections concerning the use of subordinate clauses in various syntactic functions in order to examine how the choice of linguistic structures differs in different sections of The Times.展开更多
There is a big problem in understanding long sentences which are complex and complicated in English for many people.The desire seems obvious that people have difficulties using a complete sentence. So this paper is to...There is a big problem in understanding long sentences which are complex and complicated in English for many people.The desire seems obvious that people have difficulties using a complete sentence. So this paper is to solve the problems mentionedby developing sentence sense and a chart with much help.展开更多
Since English long possess a lot of modifiers and their syntax structures are complicated, it is difficult for the Chinese readers to understand them, not to mention translating them. The paper adopts Nida's theor...Since English long possess a lot of modifiers and their syntax structures are complicated, it is difficult for the Chinese readers to understand them, not to mention translating them. The paper adopts Nida's theory of functional equivalence as the guideline in the process of translation, since it bears the merit of facilitating the communication of information. In terms of concrete methods, the long sentence should be decomposed into kernel sentences and reconstructed according to the expression of the standard Chinese language. Only by doing that, the translation version can be faithful, correct and elegant.展开更多
Purpose:Mo ve recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language units.To improve the performance of move recognition in scientific abstracts,...Purpose:Mo ve recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language units.To improve the performance of move recognition in scientific abstracts,a novel model of move recognition is proposed that outperforms the BERT-based method.Design/methodology/approach:Prevalent models based on BERT for sentence classification often classify sentences without considering the context of the sentences.In this paper,inspired by the BERT masked language model(MLM),we propose a novel model called the masked sentence model that integrates the content and contextual information of the sentences in move recognition.Experiments are conducted on the benchmark dataset PubMed 20K RCT in three steps.Then,we compare our model with HSLN-RNN,BERT-based and SciBERT using the same dataset.Findings:Compared with the BERT-based and SciBERT models,the F1 score of our model outperforms them by 4.96%and 4.34%,respectively,which shows the feasibility and effectiveness of the novel model and the result of our model comes closest to the state-of-theart results of HSLN-RNN at present.Research limitations:The sequential features of move labels are not considered,which might be one of the reasons why HSLN-RNN has better performance.Our model is restricted to dealing with biomedical English literature because we use a dataset from PubMed,which is a typical biomedical database,to fine-tune our model.Practical implications:The proposed model is better and simpler in identifying move structures in scientific abstracts and is worthy of text classification experiments for capturing contextual features of sentences.Originality/value:T he study proposes a masked sentence model based on BERT that considers the contextual features of the sentences in abstracts in a new way.The performance of this classification model is significantly improved by rebuilding the input layer without changing the structure of neural networks.展开更多
Purpose:To uncover the evaluation information on the academic contribution of research papers cited by peers based on the content cited by citing papers,and to provide an evidencebased tool for evaluating the academic...Purpose:To uncover the evaluation information on the academic contribution of research papers cited by peers based on the content cited by citing papers,and to provide an evidencebased tool for evaluating the academic value of cited papers.Design/methodology/approach:CiteOpinion uses a deep learning model to automatically extract citing sentences from representative citing papers;it starts with an analysis on the citing sentences,then it identifies major academic contribution points of the cited paper,positive/negative evaluations from citing authors and the changes in the subjects of subsequent citing authors by means of Recognizing Categories of Moves(problems,methods,conclusions,etc.),and sentiment analysis and topic clustering.Findings:Citing sentences in a citing paper contain substantial evidences useful for academic evaluation.They can also be used to objectively and authentically reveal the nature and degree of contribution of the cited paper reflected by citation,beyond simple citation statistics.Practical implications:The evidence-based evaluation tool CiteOpinion can provide an objective and in-depth academic value evaluation basis for the representative papers of scientific researchers,research teams,and institutions.Originality/value:No other similar practical tool is found in papers retrieved.Research limitations:There are difficulties in acquiring full text of citing papers.There is a need to refine the calculation based on the sentiment scores of citing sentences.Currently,the tool is only used for academic contribution evaluation,while its value in policy studies,technical application,and promotion of science is not yet tested.展开更多
Parallel corpus is of great importance to machine translation, and automatic sentence alignment is the first step towards its processing. This paper puts forward a bilingual dictionary based sentence alignment method ...Parallel corpus is of great importance to machine translation, and automatic sentence alignment is the first step towards its processing. This paper puts forward a bilingual dictionary based sentence alignment method for Chinese English parallel corpus, which differs from previous length based algorithm in its knowledge-rich approach. Experimental result shows that this method produces over 93% accuracy with usual English-Chinese dictionaries whose translations cover 31 88%~47 90% of the corpus.展开更多
文摘Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.
文摘In order to convey complete meanings,there is a phenomenon in Chinese of using multiple running sentences.Xu Jingning(2023,p.66)states,“In communication,a complete expression of meaning often requires more than one clause,which is common in human languages.”Domestic research on running sentences includes discussions on defining the concept and structural features of running sentences,sentence properties,sentence pattern classifications and their criteria,as well as issues related to translating running sentences into English.This article primarily focuses on scholarly research into the English translation of running sentences in China,highlighting recent achievements and identifying existing issues in the study of running sentence translation.However,by reviewing literature on the translation of running sentences,it is found that current research in the academic community on non-core running sentences is limited.Therefore,this paper proposes relevant strategies to address this issue.
基金supported in part by the National Natural Science Foundation of China under Grants 62273272 and 61873277in part by the Chinese Postdoctoral Science Foundation under Grant 2020M673446+1 种基金in part by the Key Research and Development Program of Shaanxi Province under Grant 2023-YBGY-243in part by the Youth Innovation Team of Shaanxi Universities.
文摘Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input source.The contents of video captioning are limited since few studies employed external corpus information to guide the generation of video captioning,which is not conducive to the accurate descrip-tion and understanding of video content.To address this issue,a novel video captioning method guided by a sentence retrieval generation network(ED-SRG)is proposed in this paper.First,a ResNeXt network model,an efficient convolutional network for online video understanding(ECO)model,and a long short-term memory(LSTM)network model are integrated to construct an encoder-decoder,which is utilized to extract the 2D features,3D features,and object features of video data respectively.These features are decoded to generate textual sentences that conform to video content for sentence retrieval.Then,a sentence-transformer network model is employed to retrieve different sentences in an external corpus that are semantically similar to the above textual sentences.The candidate sentences are screened out through similarity measurement.Finally,a novel GPT-2 network model is constructed based on GPT-2 network structure.The model introduces a designed random selector to randomly select predicted words with a high probability in the corpus,which is used to guide and generate textual sentences that are more in line with human natural language expressions.The proposed method in this paper is compared with several existing works by experiments.The results show that the indicators BLEU-4,CIDEr,ROUGE_L,and METEOR are improved by 3.1%,1.3%,0.3%,and 1.5%on a public dataset MSVD and 1.3%,0.5%,0.2%,1.9%on a public dataset MSR-VTT respectively.It can be seen that the proposed method in this paper can generate video captioning with richer semantics than several state-of-the-art approaches.
文摘We use a lot of devices in our daily life to communicate with others. In this modern world, people use email, Facebook, Twitter, and many other social network sites for exchanging information. People lose their valuable time misspelling and retyping, and some people are not happy to type large sentences because they face unnecessary words or grammatical issues. So, for this reason, word predictive systems help to exchange textual information more quickly, easier, and comfortably for all people. These systems predict the next most probable words and give users to choose of the needed word from these suggested words. Word prediction can help the writer by predicting the next word and helping complete the sentence correctly. This research aims to forecast the most suitable next word to complete a sentence for any given context. In this research, we have worked on the Bangla language. We have presented a process that can expect the next maximum probable and proper words and suggest a complete sentence using predicted words. In this research, GRU-based RNN has been used on the N-gram dataset to develop the proposed model. We collected a large dataset using multiple sources in the Bangla language and also compared it to the other approaches that have been used such as LSTM, and Naive Bayes. But this suggested approach provides excellent exactness than others. Here, the Unigram model provides 88.22%, Bi-gram model is 99.24%, Tri-gram model is 97.69%, and 4-gram and 5-gram models provide 99.43% and 99.78% on average accurateness. We think that our proposed method profound impression on Bangla search engines.
文摘Cognitive grammar,as a linguistic theory that attaches importance to the relationship between language and thinking,provides us with a more comprehensive way to understand the structure,semantics and cognitive processing of noun predicate sentences.Therefore,under the framework of cognitive grammar,this paper tries to analyze the semantic connection and cognitive process in noun predicate sentences from the semantic perspective and the method of example theory,and discusses the motivation of the formation of this construction,so as to provide references for in-depth analysis of the cognitive laws behind noun predicate sentences.
基金This study is sponsored by the Project of Research Planning Foundation on Humanities and Social Sciences of the Ministry of Education of China(Number:15YJA740018)and China Scholarship Council(CSC).
文摘This paper discusses the structure and hierarchy of sentence-endings in Tibetan language. Tibetan sentence-endings are hierarchical. They can be divided into two levels from the perspectives of structure, distribution and expressive function. The first level comes after the predicate or verb phrase, indicating the category of tense/aspect/mood(TAM). The second level, which follows a self-sufficient sentence, mainly expresses the meaning of the speaker’s hint, inference, evaluation and attitude to the information. Each level includes several different types of endings, which act on different syntactic categories or manifest different degrees of subjectivity. The lower the degree of correlation between the endings and the information of the self-sufficient sentence is, the higher the corresponding semantic category and the speaker’s subjective participation are. Some lower-level endings can also express the grammatical meaning of the higher level in certain context with increasing subjectivity.
基金National High Technology Research and Development Program of China ( No.2006AA01Z139)Young NaturalScience Foundation of Fujian Province of China ( No.2008F3105)+1 种基金Natural Science Foundation of Fujian Province of China ( No.2006J0043)Fund of Key Research Project of Fujian Province of China (No.2006H0038)
文摘Automatic partition of Chinese sentence group is very important to the statistical machine translation system based on discourse. This paper presents an approach to this issue: first, each sentence in a discourse is expressed as a feature vector; second, a special hierarchical clustering algorithm is applied to present a discourse as a sentence group tree. In this paper, local reoccurrence measure is proposed to the selection of key phras and the evaluation of the weight of key phrases. Experimental results show our approach promising.
文摘Periodic sentence is often identified as sentence with main clause as end-weight. In fact, this definition is so confusing that it causes the same confusion in practice. This paper aims at rethinking periodic sentence and advocates the adoption of noble styles with periodic sentence as its chief representative.
基金The National Natural Science Foundation of China(No.61133012)the Humanity and Social Science Foundation of the Ministry of Education(No.12YJCZH274)+1 种基金the Humanity and Social Science Foundation of Jiangxi Province(No.XW1502,TQ1503)the Science and Technology Project of Jiangxi Science and Technology Department(No.20121BBG70050,20142BBG70011)
文摘Dealing with issues such as too simple image features and word noise inference in product image sentence anmotation, a product image sentence annotation model focusing on image feature learning and key words summarization is described. Three kernel descriptors such as gradient, shape, and color are extracted, respectively. Feature late-fusion is executed in turn by the multiple kernel learning model to obtain more discriminant image features. Absolute rank and relative rank of the tag-rank model are used to boost the key words' weights. A new word integration algorithm named word sequence blocks building (WSBB) is designed to create N-gram word sequences. Sentences are generated according to the N-gram word sequences and predefined templates. Experimental results show that both the BLEU-1 scores and BLEU-2 scores of the sentences are superior to those of the state-of-art baselines.
基金supported by the Major Scientific and Technological Projects of CNPC under Grant ZD2019-183-006partially supported by the Shandong Provincial Natural Science Foundation,China under Grant ZR2020MF006partially supported by“the Fundamental Research Funds for the Central Universities”of China University of Petroleum(East China)under Grant 20CX05017A,18CX02139A.
文摘In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely fully analyzed,which is a major obstacle to the intelligent development of the social IoT.In this paper,we propose a sentence similarity analysis model to analyze the similarity in people’s opinions on hot topics in social media and news pages.Most of these data are unstructured or semi-structured sentences,so the accuracy of sentence similarity analysis largely determines the model’s performance.For the purpose of improving accuracy,we propose a novel method of sentence similarity computation to extract the syntactic and semantic information of the semi-structured and unstructured sentences.We mainly consider the subjects,predicates and objects of sentence pairs and use Stanford Parser to classify the dependency relation triples to calculate the syntactic and semantic similarity between two sentences.Finally,we verify the performance of the model with the Microsoft Research Paraphrase Corpus(MRPC),which consists of 4076 pairs of training sentences and 1725 pairs of test sentences,and most of the data came from the news of social data.Extensive simulations demonstrate that our method outperforms other state-of-the-art methods regarding the correlation coefficient and the mean deviation.
文摘Since the opening-up policy was carried out in 1979, every facility of social modernized construction has developed at high speed; meanwhile, the need of English is increased year by year. The occasion and scope of using it are expanded with the communications among countries. Therefore, English has become the generally international language in our country; in particular, translation plays an important and irreplaceable part in English to convey information. This paper aims to introduce the contrasts of English sentences and Chinese sentences and discuss some skills of translating each other. Though it is not complete and authoritative, yet it may help some people to understand the differences between two languages and to grasp some practical skills.
文摘Sentence similarity computing plays an important role in machine question-answering systems, machine-translation systems, information retrieval and automatic abstracting systems. This article firstly sums up several methods for calculating similarity between sentences, and brings out a new method which takes all factors into consideration including critical words, semantic information, sentential form and sen-tence length. And on this basis, a automatic abstracting system based on LexRank algorithm is implemented. We made several improvements in both sentence weight computing and redundancy resolution. The system described in this article could deal with single or multi-document summarization both in English and Chinese. With evaluations on two corpuses, our system could produce better summaries to a certain degree. We also show that our system is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. And in the end, existing problem and the developing trend of automatic summariza-tion technology are discussed.
文摘My investigation will serve two purposes. First, I shall investigate the function of the subclauses in the corpus in relation to their complexity, and I shall establish whether there is a correlation between sentence length and sentence complexity.Second, I shall analyse the complexity of the subclauses collected from the two sections and compare the results from these sections, focusing on finite subclauses and non-finite subclauses. I hope to be able to point out some differences in style between the news and sports sections concerning the use of subordinate clauses in various syntactic functions in order to examine how the choice of linguistic structures differs in different sections of The Times.
文摘There is a big problem in understanding long sentences which are complex and complicated in English for many people.The desire seems obvious that people have difficulties using a complete sentence. So this paper is to solve the problems mentionedby developing sentence sense and a chart with much help.
文摘Since English long possess a lot of modifiers and their syntax structures are complicated, it is difficult for the Chinese readers to understand them, not to mention translating them. The paper adopts Nida's theory of functional equivalence as the guideline in the process of translation, since it bears the merit of facilitating the communication of information. In terms of concrete methods, the long sentence should be decomposed into kernel sentences and reconstructed according to the expression of the standard Chinese language. Only by doing that, the translation version can be faithful, correct and elegant.
基金supported by the project “The demonstration system of rich semantic search application in scientific literature” (Grant No. 1734) from the Chinese Academy of Sciences
文摘Purpose:Mo ve recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language units.To improve the performance of move recognition in scientific abstracts,a novel model of move recognition is proposed that outperforms the BERT-based method.Design/methodology/approach:Prevalent models based on BERT for sentence classification often classify sentences without considering the context of the sentences.In this paper,inspired by the BERT masked language model(MLM),we propose a novel model called the masked sentence model that integrates the content and contextual information of the sentences in move recognition.Experiments are conducted on the benchmark dataset PubMed 20K RCT in three steps.Then,we compare our model with HSLN-RNN,BERT-based and SciBERT using the same dataset.Findings:Compared with the BERT-based and SciBERT models,the F1 score of our model outperforms them by 4.96%and 4.34%,respectively,which shows the feasibility and effectiveness of the novel model and the result of our model comes closest to the state-of-theart results of HSLN-RNN at present.Research limitations:The sequential features of move labels are not considered,which might be one of the reasons why HSLN-RNN has better performance.Our model is restricted to dealing with biomedical English literature because we use a dataset from PubMed,which is a typical biomedical database,to fine-tune our model.Practical implications:The proposed model is better and simpler in identifying move structures in scientific abstracts and is worthy of text classification experiments for capturing contextual features of sentences.Originality/value:T he study proposes a masked sentence model based on BERT that considers the contextual features of the sentences in abstracts in a new way.The performance of this classification model is significantly improved by rebuilding the input layer without changing the structure of neural networks.
文摘Purpose:To uncover the evaluation information on the academic contribution of research papers cited by peers based on the content cited by citing papers,and to provide an evidencebased tool for evaluating the academic value of cited papers.Design/methodology/approach:CiteOpinion uses a deep learning model to automatically extract citing sentences from representative citing papers;it starts with an analysis on the citing sentences,then it identifies major academic contribution points of the cited paper,positive/negative evaluations from citing authors and the changes in the subjects of subsequent citing authors by means of Recognizing Categories of Moves(problems,methods,conclusions,etc.),and sentiment analysis and topic clustering.Findings:Citing sentences in a citing paper contain substantial evidences useful for academic evaluation.They can also be used to objectively and authentically reveal the nature and degree of contribution of the cited paper reflected by citation,beyond simple citation statistics.Practical implications:The evidence-based evaluation tool CiteOpinion can provide an objective and in-depth academic value evaluation basis for the representative papers of scientific researchers,research teams,and institutions.Originality/value:No other similar practical tool is found in papers retrieved.Research limitations:There are difficulties in acquiring full text of citing papers.There is a need to refine the calculation based on the sentiment scores of citing sentences.Currently,the tool is only used for academic contribution evaluation,while its value in policy studies,technical application,and promotion of science is not yet tested.
文摘Parallel corpus is of great importance to machine translation, and automatic sentence alignment is the first step towards its processing. This paper puts forward a bilingual dictionary based sentence alignment method for Chinese English parallel corpus, which differs from previous length based algorithm in its knowledge-rich approach. Experimental result shows that this method produces over 93% accuracy with usual English-Chinese dictionaries whose translations cover 31 88%~47 90% of the corpus.