期刊文献+
共找到13篇文章
< 1 >
每页显示 20 50 100
Alignment of the Polish-English Parallel Text for a Statistical Machine "Translation
1
作者 Krzysztof Wolk Krzysztof Marasek 《Computer Technology and Application》 2013年第11期575-583,共9页
Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a lan... Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a language independent sentence alignment approach based on Polish (not position-sensitive language) to English experiments. This alignment approach was developed on the TED (Translanguage English Database) talks corpus, but can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence recognition. Some of them value synonyms and semantic text structure analysis as a part of additional information. Minimization of data loss was ensured. The solution is compared to other sentence alignment implementations. Also an improvement in MT system score with text processed with the described tool is shown. 展开更多
关键词 text alignment NLP tools machine learning text corpora processing
下载PDF
A Survey of Web Information System and Applications
2
作者 HAN Yanbo LI Juanzi +3 位作者 YANG Nan LIU Qing XU Baowen MENG Xiaofeng 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期769-772,共4页
The fourth international conference on Web information systems and applications (WISA 2007) has received 409 submissions and has accepted 37 papers for publication in this issue. The papers cover broad research area... The fourth international conference on Web information systems and applications (WISA 2007) has received 409 submissions and has accepted 37 papers for publication in this issue. The papers cover broad research areas, including Web mining and data warehouse, Deep Web and Web integration, P2P networks, text processing and information retrieval, as well as Web Services and Web infrastructure. After briefly introducing the WISA conference, the survey outlines the current activities and future trends concerning Web information systems and applications based on the papers accepted for publication. 展开更多
关键词 Web mining data warehouse Deep Web Web integration Web services P2P computing text processing information retrieval Web security
下载PDF
Identifying Proper Names Based on Association Analysis
3
作者 张云涛 龚玲 《Journal of Shanghai Jiaotong university(Science)》 EI 2007年第5期559-562,共4页
The issue of proper names recognition in Chinese text was discussed. An automatic approach based on association analysis to extract rules from corpus was presented. The method tries to discover rules relevant to exter... The issue of proper names recognition in Chinese text was discussed. An automatic approach based on association analysis to extract rules from corpus was presented. The method tries to discover rules relevant to external evidence by association analysis, without additional manual effort. These rules can be used to recognize the proper nouns in Chinese texts. The experimental result shows that our method is practical in some applications. Moreover, the method is language independent. 展开更多
关键词 named entity recognition natural language processing text processing Chinese text proper name
下载PDF
Optimization of Sentiment Analysis Using Teaching-Learning Based Algorithm
4
作者 Abdullah Muhammad Salwani Abdullah Nor Samsiah Sani 《Computers, Materials & Continua》 SCIE EI 2021年第11期1783-1799,共17页
Feature selection and sentiment analysis are two common studies that are currently being conducted;consistent with the advancements in computing and growing the use of social media.High dimensional or large feature se... Feature selection and sentiment analysis are two common studies that are currently being conducted;consistent with the advancements in computing and growing the use of social media.High dimensional or large feature sets is a key issue in sentiment analysis as it can decrease the accuracy of sentiment classification and make it difficult to obtain the optimal subset of the features.Furthermore,most reviews from social media carry a lot of noise and irrelevant information.Therefore,this study proposes a new text-feature selection method that uses a combination of rough set theory(RST)and teaching-learning based optimization(TLBO),which is known as RSTLBO.The framework to develop the proposed RSTLBO includes numerous stages:(1)acquiring the standard datasets(user reviews of six major U.S.airlines)which are used to validate search result feature selection methods,(2)preprocessing of the dataset using text processing methods.This involves applying text processing methods from natural language processing techniques,combined with linguistic processing techniques to produce high classification results,(3)employing the RSTLBO method,and(4)using the selected features from the previous process for sentiment classification using the Support Vector Machine(SVM)technique.Results show an improvement in sentiment analysis when combining natural language processing with linguistic processing for text processing.More importantly,the proposed RSTLBO feature selection algorithm is able to produce an improved sentiment analysis. 展开更多
关键词 Feature selection sentiment analysis rough set theory teachinglearning optimization algorithms text processing
下载PDF
Review on inferential situation models
5
作者 WANG Li 《Sino-US English Teaching》 2010年第8期18-22,共5页
This paper reviews the theories and studies in the field of inferential situation models. The Construction-Integration (CI) model, the Structure Building Framework (SBF) and 3 empirical studies are introduced. The... This paper reviews the theories and studies in the field of inferential situation models. The Construction-Integration (CI) model, the Structure Building Framework (SBF) and 3 empirical studies are introduced. The paper concludes that future studies, from a quantitative approach, should make some improvements in test materials, language proficiency manipulation and language background. 展开更多
关键词 situation model text processing language proficiency MATERIAL
下载PDF
The State of the Art of Natural Language Processing-A Systematic Automated Review of NLP Literature Using NLP Techniques
6
作者 Jan Sawicki Maria Ganzha Marcin Paprzycki 《Data Intelligence》 EI 2023年第3期707-749,共43页
Nowadays,natural language processing(NLP)is one of the most popular areas of,broadly understood,artificial intelligence.Therefore,every day,new research contributions are posted,for instance,to the arXiv repository.He... Nowadays,natural language processing(NLP)is one of the most popular areas of,broadly understood,artificial intelligence.Therefore,every day,new research contributions are posted,for instance,to the arXiv repository.Hence,it is rather difficult to capture the current"state of the field"and thus,to enter it.This brought the id-art NLP techniques to analyse the NLP-focused literature.As a result,(1)meta-level knowledge,concerning the current state of NLP has been captured,and(2)a guide to use of basic NLP tools is provided.It should be noted that all the tools and the dataset described in this contribution are publicly available.Furthermore,the originality of this review lies in its full automation.This allows easy reproducibility and continuation and updating of this research in the future as new researches emerge in the field of NLP. 展开更多
关键词 Natural language processing text processing Literature survey Keyword search Keyphrase search text embeddings text summarizations
原文传递
Arabic Bank Check Processing: State of the Art
7
作者 Irfan Ahmad Sabri A.Mahmoud 《Journal of Computer Science & Technology》 SCIE EI CSCD 2013年第2期285-299,共15页
In this paper, we present a general model for Arabic bank check processing indicating the major phases of a check processing system. We then survey the available databases for Arabic bank check processing research. Th... In this paper, we present a general model for Arabic bank check processing indicating the major phases of a check processing system. We then survey the available databases for Arabic bank check processing research. The state of the art in the different phases of Arabic bank check processing is surveyed (i.e., pre-processing, check analysis and segmentation, features extraction, and legal and courtesy amounts recognition). The open issues for future research are stated and areas that need improvements are presented. To the best of our knowledge, it is the first survey of Arabic bank check processing. 展开更多
关键词 handwriting analysis document analysis text processing feature evaluation and selection pattern analysis
原文传递
Application of Algorithm CARDBK in Document Clustering
8
作者 ZHU Yehang ZHANG Mingjie SHI Feng 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2018年第6期514-524,共11页
In the K-means clustering algorithm, each data point is uniquely placed into one category. The clustering quality is heavily dependent on the initial cluster centroid. Different initializations can yield varied result... In the K-means clustering algorithm, each data point is uniquely placed into one category. The clustering quality is heavily dependent on the initial cluster centroid. Different initializations can yield varied results; local adjustment cannot save the clustering result from poor local optima. If there is an anomaly in a cluster, it will seriously affect the cluster mean value. The K-means clustering algorithm is only suitable for clusters with convex shapes. We therefore propose a novel clustering algorithm CARDBK—"centroid all rank distance(CARD)" which means that all centroids are sorted by distance value from one point and "BK" are the initials of "batch K-means"—in which one point not only modifies a cluster centroid nearest to this point but also modifies multiple clusters centroids adjacent to this point, and the degree of influence of a point on a cluster centroid depends on the distance value between this point and the other nearer cluster centroids. Experimental results showed that our CARDBK algorithm outperformed other algorithms when tested on a number of different data sets based on the following performance indexes: entropy, purity, F1 value, Rand index and normalized mutual information(NMI). Our algorithm manifested to be more stable, linearly scalable and faster. 展开更多
关键词 algorithm design and analysis CLUSTERING documentanalysis text processing
原文传递
Application of a soft competition learning method in document clustering
9
作者 Zhu Yehang Zhang Mingjie 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2018年第3期80-91,共12页
Hard competition learning has the feature that each point modifies only one cluster centroid that wins. Correspondingly, soft competition learning has the feature that each point modifies not only the cluster centroid... Hard competition learning has the feature that each point modifies only one cluster centroid that wins. Correspondingly, soft competition learning has the feature that each point modifies not only the cluster centroid that wins, but also many other cluster centroids near this point. A soft competition learning method is proposed. Centroid all rank distance (CARD), CARDx, and centroid all rank distance batch K-means (CARDBK) are three clustering algorithms that adopt the proposed soft competition learning method. Among them the extent to which one point affects a cluster centroid depends on the distances from this point to the other nearer cluster centroids, rather than just the rank number of the distance from this point to this cluster centroid among the distances from this point to all cluster centroids. In addition, the validation experiments are carried out in order to compare the three soft competition learning algorithms CARD, CARDx, and CARDBK with several hard competition learning algorithms as well as neural gas (NG) algorithm on five data sets from different sources. Judging from the values of five performance indexes in the clustering results, this kind of soft competition learning method has better clustering effect and efficiency, and has linear scalability. 展开更多
关键词 clustering methods text processing document handling competition learning method
原文传递
Acquisition of hyponymy relations for agricultural terms from a Japanese statutory corpus
10
作者 Makoto Nakamura Tomohiro Ohno +1 位作者 Yasuhiro Ogawa Katsuhiko Toyama 《Information Processing in Agriculture》 EI 2014年第2期95-104,共10页
This paper,which aims to increment the vocabulary of an existing thesaurus using hyponymy relations,focuses on an agricultural thesaurus called AGROVOC.Our main goal is to acquire AGROVOC-qualified candidates from the... This paper,which aims to increment the vocabulary of an existing thesaurus using hyponymy relations,focuses on an agricultural thesaurus called AGROVOC.Our main goal is to acquire AGROVOC-qualified candidates from the hyponymy relations of legal texts and tables.We propose a pattern-based approach to hyponymy relation acquisition.Our experimental result showed that 222 and 868 candidates are extracted from statutory sentences with 67.1%precision and tables with 37.0%precision,respectively. 展开更多
关键词 AGROVOC Japanese statutory corpus Legal text processing
原文传递
Just What Is Narrative Urgency?
11
作者 Paul Simpson 《Language and Semiotic Studies》 2015年第3期98-116,共19页
This paper takes as its main point of departure a body of empirical research on reading and text processing,and makes particular reference to the type of experiments conducted in Egidi and Gerrig(2006)and Rapp and Ger... This paper takes as its main point of departure a body of empirical research on reading and text processing,and makes particular reference to the type of experiments conducted in Egidi and Gerrig(2006)and Rapp and Gerrig(2006).Broadly put,these experiments(i)explore the psychology of readers’preferences for narrative outcomes,(ii)examine the way readers react to characters’goals and actions,and(iii)investigate how readers tend to identify with characters’goals the more‘urgently’those goals are narrated.The present paper signals how stylistics can productively enrich such experimental work.Stylistics,it is argued,is well equipped to deal with subtle and nuanced variations in textual patterns without losing sight of the broader cognitive and discoursal positioning of readers in relation to these patterns.Making particular reference to what might constitute narrative‘urgency’,the article develops a model which amalgamates different strands of contemporary research in narrative stylistics.This model advances and elaborates three key components:a Stylistic Profi le,a Burlesque Block and a Kuleshov Monitor.Developing analyses of,and informal informant tests on,examples of both fiction and film,the paper calls for a more rounded and sophisticated understanding of style in empirical research on subjects’responses to patterns in narrative. 展开更多
关键词 burlesque(in style) fi lm narrative Kuleshov(effect) narrative style Pyscho text processing ‘urgency’
原文传递
Event Extraction via DMCNN in Open Domain Public Sentiment Information
12
作者 Zhanghui Wang Le Sun +1 位作者 Xiaoguang Li Linteng Wang 《国际计算机前沿大会会议论文集》 2020年第2期90-100,共11页
Event extraction(EE)is a difficult task in natural language processing(NLP).The target of EE is to obtain and present key information described in natural language in a structured form.Internet opinion,as an essential... Event extraction(EE)is a difficult task in natural language processing(NLP).The target of EE is to obtain and present key information described in natural language in a structured form.Internet opinion,as an essential bearer of social information,is crucial.In order to help readers quickly get the main idea of news,a method of analyzing public sentiment information on the Internet and extracting events from news information is proposed.It enables users to quickly obtain information they need.An event extraction method was proposed based on Chinese language public opinion information,aiming at automatically classifying different types of public opinion events by using sentence-level features,and neural networks were applied to extract events.A sentence feature model was introduced to classify different types of public opinion events.To ensure the effective retention of text information in the calculation process,attention mechanism was added to the semantic information,and an effective public opinion event extractor was trained through CNN and LSTM networks.Experiments show that structured information can be extracted from unstructured text,and the purpose of obtaining public opinion event entities,event-entity relationships,and entity attribute information can be achieved. 展开更多
关键词 text processing Attention mechanism Syntax analysis Neural network
原文传递
AOL4PS:A Large-scale Data Set for Personalized Search
13
作者 Qian Guo Wei Chen Huaiyu Wan 《Data Intelligence》 EI 2021年第4期548-567,共20页
Personalized search is a promising way to improve the quality of Websearch,and it has attracted much attention from both academic and industrial communities.Much of the current related research is based on commercial ... Personalized search is a promising way to improve the quality of Websearch,and it has attracted much attention from both academic and industrial communities.Much of the current related research is based on commercial search engine data,which can not be released publicly for such reasons as privacy protection and information security.This leads to a serious lack of accessible public data sets in this field.The few publicly available data sets have not become widely used in academia because of the complexity of the processing process required to study personalized search methods.The lack of data sets together with the difficulties of data processing has brought obstacles to fair comparison and evaluation of personalized search models.In this paper,we constructed a large-scale data set AOL4 PS to evaluate personalized search methods,collected and processed from AOL query logs.We present the complete and detailed data processing and construction process.Specifically,to address the challenges of processing time and storage space demands brought by massive data volumes,we optimized the process of data set construction and proposed an improved BM25 algorithm.Experiments are performed on AOL4 PS with some classic and state-of-the-art personalized search methods,and the experiment results demonstrate that AOL4 PS can measure the effect of personalized search models. 展开更多
关键词 Personalized search text data processing Data set construction
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部