期刊文献+
共找到2,420篇文章
< 1 2 121 >
每页显示 20 50 100
Orbit Weighting Scheme in the Context of Vector Space Information Retrieval
1
作者 Ahmad Ababneh Yousef Sanjalawe +2 位作者 Salam Fraihat Salam Al-E’mari Hamzah Alqudah 《Computers, Materials & Continua》 SCIE EI 2024年第7期1347-1379,共33页
This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schem... This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schemes like tf-idf and BM25.These conventional methods often struggle with accurately capturing document relevance,leading to inefficiencies in both retrieval performance and index size management.OWS proposes a dynamic weighting mechanism that evaluates the significance of terms based on their orbital position within the vector space,emphasizing term relationships and distribution patterns overlooked by existing models.Our research focuses on evaluating OWS’s impact on model accuracy using Information Retrieval metrics like Recall,Precision,InterpolatedAverage Precision(IAP),andMeanAverage Precision(MAP).Additionally,we assessOWS’s effectiveness in reducing the inverted index size,crucial for model efficiency.We compare OWS-based retrieval models against others using different schemes,including tf-idf variations and BM25Delta.Results reveal OWS’s superiority,achieving a 54%Recall and 81%MAP,and a notable 38%reduction in the inverted index size.This highlights OWS’s potential in optimizing retrieval processes and underscores the need for further research in this underrepresented area to fully leverage OWS’s capabilities in information retrieval methodologies. 展开更多
关键词 information retrieval orbit weighting scheme semantic text analysis Tf-Idf weighting scheme vector space model
下载PDF
Unlocking the Potential:A Comprehensive Systematic Review of ChatGPT in Natural Language Processing Tasks
2
作者 Ebtesam Ahmad Alomari 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期43-85,共43页
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in... As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues. 展开更多
关键词 Generative AI large languagemodel(LLM) natural language processing(NLP) ChatGPT GPT(generative pretraining transformer) GPT-4 sentiment analysis NER information extraction ANNOTATION text classification
下载PDF
Effective short text classification via the fusion of hybrid features for IoT social data 被引量:3
3
作者 Xiong Luo Zhijian Yu +2 位作者 Zhigang Zhao Wenbing Zhao Jenq-Haur Wang 《Digital Communications and Networks》 SCIE CSCD 2022年第6期942-954,共13页
Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Prev... Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance. 展开更多
关键词 information fusion Short text classi fication BERT Bidirectional encoder representations fr 0om transformers Deep learning Social data
下载PDF
The Effect of Text Familiarity on Summary Writing Skill
4
作者 Abbas Pourhosein Gilakjani 《Sino-US English Teaching》 2011年第10期640-647,共8页
Research has been done on the effects of text familiarity on reading comprehension. It has been shown that there is no relationship between familiarity with the content of a passage and better comprehension of that pa... Research has been done on the effects of text familiarity on reading comprehension. It has been shown that there is no relationship between familiarity with the content of a passage and better comprehension of that passage through reading it. More specifically, it has been indicated that students perform better on a passage with an unfamiliar content (Carrel, 1983). This study focuses on the effect of text familiarity on summary writing skill of foreign language students. In order to achieve this goal, 60 male and female students participated in this study. Four passages were selected from among reading comprehension TOEFL (Test of English as a Foreign Language) texts Two passages were familiar and the other two were unfamiliar. They were given to students for summary writing Through the processes of test administration, each subject was tested on four passages. The subjects performed better on the passage with unfamiliar content. 展开更多
关键词 text FAMILIARITY unfamiliarity PasSAGE summary writing
下载PDF
Translation Strategies of Instructional Text--Taking Murder Your Darlings and Other Gentle Writing Advice from Aristotle to Zinsser as an Example
5
作者 KE Li 《Journal of Literature and Art Studies》 2022年第9期925-932,共8页
With the popularity of English learning in the world,the demand for instructional materials is increasing,and how to accurately convey the information of these texts to English-learners and provide the learners with a... With the popularity of English learning in the world,the demand for instructional materials is increasing,and how to accurately convey the information of these texts to English-learners and provide the learners with an effective learning experience has become a major problem.Therefore,the role of the translator is crucial.It is found that that translators should make linguistic choices based on the text functions and target readers’expectations,and flexibly adopt the translation strategies,such as addition,conversion,and cohesion to convey the intention of the source author and generate the source context accurately and appropriately.Applying relevant theories to the analysis of translation cases,the paper tentatively puts forward solutions to the problems encountered in the E-C translation of chapter one of Murder Your Darlings and Other Gentle Writing Advice from Aristotle to Zinsser.Hopefully,it could provide a reference for other translators working on instructional texts. 展开更多
关键词 informative texts instructional text function plus loyalty reader-response translation strategies
下载PDF
Rediscovering Don Swanson:The Past,Present and Future of Literature-based Discovery 被引量:7
6
作者 Neil R.Smalheiser 《Journal of Data and Information Science》 CSCD 2017年第4期43-64,共22页
Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for ... Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don's contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.Design/methodology/approach: Personal recollections and literature review. Findings: The Swanson A-B-C model of literature-based discovery has been successfully used by laboratory investigators analyzing their findings and hypotheses. It continues to be a fertile area of research in a wide range of application areas including text mining, drug repurposing, studies of scientific innovation, knowledge discovery in databases, and bioinformatics. Recently, additional modes of discovery that do not follow the A-B-C model have also been proposed and explored (e.g. so-called storytelling, gaps, analogies, link prediction, negative consensus, outliers, and revival of neglected or discarded research questions). Research limitations: This paper reflects the opinions of the author and is not a comprehensive nor technically based review of literature-based discovery. Practical implications: The general scientific public is still not aware of the availability of tools for literature-based discovery. Our Arrowsmith project site maintains a suite of discovery tools that are free and open to the public (http://arrowsmith.psych.uic.edu), as does BITOLA which is maintained by Dmitar Hristovski (http:// http://ibmi.mf.uni-lj.si/bitola), and Epiphanet which is maintained by Trevor Cohen (http://epiphanet.uth.tme.edu/). Bringing user-friendly tools to the public should be a high priority, since even more than advancing basic research in informatics, it is vital that we ensure that scientists actually use discovery tools and that these are actually able to help them make experimental discoveries in the lab and in the clinic. Originality/value: This paper discusses problems and issues which were inherent in Don's thoughts during his life, including those which have not yet been fully taken up and studied systematically. 展开更多
关键词 Literature-based discovery BIOGRAPHY text mining Knowledge discovery indatabases Implicit information information science
下载PDF
Improving the precision of the keyword-matching pornographic text filtering method using a hybrid model 被引量:3
7
作者 苏贵洋 李建华 +1 位作者 马颖华 李生红 《Journal of Zhejiang University Science》 EI CSCD 2004年第9期1106-1113,共8页
With the flooding of pornographic information on the Internet, how to keep people away from that offensive information is becoming one of the most important research areas in network information security. Some applica... With the flooding of pornographic information on the Internet, how to keep people away from that offensive information is becoming one of the most important research areas in network information security. Some applications which can block or filter such information are used. Approaches in those systems can be roughly classified into two kinds: metadata based and content based. With the development of distributed technologies, content based filtering technologies will play a more and more important role in filtering systems. Keyword matching is a content based method used widely in harmful text filtering. Experiments to evaluate the recall and precision of the method showed that the precision of the method is not satisfactory, though the recall of the method is rather high. According to the results, a new pornographic text filtering model based on reconfirming is put forward. Experiments showed that the model is practical, has less loss of recall than the single keyword matching method, and has higher precision. 展开更多
关键词 Pornographic text filtering Content based filtering information filtering Network content security
下载PDF
The Application of R Software in Text Analysis
8
作者 Weihao Shi Zhezhi Jin 《数学计算(中英文版)》 2018年第1期1-5,共5页
With the development of Web 2.0,more and more people choose to use the Internet to express their opinions.All this opinions together into a new form text which contains a lot of valuable emotional information,this is ... With the development of Web 2.0,more and more people choose to use the Internet to express their opinions.All this opinions together into a new form text which contains a lot of valuable emotional information,this is why how to deal with these texts and analysis the emotional information is significant for us.We get three main tasks of sentiment analysis,including sentiment extraction,sentiment classification,sentiment application and summarization.In this paper,based on the R software,we introduced the steps of sentiment analysis in detail.Finally,we collect the movie reviews from the Internet,and use R software to do sentiment analysis in order to judge the emotional tendency of the text. 展开更多
关键词 text SENTIMENT Analysis R Software information EXTRACTION EMOTION RECOGNITION
下载PDF
Dimensionality Reduction by Mutual Information for Text Classification 被引量:2
9
作者 刘丽珍 宋瀚涛 陆玉昌 《Journal of Beijing Institute of Technology》 EI CAS 2005年第1期32-36,共5页
The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descript... The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining. 展开更多
关键词 text classification mutual information dimensionality reduction
下载PDF
A comprehensive review of existing corpora and methods for creating annotated corpora for event extraction tasks
10
作者 Mohd Hafizul Afifi Abdullah Norshakirah Aziz +3 位作者 Said Jadid Abdulkadir Kashif Hussain Hitham Alhussian Noureen Talpur 《Journal of Data and Information Science》 CSCD 2024年第4期196-238,共43页
Purpose:The purpose of this study is to serve as a comprehensive review of the existing annotated corpora.This review study aims to provide information on the existing annotated corpora for event extraction,which are ... Purpose:The purpose of this study is to serve as a comprehensive review of the existing annotated corpora.This review study aims to provide information on the existing annotated corpora for event extraction,which are limited but essential for training and improving the existing event extraction algorithms.In addition to the primary goal of this study,it provides guidelines for preparing an annotated corpus and suggests suitable tools for the annotation task.Design/methodology/approach:This study employs an analytical approach to examine available corpus that is suitable for event extraction tasks.It offers an in-depth analysis of existing event extraction corpora and provides systematic guidelines for researchers to develop accurate,high-quality corpora.This ensures the reliability of the created corpus and its suitability for training machine learning algorithms.Findings:Our exploration reveals a scarcity of annotated corpora for event extraction tasks.In particular,the English corpora are mainly focused on the biomedical and general domains.Despite the issue of annotated corpora scarcity,there are several high-quality corpora available and widely used as benchmark datasets.However,access to some of these corpora might be limited owing to closed-access policies or discontinued maintenance after being initially released,rendering them inaccessible owing to broken links.Therefore,this study documents the available corpora for event extraction tasks.Research limitations:Our study focuses only on well-known corpora available in English and Chinese.Nevertheless,this study places a strong emphasis on the English corpora due to its status as a global lingua franca,making it widely understood compared to other languages.Practical implications:We genuinely believe that this study provides valuable knowledge that can serve as a guiding framework for preparing and accurately annotating events from text corpora.It provides comprehensive guidelines for researchers to improve the quality of corpus annotations,especially for event extraction tasks across various domains.Originality/value:This study comprehensively compiled information on the existing annotated corpora for event extraction tasks and provided preparation guidelines. 展开更多
关键词 information extraction Event extraction text mining Large language model Natural language processing
下载PDF
Feature selection algorithm for text classification based on improved mutual information 被引量:1
11
作者 丛帅 张积宾 +1 位作者 徐志明 王宇颖 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2011年第3期144-148,共5页
In order to solve the poor performance in text classification when using traditional formula of mutual information (MI) , a feature selection algorithm were proposed based on improved mutual information. The improve... In order to solve the poor performance in text classification when using traditional formula of mutual information (MI) , a feature selection algorithm were proposed based on improved mutual information. The improved mutual information algorithm, which is on the basis of traditional improved mutual information methods that enbance the MI value of negative characteristics and feature' s frequency, supports the concept of concentration degree and dispersion degree. In accordance with the concept of concentration degree and dispersion degree, formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these. In this paper, the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods. The experimental results showed that the improved mutu- al information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain. Through the introduction of the concept of concentration degree and dispersion degree, the improved mutual information feature selection method greatly improves the performance of text classification system. 展开更多
关键词 text classification feature selection improved mutual information Biomimetie Pattern Recognition
下载PDF
LRV: A Tool for Academic Text Visualization to Support theLiterature Review Process
12
作者 Tahani Almutairi Maha Al-yahya 《Computers, Materials & Continua》 SCIE EI 2019年第6期741-751,共11页
Text visualization is concerned with the representation of text in a graphicalform to facilitate comprehension of large textual data. Its aim is to improve the ability tounderstand and utilize the wealth of text-based... Text visualization is concerned with the representation of text in a graphicalform to facilitate comprehension of large textual data. Its aim is to improve the ability tounderstand and utilize the wealth of text-based information available. An essential task inany scientific research is the study and review of previous works in the specified domain,a process that is referred to as the literature survey process. This process involves theidentification of prior work and evaluating its relevance to the research question. With theenormous number of published studies available online in digital form, this becomes acumbersome task for the researcher. This paper presents the design and implementationof a tool that aims to facilitate this process by identifying relevant work and suggestingclusters of articles by conceptual modeling, thus providing different options that enablethe researcher to visualize a large number of articles in a graphical easy-to-analyze form.The tool helps the researcher in analyzing and synthesizing the literature and building aconceptual understanding of the designated research area. The evaluation of the toolshows that researchers have found it useful and that it supported the process of relevantwork analysis given a specific research question, and 70% of the evaluators of the toolfound it very useful. 展开更多
关键词 text visualization information extraction text mining literature review
下载PDF
Modeling of unsupervised knowledge graph of events based on mutual information among neighbor domains and sparse representation
13
作者 Jing-Tao Sun Jing-Ming Li Qiu-Yu Zhang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2022年第12期2150-2159,共10页
Text event mining,as an indispensable method of text mining processing,has attracted the extensive attention of researchers.A modeling method for knowledge graph of events based on mutual information among neighbor do... Text event mining,as an indispensable method of text mining processing,has attracted the extensive attention of researchers.A modeling method for knowledge graph of events based on mutual information among neighbor domains and sparse representation is proposed in this paper,i.e.UKGE-MS.Specifically,UKGE-MS can improve the existing text mining technology's ability of understanding and discovering high-dimensional unmarked information,and solves the problems of traditional unsupervised feature selection methods,which only focus on selecting features from a global perspective and ignoring the impact of local connection of samples.Firstly,considering the influence of local information of samples in feature correlation evaluation,a feature clustering algorithm based on average neighborhood mutual information is proposed,and the feature clusters with certain event correlation are obtained;Secondly,an unsupervised feature selection method based on the high-order correlation of multi-dimensional statistical data is designed by combining the dimension reduction advantage of local linear embedding algorithm and the feature selection ability of sparse representation,so as to enhance the generalization ability of the selected feature items.Finally,the events knowledge graph is constructed by means of sparse representation and l1 norm.Extensive experiments are carried out on five real datasets and synthetic datasets,and the UKGE-MS are compared with five corresponding algorithms.The experimental results show that UKGE-MS is better than the traditional method in event clustering and feature selection,and has some advantages over other methods in text event recognition and discovery. 展开更多
关键词 text event mining Knowledge graph of events Mutual information among neighbor domains Sparse representation
下载PDF
Source Text Borrowing in Summary of Integrated Writing
14
作者 XIE Xin-ran 《Journal of Literature and Art Studies》 2023年第9期699-703,共5页
Integrated writing tasks,requiring writers to summarize source content and integrate their ideas or experience with those in the source text,are widely used in standardized tests of English proficiency.Integrated writ... Integrated writing tasks,requiring writers to summarize source content and integrate their ideas or experience with those in the source text,are widely used in standardized tests of English proficiency.Integrated writing adds an element not found in traditional independent writing:the use of source text material.Recently,concern has been raised that over-reliance on source texts in integrated writing may lead to plagiarism as well as inaccurate assessment which calls for further investigation.Research on L2 students’writing using sources has broadened its focus from transgressive citing practices to stages of skill development in this complicated literacy,the successive challenges students face,and how instruction might be of benefit.Previous literature mainly focused on source use in academic writing and seldom considered summary writing in standardized tests,while summary strategy is particularly important in standardized tests,in which summary accounts for a significant part of the assessment criteria.Given that a series of previous literature have indicated that source text use,source texts,and prompts are associated with the quality of summary writing,this article reviews the source text borrowing in the summary of integrated writing to further explore the relationship between some variables and the quality of summary writing and therefore finds some research gaps in this field. 展开更多
关键词 summary writing source text borrowing integrated writing PARAPHRasING
下载PDF
Survey of Data Value Evaluation Methods Based on Open Source Scientific and Technological Information
15
作者 Xiaolin Wang Cheng Dong +2 位作者 Wen Zeng Zhen Xu Junsheng Zhang 《国际计算机前沿大会会议论文集》 2019年第1期183-185,共3页
It is important to effectively identify the data value of open source scientific and technological information and to help intelligence analysts select high-value data from a large number of open-source scientific and... It is important to effectively identify the data value of open source scientific and technological information and to help intelligence analysts select high-value data from a large number of open-source scientific and technological information. The data value evaluation methods of scientific and technological information is proposed in the open source environment. According to the characteristics of the methods, the data value evaluation methods were divided into the following three aspects: research on data value evaluation methods based on information metrology, research on data value evaluation methods based on economic perspective and research on data value assessment methods based on text analysis. For each method, it indicated the main ideas, application scenarios, advantages and disadvantages. 展开更多
关键词 Data value evaluation OPEN-SOURCE SCIENTIFIC and TECHNOLOGICAL information information metrology Economic perspective text analysis
下载PDF
Ontology-based similarity measure for text clustering 被引量:1
16
作者 颜端武 李晓鹏 +1 位作者 王磊 成晓 《Journal of Southeast University(English Edition)》 EI CAS 2006年第3期389-393,共5页
A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywor... A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described. 展开更多
关键词 similarity measure text clustering ONTOLOGY information retrieval system
下载PDF
基于改进TextRank的科技文本关键词抽取方法 被引量:2
17
作者 杨冬菊 胡成富 《计算机应用》 CSCD 北大核心 2024年第6期1720-1726,共7页
针对科技文本关键词抽取任务中抽取出现次数少但能较好表达文本主旨的词语效果差的问题,提出一种基于改进TextRank的关键词抽取方法。首先,利用词语的词频-逆文档频率(TF-IDF)统计特征和位置特征优化共现图中词语间的概率转移矩阵,通过... 针对科技文本关键词抽取任务中抽取出现次数少但能较好表达文本主旨的词语效果差的问题,提出一种基于改进TextRank的关键词抽取方法。首先,利用词语的词频-逆文档频率(TF-IDF)统计特征和位置特征优化共现图中词语间的概率转移矩阵,通过迭代计算得到词语的初始得分;然后,利用K-Core(K-Core decomposition)算法挖掘KCore子图得到词语的层级特征,利用平均信息熵特征衡量词语的主题表征能力;最后,在词语初始得分的基础上融合层级特征和平均信息熵特征,从而确定关键词。实验结果表明,在公开数据集上,与TextRank方法和OTextRank(Optimized TextRank)方法相比,所提方法在抽取不同关键词数量的实验中,F1均值分别提高了6.5和3.3个百分点;在科技服务项目数据集上,与TextRank方法和OTextRank方法相比,所提方法在抽取不同关键词数量的实验中,F1均值分别提高了7.4和3.2个百分点。实验结果验证了所提方法抽取出现频率低但较好表达文本主旨关键词的有效性。 展开更多
关键词 科技文本 关键词抽取 textRank K-Core图 平均信息熵
下载PDF
Automatic User Goals Identification Based on Anchor Text and Click-Through Data 被引量:5
18
作者 YUAN Xiaojie DOU Zhicheng ZHANG Lu LIU Fang 《Wuhan University Journal of Natural Sciences》 CAS 2008年第4期495-500,共6页
Understanding the underlying goal behind a user's Web query has been proved to be helpful to improve the quality of search. This paper focuses on the problem of automatic identification of query types according to th... Understanding the underlying goal behind a user's Web query has been proved to be helpful to improve the quality of search. This paper focuses on the problem of automatic identification of query types according to the goals. Four novel entropy-based features extracted from anchor data and click-through data are proposed, and a support vector machines (SVM) classifier is used to identify the user's goal based on these features. Experi- mental results show that the proposed entropy-based features are more effective than those reported in previous work. By combin- ing multiple features the goals for more than 97% of the queries studied can be correctly identified. Besides these, this paper reaches the following important conclusions: First, anchor-based features are more effective than click-through-based features; Second, the number of sites is more reliable than the number of links; Third, click-distribution- based features are more effective than session-based ones. 展开更多
关键词 query classification user goals anchor text click-through data information retrieval
下载PDF
A Text Categorization System with Soft Real-Time Guarantee 被引量:1
19
作者 WANG Hua-yong CHEN Yu DAI Yi-qi 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期226-229,共4页
In order to provide predictable runtime performante for text categorization (TC) systems, an innovative system design method is proposed for soft real time TC systems. An analyzable mathematical model is established... In order to provide predictable runtime performante for text categorization (TC) systems, an innovative system design method is proposed for soft real time TC systems. An analyzable mathematical model is established to approximately describe the nonlinear and time-varying TC systems. According to this mathematical model, the feedback control theory is adopted to prove the system's stableness and zero steady state error. The experiments result shows that the error of deadline satisfied ratio in the system is kept within 4 of the desired value. And the number of classifiers can be dynamically adjusted by the system itself to save the computa tion resources. The proposed methodology enables the theo retical analysis and evaluation to the TC systems, leading to a high-quality and low cost implementation approach. 展开更多
关键词 information retrieval text categorization soft real-time system feedback control theory
下载PDF
Priorities for Social and Humanities Projects Based on Text Analysis 被引量:1
20
作者 Ülle Must 《Journal of Data and Information Science》 CSCD 2020年第4期116-125,共10页
Purpose:Changes in the world show that the role,importance,and coherence of SSH(social sciences and the humanities)will increase significantly in the coming years.This paper aims to monitor and analyze the evolution(o... Purpose:Changes in the world show that the role,importance,and coherence of SSH(social sciences and the humanities)will increase significantly in the coming years.This paper aims to monitor and analyze the evolution(or overlapping)of the SSH thematic pattern through three funding instruments since 2007.Design/methodology/approach:The goal of the paper is to check to what extent the EU Framework Program(FP)affects/does not affect research on national level,and to highlight hot topics from a given period with the help of text analysis.Funded project titles and abstracts derived from the EU FP,Slovenian,and Estonian RIS were used.The final analysis and comparisons between different datasets were made based on the 200 most frequent words.After removing punctuation marks,numeric values,articles,prepositions,conjunctions,and auxiliary verbs,4,854 unique words in ETIS,4,421 unique words in the Slovenian Research Information System(SICRIS),and 3,950 unique words in FP were identified.Findings:Across all funding instruments,about a quarter of the top words constitute half of the word occurrences.The text analysis results show that in the majority of cases words do not overlap between FP and nationally funded projects.In some cases,it may be due to using different vocabulary.There is more overlapping between words in the case of Slovenia(SL)and Estonia(EE)and less in the case of Estonia and EU Framework Programmes(FP).At the same time,overlapping words indicate a wider reach(culture,education,social,history,human,innovation,etc.).In nationally funded projects(bottom-up),it was relatively difficult to observe the change in thematic trends over time.More specific results emerged from the comparison of the different programs throughout FP(top-down).Research limitations:Only projects with English titles and abstracts were analyzed.Practical implications:The specifics of SSH have to take into account—the one-to-one meaning of terms/words is not as important as,for example,in the exact sciences.Thus,even in co-word analysis,the final content may go unnoticed.Originality/value:This was the first attempt to monitor the trends of SSH projects using text analysis.The text analysis of the SSH projects of the two new EU Member States used in the study showed that SSH’s thematic coverage is not much affected by the EU Framework Program.Whether this result is field-specific or country-specific should be shown in the following study,which targets SSH projects in the so-called old Member States. 展开更多
关键词 text analysis SSH Estonian Research information System(ETIS) Slovenian Research information System(SICRIS) Community Research and Development information Service(CORDIS)
下载PDF
上一页 1 2 121 下一页 到第
使用帮助 返回顶部