期刊文献+
共找到2,377篇文章
< 1 2 119 >
每页显示 20 50 100
Unlocking the Potential:A Comprehensive Systematic Review of ChatGPT in Natural Language Processing Tasks
1
作者 Ebtesam Ahmad Alomari 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期43-85,共43页
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in... As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues. 展开更多
关键词 Generative AI large languagemodel(LLM) natural language processing(NLP) ChatGPT GPT(generative pretraining transformer) GPT-4 sentiment analysis NER information extraction ANNOTATION text classification
下载PDF
Effective short text classification via the fusion of hybrid features for IoT social data 被引量:2
2
作者 Xiong Luo Zhijian Yu +2 位作者 Zhigang Zhao Wenbing Zhao Jenq-Haur Wang 《Digital Communications and Networks》 SCIE CSCD 2022年第6期942-954,共13页
Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Prev... Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance. 展开更多
关键词 information fusion Short text classi fication BERT Bidirectional encoder representations fr 0om transformers Deep learning Social data
下载PDF
Translation Strategies of Instructional Text--Taking Murder Your Darlings and Other Gentle Writing Advice from Aristotle to Zinsser as an Example
3
作者 KE Li 《Journal of Literature and Art Studies》 2022年第9期925-932,共8页
With the popularity of English learning in the world,the demand for instructional materials is increasing,and how to accurately convey the information of these texts to English-learners and provide the learners with a... With the popularity of English learning in the world,the demand for instructional materials is increasing,and how to accurately convey the information of these texts to English-learners and provide the learners with an effective learning experience has become a major problem.Therefore,the role of the translator is crucial.It is found that that translators should make linguistic choices based on the text functions and target readers’expectations,and flexibly adopt the translation strategies,such as addition,conversion,and cohesion to convey the intention of the source author and generate the source context accurately and appropriately.Applying relevant theories to the analysis of translation cases,the paper tentatively puts forward solutions to the problems encountered in the E-C translation of chapter one of Murder Your Darlings and Other Gentle Writing Advice from Aristotle to Zinsser.Hopefully,it could provide a reference for other translators working on instructional texts. 展开更多
关键词 informative texts instructional text function plus loyalty reader-response translation strategies
下载PDF
The Effect of Text Familiarity on Summary Writing Skill
4
作者 Abbas Pourhosein Gilakjani 《Sino-US English Teaching》 2011年第10期640-647,共8页
关键词 英语学习 英语教学 英语美文 英语阅读
下载PDF
Source Text Borrowing in Summary of Integrated Writing
5
作者 XIE Xin-ran 《Journal of Literature and Art Studies》 2023年第9期699-703,共5页
Integrated writing tasks,requiring writers to summarize source content and integrate their ideas or experience with those in the source text,are widely used in standardized tests of English proficiency.Integrated writ... Integrated writing tasks,requiring writers to summarize source content and integrate their ideas or experience with those in the source text,are widely used in standardized tests of English proficiency.Integrated writing adds an element not found in traditional independent writing:the use of source text material.Recently,concern has been raised that over-reliance on source texts in integrated writing may lead to plagiarism as well as inaccurate assessment which calls for further investigation.Research on L2 students’writing using sources has broadened its focus from transgressive citing practices to stages of skill development in this complicated literacy,the successive challenges students face,and how instruction might be of benefit.Previous literature mainly focused on source use in academic writing and seldom considered summary writing in standardized tests,while summary strategy is particularly important in standardized tests,in which summary accounts for a significant part of the assessment criteria.Given that a series of previous literature have indicated that source text use,source texts,and prompts are associated with the quality of summary writing,this article reviews the source text borrowing in the summary of integrated writing to further explore the relationship between some variables and the quality of summary writing and therefore finds some research gaps in this field. 展开更多
关键词 summary writing source text borrowing integrated writing PARAPHRasING
下载PDF
Rediscovering Don Swanson:The Past,Present and Future of Literature-based Discovery 被引量:7
6
作者 Neil R.Smalheiser 《Journal of Data and Information Science》 CSCD 2017年第4期43-64,共22页
Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for ... Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don's contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.Design/methodology/approach: Personal recollections and literature review. Findings: The Swanson A-B-C model of literature-based discovery has been successfully used by laboratory investigators analyzing their findings and hypotheses. It continues to be a fertile area of research in a wide range of application areas including text mining, drug repurposing, studies of scientific innovation, knowledge discovery in databases, and bioinformatics. Recently, additional modes of discovery that do not follow the A-B-C model have also been proposed and explored (e.g. so-called storytelling, gaps, analogies, link prediction, negative consensus, outliers, and revival of neglected or discarded research questions). Research limitations: This paper reflects the opinions of the author and is not a comprehensive nor technically based review of literature-based discovery. Practical implications: The general scientific public is still not aware of the availability of tools for literature-based discovery. Our Arrowsmith project site maintains a suite of discovery tools that are free and open to the public (http://arrowsmith.psych.uic.edu), as does BITOLA which is maintained by Dmitar Hristovski (http:// http://ibmi.mf.uni-lj.si/bitola), and Epiphanet which is maintained by Trevor Cohen (http://epiphanet.uth.tme.edu/). Bringing user-friendly tools to the public should be a high priority, since even more than advancing basic research in informatics, it is vital that we ensure that scientists actually use discovery tools and that these are actually able to help them make experimental discoveries in the lab and in the clinic. Originality/value: This paper discusses problems and issues which were inherent in Don's thoughts during his life, including those which have not yet been fully taken up and studied systematically. 展开更多
关键词 Literature-based discovery BIOGRAPHY text mining Knowledge discovery indatabases Implicit information information science
下载PDF
Improving the precision of the keyword-matching pornographic text filtering method using a hybrid model 被引量:3
7
作者 苏贵洋 李建华 +1 位作者 马颖华 李生红 《Journal of Zhejiang University Science》 EI CSCD 2004年第9期1106-1113,共8页
With the flooding of pornographic information on the Internet, how to keep people away from that offensive information is becoming one of the most important research areas in network information security. Some applica... With the flooding of pornographic information on the Internet, how to keep people away from that offensive information is becoming one of the most important research areas in network information security. Some applications which can block or filter such information are used. Approaches in those systems can be roughly classified into two kinds: metadata based and content based. With the development of distributed technologies, content based filtering technologies will play a more and more important role in filtering systems. Keyword matching is a content based method used widely in harmful text filtering. Experiments to evaluate the recall and precision of the method showed that the precision of the method is not satisfactory, though the recall of the method is rather high. According to the results, a new pornographic text filtering model based on reconfirming is put forward. Experiments showed that the model is practical, has less loss of recall than the single keyword matching method, and has higher precision. 展开更多
关键词 信息过滤 网络信息安全 信息基过滤 色情文本过滤 关键子匹配 混合模型
下载PDF
The Application of R Software in Text Analysis
8
作者 Weihao Shi Zhezhi Jin 《数学计算(中英文版)》 2018年第1期1-5,共5页
With the development of Web 2.0,more and more people choose to use the Internet to express their opinions.All this opinions together into a new form text which contains a lot of valuable emotional information,this is ... With the development of Web 2.0,more and more people choose to use the Internet to express their opinions.All this opinions together into a new form text which contains a lot of valuable emotional information,this is why how to deal with these texts and analysis the emotional information is significant for us.We get three main tasks of sentiment analysis,including sentiment extraction,sentiment classification,sentiment application and summarization.In this paper,based on the R software,we introduced the steps of sentiment analysis in detail.Finally,we collect the movie reviews from the Internet,and use R software to do sentiment analysis in order to judge the emotional tendency of the text. 展开更多
关键词 text SENTIMENT Analysis R Software information EXTRACTION EMOTION RECOGNITION
下载PDF
基于改进TextRank的科技文本关键词抽取方法
9
作者 杨冬菊 胡成富 《计算机应用》 CSCD 北大核心 2024年第6期1720-1726,共7页
针对科技文本关键词抽取任务中抽取出现次数少但能较好表达文本主旨的词语效果差的问题,提出一种基于改进TextRank的关键词抽取方法。首先,利用词语的词频-逆文档频率(TF-IDF)统计特征和位置特征优化共现图中词语间的概率转移矩阵,通过... 针对科技文本关键词抽取任务中抽取出现次数少但能较好表达文本主旨的词语效果差的问题,提出一种基于改进TextRank的关键词抽取方法。首先,利用词语的词频-逆文档频率(TF-IDF)统计特征和位置特征优化共现图中词语间的概率转移矩阵,通过迭代计算得到词语的初始得分;然后,利用K-Core(K-Core decomposition)算法挖掘KCore子图得到词语的层级特征,利用平均信息熵特征衡量词语的主题表征能力;最后,在词语初始得分的基础上融合层级特征和平均信息熵特征,从而确定关键词。实验结果表明,在公开数据集上,与TextRank方法和OTextRank(Optimized TextRank)方法相比,所提方法在抽取不同关键词数量的实验中,F1均值分别提高了6.5和3.3个百分点;在科技服务项目数据集上,与TextRank方法和OTextRank方法相比,所提方法在抽取不同关键词数量的实验中,F1均值分别提高了7.4和3.2个百分点。实验结果验证了所提方法抽取出现频率低但较好表达文本主旨关键词的有效性。 展开更多
关键词 科技文本 关键词抽取 textRank K-Core图 平均信息熵
下载PDF
Orbit Weighting Scheme in the Context of Vector Space Information Retrieval
10
作者 Ahmad Ababneh Yousef Sanjalawe +2 位作者 Salam Fraihat Salam Al-E’mari Hamzah Alqudah 《Computers, Materials & Continua》 SCIE EI 2024年第7期1347-1379,共33页
This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schem... This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schemes like tf-idf and BM25.These conventional methods often struggle with accurately capturing document relevance,leading to inefficiencies in both retrieval performance and index size management.OWS proposes a dynamic weighting mechanism that evaluates the significance of terms based on their orbital position within the vector space,emphasizing term relationships and distribution patterns overlooked by existing models.Our research focuses on evaluating OWS’s impact on model accuracy using Information Retrieval metrics like Recall,Precision,InterpolatedAverage Precision(IAP),andMeanAverage Precision(MAP).Additionally,we assessOWS’s effectiveness in reducing the inverted index size,crucial for model efficiency.We compare OWS-based retrieval models against others using different schemes,including tf-idf variations and BM25Delta.Results reveal OWS’s superiority,achieving a 54%Recall and 81%MAP,and a notable 38%reduction in the inverted index size.This highlights OWS’s potential in optimizing retrieval processes and underscores the need for further research in this underrepresented area to fully leverage OWS’s capabilities in information retrieval methodologies. 展开更多
关键词 information retrieval orbit weighting scheme semantic text analysis Tf-Idf weighting scheme vector space model
下载PDF
Dimensionality Reduction by Mutual Information for Text Classification 被引量:2
11
作者 刘丽珍 宋瀚涛 陆玉昌 《Journal of Beijing Institute of Technology》 EI CAS 2005年第1期32-36,共5页
The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descript... The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining. 展开更多
关键词 text classification mutual information dimensionality reduction
下载PDF
Gate-Attention and Dual-End Enhancement Mechanism for Multi-Label Text Classification
12
作者 Jieren Cheng Xiaolong Chen +3 位作者 Wenghang Xu Shuai Hua Zhu Tang Victor S.Sheng 《Computers, Materials & Continua》 SCIE EI 2023年第11期1779-1793,共15页
In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in sema... In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in semantic feature extraction have turned to external knowledge to augment the model’s grasp of textual content,often overlooking intrinsic textual cues such as label statistical features.In contrast,these endogenous insights naturally align with the classification task.In our paper,to complement this focus on intrinsic knowledge,we introduce a novel Gate-Attention mechanism.This mechanism adeptly integrates statistical features from the text itself into the semantic fabric,enhancing the model’s capacity to understand and represent the data.Additionally,to address the intricate task of mining label correlations,we propose a Dual-end enhancement mechanism.This mechanism effectively mitigates the challenges of information loss and erroneous transmission inherent in traditional long short term memory propagation.We conducted an extensive battery of experiments on the AAPD and RCV1-2 datasets.These experiments serve the dual purpose of confirming the efficacy of both the Gate-Attention mechanism and the Dual-end enhancement mechanism.Our final model unequivocally outperforms the baseline model,attesting to its robustness.These findings emphatically underscore the imperativeness of taking into account not just external knowledge but also the inherent intricacies of textual data when crafting potent MLTC models. 展开更多
关键词 Multi-label text classification feature extraction label distribution information sequence generation
下载PDF
Feature selection algorithm for text classification based on improved mutual information 被引量:1
13
作者 丛帅 张积宾 +1 位作者 徐志明 王宇颖 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2011年第3期144-148,共5页
In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mut... In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mutual information algorithm,which is on the basis of traditional improved mutual information methods that enhance the MI value of negative characteristics and feature's frequency,supports the concept of concentration degree and dispersion degree.In accordance with the concept of concentration degree and dispersion degree,formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these.In this paper,the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods.The experimental results showed that the improved mutual information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain.Through the introduction of the concept of concentration degree and dispersion degree,the improved mutual information feature selection method greatly improves the performance of text classification system. 展开更多
关键词 text classification feature selection improved mutual information Biomimetic Pattern Recognition
下载PDF
A Weighted Multi-Layer Analytics Based Model for Emoji Recommendation
14
作者 Amira M.Idrees Abdul Lateef Marzouq Al-Solami 《Computers, Materials & Continua》 SCIE EI 2024年第1期1115-1133,共19页
The developed system for eye and face detection using Convolutional Neural Networks(CNN)models,followed by eye classification and voice-based assistance,has shown promising potential in enhancing accessibility for ind... The developed system for eye and face detection using Convolutional Neural Networks(CNN)models,followed by eye classification and voice-based assistance,has shown promising potential in enhancing accessibility for individuals with visual impairments.The modular approach implemented in this research allows for a seamless flow of information and assistance between the different components of the system.This research significantly contributes to the field of accessibility technology by integrating computer vision,natural language processing,and voice technologies.By leveraging these advancements,the developed system offers a practical and efficient solution for assisting blind individuals.The modular design ensures flexibility,scalability,and ease of integration with existing assistive technologies.However,it is important to acknowledge that further research and improvements are necessary to enhance the system’s accuracy and usability.Fine-tuning the CNN models and expanding the training dataset can improve eye and face detection as well as eye classification capabilities.Additionally,incorporating real-time responses through sophisticated natural language understanding techniques and expanding the knowledge base of ChatGPT can enhance the system’s ability to provide comprehensive and accurate responses.Overall,this research paves the way for the development of more advanced and robust systems for assisting visually impaired individuals.By leveraging cutting-edge technologies and integrating them into amodular framework,this research contributes to creating a more inclusive and accessible society for individuals with visual impairments.Future work can focus on refining the system,addressing its limitations,and conducting user studies to evaluate its effectiveness and impact in real-world scenarios. 展开更多
关键词 Social networks text analytics emoji prediction features extraction information retrieval
下载PDF
LRV: A Tool for Academic Text Visualization to Support theLiterature Review Process
15
作者 Tahani Almutairi Maha Al-yahya 《Computers, Materials & Continua》 SCIE EI 2019年第6期741-751,共11页
Text visualization is concerned with the representation of text in a graphicalform to facilitate comprehension of large textual data. Its aim is to improve the ability tounderstand and utilize the wealth of text-based... Text visualization is concerned with the representation of text in a graphicalform to facilitate comprehension of large textual data. Its aim is to improve the ability tounderstand and utilize the wealth of text-based information available. An essential task inany scientific research is the study and review of previous works in the specified domain,a process that is referred to as the literature survey process. This process involves theidentification of prior work and evaluating its relevance to the research question. With theenormous number of published studies available online in digital form, this becomes acumbersome task for the researcher. This paper presents the design and implementationof a tool that aims to facilitate this process by identifying relevant work and suggestingclusters of articles by conceptual modeling, thus providing different options that enablethe researcher to visualize a large number of articles in a graphical easy-to-analyze form.The tool helps the researcher in analyzing and synthesizing the literature and building aconceptual understanding of the designated research area. The evaluation of the toolshows that researchers have found it useful and that it supported the process of relevantwork analysis given a specific research question, and 70% of the evaluators of the toolfound it very useful. 展开更多
关键词 text visualization information extraction text mining literature review
下载PDF
Modeling of unsupervised knowledge graph of events based on mutual information among neighbor domains and sparse representation
16
作者 Jing-Tao Sun Jing-Ming Li Qiu-Yu Zhang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2022年第12期2150-2159,共10页
Text event mining,as an indispensable method of text mining processing,has attracted the extensive attention of researchers.A modeling method for knowledge graph of events based on mutual information among neighbor do... Text event mining,as an indispensable method of text mining processing,has attracted the extensive attention of researchers.A modeling method for knowledge graph of events based on mutual information among neighbor domains and sparse representation is proposed in this paper,i.e.UKGE-MS.Specifically,UKGE-MS can improve the existing text mining technology's ability of understanding and discovering high-dimensional unmarked information,and solves the problems of traditional unsupervised feature selection methods,which only focus on selecting features from a global perspective and ignoring the impact of local connection of samples.Firstly,considering the influence of local information of samples in feature correlation evaluation,a feature clustering algorithm based on average neighborhood mutual information is proposed,and the feature clusters with certain event correlation are obtained;Secondly,an unsupervised feature selection method based on the high-order correlation of multi-dimensional statistical data is designed by combining the dimension reduction advantage of local linear embedding algorithm and the feature selection ability of sparse representation,so as to enhance the generalization ability of the selected feature items.Finally,the events knowledge graph is constructed by means of sparse representation and l1 norm.Extensive experiments are carried out on five real datasets and synthetic datasets,and the UKGE-MS are compared with five corresponding algorithms.The experimental results show that UKGE-MS is better than the traditional method in event clustering and feature selection,and has some advantages over other methods in text event recognition and discovery. 展开更多
关键词 text event mining Knowledge graph of events Mutual information among neighbor domains Sparse representation
下载PDF
Text-to-SQL文本信息处理技术研究综述
17
作者 彭钰寒 乔少杰 +5 位作者 薛骐 李江敏 谢添丞 徐康镭 冉黎琼 曾少北 《无线电工程》 2024年第5期1053-1062,共10页
信号与信息处理的需求日益增加,离不开数据处理技术,数据处理需要数据库的支持,然而没有经过训练的使用者会因为不熟悉数据库操作产生诸多问题。文本转结构化查询语言(Text to Structured Query Language,Text-to-SQL)的出现,使用户无... 信号与信息处理的需求日益增加,离不开数据处理技术,数据处理需要数据库的支持,然而没有经过训练的使用者会因为不熟悉数据库操作产生诸多问题。文本转结构化查询语言(Text to Structured Query Language,Text-to-SQL)的出现,使用户无需掌握结构化查询语言(Structured Query Language,SQL)也能够熟练操作数据库。介绍Text-to-SQL的研究背景及面临的挑战;介绍Text-to-SQL关键技术、基准数据集、模型演变及最新研究进展,关键技术包括Transformer等主流技术,用于模型训练的基准数据集包括WikiSQL和Spider;介绍Text-to-SQL不同阶段模型的特点,详细阐述Text-to-SQL最新研究成果的工作原理,包括模型构建、解析器设计及数据集生成;总结Text-to-SQL未来的发展方向及研究重点。 展开更多
关键词 文本转结构化查询语言 解析器 文本信息处理 数据库 深度学习
下载PDF
Survey of Data Value Evaluation Methods Based on Open Source Scientific and Technological Information
18
作者 Xiaolin Wang Cheng Dong +2 位作者 Wen Zeng Zhen Xu Junsheng Zhang 《国际计算机前沿大会会议论文集》 2019年第1期183-185,共3页
It is important to effectively identify the data value of open source scientific and technological information and to help intelligence analysts select high-value data from a large number of open-source scientific and... It is important to effectively identify the data value of open source scientific and technological information and to help intelligence analysts select high-value data from a large number of open-source scientific and technological information. The data value evaluation methods of scientific and technological information is proposed in the open source environment. According to the characteristics of the methods, the data value evaluation methods were divided into the following three aspects: research on data value evaluation methods based on information metrology, research on data value evaluation methods based on economic perspective and research on data value assessment methods based on text analysis. For each method, it indicated the main ideas, application scenarios, advantages and disadvantages. 展开更多
关键词 Data value evaluation OPEN-SOURCE SCIENTIFIC and TECHNOLOGICAL information information metrology Economic perspective text analysis
下载PDF
Automatic User Goals Identification Based on Anchor Text and Click-Through Data 被引量:5
19
作者 YUAN Xiaojie DOU Zhicheng ZHANG Lu LIU Fang 《Wuhan University Journal of Natural Sciences》 CAS 2008年第4期495-500,共6页
Understanding the underlying goal behind a user's Web query has been proved to be helpful to improve the quality of search. This paper focuses on the problem of automatic identification of query types according to th... Understanding the underlying goal behind a user's Web query has been proved to be helpful to improve the quality of search. This paper focuses on the problem of automatic identification of query types according to the goals. Four novel entropy-based features extracted from anchor data and click-through data are proposed, and a support vector machines (SVM) classifier is used to identify the user's goal based on these features. Experi- mental results show that the proposed entropy-based features are more effective than those reported in previous work. By combin- ing multiple features the goals for more than 97% of the queries studied can be correctly identified. Besides these, this paper reaches the following important conclusions: First, anchor-based features are more effective than click-through-based features; Second, the number of sites is more reliable than the number of links; Third, click-distribution- based features are more effective than session-based ones. 展开更多
关键词 query classification user goals anchor text click-through data information retrieval
下载PDF
A Text Categorization System with Soft Real-Time Guarantee 被引量:1
20
作者 WANG Hua-yong CHEN Yu DAI Yi-qi 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期226-229,共4页
In order to provide predictable runtime performante for text categorization (TC) systems, an innovative system design method is proposed for soft real time TC systems. An analyzable mathematical model is established... In order to provide predictable runtime performante for text categorization (TC) systems, an innovative system design method is proposed for soft real time TC systems. An analyzable mathematical model is established to approximately describe the nonlinear and time-varying TC systems. According to this mathematical model, the feedback control theory is adopted to prove the system's stableness and zero steady state error. The experiments result shows that the error of deadline satisfied ratio in the system is kept within 4 of the desired value. And the number of classifiers can be dynamically adjusted by the system itself to save the computa tion resources. The proposed methodology enables the theo retical analysis and evaluation to the TC systems, leading to a high-quality and low cost implementation approach. 展开更多
关键词 information retrieval text categorization soft real-time system feedback control theory
下载PDF
上一页 1 2 119 下一页 到第
使用帮助 返回顶部