To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new t...To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.展开更多
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in...As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.展开更多
The recent developments in Multimedia Internet of Things(MIoT)devices,empowered with Natural Language Processing(NLP)model,seem to be a promising future of smart devices.It plays an important role in industrial models...The recent developments in Multimedia Internet of Things(MIoT)devices,empowered with Natural Language Processing(NLP)model,seem to be a promising future of smart devices.It plays an important role in industrial models such as speech understanding,emotion detection,home automation,and so on.If an image needs to be captioned,then the objects in that image,its actions and connections,and any silent feature that remains under-projected or missing from the images should be identified.The aim of the image captioning process is to generate a caption for image.In next step,the image should be provided with one of the most significant and detailed descriptions that is syntactically as well as semantically correct.In this scenario,computer vision model is used to identify the objects and NLP approaches are followed to describe the image.The current study develops aNatural Language Processing with Optimal Deep Learning Enabled Intelligent Image Captioning System(NLPODL-IICS).The aim of the presented NLPODL-IICS model is to produce a proper description for input image.To attain this,the proposed NLPODL-IICS follows two stages such as encoding and decoding processes.Initially,at the encoding side,the proposed NLPODL-IICS model makes use of Hunger Games Search(HGS)with Neural Search Architecture Network(NASNet)model.This model represents the input data appropriately by inserting it into a predefined length vector.Besides,during decoding phase,Chimp Optimization Algorithm(COA)with deeper Long Short Term Memory(LSTM)approach is followed to concatenate the description sentences 4436 CMC,2023,vol.74,no.2 produced by the method.The application of HGS and COA algorithms helps in accomplishing proper parameter tuning for NASNet and LSTM models respectively.The proposed NLPODL-IICS model was experimentally validated with the help of two benchmark datasets.Awidespread comparative analysis confirmed the superior performance of NLPODL-IICS model over other models.展开更多
A language model for information retrieval is built by using a query language model to generate queries and a document language model to generate documents. The documents are ranked according to the relative entropies...A language model for information retrieval is built by using a query language model to generate queries and a document language model to generate documents. The documents are ranked according to the relative entropies of estimated document language models with respect to the estimated query language model. Two popular and relatively efficient smoothing methods, the Jelinek- Mercer method and the absolute discounting method, are used to smooth the document language model in estimation of the document language, A combined model composed of the feedback document language model and the collection language model is used to estimate the query model. A performacne comparison between the new retrieval method and the existing method with feedback is made, and the retrieval performances of the proposed method with the two different smoothing techniques are evaluated on three Text Retrieval Conference (TREC) data sets. Experimental results show that the method is effective and performs better than the basic language modeling approach; moreover, the method using the Jelinek-Mercer technique performs better than that using the absolute discounting technique, and the perfomance is sensitive to the smoothing peramters.展开更多
Bilingual word vectors have been exploited a lot in cross-language information retrieval research. However, most of the research is currently focused on similar language pairs. There are very few studies exploring the...Bilingual word vectors have been exploited a lot in cross-language information retrieval research. However, most of the research is currently focused on similar language pairs. There are very few studies exploring the impact of using bilingual word vectors for cross-language information retrieval in long-distance language pairs. In this paper, it systematically analyzes the retrieval performance of various European languages (English, German, Italian, French, Finnish, Dutch) as well as Asian languages (Chinese, Japanese) in the adhoc task of CLEF 2002–2003 campaign. Genetic proximity was used to visually represent the relationships between languages and compare their crosslingual retrieval performance in various settings. The results show that the differences in language vocabulary would dramatically affect the retrieval performance. At the same time, the term by term translation retrieval method performs slightly better than the simple vector addition retrieval methods. It proves that the translation-based retrieval model can still maintain its advantage under the new semantic scheme.展开更多
Because SQL for querying data from spatial databa se s is ineffective, the query based on natural or visual language becomes an attra ctive research field gradually. However, how to define and represent natural lan gu...Because SQL for querying data from spatial databa se s is ineffective, the query based on natural or visual language becomes an attra ctive research field gradually. However, how to define and represent natural lan guages related to spatial data are still gigantic problems. Because existing mod els of direction relations can’t describe by use of some common concepts. First of all, detailed direction relations are proposed to describe the directions re lated to the interior of spatial objects, such as "east part of a region","ea st boundary of a region", and so on. Secondly, by integrating the detailed dire ctions with exterior direction relations and topological relations, several NLSR s are defined, such as "a road goes across the east part of a lake", "a river goes along the east boundary of a province", etc. Finally, based on the NLSRs abovementioned, a natural spatial query language (NSQL) is formed to retrieve da ta from spatial databases.展开更多
Sentiment analysis or opinion mining(OM)concepts become familiar due to advances in networking technologies and social media.Recently,massive amount of text has been generated over Internet daily which makes the patte...Sentiment analysis or opinion mining(OM)concepts become familiar due to advances in networking technologies and social media.Recently,massive amount of text has been generated over Internet daily which makes the pattern recognition and decision making process difficult.Since OM find useful in business sectors to improve the quality of the product as well as services,machine learning(ML)and deep learning(DL)models can be considered into account.Besides,the hyperparameters involved in the DL models necessitate proper adjustment process to boost the classification process.Therefore,in this paper,a new Artificial Fish Swarm Optimization with Bidirectional Long Short Term Memory(AFSO-BLSTM)model has been developed for OM process.The major intention of the AFSO-BLSTM model is to effectively mine the opinions present in the textual data.In addition,the AFSO-BLSTM model undergoes pre-processing and TF-IFD based feature extraction process.Besides,BLSTM model is employed for the effectual detection and classification of opinions.Finally,the AFSO algorithm is utilized for effective hyperparameter adjustment process of the BLSTM model,shows the novelty of the work.A complete simulation study of the AFSO-BLSTM model is validated using benchmark dataset and the obtained experimental values revealed the high potential of the AFSO-BLSTM model on mining opinions.展开更多
Analyzing Research and Development(R&D)trends is important because it can influence future decisions regarding R&D direction.In typical trend analysis,topic or technology taxonomies are employed to compute the...Analyzing Research and Development(R&D)trends is important because it can influence future decisions regarding R&D direction.In typical trend analysis,topic or technology taxonomies are employed to compute the popularities of the topics or codes over time.Although it is simple and effective,the taxonomies are difficult to manage because new technologies are introduced rapidly.Therefore,recent studies exploit deep learning to extract pre-defined targets such as problems and solutions.Based on the recent advances in question answering(QA)using deep learning,we adopt a multi-turn QA model to extract problems and solutions from Korean R&D reports.With the previous research,we use the reports directly and analyze the difficulties in handling them using QA style on Information Extraction(IE)for sentence-level benchmark dataset.After investigating the characteristics of Korean R&D,we propose a model to deal with multiple and repeated appearances of targets in the reports.Accordingly,we propose a model that includes an algorithm with two novel modules and a prompt.A newly proposed methodology focuses on reformulating a question without a static template or pre-defined knowledge.We show the effectiveness of the proposed model using a Korean R&D report dataset that we constructed and presented an in-depth analysis of the benefits of the multi-turn QA model.展开更多
At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production ...At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production environments,there are a large number of KGs with a small number of entities and relations,which are called sparse KGs.Limited by the performance of knowledge extraction methods or some other reasons(some common-sense information does not appear in the natural corpus),the relation between entities is often incomplete.To solve this problem,a method of the graph neural network and information enhancement is proposed.The improved method increases the mean reciprocal rank(MRR)and Hit@3 by 1.6%and 1.7%,respectively,when the sparsity of the FB15K-237 dataset is 10%.When the sparsity is 50%,the evaluation indexes MRR and Hit@10 are increased by 0.8%and 1.8%,respectively.展开更多
自然语言到结构化查询语言(natural language to structured query language,NL2SQL)任务旨在将自然语言询问转化为数据库可执行的结构化查询语言(structured query language,SQL)语句。本文提出了一种辅助任务增强的中文跨域NL2SQL算法...自然语言到结构化查询语言(natural language to structured query language,NL2SQL)任务旨在将自然语言询问转化为数据库可执行的结构化查询语言(structured query language,SQL)语句。本文提出了一种辅助任务增强的中文跨域NL2SQL算法,其核心思想是通过在解码阶段添加辅助任务以结合原始模型来进行多任务训练,提升模型的准确率。辅助任务的设计是通过将数据库模式建模成图,预测自然语言询问与数据库模式图中的节点的依赖关系,显式地建模自然语言询问和数据库模式之间的依赖关系。针对特定的自然语言询问,通过辅助任务的提升,模型能够更好地识别数据库模式中哪些表/列对预测目标SQL更有效。在中文NL2SQL数据集DuSQL上的实验结果表明,添加辅助任务后的算法相对于原始模型取得了更好的效果,能够更好地处理跨域NL2SQL任务。展开更多
基金the High Technology Research and Development Program of China(No.2006AA01Z150)the National Natural Science Foundation of China(No.60435020)
文摘To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.
文摘As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R161)PrincessNourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the|Deanship of Scientific Research at Umm Al-Qura University|for supporting this work by Grant Code:(22UQU4310373DSR33).
文摘The recent developments in Multimedia Internet of Things(MIoT)devices,empowered with Natural Language Processing(NLP)model,seem to be a promising future of smart devices.It plays an important role in industrial models such as speech understanding,emotion detection,home automation,and so on.If an image needs to be captioned,then the objects in that image,its actions and connections,and any silent feature that remains under-projected or missing from the images should be identified.The aim of the image captioning process is to generate a caption for image.In next step,the image should be provided with one of the most significant and detailed descriptions that is syntactically as well as semantically correct.In this scenario,computer vision model is used to identify the objects and NLP approaches are followed to describe the image.The current study develops aNatural Language Processing with Optimal Deep Learning Enabled Intelligent Image Captioning System(NLPODL-IICS).The aim of the presented NLPODL-IICS model is to produce a proper description for input image.To attain this,the proposed NLPODL-IICS follows two stages such as encoding and decoding processes.Initially,at the encoding side,the proposed NLPODL-IICS model makes use of Hunger Games Search(HGS)with Neural Search Architecture Network(NASNet)model.This model represents the input data appropriately by inserting it into a predefined length vector.Besides,during decoding phase,Chimp Optimization Algorithm(COA)with deeper Long Short Term Memory(LSTM)approach is followed to concatenate the description sentences 4436 CMC,2023,vol.74,no.2 produced by the method.The application of HGS and COA algorithms helps in accomplishing proper parameter tuning for NASNet and LSTM models respectively.The proposed NLPODL-IICS model was experimentally validated with the help of two benchmark datasets.Awidespread comparative analysis confirmed the superior performance of NLPODL-IICS model over other models.
基金The National Natural Science Founda-tion of China ( No. 60473004)the Science and ResearchFoundation Program of Henan University of Science and Tech-nology (No.2004ZY041)the Natural and Science FoundationProgram of the Education Department of Henan Province (No.200410464004)
文摘A language model for information retrieval is built by using a query language model to generate queries and a document language model to generate documents. The documents are ranked according to the relative entropies of estimated document language models with respect to the estimated query language model. Two popular and relatively efficient smoothing methods, the Jelinek- Mercer method and the absolute discounting method, are used to smooth the document language model in estimation of the document language, A combined model composed of the feedback document language model and the collection language model is used to estimate the query model. A performacne comparison between the new retrieval method and the existing method with feedback is made, and the retrieval performances of the proposed method with the two different smoothing techniques are evaluated on three Text Retrieval Conference (TREC) data sets. Experimental results show that the method is effective and performs better than the basic language modeling approach; moreover, the method using the Jelinek-Mercer technique performs better than that using the absolute discounting technique, and the perfomance is sensitive to the smoothing peramters.
基金National Natural Science Foundation of China under Project No. 61876062Scientific Research Fund of Hunan Provincial Education Department of China under Project No. 16K030Hunan Provincial Natural Science Foundation of China under Project No. 2017JJ2101, Hunan Provincial Innovation Foundation for Postgraduate under Project No. CX2018B671.
文摘Bilingual word vectors have been exploited a lot in cross-language information retrieval research. However, most of the research is currently focused on similar language pairs. There are very few studies exploring the impact of using bilingual word vectors for cross-language information retrieval in long-distance language pairs. In this paper, it systematically analyzes the retrieval performance of various European languages (English, German, Italian, French, Finnish, Dutch) as well as Asian languages (Chinese, Japanese) in the adhoc task of CLEF 2002–2003 campaign. Genetic proximity was used to visually represent the relationships between languages and compare their crosslingual retrieval performance in various settings. The results show that the differences in language vocabulary would dramatically affect the retrieval performance. At the same time, the term by term translation retrieval method performs slightly better than the simple vector addition retrieval methods. It proves that the translation-based retrieval model can still maintain its advantage under the new semantic scheme.
文摘Because SQL for querying data from spatial databa se s is ineffective, the query based on natural or visual language becomes an attra ctive research field gradually. However, how to define and represent natural lan guages related to spatial data are still gigantic problems. Because existing mod els of direction relations can’t describe by use of some common concepts. First of all, detailed direction relations are proposed to describe the directions re lated to the interior of spatial objects, such as "east part of a region","ea st boundary of a region", and so on. Secondly, by integrating the detailed dire ctions with exterior direction relations and topological relations, several NLSR s are defined, such as "a road goes across the east part of a lake", "a river goes along the east boundary of a province", etc. Finally, based on the NLSRs abovementioned, a natural spatial query language (NSQL) is formed to retrieve da ta from spatial databases.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(RGP 2/142/43).
文摘Sentiment analysis or opinion mining(OM)concepts become familiar due to advances in networking technologies and social media.Recently,massive amount of text has been generated over Internet daily which makes the pattern recognition and decision making process difficult.Since OM find useful in business sectors to improve the quality of the product as well as services,machine learning(ML)and deep learning(DL)models can be considered into account.Besides,the hyperparameters involved in the DL models necessitate proper adjustment process to boost the classification process.Therefore,in this paper,a new Artificial Fish Swarm Optimization with Bidirectional Long Short Term Memory(AFSO-BLSTM)model has been developed for OM process.The major intention of the AFSO-BLSTM model is to effectively mine the opinions present in the textual data.In addition,the AFSO-BLSTM model undergoes pre-processing and TF-IFD based feature extraction process.Besides,BLSTM model is employed for the effectual detection and classification of opinions.Finally,the AFSO algorithm is utilized for effective hyperparameter adjustment process of the BLSTM model,shows the novelty of the work.A complete simulation study of the AFSO-BLSTM model is validated using benchmark dataset and the obtained experimental values revealed the high potential of the AFSO-BLSTM model on mining opinions.
基金the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(NRF-2019R1G1A1003312)the Ministry of Education(NRF-2021R1I1A3052815).
文摘Analyzing Research and Development(R&D)trends is important because it can influence future decisions regarding R&D direction.In typical trend analysis,topic or technology taxonomies are employed to compute the popularities of the topics or codes over time.Although it is simple and effective,the taxonomies are difficult to manage because new technologies are introduced rapidly.Therefore,recent studies exploit deep learning to extract pre-defined targets such as problems and solutions.Based on the recent advances in question answering(QA)using deep learning,we adopt a multi-turn QA model to extract problems and solutions from Korean R&D reports.With the previous research,we use the reports directly and analyze the difficulties in handling them using QA style on Information Extraction(IE)for sentence-level benchmark dataset.After investigating the characteristics of Korean R&D,we propose a model to deal with multiple and repeated appearances of targets in the reports.Accordingly,we propose a model that includes an algorithm with two novel modules and a prompt.A newly proposed methodology focuses on reformulating a question without a static template or pre-defined knowledge.We show the effectiveness of the proposed model using a Korean R&D report dataset that we constructed and presented an in-depth analysis of the benefits of the multi-turn QA model.
基金supported by the Sichuan Science and Technology Program under Grants No.2022YFQ0052 and No.2021YFQ0009.
文摘At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production environments,there are a large number of KGs with a small number of entities and relations,which are called sparse KGs.Limited by the performance of knowledge extraction methods or some other reasons(some common-sense information does not appear in the natural corpus),the relation between entities is often incomplete.To solve this problem,a method of the graph neural network and information enhancement is proposed.The improved method increases the mean reciprocal rank(MRR)and Hit@3 by 1.6%and 1.7%,respectively,when the sparsity of the FB15K-237 dataset is 10%.When the sparsity is 50%,the evaluation indexes MRR and Hit@10 are increased by 0.8%and 1.8%,respectively.
文摘自然语言到结构化查询语言(natural language to structured query language,NL2SQL)任务旨在将自然语言询问转化为数据库可执行的结构化查询语言(structured query language,SQL)语句。本文提出了一种辅助任务增强的中文跨域NL2SQL算法,其核心思想是通过在解码阶段添加辅助任务以结合原始模型来进行多任务训练,提升模型的准确率。辅助任务的设计是通过将数据库模式建模成图,预测自然语言询问与数据库模式图中的节点的依赖关系,显式地建模自然语言询问和数据库模式之间的依赖关系。针对特定的自然语言询问,通过辅助任务的提升,模型能够更好地识别数据库模式中哪些表/列对预测目标SQL更有效。在中文NL2SQL数据集DuSQL上的实验结果表明,添加辅助任务后的算法相对于原始模型取得了更好的效果,能够更好地处理跨域NL2SQL任务。