In machine learning,sentiment analysis is a technique to find and analyze the sentiments hidden in the text.For sentiment analysis,annotated data is a basic requirement.Generally,this data is manually annotated.Manual...In machine learning,sentiment analysis is a technique to find and analyze the sentiments hidden in the text.For sentiment analysis,annotated data is a basic requirement.Generally,this data is manually annotated.Manual annotation is time consuming,costly and laborious process.To overcome these resource constraints this research has proposed a fully automated annotation technique for aspect level sentiment analysis.Dataset is created from the reviews of ten most popular songs on YouTube.Reviews of five aspects—voice,video,music,lyrics and song,are extracted.An N-Gram based technique is proposed.Complete dataset consists of 369436 reviews that took 173.53 s to annotate using the proposed technique while this dataset might have taken approximately 2.07 million seconds(575 h)if it was annotated manually.For the validation of the proposed technique,a sub-dataset—Voice,is annotated manually as well as with the proposed technique.Cohen’s Kappa statistics is used to evaluate the degree of agreement between the two annotations.The high Kappa value(i.e.,0.9571%)shows the high level of agreement between the two.This validates that the quality of annotation of the proposed technique is as good as manual annotation even with far less computational cost.This research also contributes in consolidating the guidelines for the manual annotation process.展开更多
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in...As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.展开更多
在现有的新闻领域标注语料库研究的基础上,结合时政新闻文本的特点,构建了面向时政新闻文本的事件标注语料库(event annotation corpus for current political news,EACPN)。EACPN从事件元素、人物角色及事件子类别等多个层面对21455篇...在现有的新闻领域标注语料库研究的基础上,结合时政新闻文本的特点,构建了面向时政新闻文本的事件标注语料库(event annotation corpus for current political news,EACPN)。EACPN从事件元素、人物角色及事件子类别等多个层面对21455篇时政新闻进行标注,涵盖了128523个事件元素和17919个子类别,整体标注一致性达到85.9%。所构建的EACPN为今后的时政新闻文本事件抽取研究和事件知识图谱构建提供了数据基础。展开更多
In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages o...In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages of postpositive attributive are complex and variable, especially three categories: present participle phrase, past participle phrase, and preposition phrase as postpositire attributive, which always bring the difficulties of automatic parsing. We summarize these categories and annotate the semantic information. Compared with dependency structure, feature structure, being recursive directed graph, enhances semantic information extraction in biomedical field. The annotation results show that recursive directed graph is more suitable to extract complex semantic relations for biomedical text mining.展开更多
为有效地解决当前相关标准和标准数据匮乏的问题,通过分析中文文本中地理空间关系描述的语言特点,提出中文文本的地理空间关系标注体系,并以GATE(General Architecture for Text Engineering)为标注工具,以《中国大百科全书中国地理》...为有效地解决当前相关标准和标准数据匮乏的问题,通过分析中文文本中地理空间关系描述的语言特点,提出中文文本的地理空间关系标注体系,并以GATE(General Architecture for Text Engineering)为标注工具,以《中国大百科全书中国地理》为文本数据源,采用交叉校验方式建立了地理空间关系标注语料库。实现了中文文本中地理空间关系描述的结构化表达,提供了地理空间关系信息抽取的标准化测试数据。展开更多
地理信息的语义解析有效地解决自然语言与地理信息系统之间的语义障碍问题。在分析中文文本和地理信息系统中地理实体描述和表达机制差异的基础上,结合地理命名实体描述的语言特点,制定中文文本的地理命名实体标注体系和标注规范,并以GA...地理信息的语义解析有效地解决自然语言与地理信息系统之间的语义障碍问题。在分析中文文本和地理信息系统中地理实体描述和表达机制差异的基础上,结合地理命名实体描述的语言特点,制定中文文本的地理命名实体标注体系和标注规范,并以GATE(General Architecture for Text Engineering)作为标注平台,构建基于《中国大百科全书中国地理》的大规模标注语料库,以解决当前相关标准和规模化标准数据匮乏的问题。展开更多
文摘In machine learning,sentiment analysis is a technique to find and analyze the sentiments hidden in the text.For sentiment analysis,annotated data is a basic requirement.Generally,this data is manually annotated.Manual annotation is time consuming,costly and laborious process.To overcome these resource constraints this research has proposed a fully automated annotation technique for aspect level sentiment analysis.Dataset is created from the reviews of ten most popular songs on YouTube.Reviews of five aspects—voice,video,music,lyrics and song,are extracted.An N-Gram based technique is proposed.Complete dataset consists of 369436 reviews that took 173.53 s to annotate using the proposed technique while this dataset might have taken approximately 2.07 million seconds(575 h)if it was annotated manually.For the validation of the proposed technique,a sub-dataset—Voice,is annotated manually as well as with the proposed technique.Cohen’s Kappa statistics is used to evaluate the degree of agreement between the two annotations.The high Kappa value(i.e.,0.9571%)shows the high level of agreement between the two.This validates that the quality of annotation of the proposed technique is as good as manual annotation even with far less computational cost.This research also contributes in consolidating the guidelines for the manual annotation process.
文摘As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.
文摘在现有的新闻领域标注语料库研究的基础上,结合时政新闻文本的特点,构建了面向时政新闻文本的事件标注语料库(event annotation corpus for current political news,EACPN)。EACPN从事件元素、人物角色及事件子类别等多个层面对21455篇时政新闻进行标注,涵盖了128523个事件元素和17919个子类别,整体标注一致性达到85.9%。所构建的EACPN为今后的时政新闻文本事件抽取研究和事件知识图谱构建提供了数据基础。
基金Supported by the National Natural Science Foundation of China(61202193,61202304)the Major Projects of Chinese National Social Science Foundation(11&ZD189)the Chinese Postdoctoral Science Foundation(2013M540593,2014T70722)
文摘In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages of postpositive attributive are complex and variable, especially three categories: present participle phrase, past participle phrase, and preposition phrase as postpositire attributive, which always bring the difficulties of automatic parsing. We summarize these categories and annotate the semantic information. Compared with dependency structure, feature structure, being recursive directed graph, enhances semantic information extraction in biomedical field. The annotation results show that recursive directed graph is more suitable to extract complex semantic relations for biomedical text mining.
文摘为有效地解决当前相关标准和标准数据匮乏的问题,通过分析中文文本中地理空间关系描述的语言特点,提出中文文本的地理空间关系标注体系,并以GATE(General Architecture for Text Engineering)为标注工具,以《中国大百科全书中国地理》为文本数据源,采用交叉校验方式建立了地理空间关系标注语料库。实现了中文文本中地理空间关系描述的结构化表达,提供了地理空间关系信息抽取的标准化测试数据。
文摘地理信息的语义解析有效地解决自然语言与地理信息系统之间的语义障碍问题。在分析中文文本和地理信息系统中地理实体描述和表达机制差异的基础上,结合地理命名实体描述的语言特点,制定中文文本的地理命名实体标注体系和标注规范,并以GATE(General Architecture for Text Engineering)作为标注平台,构建基于《中国大百科全书中国地理》的大规模标注语料库,以解决当前相关标准和规模化标准数据匮乏的问题。