期刊文献+

基于异质信息网的短文本特征扩充方法 被引量:1

Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network
下载PDF
导出
摘要 随着计算机技术深度融入社会生活,越来越多的短文本信息遍布在网络平台上。针对短文本的数据稀疏问题,文中构建了一个鲁棒的异质信息网框架(HTE)来建模短文本,该框架可集成任何类型的附加信息并捕获它们之间的关系,以解决数据稀疏问题。基于该框架利用不同外部知识设计了6种短文本扩充方法,引入Wikipedia知识库和Freebase知识库的实体、实体类别、实体间关系等实体信息和文本主题等文本信息,以丰富短文本特征。最后使用相似性度量结果来验证所提出的短文本特征扩充方法的效果。通过与传统的3种相似性度量方法的6种文本扩充方法以及目前主流的短文本匹配算法在两个短文本数据集上进行比较,结果表明,所提的6种短文本扩充方法均有所提升,最佳方法的相似度度量结果与BERT相比提升了5.97%,证明了所提框架具有鲁棒性,可以包含多种类型的外部知识,能够解决短文本的数据稀疏性问题,以无监督的方式高精度地对短文本进行相似性度量。 With the deep integration of computer technology into social life, more and more short text messages are spreaded all over the web platform.Aiming at the problem of data sparsity of short texts, a robust heterogeneous information network framework(HTE) for modeling short texts, which can integrate any type of additional information and capture the relationship between them to solve the data sparsity problem, is constructed.Based on this framework, six short text expansion methods are designed using different external knowledge, and the short text features are enriched by introducing entity information such as entities, entity categories, inter-entity relationships and textual information such as text topics from Wikipedia and Freebase knowledge bases.Finally, the similarity measurement result is used to verify the experimental effect.By comparing the six text expansion me-thods with the traditional three similarity measures on two short text datasets and the current mainstream short text matching algorithms, the results of the proposed six text expansion methods are improved.Compared with BERT,the similarity measurement results of the best method improves by 5.97%.The proposed framework is robust and can include any type of external know-ledge, and the proposed method can overcome the data sparsity problem of short texts and can perform similarity metrics on short texts with high accuracy in an unsupervised manner.
作者 吕晓锋 赵书良 高恒达 武永亮 张宝奇 LYU Xiao-feng;ZHAO Shu-liang;GAO Heng-da;WU Yong-liang;ZHANG Bao-qi(College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China;Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics&Data Security,Hebei Normal University,Shijiazhuang 050024,China;Hebei Provincial Key Laboratory of Network&Information Security,Hebei Normal University,Shijiazhuang 050024,China;Software College,Hebei Normal University,Shijiazhuang 050024,China;School of Information Science and Technology,Shijiazhuang Tiedao University,Shijiazhuang 050043,China)
出处 《计算机科学》 CSCD 北大核心 2022年第9期92-100,共9页 Computer Science
基金 国家社会科学基金重大项目(13&ZD091,18ZDA200) 河北省重点研发计划项目(20370301D) 河北师范大学重大关键技术攻关项目(L2020K01)。
关键词 异质信息网络 短文本扩充方法 短文本匹配 知识库 元路径 Heterogeneous information network Short text enrichment method Short text matching Knowledge base Meta-path
  • 相关文献

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部