基于多模态特征融合嵌入的相似广告检索方法被引量：2

A Multi-Modal Feature Fusion Embedding Method for Similar Ad Retrieving

下载PDF

导出

摘要随着互联网人工智能技术的飞速发展,学习用户特征并精准投放广告能够显著提升广告的点击率(Click-Through-Rate,CTR)与转化率(Conversion Rate,CVR).人群智能定向是解决广告投放问题中极其重要的一环,其业界主流方法是使用转化用户和非转化用户训练基于用户特征的判断其是否会成为转化用户的分类模型.这个分类器的优劣依赖广告的实际转化人群规模,规模越大,越能准确判断.但在实际应用中通常面临某些广告转化人群不足的问题,本文利用在学术与工业场景占据重要研究地位的基于内容的检索技术来扩充相似广告集合,从而扩充对应转化人群.现有的单模态检索方案只关注于单个模态的特征(文本/图像),忽视了不同模态间的内在共有联系,使得挖掘出的广告特征不全且包含大量噪声,最终导致相似广告的检索结果质量不高,从而导致相似转化人群的扩充质量低下.而近年来兴起的跨模态检索方案主要关注以文搜图或以图搜文,并且没有考虑到通用目标检测器并不适用于特定领域图像数据这一事实.为解决这些问题,本文提出一种以广告分类为基本训练目标的多模态商品广告特征融合建模方法,以提升相似广告检索的效果.具体来说,本文使用Transformer模型提取文本语义特征,使用目标检测YOLO模型挖掘图像中细粒度的视觉特征,并结合文本注意力机制识别图像中与商品相关的目标,以降低无关目标给广告特征带来的噪声影响.同时,本文提出了一种多模态融合注意力机制,以高效融合广告文本和图像特征.该模型命名为ToTYEmb(Text oriented Transformer-Yolo fusion Embedding).另外,本文还提出了一种算法框架,将相似广告扩充、转化人群扩充加入到现有的人群智能定向工作流中.实验结果表明,较多个基线模型,本文方案有效提升了相似商品广告的检索质量,避免了很多由单模态信息带来的错误.同时离线人群定向更新实验表明本文提出的利用相似广告扩充转化人群确实能在很大程度上优化现有的人群智能定向算法. With the rapid development of Internet artificial intelligence technology,learning user characteristics and accurately placing advertisements can significantly increase the click-through rate(CTR)and conversion rate(CVR)of advertisements.Targeting the potential customers of a product from the crowd automatically is an extremely important part of solving the problem of advertising.The mainstream method in the industry is to use converted users and non-converted users to train a classification model based on user characteristics for judging whether a given use would be buy the product of the ads or not.The performance of this classifier largely depends on the actual converted population scale of advertising.The larger the scale,the more accurate the judgment can be.However,in practical applications,it usually faces the problem of insufficient specific converted users.This paper innovatively uses the widely studied content-based retrieval technology to expand the collection of similar ads of a given ad and treats the converted users of these similar advertisements as the converted users of the ad,thereby expanding the corresponding conversion population.However,on one hand,existing monomodal retrieval methods focus on the features of a single modal(text or image),and ignore the inherent relationship between different modals,resulting in insufficient features and a large amount of noise.On the other hand,the cross-modal retrieval schemes that have emerged in recent years mainly focus on searching images by text or searching text by images,and do not take into account the fact that general object detectors are not suitable for image data in specific fields.In order to solve these problems,this paper proposes a multi-modal feature fusion model with the training goal of classifiying ads to correct types to extract comprehensive features to improve the performance of similar ads retrieval.To be specific,we adopt Transformer to extract semantic information from text,and employ YOLO to mine fine-grained visual features from images.Besides,for mining text-image relation,a text-based attention mechanism is used to extract product-related objects in images to reduce the noise impact of irrelevant items.We also adopt a multimodal fusion layer to fuse text and image features.The proposed entire model names ToTYEmb(Text oriented Transformer-Yolo fusion Embedding).With classification target,ToTYEmb would generate useful embeddings for each ad and it will be used for searching similar ones.In addition,we also proposes an algorithmic framework that embeds similar advertising expansion and conversion crowd expansion into the existing crowd intelligent targeting workflow.In the experiments,we compared the classification accuracies and recall precisions of similar ads of the proposed ToTYEmb with the baseline classification models and multi modal pre-training models,and analyzed the impact of different modules and the number of selected image areas on the recall of similar ads.Experimental results prove that the proposed method effectively improves the retrieval quality of similar ads and avoids many errors caused by incorrect single-modal information.At the same time,the offline target crowd update experiment shows that the proposed method can indeed optimize the existing intelligent crowd targeting algorithm to a large extent.

作者冯奕周晓松李传艺王挺葛季栋胡雨成张小鹏骆斌 FENG Yi;ZHOU Xiao-Song;LI Chuan-Yi;WANG Ting;GE Ji-Dong;HU Yu-Cheng;ZHANG Xiao-Peng;LUO Bin(State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210046;Software Institute,Nanjing University,Nanjing 210093;Tencent,Shenzhen,Guangdong 518000)

机构地区南京大学计算机软件新技术国家重点实验室南京大学软件学院深圳市腾讯计算机系统有限公司

出处《计算机学报》 EI CAS CSCD 北大核心 2022年第7期1500-1516,共17页 Chinese Journal of Computers

基金国家自然科学基金青年科学基金(61802167)和腾讯公司资助.

关键词多模态特征融合相似广告检索 TRANSFORMER 注意力机制 multi-modal feature fusion similar ads retrieval Transformer attention mechanism

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1赵其鲁,李宗民.跨模态社交图像聚类[J].计算机学报,2018,41(1):98-111. 被引量：4

共引文献3

1闫小强,叶阳东.共享和私有信息最大化的跨媒体聚类[J].计算机研究与发展,2019,56(7):1370-1382. 被引量：3
2刘淑伟,陈威,赵伟,陈进才,卢萍.基于簇内乘积量化的最近邻检索方法[J].计算机学报,2020,43(2):303-314. 被引量：6
3郭丹,姚沈涛,王辉,汪萌.嵌入局部聚类描述符的视频问答Transformer模型[J].计算机学报,2023,46(4):671-689. 被引量：1

同被引文献16

1张海瑜,陈庆龙,张斯静,张子怡,杨帆,李鑫星.基于语义知识图谱的农业知识智能检索方法[J].农业机械学报,2021,52(S01):156-163. 被引量：12
2曹旭辉.基于神经网络的建筑工程造价预测研究[J].城市建设理论研究（电子版）,2018,8(15):42-42. 被引量：1
3汤辉.建筑工程造价管理系统的分析与设计研究[J].中国住宅设施,2020,0(3):40-41. 被引量：2
4焦科名,徐凯.基于神经网络和模糊控制的光伏发电MPPT研究[J].计算机仿真,2020,37(5):71-76. 被引量：13
5罗永康.人工神经网络在建筑工程造价预算中的应用[J].山西建筑,2020,46(12):179-180. 被引量：8
6李俊达,李远富,王广开.基于CBR的公路工程造价估算模型[J].公路交通科技,2020,37(6):44-49. 被引量：16
7刘欲意,郭海涛.矿山工程建设项目造价管理信息系统设计[J].矿冶工程,2020,40(3):145-149. 被引量：8
8牛衍亮,高立扬,段晓晨,赵丹.基于余弦相似度-神经网络模型的高铁土建工程造价估算[J].土木工程与管理学报,2020,37(5):82-87. 被引量：16
9王德美,陈慧,肖之鸿,夏松林,范淑倩,崔常辉,张清华.基于数据挖掘的住宅工程造价预测[J].土木工程与管理学报,2021,38(1):175-182. 被引量：25
10马良玉,郑佳奕.基于动态模糊神经网络的超临界机组协调控制[J].华北电力大学学报（自然科学版）,2021,48(2):96-103. 被引量：7

引证文献2

1赖震宇.基于动态模糊神经网络的工程造价估算系统设计[J].信息与电脑,2022,34(20):134-136. 被引量：3
2刘红.基于语义相似度匹配的C语言课程教学资源在线检索方法[J].信息与电脑,2023,35(15):234-236. 被引量：2

二级引证文献5

1林锦全.基于模糊指数平滑法的工程造价估算方法[J].建筑与预算,2023(8):19-21.
2董建伟.工程造价数字化估算系统探究[J].工程造价管理,2023(6):91-97. 被引量：2
3唐志佳.建筑工程造价数字化估算系统研究[J].石油化工建设,2024,46(5):34-36.
4赵鑫.基于离散监督哈希算法的非结构化网络数据库在线检索方法[J].现代计算机,2024,30(12):47-51.
5杨彤.基于数字化多媒体技术的海量音乐教育网络资源智能检索[J].无线互联科技,2024,21(14):106-108.

1谭兆亮.小学数学解题中的“柳暗花明”——转化思想的巧妙运用[J].小学时代,2020(32):61-61.
2彭雪,赵辉,郑肇谦,庞海婷.融合多种嵌入表示的中文命名实体识别[J].长春工业大学学报,2022,43(1):81-90.
3裴焱栋,顾克江.基于内容和语义的三维模型检索综述[J].计算机应用,2020,40(7):1863-1872. 被引量：5
4李晓冬,李育强,宋元凤,侯孟书.新的基于融合向量的DGA域名检测方法[J].计算机应用研究,2022,39(6):1834-1837. 被引量：7
5孙秋花.俄汉隐喻翻译之语际思维面式转化机制研究[J].解放军外国语学院学报,2021,44(5):128-136.
6康丽丽.业财融合在企业中的应用研究[J].新金融世界,2022,21(4):52-54.
7高燕,夏霏.视觉信息位姿估计在火箭起飞漂移量中的应用[J].无线电工程,2021,51(4):302-307. 被引量：2
8张慧,韩新宁,韩惠丽.基于滚动引导滤波的红外与可见光图像融合[J].红外技术,2022,44(6):598-603. 被引量：4
9蓝文婷,焦烜.地方院校商科专业“四链融合”育人模式构建研究[J].广西教育,2022(15):97-100.
10吴聪聪.翻译技术之术语知识库的发展趋势[J].新经济,2022(6):37-41.

计算机学报

2022年第7期

浏览历史

内容加载中请稍等...

基于多模态特征融合嵌入的相似广告检索方法被引量：2

参考文献1

共引文献3

同被引文献16

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

基于多模态特征融合嵌入的相似广告检索方法 被引量：2

参考文献1

共引文献3

同被引文献16

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

基于多模态特征融合嵌入的相似广告检索方法被引量：2