期刊文献+

基于多模态特征融合嵌入的相似广告检索方法 被引量:2

A Multi-Modal Feature Fusion Embedding Method for Similar Ad Retrieving
下载PDF
导出
摘要 随着互联网人工智能技术的飞速发展,学习用户特征并精准投放广告能够显著提升广告的点击率(Click-Through-Rate,CTR)与转化率(Conversion Rate,CVR).人群智能定向是解决广告投放问题中极其重要的一环,其业界主流方法是使用转化用户和非转化用户训练基于用户特征的判断其是否会成为转化用户的分类模型.这个分类器的优劣依赖广告的实际转化人群规模,规模越大,越能准确判断.但在实际应用中通常面临某些广告转化人群不足的问题,本文利用在学术与工业场景占据重要研究地位的基于内容的检索技术来扩充相似广告集合,从而扩充对应转化人群.现有的单模态检索方案只关注于单个模态的特征(文本/图像),忽视了不同模态间的内在共有联系,使得挖掘出的广告特征不全且包含大量噪声,最终导致相似广告的检索结果质量不高,从而导致相似转化人群的扩充质量低下.而近年来兴起的跨模态检索方案主要关注以文搜图或以图搜文,并且没有考虑到通用目标检测器并不适用于特定领域图像数据这一事实.为解决这些问题,本文提出一种以广告分类为基本训练目标的多模态商品广告特征融合建模方法,以提升相似广告检索的效果.具体来说,本文使用Transformer模型提取文本语义特征,使用目标检测YOLO模型挖掘图像中细粒度的视觉特征,并结合文本注意力机制识别图像中与商品相关的目标,以降低无关目标给广告特征带来的噪声影响.同时,本文提出了一种多模态融合注意力机制,以高效融合广告文本和图像特征.该模型命名为ToTYEmb(Text oriented Transformer-Yolo fusion Embedding).另外,本文还提出了一种算法框架,将相似广告扩充、转化人群扩充加入到现有的人群智能定向工作流中.实验结果表明,较多个基线模型,本文方案有效提升了相似商品广告的检索质量,避免了很多由单模态信息带来的错误.同时离线人群定向更新实验表明本文提出的利用相似广告扩充转化人群确实能在很大程度上优化现有的人群智能定向算法. With the rapid development of Internet artificial intelligence technology,learning user characteristics and accurately placing advertisements can significantly increase the click-through rate(CTR)and conversion rate(CVR)of advertisements.Targeting the potential customers of a product from the crowd automatically is an extremely important part of solving the problem of advertising.The mainstream method in the industry is to use converted users and non-converted users to train a classification model based on user characteristics for judging whether a given use would be buy the product of the ads or not.The performance of this classifier largely depends on the actual converted population scale of advertising.The larger the scale,the more accurate the judgment can be.However,in practical applications,it usually faces the problem of insufficient specific converted users.This paper innovatively uses the widely studied content-based retrieval technology to expand the collection of similar ads of a given ad and treats the converted users of these similar advertisements as the converted users of the ad,thereby expanding the corresponding conversion population.However,on one hand,existing monomodal retrieval methods focus on the features of a single modal(text or image),and ignore the inherent relationship between different modals,resulting in insufficient features and a large amount of noise.On the other hand,the cross-modal retrieval schemes that have emerged in recent years mainly focus on searching images by text or searching text by images,and do not take into account the fact that general object detectors are not suitable for image data in specific fields.In order to solve these problems,this paper proposes a multi-modal feature fusion model with the training goal of classifiying ads to correct types to extract comprehensive features to improve the performance of similar ads retrieval.To be specific,we adopt Transformer to extract semantic information from text,and employ YOLO to mine fine-grained visual features from images.Besides,for mining text-image relation,a text-based attention mechanism is used to extract product-related objects in images to reduce the noise impact of irrelevant items.We also adopt a multimodal fusion layer to fuse text and image features.The proposed entire model names ToTYEmb(Text oriented Transformer-Yolo fusion Embedding).With classification target,ToTYEmb would generate useful embeddings for each ad and it will be used for searching similar ones.In addition,we also proposes an algorithmic framework that embeds similar advertising expansion and conversion crowd expansion into the existing crowd intelligent targeting workflow.In the experiments,we compared the classification accuracies and recall precisions of similar ads of the proposed ToTYEmb with the baseline classification models and multi modal pre-training models,and analyzed the impact of different modules and the number of selected image areas on the recall of similar ads.Experimental results prove that the proposed method effectively improves the retrieval quality of similar ads and avoids many errors caused by incorrect single-modal information.At the same time,the offline target crowd update experiment shows that the proposed method can indeed optimize the existing intelligent crowd targeting algorithm to a large extent.
作者 冯奕 周晓松 李传艺 王挺 葛季栋 胡雨成 张小鹏 骆斌 FENG Yi;ZHOU Xiao-Song;LI Chuan-Yi;WANG Ting;GE Ji-Dong;HU Yu-Cheng;ZHANG Xiao-Peng;LUO Bin(State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210046;Software Institute,Nanjing University,Nanjing 210093;Tencent,Shenzhen,Guangdong 518000)
出处 《计算机学报》 EI CAS CSCD 北大核心 2022年第7期1500-1516,共17页 Chinese Journal of Computers
基金 国家自然科学基金青年科学基金(61802167)和腾讯公司资助.
关键词 多模态特征融合 相似广告检索 TRANSFORMER 注意力机制 multi-modal feature fusion similar ads retrieval Transformer attention mechanism
  • 相关文献

参考文献1

共引文献3

同被引文献16

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部