摘要
By efficiently and accurately predicting the adoptability of pets,shelters and rescuers can be positively guided on improving attraction of pet profiles,reducing animal suffering and euthanization.Previous prediction methods usually only used a single type of content for training.However,many pets contain not only textual content,but also images.To make full use of textual and visual information,this paper proposed a novel method to process pets that contain multimodal information.We employed several CNN(Convolutional Neural Network)based models and other methods to extract features from images and texts to obtain the initial multimodal representation,then reduce the dimensions and fuse them.Finally,we trained the fused features with two GBDT(Gradient Boosting Decision Tree)based models and a Neural Network(NN)and compare the performance of them and their ensemble.The evaluation result demonstrates that the proposed ensemble learning can improve the accuracy of prediction.
基金
This work is supported by The National Key Research and Development Program of China(2018YFB1800202,2016YFB1000302,SQ2019ZD090149,2018YFB0204301).