In order to solve the problem that the existing cross-modal entity resolution methods easily ignore the high-level semantic informational correlations between cross-modal data,we propose a novel cross-modal entity res...In order to solve the problem that the existing cross-modal entity resolution methods easily ignore the high-level semantic informational correlations between cross-modal data,we propose a novel cross-modal entity resolution for image and text integrating global and fine-grained joint attention mechanism method.First,we map the cross-modal data to a common embedding space utilizing a feature extraction network.Then,we integrate global joint attention mechanism and fine-grained joint attention mechanism,making the model have the ability to learn the global semantic characteristics and the local fine-grained semantic characteristics of the cross-modal data,which is used to fully exploit the cross-modal semantic correlation and boost the performance of cross-modal entity resolution.Moreover,experiments on Flickr-30K and MS-COCO datasets show that the overall performance of R@sum outperforms by 4.30%and 4.54%compared with 5 state-of-the-art methods,respectively,which can fully demonstrate the superiority of our proposed method.展开更多
With the rapid development of future network, there has been an explosive growth in multimedia data such as web images. Hence, an efficient image retrieval engine is necessary. Previous studies concentrate on the sing...With the rapid development of future network, there has been an explosive growth in multimedia data such as web images. Hence, an efficient image retrieval engine is necessary. Previous studies concentrate on the single concept image retrieval, which has limited practical usability. In practice, users always employ an Internet image retrieval system with multi-concept queries, but, the related existing approaches are often ineffective because the only combination of single-concept query techniques is adopted. At present semantic concept based multi-concept image retrieval is becoming an urgent issue to be solved. In this paper, a novel Multi-Concept image Retrieval Model(MCRM) based on the multi-concept detector is proposed, which takes a multi-concept as a whole and directly learns each multi-concept from the rearranged multi-concept training set. After the corresponding retrieval algorithm is presented, and the log-likelihood function of predictions is maximized by the gradient descent approach. Besides, semantic correlations among single-concepts and multiconcepts are employed to improve the retrieval performance, in which the semantic correlation probability is estimated with three correlation measures, and the visual evidence is expressed by Bayes theorem, estimated by Support Vector Machine(SVM). Experimental results on Corel and IAPR data sets show that the approach outperforms the state-of-the-arts. Furthermore, the model is beneficial for multi-concept retrieval and difficult retrieval with few relevant images.展开更多
With the rapid development of Internet and multimedia technology, cross-media retrieval is concerned to retrieve all the related media objects with multi-modality by submitting a query media object. Unfortunately, the...With the rapid development of Internet and multimedia technology, cross-media retrieval is concerned to retrieve all the related media objects with multi-modality by submitting a query media object. Unfortunately, the complexity and the heterogeneity of multi-modality have posed the following two major challenges for cross-media retrieval: 1) how to construct, a unified and compact model for media objects with multi-modality, 2) how to improve the performance of retrieval for large scale cross-media database. In this paper, we propose a novel method which is dedicate to solving these issues to achieve effective and accurate cross-media retrieval. Firstly, a multi-modality semantic relationship graph (MSRG) is constructed using the semantic correlation amongst the media objects with multi-modality. Secondly, all the media objects in MSRG are mapped onto an isomorphic semantic space. Further, an efficient indexing MK-tree based on heterogeneous data distribution is proposed to manage the media objects within the semantic space and improve the performance of cross-media retrieval. Extensive experiments on real large scale cross-media datasets indicate that our proposal dramatically improves the accuracy and efficiency of cross-media retrieval, outperforming the existing methods significantly.展开更多
This issue of Science China Physics, Mechanics & Astronomy celebrates the Centenary of Einstein's General Theory of Rela- tivity, which changed the way humanity understood the concepts of space, time and matter. Pri...This issue of Science China Physics, Mechanics & Astronomy celebrates the Centenary of Einstein's General Theory of Rela- tivity, which changed the way humanity understood the concepts of space, time and matter. Prior to 1915 Einstein had intro- duced his theory of Special Relativity, and Minkowski had introduced the spacetime metric. General Relativity overthrew the Newtonian idea that space, time and matter were independent, replacing it with the idea that space, time and matter are inex- tricably linked. Within a year of the publication of General Relativity came Schwartzchild's exact solution of Einstein's field equations which describes the spacetime structure of black holes. In 1916 and 1918 Einstein showed that his theory predicted the existence of gravitational waves. Within 7 years, in 1922, Friedmann published a solution for Einstein's field equations applied to a homogeneous universe, uncovering the basic physics of Big Bang cosmology.展开更多
基金the Special Research Fund for the China Postdoctoral Science Foundation(No.2015M582832)the Major National Science and Technology Program(No.2015ZX01040201)the National Natural Science Foundation of China(No.61371196)。
文摘In order to solve the problem that the existing cross-modal entity resolution methods easily ignore the high-level semantic informational correlations between cross-modal data,we propose a novel cross-modal entity resolution for image and text integrating global and fine-grained joint attention mechanism method.First,we map the cross-modal data to a common embedding space utilizing a feature extraction network.Then,we integrate global joint attention mechanism and fine-grained joint attention mechanism,making the model have the ability to learn the global semantic characteristics and the local fine-grained semantic characteristics of the cross-modal data,which is used to fully exploit the cross-modal semantic correlation and boost the performance of cross-modal entity resolution.Moreover,experiments on Flickr-30K and MS-COCO datasets show that the overall performance of R@sum outperforms by 4.30%and 4.54%compared with 5 state-of-the-art methods,respectively,which can fully demonstrate the superiority of our proposed method.
基金supported by National Natural Science Foundation of China(Grant Nos.6137022961370178+4 种基金61272067)National Key Technology R&D Program(Grant No.2013BAH72B01)MOE-China Mobile Research Fund(Grant No.MCM20130651)the Natural Science Foundation of GDP(Grant No.S2013010015178)Science-Technology Project of GDED(Grant No.2012KJCX0037)
文摘With the rapid development of future network, there has been an explosive growth in multimedia data such as web images. Hence, an efficient image retrieval engine is necessary. Previous studies concentrate on the single concept image retrieval, which has limited practical usability. In practice, users always employ an Internet image retrieval system with multi-concept queries, but, the related existing approaches are often ineffective because the only combination of single-concept query techniques is adopted. At present semantic concept based multi-concept image retrieval is becoming an urgent issue to be solved. In this paper, a novel Multi-Concept image Retrieval Model(MCRM) based on the multi-concept detector is proposed, which takes a multi-concept as a whole and directly learns each multi-concept from the rearranged multi-concept training set. After the corresponding retrieval algorithm is presented, and the log-likelihood function of predictions is maximized by the gradient descent approach. Besides, semantic correlations among single-concepts and multiconcepts are employed to improve the retrieval performance, in which the semantic correlation probability is estimated with three correlation measures, and the visual evidence is expressed by Bayes theorem, estimated by Support Vector Machine(SVM). Experimental results on Corel and IAPR data sets show that the approach outperforms the state-of-the-arts. Furthermore, the model is beneficial for multi-concept retrieval and difficult retrieval with few relevant images.
基金supported by the National Natural Science Foundation of China under Grant Nos.61025007,60933001,61100024the National Basic Research 973 Program of China under Grant No.2011CB302200-G+1 种基金the National High Technology Research and Development 863 Program of China under Grant No.2012AA011004the Fundamental Research Funds for the Central Universities of China under Grant No.N110404011
文摘With the rapid development of Internet and multimedia technology, cross-media retrieval is concerned to retrieve all the related media objects with multi-modality by submitting a query media object. Unfortunately, the complexity and the heterogeneity of multi-modality have posed the following two major challenges for cross-media retrieval: 1) how to construct, a unified and compact model for media objects with multi-modality, 2) how to improve the performance of retrieval for large scale cross-media database. In this paper, we propose a novel method which is dedicate to solving these issues to achieve effective and accurate cross-media retrieval. Firstly, a multi-modality semantic relationship graph (MSRG) is constructed using the semantic correlation amongst the media objects with multi-modality. Secondly, all the media objects in MSRG are mapped onto an isomorphic semantic space. Further, an efficient indexing MK-tree based on heterogeneous data distribution is proposed to manage the media objects within the semantic space and improve the performance of cross-media retrieval. Extensive experiments on real large scale cross-media datasets indicate that our proposal dramatically improves the accuracy and efficiency of cross-media retrieval, outperforming the existing methods significantly.
文摘This issue of Science China Physics, Mechanics & Astronomy celebrates the Centenary of Einstein's General Theory of Rela- tivity, which changed the way humanity understood the concepts of space, time and matter. Prior to 1915 Einstein had intro- duced his theory of Special Relativity, and Minkowski had introduced the spacetime metric. General Relativity overthrew the Newtonian idea that space, time and matter were independent, replacing it with the idea that space, time and matter are inex- tricably linked. Within a year of the publication of General Relativity came Schwartzchild's exact solution of Einstein's field equations which describes the spacetime structure of black holes. In 1916 and 1918 Einstein showed that his theory predicted the existence of gravitational waves. Within 7 years, in 1922, Friedmann published a solution for Einstein's field equations applied to a homogeneous universe, uncovering the basic physics of Big Bang cosmology.