Joint learning of words and entities is advantageous to various NLP tasks, while most of the works focus on single language setting. Cross-lingual representations learning receives high attention recently, but is stil...Joint learning of words and entities is advantageous to various NLP tasks, while most of the works focus on single language setting. Cross-lingual representations learning receives high attention recently, but is still restricted by the availability of parallel data. In this paper, a method is proposed to jointly embed texts and entities on comparable data. In addition to evaluate on public semantic textual similarity datasets, a task (cross-lingual text extraction) was proposed to assess the similarities between texts and contribute to this dataset. It shows that the proposed method outperforms cross-lingual representations methods using parallel data on cross-lingual tasks, and achieves competitive results on mono-lingual tasks.展开更多
Satellite remote sensing,characterized by extensive coverage,fre-quent revisits,and continuous monitoring,provides essential data support for addressing global challenges.Over the past six decades,thousands of Earth o...Satellite remote sensing,characterized by extensive coverage,fre-quent revisits,and continuous monitoring,provides essential data support for addressing global challenges.Over the past six decades,thousands of Earth observation satellites and sensors have been deployed worldwide.These valuable Earth observation assets are contributed independently by various nations and organizations employing diverse methodologies.This poses a significant challenge in effectively discovering global Earth observation resources and realizing their full potential.In this paper,we describe the develop-ment of GEOSatDB,the most complete semantic database of civil Earth observation satellites developed based on a unified ontology model.A similarity matching method is used to integrate satellite information and a prompt strategy is used to extract unstructured sensor information.The resulting semantic database contains 127,949 semantic statements for 2,340 remote sensing satellites and 1,021 observation sensors.The global Earth observation capabil-ities of 195 countries worldwide have been analyzed in detail,and a concrete use case along with an associated query demonstration is presented.This database provides significant value in effectively facilitating the semantic understanding and sharing of Earth observa-tion resources.展开更多
Knowledge bases(KBs)are often greatly incomplete,necessitating a demand for KB completion.Although XLORE is an English-Chinese bilingual knowledge graph,there are only 423,974 cross-lingual links between English insta...Knowledge bases(KBs)are often greatly incomplete,necessitating a demand for KB completion.Although XLORE is an English-Chinese bilingual knowledge graph,there are only 423,974 cross-lingual links between English instances and Chinese instances.We present XLORE2,an extension of the XLORE that is built automatically from Wikipedia,Baidu Baike and Hudong Baike.We add more facts by making cross-lingual knowledge linking,cross-lingual property matching and fine-grained type inference.We also design an entity linking system to demonstrate the effectiveness and broad coverage of XLORE2.展开更多
Cognitive diagnosis,which aims to diagnose students’knowledge proficiency,is crucial for numerous online education applications,such as personalized exercise recommendation.Existing methods in this area mainly exploi...Cognitive diagnosis,which aims to diagnose students’knowledge proficiency,is crucial for numerous online education applications,such as personalized exercise recommendation.Existing methods in this area mainly exploit students’exercising records,which ignores students’full learning process in online education systems.Besides,the latent relation of exercises with course structure and texts is still underexplored.In this paper,a learning behavior-aware cognitive diagnosis(LCD)framework is proposed for students’cognitive modeling with both learning behavior records and exercising records.The concept of LCD was first introduced to characterize students’knowledge proficiency more completely.Second,a course graph was designed to explore rich information existed in course texts and structures.Third,an interaction function was put forward to explore complex relationships between students,exercises and videos.Extensive experiments on a real-world dataset prove that LCD predicts student performance more effectively,the output of LCD is also interpretable.展开更多
Word embedding has drawn a lot of attention due to its usefulness in many NLP tasks. So far a handful of neural-network based word embedding algorithms have been proposed without considering the effects of pronouns in...Word embedding has drawn a lot of attention due to its usefulness in many NLP tasks. So far a handful of neural-network based word embedding algorithms have been proposed without considering the effects of pronouns in the training corpus. In this paper, we propose using co-reference resolution to improve the word embedding by extracting better context. We evaluate four word embeddings with considerations of co-reference resolution and compare the quality of word embedding on the task of word analogy and word similarity on multiple data sets.Experiments show that by using co-reference resolution, the word embedding performance in the word analogy task can be improved by around 1.88%. We find that the words that are names of countries are affected the most,which is as expected.展开更多
With the development of tourism knowledge graphs(KGs),recommendation,question answering(QA)and other functions under its support enable various applications to better understand users and provide services.Existing Chi...With the development of tourism knowledge graphs(KGs),recommendation,question answering(QA)and other functions under its support enable various applications to better understand users and provide services.Existing Chinese tourism KGs do not contain enough entity information and relations.Besides,the knowledge storage usually contains only the text modality but lacks other modalities such as images.In this paper,a multi-modal Chinese tourism knowledge graph(MCTKG)is proposed based on Beijing tourist attractions to support QA and help tourists plan tourism routes.An MCTKG ontology was constructed to maintain the semantic consistency of heterogeneous data sources.To increase the number of entities and relations related to the tourist attractions in MCTKG,entities were automatically expanded belonging to the concepts of building,organization,relic,and person based on Baidu Encyclopedia.In addition,based on the types of tourist attractions and the styles of tourism route,a tourism route generation algorithm was proposed,which can automatically schedule the tourism routes by incorporating tourist attractions and the route style.Experimental results show that the generated tourist routes have similar satisfaction comparedwith the tourism routes crawled from specific travel websites.展开更多
The World Wide Web (WWW) has revolutionized many aspects of society in the past two decades. Emerging applications of the World Wide Web have not only fostered industrial development worth billions of dollars, but h...The World Wide Web (WWW) has revolutionized many aspects of society in the past two decades. Emerging applications of the World Wide Web have not only fostered industrial development worth billions of dollars, but have also lead to research in a number of disciplines including computer science, social science, mathematics, and economics.展开更多
文摘Joint learning of words and entities is advantageous to various NLP tasks, while most of the works focus on single language setting. Cross-lingual representations learning receives high attention recently, but is still restricted by the availability of parallel data. In this paper, a method is proposed to jointly embed texts and entities on comparable data. In addition to evaluate on public semantic textual similarity datasets, a task (cross-lingual text extraction) was proposed to assess the similarities between texts and contribute to this dataset. It shows that the proposed method outperforms cross-lingual representations methods using parallel data on cross-lingual tasks, and achieves competitive results on mono-lingual tasks.
基金supported by the Major Program of the National Natural Science Foundation of China[42090015].
文摘Satellite remote sensing,characterized by extensive coverage,fre-quent revisits,and continuous monitoring,provides essential data support for addressing global challenges.Over the past six decades,thousands of Earth observation satellites and sensors have been deployed worldwide.These valuable Earth observation assets are contributed independently by various nations and organizations employing diverse methodologies.This poses a significant challenge in effectively discovering global Earth observation resources and realizing their full potential.In this paper,we describe the develop-ment of GEOSatDB,the most complete semantic database of civil Earth observation satellites developed based on a unified ontology model.A similarity matching method is used to integrate satellite information and a prompt strategy is used to extract unstructured sensor information.The resulting semantic database contains 127,949 semantic statements for 2,340 remote sensing satellites and 1,021 observation sensors.The global Earth observation capabil-ities of 195 countries worldwide have been analyzed in detail,and a concrete use case along with an associated query demonstration is presented.This database provides significant value in effectively facilitating the semantic understanding and sharing of Earth observa-tion resources.
基金National Natural Science Foundation of China(NSFC)key project(No.61533018,No.U1736204 and No.61661146007)Ministry of Education and China Mobile Research Fund(No.20181770250)and THUNUS NExT Co-Lab.
文摘Knowledge bases(KBs)are often greatly incomplete,necessitating a demand for KB completion.Although XLORE is an English-Chinese bilingual knowledge graph,there are only 423,974 cross-lingual links between English instances and Chinese instances.We present XLORE2,an extension of the XLORE that is built automatically from Wikipedia,Baidu Baike and Hudong Baike.We add more facts by making cross-lingual knowledge linking,cross-lingual property matching and fine-grained type inference.We also design an entity linking system to demonstrate the effectiveness and broad coverage of XLORE2.
基金This work is supported by the National Key Research and Development Program of China(2018YFB1005100)It also got partial support from National Engineering Laboratory for Cyberlearning and Intelligent Technology,and Beijing Key Lab of Networked Multimedia.
文摘Cognitive diagnosis,which aims to diagnose students’knowledge proficiency,is crucial for numerous online education applications,such as personalized exercise recommendation.Existing methods in this area mainly exploit students’exercising records,which ignores students’full learning process in online education systems.Besides,the latent relation of exercises with course structure and texts is still underexplored.In this paper,a learning behavior-aware cognitive diagnosis(LCD)framework is proposed for students’cognitive modeling with both learning behavior records and exercising records.The concept of LCD was first introduced to characterize students’knowledge proficiency more completely.Second,a course graph was designed to explore rich information existed in course texts and structures.Third,an interaction function was put forward to explore complex relationships between students,exercises and videos.Extensive experiments on a real-world dataset prove that LCD predicts student performance more effectively,the output of LCD is also interpretable.
基金supported by the National HighTech Research and Development(863)Program(No.2015AA015401)the National Natural Science Foundation of China(Nos.61533018 and 61402220)+2 种基金the State Scholarship Fund of CSC(No.201608430240)the Philosophy and Social Science Foundation of Hunan Province(No.16YBA323)the Scientific Research Fund of Hunan Provincial Education Department(Nos.16C1378 and 14B153)
文摘Word embedding has drawn a lot of attention due to its usefulness in many NLP tasks. So far a handful of neural-network based word embedding algorithms have been proposed without considering the effects of pronouns in the training corpus. In this paper, we propose using co-reference resolution to improve the word embedding by extracting better context. We evaluate four word embeddings with considerations of co-reference resolution and compare the quality of word embedding on the task of word analogy and word similarity on multiple data sets.Experiments show that by using co-reference resolution, the word embedding performance in the word analogy task can be improved by around 1.88%. We find that the words that are names of countries are affected the most,which is as expected.
基金This work is supported by the National Key Research and Development Program of China(2017YFB1002101)NSFC Key Project(U1736204)a grant from Beijing Academy of Artificial Intelligence(BAAI2019ZD0502).
文摘With the development of tourism knowledge graphs(KGs),recommendation,question answering(QA)and other functions under its support enable various applications to better understand users and provide services.Existing Chinese tourism KGs do not contain enough entity information and relations.Besides,the knowledge storage usually contains only the text modality but lacks other modalities such as images.In this paper,a multi-modal Chinese tourism knowledge graph(MCTKG)is proposed based on Beijing tourist attractions to support QA and help tourists plan tourism routes.An MCTKG ontology was constructed to maintain the semantic consistency of heterogeneous data sources.To increase the number of entities and relations related to the tourist attractions in MCTKG,entities were automatically expanded belonging to the concepts of building,organization,relic,and person based on Baidu Encyclopedia.In addition,based on the types of tourist attractions and the styles of tourism route,a tourism route generation algorithm was proposed,which can automatically schedule the tourism routes by incorporating tourist attractions and the route style.Experimental results show that the generated tourist routes have similar satisfaction comparedwith the tourism routes crawled from specific travel websites.
文摘The World Wide Web (WWW) has revolutionized many aspects of society in the past two decades. Emerging applications of the World Wide Web have not only fostered industrial development worth billions of dollars, but have also lead to research in a number of disciplines including computer science, social science, mathematics, and economics.