Knowledge graph(KG)has played an important role in enhancing the performance of many intelligent systems.In this paper,we introduce the solution of building a large-scale multi-source knowledge graph from scratch in S...Knowledge graph(KG)has played an important role in enhancing the performance of many intelligent systems.In this paper,we introduce the solution of building a large-scale multi-source knowledge graph from scratch in Sogou Inc.,including its architecture,technical implementation and applications.Unlike previous works that build knowledge graph with graph databases,we build the knowledge graph on top of SogouQdb,a distributed search engine developed by Sogou Web Search Department,which can be easily scaled to support petabytes of data.As a supplement to the search engine,we also introduce a series of models to support inference and graph based querying.Currently,the data of Sogou knowledge graph that are collected from 136 different websites and constantly updated consist of 54 million entities and over 600 million entity links.We also introduce three applications of knowledge graph in Sogou Inc.:entity detection and linking,knowledge based question answering and knowledge based dialog system.These applications have been used in Web search products to help user acquire information more efficiently.展开更多
Camera-equipped mobile devices are encouraging people to take more photos and the development and growth of social networks is making it increasingly popular to share photos online. When objects appear in overlapping ...Camera-equipped mobile devices are encouraging people to take more photos and the development and growth of social networks is making it increasingly popular to share photos online. When objects appear in overlapping Fields Of View(FOV), this means that they are drawing much attention and thus indicates their popularity. Successfully discovering and locating these objects can be very useful for many applications, such as criminal investigations, event summaries, and crowdsourcing-based Geographical Information Systems(GIS).Existing methods require either prior knowledge of the environment or intentional photographing. In this paper, we propose a seamless approach called 'Spotlight', which performs passive localization using crowdsourced photos.Using a graph-based model, we combine object images across multiple camera views. Within each set of combined object images, a photographing map is built on which object localization is performed using plane geometry. We evaluate the system’s localization accuracy using photos taken in various scenarios, with the results showing our approach to be effective for passive object localization and to achieve a high level of accuracy.展开更多
Identification of significant biological relationships or patterns is central to many metagenomic studies.Methods that estimate association networks have been proposed for this purpose;however,they assume that associa...Identification of significant biological relationships or patterns is central to many metagenomic studies.Methods that estimate association networks have been proposed for this purpose;however,they assume that associations are static,neglecting the fact that relationships in a microbial ecosystem may vary with changes in environmental factors(EFs),which can result in inaccurate estimations.Therefore,in this study,we propose a computational model,called the k-Lognormal-Dirichlet-Multinomial(kLDM)model,which estimates multiple association networks that correspond to specific environmental conditions,and simultaneously infers microbe-microbe and EF-microbe associations for each network.The effectiveness of the kLDM model was demonstrated on synthetic data,a colorectal cancer(CRC)dataset,the Tara Oceans dataset,and the American Gut Project dataset.The results revealed that the widely-used Spearman’s rank correlation coefficient method performed much worse than the other methods,indicating the importance of separating samples by environmental conditions.Cancer fecal samples were then compared with cancer-free samples,and the estimation achieved by kLDM exhibited fewer associations among microbes but stronger associations between specific bacteria,especially five CRC-associated operational taxonomic units,indicating gut microbe translocation in cancer patients.Some EF-dependent associations were then found within a marine eukaryotic community.Finally,the gut microbial heterogeneity of inflammatory bowel disease patients was detected.These results demonstrate that kLDM can elucidate the complex associations within microbial ecosystems.The kLDM program,R,and Python scripts,together with all experimental datasets,are accessible at https://github.com/tinglab/kLDM.git.展开更多
The radiation damage effect of key structural materials is one of the main research subjects of the numerical reactor.From the perspective of experimental safety and feasibility,Molecular Dynamics(MD)in the materials ...The radiation damage effect of key structural materials is one of the main research subjects of the numerical reactor.From the perspective of experimental safety and feasibility,Molecular Dynamics(MD)in the materials field is an ideal method for simulating the radiation damage of structural materials.The Crystal-MD represents a massive parallel MD simulation software based on the key material characteristics of reactors.Compared with the Large-scale Atomic/Molecurlar Massively Parallel Simulator(LAMMPS)and ITAP Molecular Dynamics(IMD)software,the Crystal-MD reduces the memory required for software operation to a certain extent,but it is very time-consuming.Moreover,the calculation results of the Crystal-MD have large deviations,and there are also some problems,such as memory limitation and frequent communication during its migration and optimization.In this paper,in order to solve the above problems,the memory access mode of the Crystal-MD software is studied.Based on the memory access mode,a memory access optimization strategy is proposed for a unique architecture of China’s supercomputer Sunway Taihu Light.The proposed optimization strategy is verified by the experiments,and experimental results show that the running speed of the Crystal-MD is increased significantly by using the proposed optimization strategy.展开更多
文摘Knowledge graph(KG)has played an important role in enhancing the performance of many intelligent systems.In this paper,we introduce the solution of building a large-scale multi-source knowledge graph from scratch in Sogou Inc.,including its architecture,technical implementation and applications.Unlike previous works that build knowledge graph with graph databases,we build the knowledge graph on top of SogouQdb,a distributed search engine developed by Sogou Web Search Department,which can be easily scaled to support petabytes of data.As a supplement to the search engine,we also introduce a series of models to support inference and graph based querying.Currently,the data of Sogou knowledge graph that are collected from 136 different websites and constantly updated consist of 54 million entities and over 600 million entity links.We also introduce three applications of knowledge graph in Sogou Inc.:entity detection and linking,knowledge based question answering and knowledge based dialog system.These applications have been used in Web search products to help user acquire information more efficiently.
文摘Camera-equipped mobile devices are encouraging people to take more photos and the development and growth of social networks is making it increasingly popular to share photos online. When objects appear in overlapping Fields Of View(FOV), this means that they are drawing much attention and thus indicates their popularity. Successfully discovering and locating these objects can be very useful for many applications, such as criminal investigations, event summaries, and crowdsourcing-based Geographical Information Systems(GIS).Existing methods require either prior knowledge of the environment or intentional photographing. In this paper, we propose a seamless approach called 'Spotlight', which performs passive localization using crowdsourced photos.Using a graph-based model, we combine object images across multiple camera views. Within each set of combined object images, a photographing map is built on which object localization is performed using plane geometry. We evaluate the system’s localization accuracy using photos taken in various scenarios, with the results showing our approach to be effective for passive object localization and to achieve a high level of accuracy.
基金supported by the National Natural Science Foundation of China(Grant Nos.61872218,61673241,and 61721003)the Tsinghua-Fuzhou Institute Research ProgramBeijing National Research Center for Information Science and Technology(BNRist),China。
文摘Identification of significant biological relationships or patterns is central to many metagenomic studies.Methods that estimate association networks have been proposed for this purpose;however,they assume that associations are static,neglecting the fact that relationships in a microbial ecosystem may vary with changes in environmental factors(EFs),which can result in inaccurate estimations.Therefore,in this study,we propose a computational model,called the k-Lognormal-Dirichlet-Multinomial(kLDM)model,which estimates multiple association networks that correspond to specific environmental conditions,and simultaneously infers microbe-microbe and EF-microbe associations for each network.The effectiveness of the kLDM model was demonstrated on synthetic data,a colorectal cancer(CRC)dataset,the Tara Oceans dataset,and the American Gut Project dataset.The results revealed that the widely-used Spearman’s rank correlation coefficient method performed much worse than the other methods,indicating the importance of separating samples by environmental conditions.Cancer fecal samples were then compared with cancer-free samples,and the estimation achieved by kLDM exhibited fewer associations among microbes but stronger associations between specific bacteria,especially five CRC-associated operational taxonomic units,indicating gut microbe translocation in cancer patients.Some EF-dependent associations were then found within a marine eukaryotic community.Finally,the gut microbial heterogeneity of inflammatory bowel disease patients was detected.These results demonstrate that kLDM can elucidate the complex associations within microbial ecosystems.The kLDM program,R,and Python scripts,together with all experimental datasets,are accessible at https://github.com/tinglab/kLDM.git.
基金supported by the National Key R&D Program of China(No.2017YFB0202003)。
文摘The radiation damage effect of key structural materials is one of the main research subjects of the numerical reactor.From the perspective of experimental safety and feasibility,Molecular Dynamics(MD)in the materials field is an ideal method for simulating the radiation damage of structural materials.The Crystal-MD represents a massive parallel MD simulation software based on the key material characteristics of reactors.Compared with the Large-scale Atomic/Molecurlar Massively Parallel Simulator(LAMMPS)and ITAP Molecular Dynamics(IMD)software,the Crystal-MD reduces the memory required for software operation to a certain extent,but it is very time-consuming.Moreover,the calculation results of the Crystal-MD have large deviations,and there are also some problems,such as memory limitation and frequent communication during its migration and optimization.In this paper,in order to solve the above problems,the memory access mode of the Crystal-MD software is studied.Based on the memory access mode,a memory access optimization strategy is proposed for a unique architecture of China’s supercomputer Sunway Taihu Light.The proposed optimization strategy is verified by the experiments,and experimental results show that the running speed of the Crystal-MD is increased significantly by using the proposed optimization strategy.