期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
DynamicRetriever:A Pre-trained Model-based IR System Without an Explicit Index
1
作者 Yu-Jia Zhou Jing Yao +2 位作者 Zhi-Cheng Dou Ledell Wu ji-rong wen 《Machine Intelligence Research》 EI CSCD 2023年第2期276-288,共13页
Web search provides a promising way for people to obtain information and has been extensively studied.With the surge of deep learning and large-scale pre-training techniques,various neural information retrieval models... Web search provides a promising way for people to obtain information and has been extensively studied.With the surge of deep learning and large-scale pre-training techniques,various neural information retrieval models are proposed,and they have demonstrated the power for improving search(especially,the ranking)quality.All these existing search methods follow a common paradigm,i.e.,index-retrieve-rerank,where they first build an index of all documents based on document terms(i.e.,sparse inverted index)or representation vectors(i.e.,dense vector index),then retrieve and rerank retrieved documents based on the similarity between the query and documents via ranking models.In this paper,we explore a new paradigm of information retrieval without an explicit index but only with a pre-trained model.Instead,all of the knowledge of the documents is encoded into model parameters,which can be regarded as a differentiable indexer and optimized in an end-to-end manner.Specifically,we propose a pre-trained model-based information retrieval(IR)system called DynamicRetriever,which directly returns document identifiers for a given query.Under such a framework,we implement two variants to explore how to train the model from scratch and how to combine the advantages of dense retrieval models.Compared with existing search methods,the model-based IR system parameterizes the traditional static index with a pre-training model,which converts the document semantic mapping into a dynamic and updatable process.Extensive experiments conducted on the public search benchmark Microsoft machine reading comprehension(MS MARCO)verify the effectiveness and potential of our proposed new paradigm for information retrieval. 展开更多
关键词 Information retrieval(IR) document retrieval model-based IR pre-trained language model differentiable search index
原文传递
Editorial for Special Issue on Large-scale Pre-training:Data,Models,and Fine-tuning
2
作者 ji-rong wen ji-rong wen +1 位作者 Zi Huang Hanwang Zhang 《Machine Intelligence Research》 EI CSCD 2023年第2期145-146,共2页
In recent years,there has been a surge of interest and rapid development in large-scale pre-training due to the explosive growth of both data and model parameters.Large-scale training has achieved impressive performan... In recent years,there has been a surge of interest and rapid development in large-scale pre-training due to the explosive growth of both data and model parameters.Large-scale training has achieved impressive performance milestones across a wide range of practical problems,including natural language processing,computer vision,recommendation systems,robotics,and other basic research areas like bioinformatics. 展开更多
关键词 COMPUTER SCALE PARAMETERS
原文传递
Dynamic Shortest Path Monitoring in Spatial Networks 被引量:2
3
作者 Shuo Shang Lisi Chen +2 位作者 Zhe-Wei Wei Dan-Huai Guo ji-rong wen 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第4期637-648,共12页
With the increasing availability of real-time traffic information, dynamic spatial networks are pervasive nowa- days and path planning in dynamic spatial networks becomes an important issue. In this light, we propose ... With the increasing availability of real-time traffic information, dynamic spatial networks are pervasive nowa- days and path planning in dynamic spatial networks becomes an important issue. In this light, we propose and investigate a novel problem of dynamically monitoring shortest paths in spatial networks (DSPM query). When a traveler aims to a des- tination, his/her shortest path to the destination may change due to two reasons: 1) the travel costs of some edges have been updated and 2) the traveler deviates from the pre-planned path. Our target is to accelerate the shortest path computing in dynamic spatial networks, and we believe that this study may be useful in many mobile applications, such as route planning and recommendation, car navigation and tracking, and location-based services in general. This problem is challenging due to two reasons: 1) how to maintain and reuse the existing computation results to accelerate the following computations, and 2) how to prune the search space effectively. To overcome these challenges, filter-and-refinement paradigm is adopted. We maintain an expansion tree and define a pair of upper and lower bounds to prune the search space. A series of optimization techniques are developed to accelerate the shortest path computing. The performance of the developed methods is studied in extensive experiments based on real spatial data. 展开更多
关键词 shortest path dynamic spatial network spatial database location-based service
原文传递
An Experimental Study of Text Representation Methods forCross-Site Purchase Preference Prediction Using the Social Text Data 被引量:2
4
作者 Ting Bai Hong-Jian Dou +2 位作者 Xin Zhao Ding-Yi Yang ji-rong wen 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第4期828-842,共15页
Nowadays, many e-commerce websites allow users to login with their existing social networking accounts. When a new user comes to an e-commerce website, it is interesting to study whether the information from external ... Nowadays, many e-commerce websites allow users to login with their existing social networking accounts. When a new user comes to an e-commerce website, it is interesting to study whether the information from external social media platforms can be utilized to alleviate the cold-start problem. In this paper, we focus on a specific task on cross-site information sharing, i.e., leveraging the text posted by a user on the social media platform (termed as social text) to infer his/her purchase preference of product categories on an e-commerce platform. To solve the task, a key problem is how to effectively represent the social text in a way that its information can be utilized on the e-commerce platform. We study two major kinds of text representation methods for predicting cross-site purchase preference, including shallow textual features and deep textual features learned by deep neural network models. We conduct extensive experiments on a large linked dataset, and our experimental results indicate that it is promising to utilize the social text for predicting purchase preference. Specially, the deep neural network approach has shown a more powerful predictive ability when the number of categories becomes large. 展开更多
关键词 social media e-commerce website purchase preference deep neural network
原文传递
KB4Rec:A Data Set for Linking Knowledge Bases with Recommender Systems 被引量:6
5
作者 Wayne Xin Zhao Gaole He +4 位作者 Kunlin Yang Hongjian Dou Jin Huang Siqi Ouyang ji-rong wen 《Data Intelligence》 2019年第2期121-136,共16页
To develop a knowledge-aware recommender system,a key issue is how to obtain rich and structured knowledge base(KB)information for recommender system(RS)items.Existing data sets or methods either use side information ... To develop a knowledge-aware recommender system,a key issue is how to obtain rich and structured knowledge base(KB)information for recommender system(RS)items.Existing data sets or methods either use side information from original RSs(containing very few kinds of useful information)or utilize a private KB.In this paper,we present KB4Rec v1.0,a data set linking KB information for RSs.It has linked three widely used RS data sets with two popular KBs,namely Freebase and YAGO.Based on our linked data set,we first preform qualitative analysis experiments,and then we discuss the effect of two important factors(i.e.,popularity and recency)on whether a RS item can be linked to a KB entity.Finally,we compare several knowledge-aware recommendation algorithms on our linked data set. 展开更多
关键词 Knowledge-aware recommendation Recommender system Knowledge base
原文传递
Generating timeline summaries with social media attention 被引量:1
6
作者 Wayne Xin ZHAO ji-rong wen Xiaoming LI 《Frontiers of Computer Science》 SCIE EI CSCD 2016年第4期702-716,共15页
Timeline generation is an important research task which can help users to have a quick understanding of the overall evolution of one given topic. Previous methods simply split the time span into fixed, equal time inte... Timeline generation is an important research task which can help users to have a quick understanding of the overall evolution of one given topic. Previous methods simply split the time span into fixed, equal time intervals without studying the role of the evolutionary patterns of the underlying topic in timeline generation. In addition, few of these methods take users' collective interests into considerations to generate timelines. We consider utilizing social media attention to address these two problems due to the facts: 1) social media is an important pool of real users' collective interests; 2) the information cascades generated in it might be good indicators for boundaries of topic phases. Employing Twitter as a basis, we propose to incorporate topic phases and user's collective interests which are learnt from social media into a unified timeline generation algorithm. We construct both one informativeness-oriented and three interestingness-oriented evaluation sets over five topics. We demonstrate that it is very effective to generate both informative and interesting timelines. In addition, our idea naturally leads to a novel presen- tation of timelines, i.e., phase based timelines, which can potentially improve user experience. 展开更多
关键词 TIMELINE social media attention phase users'collective interests
原文传递
Ranking and tagging bursty features in text streams with context language models
7
作者 Wayne Xin ZHAO Chen LIU +1 位作者 ji-rong wen Xiaoming LI 《Frontiers of Computer Science》 SCIE EI CSCD 2017年第5期852-862,共11页
Detecting and using bursty pattems to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to dete... Detecting and using bursty pattems to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging. 展开更多
关键词 bursty features bursty features ranking bursty feature tagging context modeling
原文传递
Index-free triangle-based graph local clustering
8
作者 Zhe YUAN Zhewei WEI +1 位作者 Fangrui LV ji-rong wen 《Frontiers of Computer Science》 SCIE EI 2024年第3期143-153,共11页
Motif-based graph local clustering(MGLC)is a popular method for graph mining tasks due to its various applications.However,the traditional two-phase approach of precomputing motif weights before performing local clust... Motif-based graph local clustering(MGLC)is a popular method for graph mining tasks due to its various applications.However,the traditional two-phase approach of precomputing motif weights before performing local clustering loses locality and is impractical for large graphs.While some attempts have been made to address the efficiency bottleneck,there is still no applicable algorithm for large scale graphs with billions of edges.In this paper,we propose a purely local and index-free method called Index-free Triangle-based Graph Local Clustering(TGLC^(*))to solve the MGLC problem w.r.t.a triangle.TGLC^(*)directly estimates the Personalized PageRank(PPR)vector using random walks with the desired triangleweighted distribution and proposes the clustering result using a standard sweep procedure.We demonstrate TGLC^(*)’s scalability through theoretical analysis and its practical benefits through a novel visualization layout.TGLC^(*)is the first algorithm to solve the MGLC problem without precomputing the motif weight.Extensive experiments on seven real-world large-scale datasets show that TGLC^(*)is applicable and scalable for large graphs. 展开更多
关键词 graph local clustering triangle motif index-free sampling method visualization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部