The building of data mashups is complicated and error-prone, because this process requires not only finding suitable APIs but also combining them in an appropriate way to get the desired result. This paper describes a...The building of data mashups is complicated and error-prone, because this process requires not only finding suitable APIs but also combining them in an appropriate way to get the desired result. This paper describes an ontology-driven mashup auto-completion approach for a data API network to facilitate this task. First, a microformats-based ontology was defined to describe the attributes and activities of the data APIs. A semantic Bayesian network (sBN) and a semantic graph template were used for the link prediction on the Semantic Web and to construct a data API network denoted as Np. The performance is improved by a semi-supervised learning method which uses both labeled and unlabeled data. Then, this network is used to build an ontology-driven mashup auto-completion system to help users build mashups by providing three kinds of recommendations. Tests demonstrate that the approach has a precisionp of about 80%, recallp of about 60%, and F0.5 of about 70% for predicting links between APIs. Compared with the API network Ne com-posed of existing links on the current Web, Np contains more links including those that should but do not exist. The ontology-driven mashup auto-completion system gives a much better recallr and discounted cumula-tive gain (DCG) on Np than on Ne. The tests suggest that this approach gives users more creativity by constructing the API network through predicting mashup APIs rather than using only existing links on the Web.展开更多
Query auto-completion(QAC)facilitates query formulation by predicting completions for given query prefix inputs.Most web search engines use behavioral signals to customize query completion lists for users.To be effect...Query auto-completion(QAC)facilitates query formulation by predicting completions for given query prefix inputs.Most web search engines use behavioral signals to customize query completion lists for users.To be effective,such personalized QAC models rely on the access to suffcient context about each user’s interest and intentions.Hence,they often suffer from data sparseness problems.For this reason,we propose the construction and application of cohorts to address context sparsity and to enhance QAC personalization.We build an individual’s interest profile by learning his/her topic preferences through topic models and then aggregate users who share similar profiles.As conventional topic models are unable to automatically learn cohorts,we propose two cohort topic models that handle topic modeling and cohort discovery in the same framework.We present four cohortbased personalized QAC models that employ four different cohort discovery strategies.Our proposals use cohorts’contextual information together with query frequency to rank completions.We perform extensive experiments on the publicly available AOL query log and compare the ranking effectiveness with that of models that discard cohort contexts.Experimental results suggest that our cohort-based personalized QAC models can solve the sparseness problem and yield significant relevance improvement over competitive baselines.展开更多
基金Supported by the National Natural Science Foundation of China(No. 61070156)Special Youth Research and Innovation Programs (Nos.2009QNA5025 and 2010QNA5044)IBM-ZJU Joint Research Projects
文摘The building of data mashups is complicated and error-prone, because this process requires not only finding suitable APIs but also combining them in an appropriate way to get the desired result. This paper describes an ontology-driven mashup auto-completion approach for a data API network to facilitate this task. First, a microformats-based ontology was defined to describe the attributes and activities of the data APIs. A semantic Bayesian network (sBN) and a semantic graph template were used for the link prediction on the Semantic Web and to construct a data API network denoted as Np. The performance is improved by a semi-supervised learning method which uses both labeled and unlabeled data. Then, this network is used to build an ontology-driven mashup auto-completion system to help users build mashups by providing three kinds of recommendations. Tests demonstrate that the approach has a precisionp of about 80%, recallp of about 60%, and F0.5 of about 70% for predicting links between APIs. Compared with the API network Ne com-posed of existing links on the current Web, Np contains more links including those that should but do not exist. The ontology-driven mashup auto-completion system gives a much better recallr and discounted cumula-tive gain (DCG) on Np than on Ne. The tests suggest that this approach gives users more creativity by constructing the API network through predicting mashup APIs rather than using only existing links on the Web.
基金the National Natural Science Foundation of China(No.61702526)the Defense Industrial Technology Development Program of China(No.JCKY2017204B064)the National Advanced Research Project of China(No.6141B0801010b)。
文摘Query auto-completion(QAC)facilitates query formulation by predicting completions for given query prefix inputs.Most web search engines use behavioral signals to customize query completion lists for users.To be effective,such personalized QAC models rely on the access to suffcient context about each user’s interest and intentions.Hence,they often suffer from data sparseness problems.For this reason,we propose the construction and application of cohorts to address context sparsity and to enhance QAC personalization.We build an individual’s interest profile by learning his/her topic preferences through topic models and then aggregate users who share similar profiles.As conventional topic models are unable to automatically learn cohorts,we propose two cohort topic models that handle topic modeling and cohort discovery in the same framework.We present four cohortbased personalized QAC models that employ four different cohort discovery strategies.Our proposals use cohorts’contextual information together with query frequency to rank completions.We perform extensive experiments on the publicly available AOL query log and compare the ranking effectiveness with that of models that discard cohort contexts.Experimental results suggest that our cohort-based personalized QAC models can solve the sparseness problem and yield significant relevance improvement over competitive baselines.