基于查询—文档异构信息网络的半监督学习被引量：2

Semi-supervised learning by constructing query-document heterogeneous information network

下载PDF

导出

摘要基于图的半监督学习近年来得到了广泛的研究,然而,现有的半监督学习算法大都只能应用于同构网络。根据查询及文档自身的内容特征和点击关系构建查询—文档异构信息网络,并引入样本的判别信息强化网络结构。提出了查询—文档异构信息网络上半监督聚类的正则化框架和迭代算法,在正则化框架中,基于流形假设构造了异构信息网络上的代价函数,并得到该函数的封闭解,以此预测未标记查询和文档的类别标记。在大规模商业搜索引擎查询日志上的实验表明本方法优于传统的半监督学习方法。 Various graph-based algorithms for semi-supervised learning have been proposed in recent literatures. However, although classification on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. The semi-supervised classification problem on query-document heterogeneous information network which incorporate the bipartite graph with the content information from both sides is consid- ered. In order to strengthen the network structure, class information of sample nodes is introduced. A semi-supervised learning algorithm based on two frameworks including the novel graph-based regularization framework and the iterative framework is investigated. In the regnlarization framework, a new cost function to consider the direct relationship between two entity sets and the content information from both sides which leads t＇o a significant improvement over the baseline methods is developed. Experimental results demonstrate that proposed method achieves the best performance with consistent and promising improvements.

作者刘钰峰李仁发

机构地区湖南大学信息科学与工程学院湖南大学嵌入式系统与网络实验室

出处《通信学报》 EI CSCD 北大核心 2014年第8期40-47,共8页 Journal on Communications

基金国家自然科学基金资助项目(61173036)~~

关键词异构信息网络半监督学习信息检索点击日志 heterogeneous information networks semi-supervised learning information retrieval click-through data

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献20

1SUN Y, YU Y, HAN J. Ranking-based clustering of heterogeneous information networks with star network schema[A]. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discov- cry and Data mining[C]. Paris, France, 2009. 797-806.
2SUN Y, HAN J. Mining heterogeneous information networks: a struc- tural analysis approach[J]. SlGK.DD Explorations, 2012, 14(2):20-28.
3BELKIN M, NIYOGI P, SINDHWANI V. Manifold regularization: a geometric framework for learning fi'om labeled and unlabeled exam- pies[J]. The Journal of Machine Learning Research, 2006, 7: 2399-2434.
4ZHOU D, BOUSQUET O, LAL T N, et al. Learning with local and global consistency[J]. Advances in Neural Information Processing Systems, 2004, 16:321-328.
5LI X, WANG Y Y, ACERO A. Learning query intent from regularized click graphs[A]. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Re- trieval[C]. Singapore, Singapore, 2008. 339-346.
6WU W, LI H, XU J. Learning query and document similarities from click-through bipartite graph with metadata[A]. Proceedings of the Sixth ACM International Conference on Web Search and Data Min- ing[C]. Roman, Italy, 2013.687-696.
7CHEN Y, WANG L, DONG M. Non-negative matrix factorization for semisupcrvised heterogeneous data coclustering[J]. Knowledge and Data Engineering, 2010, 22(10): 1459-1474.
8DENG H, HAN J, ZHAO B, et al. Probabilistic topic models with biased propagation on heterogeneous information networks[A]. Pro- ceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C]. San Diego, CA, 2011. 1271-1279.
9DENO H, HAN J, LYU M R, et al. Modeling and exploiting hetero- geneous bibliographic networks for expertise ranking[A]. Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries[C]. New York, USA, 2012.71-80.
10ZHOU Z H, LI M. Semi-supervised learning by disagreement[J]. Knowledge and Information Systems, 2010, 24(3): 415-439.

同被引文献9

1陈黎飞,姜青山,王声瑞.基于层次划分的最佳聚类数确定方法[J].软件学报,2008,19(1):62-72. 被引量：82
2赵泽亚,贾岩涛,王元卓,靳小龙,程学旗.基于动态异构信息网络的时序关系预测[J].计算机研究与发展,2015,52(8):1735-1741. 被引量：8
3胡凌超,于洪.一种基于投票的三支决策聚类集成方法[J].小型微型计算机系统,2016,37(8):1741-1745. 被引量：5
4赵军,徐晓燕.基于GraphX的分布式幂迭代聚类[J].计算机应用,2016,36(10):2710-2714. 被引量：3
5魏霖静,练智超,王联国,侯振兴.基于词条与语意差异度量的文档聚类算法[J].计算机科学,2016,43(12):229-233. 被引量：1
6赵孝礼,赵荣珍.全局与局部判别信息融合的转子故障数据集降维方法研究[J].自动化学报,2017,43(4):560-567. 被引量：34
7徐森,皋军,花小朋,李先锋,徐静.一种改进的自适应聚类集成选择方法[J].自动化学报,2018,44(11):2103-2112. 被引量：8
8陈湘涛,丁平尖,王晶.异构信息网中基于元路径的动态相似性搜索[J].计算机应用,2014,34(9):2604-2607. 被引量：2
9李玉,甄畅,石雪,赵泉华.基于熵加权K-means全局信息聚类的高光谱图像分类[J].中国图象图形学报,2019,0(4):630-638. 被引量：14

引证文献2

1汤小康,曹步文.异构信息网络中基于图的半监督学习[J].小型微型计算机系统,2017,38(10):2258-2262. 被引量：1
2王留洋,俞扬信,陈伯伦,章慧.基于共识和分类改善文档聚类的识别信息方法[J].计算机应用,2020,40(4):1069-1073. 被引量：6

二级引证文献7

1陈惠娟,赵旭,陈亮.云计算环境中移动网络低匹配度异质信息入侵感知预测算法[J].吉林大学学报（理学版）,2019,57(6):1449-1455. 被引量：2
2刘鹏,宁鹏飞.基于VSM的海量医学资源特定信息优化聚类模型[J].计算机仿真,2021,38(6):383-386.
3徐一鸣,潘伟民.基于深度学习的多重文档结构识别方法研究[J].电子设计工程,2021,29(21):53-56. 被引量：1
4李金讯,郭娜,林树鸿,颜清.基于多重图像隐藏防伪标识公文防篡改的新方法研究[J].电力大数据,2021,24(9):1-8. 被引量：2
5刘江平.基于特征选择的光通信网络传输冗余信息辨识方法[J].保山学院学报,2022,41(2):71-77. 被引量：1
6吴南辉,沈炎松.英汉翻译语法误译校正方法研究——基于K均值聚类[J].漳州职业技术学院学报,2022,24(2):67-75.
7顾志芹.图书馆信息库资源自动检索方法研究[J].自动化技术与应用,2023,42(11):77-81. 被引量：1

1王继民,彭波.搜索引擎用户点击行为分析[J].情报学报,2006,25(2):154-162. 被引量：45
2张永健,王漫.无线传感器网络系统的故障诊断技术研究综述[J].计算机与现代化,2012(1):129-131.
3刘钰峰,李仁发.基于Term-Query-URL异构信息网络的查询推荐[J].湖南大学学报（自然科学版）,2014,41(5):106-112. 被引量：3
4邓晓妹,武刚.基于点击日志的搜索引擎用户满意度评价研究[J].计算机工程与应用,2015,51(8):245-249. 被引量：1
5Loadmemory.MSN Space进阶支巧（二）[J].计算机应用文摘,2005,21(14):81-82.
6徐姗姗,刘应安,徐昇.立体匹配中边界信息的强化算法[J].山东大学学报（工学版）,2012,42(6):43-49.
7孙付伟,李娟,杨达.基于贝叶斯推理的点击模型及其实现[J].计算机应用与软件,2013,30(1):7-10. 被引量：1
8陈斌辉,白清源.半监督复杂结构数据降维方法[J].计算机工程与应用,2011,47(35):135-138. 被引量：1
9石雁,李朝锋.基于朴素贝叶斯点击预测的查询推荐方法[J].计算机应用与软件,2016,33(10):19-22. 被引量：3
10王家卓,刘奕群,马少平,张敏.基于用户行为的竞价广告效果分析[J].计算机研究与发展,2011,48(1):133-138. 被引量：10

通信学报

2014年第8期

浏览历史

内容加载中请稍等...

基于查询—文档异构信息网络的半监督学习被引量：2

参考文献20

同被引文献9

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于查询—文档异构信息网络的半监督学习 被引量：2

参考文献20

同被引文献9

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于查询—文档异构信息网络的半监督学习被引量：2