反事实预测和选择偏差是因果效应估计中的重大挑战。为对潜在协变量的复杂混杂分布进行有效表征,同时增强反事实预测泛化能力,提出一种面向工业因果效应估计应用的重加权对抗变分自编码器网络(RVAENet)模型。针对混杂分布去偏问题,借鉴...反事实预测和选择偏差是因果效应估计中的重大挑战。为对潜在协变量的复杂混杂分布进行有效表征,同时增强反事实预测泛化能力,提出一种面向工业因果效应估计应用的重加权对抗变分自编码器网络(RVAENet)模型。针对混杂分布去偏问题,借鉴域适应思想,采用对抗学习机制对由变分自编码器(VAE)获得的隐含变量进行表示学习的分布平衡;在此基础上,通过学习样本倾向性权重对样本进行重加权,进一步缩小实验组(Treatment)与对照组(Control)样本间的分布差异。实验结果表明,在工业真实场景数据集的两个场景下,所提模型的提升曲线下的面积(AUUC)比TEDVAE(Treatment Effect with Disentangled VAE)分别提升了15.02%、16.02%;在公开数据集上,所提模型的平均干预效果(ATE)和异构估计精度(PEHE)普遍取得最优结果。展开更多
In recent years,there has been a growing interest in graph convolutional networks(GCN).However,existing GCN and variants are predominantly based on simple graph or hypergraph structures,which restricts their ability t...In recent years,there has been a growing interest in graph convolutional networks(GCN).However,existing GCN and variants are predominantly based on simple graph or hypergraph structures,which restricts their ability to handle complex data correlations in practical applications.These limitations stem from the difficulty in establishing multiple hierarchies and acquiring adaptive weights for each of them.To address this issue,this paper introduces the latest concept of complex hypergraphs and constructs a versatile high-order multi-level data correlation model.This model is realized by establishing a three-tier structure of complexes-hypergraphs-vertices.Specifically,we start by establishing hyperedge clusters on a foundational network,utilizing a second-order hypergraph structure to depict potential correlations.For this second-order structure,truncation methods are used to assess and generate a three-layer composite structure.During the construction of the composite structure,an adaptive learning strategy is implemented to merge correlations across different levels.We evaluate this model on several popular datasets and compare it with recent state-of-the-art methods.The comprehensive assessment results demonstrate that the proposed model surpasses the existing methods,particularly in modeling implicit data correlations(the classification accuracy of nodes on five public datasets Cora,Citeseer,Pubmed,Github Web ML,and Facebook are 86.1±0.33,79.2±0.35,83.1±0.46,83.8±0.23,and 80.1±0.37,respectively).This indicates that our approach possesses advantages in handling datasets with implicit multi-level structures.展开更多
As an emerging joint learning model,federated learning is a promising way to combine model parameters of different users for training and inference without collecting users’original data.However,a practical and effic...As an emerging joint learning model,federated learning is a promising way to combine model parameters of different users for training and inference without collecting users’original data.However,a practical and efficient solution has not been established in previous work due to the absence of efficient matrix computation and cryptography schemes in the privacy-preserving federated learning model,especially in partially homomorphic cryptosystems.In this paper,we propose a Practical and Efficient Privacy-preserving Federated Learning(PEPFL)framework.First,we present a lifted distributed ElGamal cryptosystem for federated learning,which can solve the multi-key problem in federated learning.Secondly,we develop a Practical Partially Single Instruction Multiple Data(PSIMD)parallelism scheme that can encode a plaintext matrix into single plaintext for encryption,improving the encryption efficiency and reducing the communication cost in partially homomorphic cryptosystem.In addition,based on the Convolutional Neural Network(CNN)and the designed cryptosystem,a novel privacy-preserving federated learning framework is designed by using Momentum Gradient Descent(MGD).Finally,we evaluate the security and performance of PEPFL.The experiment results demonstrate that the scheme is practicable,effective,and secure with low communication and computation costs.展开更多
现有时空感知的表示学习框架无法对强时空语义的实际场景存在的“When”、“Where”和“What”3个问题给出一个统一的解决方案。同时,现有的时间和空间建模上的研究方案也存在着一定的缺陷,无法在复杂的实际场景中取得最优的性能。为了...现有时空感知的表示学习框架无法对强时空语义的实际场景存在的“When”、“Where”和“What”3个问题给出一个统一的解决方案。同时,现有的时间和空间建模上的研究方案也存在着一定的缺陷,无法在复杂的实际场景中取得最优的性能。为了解决这些问题,本文提出了一个统一的用户表示框架—GTRL(geography and time aware representation learning),可以同时在时间和空间的维度上对用户的历史行为轨迹进行联合建模。在时间建模上,GTRL采用函数式的时间编码以及连续时间和上下文感知的图注意力网络,在动态的用户行为图上灵活地捕获高阶的结构化时序信息。在空间建模上,GTRL采用了层级化的地理编码和深度历史轨迹建模模块高效地刻画了用户的地理位置偏好。GTRL设计了统一的联合优化方案,同时在交互预测、交互时间预测以及交互位置3个任务上进行模型学习。最后,本文在公开数据集和工业数据集上设计了大量的实验,分别验证了GTRL相较学术界基线模型的优势,以及在实际业务场景中的有效性。展开更多
小微企业是中国经济的毛细血管、就业的蓄水池,也是推动中国经济高质量发展的重要力量。然而,已有关注小微企业的调查数据极为有限。面对这一状况,北京大学企业大数据研究中心、北京大学中国社会科学调查中心和蚂蚁集团研究院自2020年...小微企业是中国经济的毛细血管、就业的蓄水池,也是推动中国经济高质量发展的重要力量。然而,已有关注小微企业的调查数据极为有限。面对这一状况,北京大学企业大数据研究中心、北京大学中国社会科学调查中心和蚂蚁集团研究院自2020年第三季度起每个季度开展中国小微经营者调查(Online Survey of Micro-and-small Enterprises,简称OSOME),全面收集包括企业基本信息、经营状况、融资状况、政策覆盖状况、数字化转型以及对未来的信心等多方面的数据。基于上述调研数据,本文描绘了中国小微经营者的画像与生存现状,并构建了中国小微经营者信心指数。OSOME数据显示了小微经营者在以下五个方面的典型事实:第一,中国存在大量以糊口型为主的未注册个体户;第二,近几年小微经营者面临的生存压力主要来自经营成本上升和市场需求疲软;第三,相比大中型企业,小微经营者更加依赖线上融资,线上申请和获得贷款的比例是线下的2倍以上;第四,小微经营者享受减税降费等助企纾困政策比例较低;第五,小微经营者正在加快数字化转型进程。OSOME调查不仅能刻画个体户的经营情况,对已有企业调查形成有效补充,同时还能通过持续的季度数据收集和分析为决策者更好地促进小微企业良性发展提供实证依据。展开更多
文摘反事实预测和选择偏差是因果效应估计中的重大挑战。为对潜在协变量的复杂混杂分布进行有效表征,同时增强反事实预测泛化能力,提出一种面向工业因果效应估计应用的重加权对抗变分自编码器网络(RVAENet)模型。针对混杂分布去偏问题,借鉴域适应思想,采用对抗学习机制对由变分自编码器(VAE)获得的隐含变量进行表示学习的分布平衡;在此基础上,通过学习样本倾向性权重对样本进行重加权,进一步缩小实验组(Treatment)与对照组(Control)样本间的分布差异。实验结果表明,在工业真实场景数据集的两个场景下,所提模型的提升曲线下的面积(AUUC)比TEDVAE(Treatment Effect with Disentangled VAE)分别提升了15.02%、16.02%;在公开数据集上,所提模型的平均干预效果(ATE)和异构估计精度(PEHE)普遍取得最优结果。
基金Project supported by the National Natural Science Foundation of China(Grant Nos.12275179 and 11875042)the Natural Science Foundation of Shanghai Municipality,China(Grant No.21ZR1443900)。
文摘In recent years,there has been a growing interest in graph convolutional networks(GCN).However,existing GCN and variants are predominantly based on simple graph or hypergraph structures,which restricts their ability to handle complex data correlations in practical applications.These limitations stem from the difficulty in establishing multiple hierarchies and acquiring adaptive weights for each of them.To address this issue,this paper introduces the latest concept of complex hypergraphs and constructs a versatile high-order multi-level data correlation model.This model is realized by establishing a three-tier structure of complexes-hypergraphs-vertices.Specifically,we start by establishing hyperedge clusters on a foundational network,utilizing a second-order hypergraph structure to depict potential correlations.For this second-order structure,truncation methods are used to assess and generate a three-layer composite structure.During the construction of the composite structure,an adaptive learning strategy is implemented to merge correlations across different levels.We evaluate this model on several popular datasets and compare it with recent state-of-the-art methods.The comprehensive assessment results demonstrate that the proposed model surpasses the existing methods,particularly in modeling implicit data correlations(the classification accuracy of nodes on five public datasets Cora,Citeseer,Pubmed,Github Web ML,and Facebook are 86.1±0.33,79.2±0.35,83.1±0.46,83.8±0.23,and 80.1±0.37,respectively).This indicates that our approach possesses advantages in handling datasets with implicit multi-level structures.
基金supported by the National Natural Science Foundation of China under Grant No.U19B2021the Key Research and Development Program of Shaanxi under Grant No.2020ZDLGY08-04+1 种基金the Key Technologies R&D Program of He’nan Province under Grant No.212102210084the Innovation Scientists and Technicians Troop Construction Projects of Henan Province.
文摘As an emerging joint learning model,federated learning is a promising way to combine model parameters of different users for training and inference without collecting users’original data.However,a practical and efficient solution has not been established in previous work due to the absence of efficient matrix computation and cryptography schemes in the privacy-preserving federated learning model,especially in partially homomorphic cryptosystems.In this paper,we propose a Practical and Efficient Privacy-preserving Federated Learning(PEPFL)framework.First,we present a lifted distributed ElGamal cryptosystem for federated learning,which can solve the multi-key problem in federated learning.Secondly,we develop a Practical Partially Single Instruction Multiple Data(PSIMD)parallelism scheme that can encode a plaintext matrix into single plaintext for encryption,improving the encryption efficiency and reducing the communication cost in partially homomorphic cryptosystem.In addition,based on the Convolutional Neural Network(CNN)and the designed cryptosystem,a novel privacy-preserving federated learning framework is designed by using Momentum Gradient Descent(MGD).Finally,we evaluate the security and performance of PEPFL.The experiment results demonstrate that the scheme is practicable,effective,and secure with low communication and computation costs.
文摘现有时空感知的表示学习框架无法对强时空语义的实际场景存在的“When”、“Where”和“What”3个问题给出一个统一的解决方案。同时,现有的时间和空间建模上的研究方案也存在着一定的缺陷,无法在复杂的实际场景中取得最优的性能。为了解决这些问题,本文提出了一个统一的用户表示框架—GTRL(geography and time aware representation learning),可以同时在时间和空间的维度上对用户的历史行为轨迹进行联合建模。在时间建模上,GTRL采用函数式的时间编码以及连续时间和上下文感知的图注意力网络,在动态的用户行为图上灵活地捕获高阶的结构化时序信息。在空间建模上,GTRL采用了层级化的地理编码和深度历史轨迹建模模块高效地刻画了用户的地理位置偏好。GTRL设计了统一的联合优化方案,同时在交互预测、交互时间预测以及交互位置3个任务上进行模型学习。最后,本文在公开数据集和工业数据集上设计了大量的实验,分别验证了GTRL相较学术界基线模型的优势,以及在实际业务场景中的有效性。
文摘小微企业是中国经济的毛细血管、就业的蓄水池,也是推动中国经济高质量发展的重要力量。然而,已有关注小微企业的调查数据极为有限。面对这一状况,北京大学企业大数据研究中心、北京大学中国社会科学调查中心和蚂蚁集团研究院自2020年第三季度起每个季度开展中国小微经营者调查(Online Survey of Micro-and-small Enterprises,简称OSOME),全面收集包括企业基本信息、经营状况、融资状况、政策覆盖状况、数字化转型以及对未来的信心等多方面的数据。基于上述调研数据,本文描绘了中国小微经营者的画像与生存现状,并构建了中国小微经营者信心指数。OSOME数据显示了小微经营者在以下五个方面的典型事实:第一,中国存在大量以糊口型为主的未注册个体户;第二,近几年小微经营者面临的生存压力主要来自经营成本上升和市场需求疲软;第三,相比大中型企业,小微经营者更加依赖线上融资,线上申请和获得贷款的比例是线下的2倍以上;第四,小微经营者享受减税降费等助企纾困政策比例较低;第五,小微经营者正在加快数字化转型进程。OSOME调查不仅能刻画个体户的经营情况,对已有企业调查形成有效补充,同时还能通过持续的季度数据收集和分析为决策者更好地促进小微企业良性发展提供实证依据。