We investigated the application of Causal Bayesian Networks (CBNs) to large data sets in order to predict user intent via internet search prediction. Here, sample data are taken from search engine logs (Excite, Altavi...We investigated the application of Causal Bayesian Networks (CBNs) to large data sets in order to predict user intent via internet search prediction. Here, sample data are taken from search engine logs (Excite, Altavista, and Alltheweb). These logs are parsed and sorted in order to create a data structure that was used to build a CBN. This network is used to predict the next term or terms that the user may be about to search (type). We looked at the application of CBNs, compared with Naive Bays and Bays Net classifiers on very large datasets. To simulate our proposed results, we took a small sample of search data logs to predict intentional query typing. Additionally, problems that arise with the use of such a data structure are addressed individually along with the solutions used and their prediction accuracy and sensitivity.展开更多
为了及早发现重症监护室中的急性肾损伤高危患者,为其提供适当的护理,实现医疗资源的合理利用,研究建立因果贝叶斯网络模型进行急性肾损伤高危患者死亡风险预测。从重症监护医学信息市场(Medical Information Mart for Intensive CareⅢ...为了及早发现重症监护室中的急性肾损伤高危患者,为其提供适当的护理,实现医疗资源的合理利用,研究建立因果贝叶斯网络模型进行急性肾损伤高危患者死亡风险预测。从重症监护医学信息市场(Medical Information Mart for Intensive CareⅢ,MIMIC-Ⅲ)数据库中筛选了25个研究变量和3870条患者数据,使用因果发现算法进行特征降维。通过NO TEARS算法构建因果图并建立因果贝叶斯网络进行实验,通过机器学习算法验证重要特征的合理性,并对网络结构进行因果效应估计,模型具有最高的受试者工作特征曲线下面积(Area Under the Receiver Operating Characteristic,AUROC)分数,为81.7%,优于逻辑回归(Logistic Regression,LR)、随机森林(Random Forest,RF)和极端梯度提升树(eXtreme Gradient Boosting,XGBoost)。此外,模型的重要特征预测能力在各种建模中都很稳健,构建的因果贝叶斯网络具有更好的预测效果并具备良好的解释能力。展开更多
文摘We investigated the application of Causal Bayesian Networks (CBNs) to large data sets in order to predict user intent via internet search prediction. Here, sample data are taken from search engine logs (Excite, Altavista, and Alltheweb). These logs are parsed and sorted in order to create a data structure that was used to build a CBN. This network is used to predict the next term or terms that the user may be about to search (type). We looked at the application of CBNs, compared with Naive Bays and Bays Net classifiers on very large datasets. To simulate our proposed results, we took a small sample of search data logs to predict intentional query typing. Additionally, problems that arise with the use of such a data structure are addressed individually along with the solutions used and their prediction accuracy and sensitivity.
文摘为了及早发现重症监护室中的急性肾损伤高危患者,为其提供适当的护理,实现医疗资源的合理利用,研究建立因果贝叶斯网络模型进行急性肾损伤高危患者死亡风险预测。从重症监护医学信息市场(Medical Information Mart for Intensive CareⅢ,MIMIC-Ⅲ)数据库中筛选了25个研究变量和3870条患者数据,使用因果发现算法进行特征降维。通过NO TEARS算法构建因果图并建立因果贝叶斯网络进行实验,通过机器学习算法验证重要特征的合理性,并对网络结构进行因果效应估计,模型具有最高的受试者工作特征曲线下面积(Area Under the Receiver Operating Characteristic,AUROC)分数,为81.7%,优于逻辑回归(Logistic Regression,LR)、随机森林(Random Forest,RF)和极端梯度提升树(eXtreme Gradient Boosting,XGBoost)。此外,模型的重要特征预测能力在各种建模中都很稳健,构建的因果贝叶斯网络具有更好的预测效果并具备良好的解释能力。