摘要
有机化合物在生物体内的富集,通常用生物富集因子(bioconcentration factor,简称BCF)来表达,这是化合物生态环境毒性评估的重要指标。为合理预测有机化合物是否易于生物富集,首先从美国环保局网站收集了624个具有不同BCF值的化合物,然后采用7种分子指纹结合5种机器学习方法(包括支持向量机、C4.5决策树、k最近邻法、随机森林法和朴素贝叶斯法),构建了化合物BCF的分类预测模型,所有模型均采用独立外部验证集进行验证。其中,使用Chemo Typer分子指纹结合支持向量机方法得到的二分类模型,整体预测准确度最好,达到了85.4%。通过采用信息增益、频率分析等方法,进一步确定了化合物中易于引起生物富集的关键子结构,包括芳基氯、二芳基醚、氯代烷烃等。研究中所用到的方法为有毒化学品的生态风险评价提供了良好可靠的预测工具。
Bioconcentration is an important endpoint in evaluation of chemical adverse effects on ecosystems. In this study, in silico methods were used to predict chemical bioconcentration factor (BCF). At ftrst a data set containing 624 chemicals with BCF values was collected from the Estimation Program Interface Suite of the U. S. Environmental Protection Agency. Using seven fingerprints to represent the molecules, binary classification models were developed with five machine learning methods, including support vector machine (SVM), CA.5 decision tree (CA.5 DT), k-nearest neighbors (kNN), random forest (RF), and Ndive Bayes (NB). Reliable predictive models were then obtained and validated by 10-fold cross validation and external validation set. Among them, the model built by SVM with ChemoTyper fingerprint performed best, with predictive accuracy up to 85A%. Moreover, some substructures were identified to be key for bioconcentration via several methods, such as arylchloride, diarylether, chloroalkene, and so on. The approaches used in this study provide a useful tool for environmental risk assessment of chemicals.
出处
《生态毒理学报》
CAS
CSCD
北大核心
2015年第2期173-182,共10页
Asian Journal of Ecotoxicology
基金
国家自然科学基金(No.81373329)
学科创新引智计划即111计划(No.B07023)
关键词
生物富集因子
计算机预测
二分类模型
警示子结构
环境毒理学
bioconcentration factor
in silico prediction
binary classification models
substructural alerts
environmental toxicology