摘要
Emeralds-the green colored variety of beryl-occur as gem-quality specimens in over fifty deposits globally.While digital traceability methods for emerald have limitations,sample-based approaches offer robust alterna-tives,particularly for determining the geographic origin of emerald.Three factors make emerald suitable for provenance studies and hence for developing models for origin determination.First,the diverse elemental chemistry of emerald at minor(<1 wt%)and trace levels(<1 to 100’s ppmw)exhibits unique inter-element fractionations between global deposits.Second,minimally destructive techniques,including laser ablation inductively coupled plasma mass spectrometry(LA-ICP-MS),enable measurement of these diagnostic elemental signatures.Third,when applied to extensive datasets,machine learning(ML)techniques enable the creation of predictive models and statistical discrimination with adequate characterization of the deposits.This study em-ploys a carefully selected dataset comprising more than 1000 LA-ICP-MS analyses of gem-quality emeralds,enriched with new analyses.This dataset represents the largest available for global emerald deposits.We con-ducted unsupervised exploratory analysis using Principal Component Analysis(PCA).For machine learning-based classification,we employed Support Vector Machine Classification(SVM-C),achieving an initial accu-racy rate of 79%.This was enhanced to 96.8%through the use of hierarchical SVM-C with PCA filters as our modeling approach.The ML models were trained using the concentrations of eight statistically significant ele-ments(Li,V,Cr,Fe,Sc,Ga,Rb,Cs).By leveraging high-quality LA-ICP-MS data and ML techniques,accurate identification of the geographical origin of emerald becomes possible.These models are important for accurate provenance of emerald,and from a geochemical perspective,for understanding the formation environments of beryl-bearing pegmatites and shales.