摘要
To aid the development of machine learning models for automated spectroscopic data classification,we created a universal synthetic dataset for the validation of their performance.The dataset mimics the characteristic appearance of experimental measurements from techniques such as X-ray diffraction,nuclear magnetic resonance,and Raman spectroscopy among others.We applied eight neural network architectures to classify artificial spectra,evaluating their ability to handle common experimental artifacts.While all models achieved over 98%accuracy on the synthetic dataset,misclassifications occurred when spectra had overlapping peaks or intensities.We found that non-linear activation functions,specifically ReLU in the fully-connected layers,were crucial for distinguishing between these classes,while adding more sophisticated components,such as residual blocks or normalization layers,provided no performance benefit.Based on these findings,we summarize key design principles for neural networks in spectroscopic data classification and publicly share all scripts used in this study.
基金
N.J.S.was supported in part by the National Science Foundation Graduate Research Fellowship under grant#1752814.We also thank Gerbrand Ceder for the helpful discussion and invitation to UC Berkeley。