Reducing metabolomics data processing time with the CCTMS credential library
With the advent of ultra-high performance liquid chromatography and high resolution tandem mass spectrometry (LC-MS), the scale of metabolomics has increased tremendously to tens of thousands of compounds including endogenous metabolites, environmental contaminants, and chemical noise. Typical approaches utilize large databases (~100,000 compounds) leading to computationally-intensive searches. To address this issue, we created a credentialed library of primary endogenous compounds which captures major metabolic pathways and reduces data processing time. We first analyzed previously acquired positive mode metabolomics data from various biological matrices spiked in with 13C isotopically labeled metabolites from yeast generated using HILIC chromatography. We then selected metabolites based on signal to noise ratio, biological origin, and matching retention times with 13C internal standards. The resulting library contained 493 fully annotated and verified metabolites. We demonstrated the utility of this library in several biological matrices (serum, plasma, muscle, liver, medulloblastoma cell line, and yeast). We identified metabolites from several chemical classes such as lipids, amino acids, sugars, and nucleotides involved in central carbon/nitrogen metabolic pathways and observed tissue specific metabolic signatures. Using our library, we observed significant reduction in search time (54-fold, p-value = 0.004) without loss in data quality. We plan to continuously expand our library to more accurately and efficiently characterize endogenous metabolites in future studies.