Welcome to CFM-ID 3.0! To use the earlier version 2.0, click here.

caffeine spectra

CFM-ID 3.0 is an improved version of CFM-ID, a freely available tool for MS-spectra prediction, MS-spectra annotation, and MS-based compound identification. The improvements to CFM-ID’s performance and speed include:

  1. the implementation of a rule-based fragmentation approach for lipid MS/MS spectral prediction, which greatly improves the speed and accuracy of CFM-ID;
  2. the inclusion of experimental MS/MS spectra and other metadata to enhance CFM-ID’s compound identification abilities;
  3. the development of new scoring functions that improves CFM-ID’s accuracy by 21.1%; and
  4. the implementation of a chemical classification algorithm that correctly classifies unknown chemicals (based on their MS/MS spectra) in >80% of the cases.

CFM-ID provides a method for accurately and efficiently identifying metabolites in spectra generated by electrospray tandem mass spectrometry (ESI-MS/MS). The program uses Competitive Fragmentation Modeling to produce a probabilistic generative model for the MS/MS fragmentation process and machine learning techniques to adapt the model parameters from data. This generated model can be used for:

Spectra Prediction:Predicting the spectra for a given chemical structure. This task predicts low/10V, medium/20V, and high/40V energy MS/MS spectra for an input structure provided in SMILES or InChI format. Spectra are predicted using combinatorial fragmentation, except in the case of select lipids in which a rule-based fragmentation approach is implemented. The rule-based fragmentation module is based on a library of 344 rules covering 21 lipid classes and seven adducts.
Peak Assignment:Annotating the peaks in set of spectra given a known chemical structure. This task takes a set of three input spectra (for ESI spectra, low/10V, medium/20V, and high/40V energy levels) or a single input spectra (for EI spectra, 70eV energy level) in peak list format and a chemical structure in SMILES or InChI format, then assigns a putative fragment annotation to the peaks in each spectrum.
Compound Identification:Predicted ranking of possible candidate structures for a target spectrum. This task takes a set of three input spectra (for ESI spectra, low/10V, medium/20V, and high/40V energy levels) or a single input spectra (for EI spectra, 70eV energy level) in peak list format, and ranks a list of candidate structures according to how well they match the input spectra. This candidate list may be provided by the user, or can be generated from select databases (CASMI2016, ContaminantDB, DrugBank, FiehnLib, HMDB, KEGG, MassBank, MetaboBASE, NIST, PhytoHub, and iTree). Chemical classes are predicted for each candidate molecule. The original similarity score used in the ranking was computed (Jaccard or DotProduct) by comparing the predicted spectra of a candidate compound with the input spectra. The new similarity score takes into account candidate molecule metadata (citation frequency and chemical classification) in addition to the original score. Users can choose to use either scoring method.

The models used here was trained using Single Energy Competitive Fragmentation Modeling on ESI-MS/MS spectra measured at three different collision energies (low/10V, medium/20V, and high/40V) and EI-MS spectra measured at 70eV that were obtained from the METLIN Metabolite Database.

More information on the Competitive Fragmentation Modeling method and this web server can be found in the following publications:

Djoumbou-Feunang Y, Pon A, Karu N, Zheng J, Li C, Arndt D, Gautam M, Allen F, and Wishart D. Significantly Improved ESI-MS/MS Prediction and Compound Identification. Metabolites. April 13, 2019, 9(4), 72.

Allen F, Greiner R, and Wishart D. Computational prediction of electron ionization mass spectra to assist in GC-MS compound identification. Anal. Chem. July 6, 2016, 88(15), 7689–7697.
Supporting Data: https://sourceforge.net/p/cfm-id/code/HEAD/tree/supplementary_material/2016_ei_ms_paper/

Allen F, Greiner R, and Wishart D. Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics. June 2014.
Supporting Data: https://sourceforge.net/p/cfm-id/code/HEAD/tree/supplementary_material/2015_esi_msms_paper/

Allen F, Pon A, Wilson M, Greiner R, and Wishart D. CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra. Nucleic Acids Res. June 2014.

Source code for the rule-based fragmentation tool is available at https://bitbucket.org/wishartlab/msrb-fragmenter. Windows executables and cross-platform source code for the combinatorial fragmentation tool are freely available at http://sourceforge.net/projects/cfm-id. Supplementary files containing test molecule lists and trained models are also available on that site.