Spectra Prediction

The Spectra Prediction utility predicts the spectra for a given input molecule.

Input:

InChI/SMILES: The molecule must be represented in either InChI format or SMILES format. InChI strings need to start with "InChI=" and are not expected to have any charge - an additional H+ will be added. InChI strings need to contain AT LEAST the main layer with its chemical formula and atom connections sublayers for proper computation.

Examples:CN1CCC[C@H]1c2cccnc2
InChI=1S/C10H14N2/c1-12-7-3-5-10(12)9-4-2-6-11-8-9/h2,4,6,8,10H,3,5,7H2,1H3

Spectra Type: The type of spectra, either ESI (Electrospray Ionization) or EI (Electron Ionization/Impact).

Ion Mode: Indicates whether the precursor ion has a positive or negative adduct.

Adduct: Indicates the specific adduct used.

Output:

Spectra are computed for low (10V), medium (20V) and high (40V) collision energy levels and are represented by a list of 'mass intensity' pairs, each corresponding to a peak in the spectra.

energy0 low energy level
132.08132432.334652703
134.09697443.34923502 mass intensity
136.11262444.048843483
146.09697442.149324997
147.09222341.635296179
163.123523586.48264762
energy1 medium energy level
84.081324326.006301708
132.08132438.196052008
134.09697449.194699278
136.112624410.23621718
146.09697446.107338911
147.09222344.933320787
163.123523555.32607013
energy2 high energy level
30.034374132.742873396
41.039125166.65846756
44.050024194.235872856
46.065674261.836649163
51.02347515.705668897
53.039125163.026849879
55.054775226.651243286
56.050024191.699714643
57.070425293.407180068
65.039125162.771825374
68.050024191.700204766
80.050024196.265672776
82.065674263.807914718
84.081324328.114929747
92.050024194.008757209
94.065674266.926908744
96.081324321.504021177
104.05002422.293258876
105.07042531.810507476
106.06567432.507700064
108.08132432.524395945
118.06567432.464943213
120.08132433.650643451
121.07657333.021784682
132.08132432.057084015
134.09697441.892947239
136.11262442.008486652
137.10787341.694987566
147.09222343.008506562

Peak Assignment

The Peak Assignment utility annotates the peaks in a provided set of spectra given a known molecule. The complete list of feasible fragments is computed, then the most likely fragments for each spectrum peak are determined using a pre-trained model.

Input:

InChI/SMILES: The molecule must be represented in either InChI format or SMILES format. InChI strings need to start with "InChI=" and are not expected to have any charge - an additional H+ will be added. InChI strings need to contain AT LEAST the main layer with its chemical formula and atom connections sublayers for proper computation.

Examples:Oc1ccc(CC(NC(=O)C(N)CO)C(=O)NC(CC(O)=O)C(O)=O)cc1
InChI=1S/C16H21N3O8/c17-10(7-20)14(24)18-11(5-8-1-3-9(21)4-2-8)15(25)19-12(16(26)27)6-13(22)23/h1-4,10-12,20-21H,5-7,17H2,(H,18,24)(H,19,25)(H,22,23)(H,26,27)

Spectra: The spectra should be represented as a list of peaks with the format 'mass intensity' on each line. For ESI spectra, 'low','medium', and 'high' or 'energy0', 'energy1', and 'energy2' header lines should begin spectra of different energy levels (in that order) and multiple energy levels are optional (only one is required). EI spectra only need to have one energy level. Spectra may also be in .msp file format, in which case energy levels for ESI spectra should be specified in the "Comment: " field (EI spectra do not need a specified energy level). A corresponding spectra ID must be selected for .msp spectra. .msp files must have an "ID" and "Num peaks" attributes for each spectra.

Example peak list format:low
87.0546877.567280
105.0691741.791050
136.0761613.081500
160.0762892.225420
178.0846165.319120
223.106608100.000000
251.1017340.722900
297.1075673.945980
384.14038411.216900
medium
60.0445452.476820
87.0569659.632580
119.0460862.367850
135.0663351.865000
136.07719246.373600
160.0744176.652730
178.0870520.078100
223.109344100.000000
251.1086683.127750
297.1136871.892360
high
42.0339093.047230
60.04374626.520300
70.0272683.162400
87.05627218.342000
91.05449423.516200
119.0482815.711000
121.0634027.273900
133.065515.039960
135.0662383.626030
136.074907100.000000
160.07440926.458000
178.08545412.211700
Example .msp format:
Name: Diazirine
NISTNO: 305841
ID: ID_3
Num peaks: 12
Comment: energy0
12108.00
13228.99
14999.00
1521.98
2617.98
2758.05
28178.04
2922.98
4017.98
41108.00
42431.01
437.99
Name: Methane, diazo-
NISTNO: 57
ID: ID_4
Num peaks: 12
Comment: energy1
12110.10
13220.30
14999.00
1525.18
2612.59
2758.25
28179.34
2920.48
4021.98
41110.10
42424.82
4310.99

Spectra Type: The type of spectra, either ESI (Electrospray Ionization) or EI (Electron Ionization/Impact).

Ion Mode: Indicates whether the precursor ion has a positive or negative adduct.

Mass Tolerance: The mass tolerance to use when matching peaks within the dot product comparison. The default value is 10.0 ppm.

Output:

Results contain the original spectra appended with the ids of any fragments with a corresponding mass, listed in order from most likely to least likely. A list of fragments with their masses and SMILES is also provided, along with a list of transitions between pairs of fragments and their corresponding neutral losses. Fragment numbers are shown in red.

Spectra Peaks and Possible Matching Fragments for Oc1ccc(CC(NC(=O)C(N)CO)C(=O)NC(CC(O)=O)C(O)=O)cc1
energy0 low energy level
87.0546874.07127233716 15
105.0691740.96360281634
136.076167.03797785717
160.0762891.19729822114
178.0846162.861739768
223.10660853.801000327 21 24 mass   intensity   corresponding_fragment(s)
251.1017321.909327568 19 18 20
297.1075672.1229767135
384.1403846.0348044050
energy1 medium energy level
60.0445451.27364677522
87.0569654.95332904916 15
119.0460861.21761150125
135.0663350.9590326451
136.07719223.8465395617
160.0744173.42101085714
178.0870510.32469349
223.10934451.422661947 21 24
251.1086681.6083723098 19 18 20
297.1136870.97310188545
energy2 high energy level
42.0339091.24423091226
60.04374610.8286466922
70.0272681.291256596
87.0562727.48932091916 15
91.0544949.60202642
119.048286.41504312325
121.0634022.9700453327
133.065512.057893243
135.0662381.480563861
136.07490740.831539217
160.07440910.8032086414
178.0854544.986225072
28 Fragments Generated
0384.1406897NC(CO)C(=O)NC(CC1=CC=C(O)C=C1)C(=O)NC(CC(=O)O)C(=O)
1278.0988249N=C(C=O)C(=O)[NH+]=CC(O)NC(CC(O)O)C(O)O fragment_number   fragment_mass   fragment_SMILES
2276.0831748N=C(C=O)C(=O)[NH+]=C=C(O)NC(CC(O)O)C(O)O
3274.0675247N=C(C=O)C(=O)[NH+]=C=C(O)N=C(CC(O)O)C(O)O
4105.0664025NC(CO)C(=[NH2+])O
5297.1086613[NH3+]C(=C=C1C=CC(=O)CC1)C(=O)N=C(CC(O)O)C(O)O
6367.1141406O=C(NC(CC(O)O)C(O)O)C(=C=C1C=CC(=O)CC1)[NH+]=C(O)C#CO
7223.1082673NC(CO)C(O)[NH+]=C=C=C1C=CC(=O)CC1
8251.103182N=C(CO)C(O)=[NH+]C(=C=C1C=CC(=O)CC1)CO
9253.118832NC(CO)C(O)=[NH+]C(=C=C1C=CC(=O)CC1)CO
10266.114081N=C(CO)C(O)=[NH+]C(=C=C1C=CC(=O)CC1)C(N)O
11268.1297311NC(O)C(=C=C1C=CC(=O)CC1)[NH+]=C(O)C(N)CO
12270.1453811NC(O)C(=C=C1C=CC(=O)CC1)[NH2+]C(O)C(N)CO
13354.130125C#CC(=C=C([NH+]=C(O)C(=N)C=O)C(=O)NC(CC(O)O)C(O)O)CC
14160.0722162N=C(CO)C(=O)[NH+]=CC(N)O
1587.05583784[NH+]#CC(N)CO
1687.05583784CC(=N)C(=[NH2+])O
17136.0762389[NH3+]C=C=C1C=CC(=O)CC1
18251.103182CC(=NC(=O)C([NH3+])=C=C1C=CC(=O)CC1)C(O)O
19251.103182[NH3+]C(=C=C1C=CC(=O)CC1)C(=O)N=CCC(O)O
20251.103182NC(O)C(=C=C1C=CC(=O)CC1)[NH+]=C(O)C=CO
21223.1082673C#CC(=C=C(CO)[NH+]=C(O)C(=N)CO)CC
2260.04493881[NH2+]=CCO
23354.130125N#CC(O)=[NH+]C(=C=C1C=CC(=O)CC1)C(O)NC(CC(O)O)C(O)O
24223.1082673NCC(O)=[NH+]C(=C=C1C=CC(=O)CC1)CO
25119.0496898C=C=C1C=CC(=[OH+])C=C1
2642.03437413CC#[NH+]
27121.0653399C=C=C1C=CC(=[OH+])CC1
Transitions
02C=C1C=CC(=O)CC1
03CC1C=CC(=O)CC1 fragment_number   fragment_number   transition_between_fragments
04O=C1C=CC(=C=C=C(O)N=C(C=C(O)O)C(O)O)CC1
05NC(=C=O)CO
06N
07O=C=NC(=CC(O)O)C(O)O
08N=C(C=C(O)O)C(O)O
09N=C(C=C(O)O)C(=O)O
010OC(O)C#CC(O)O
011O=C(O)C#CC(O)O
012O=C(O)C#CC(=O)O
013C=O
114OC(O)C#CC(O)O
214O=C(O)C#CC(O)O
314O=C(O)C#CC(=O)O
415O
416O
517O=C=NC(=CC(O)O)C(O)O
518O=CO
519O=CO
65O=C=C=CO
620O=C(O)C#CC(O)O
74C=C=C1C=CC(=O)C=C1
717NC(=C=O)CO
84O=C=C=C=C1C=CC(=O)CC1
97C=O
1014C=C1C=CC(=O)C=C1
107N=C=O
1114C=C1C=CC(=O)CC1
114NC(O)=C=C=C1C=CC(=O)CC1
1120N
117NC=O
118N
1214CC1C=CC(=O)CC1
1321N=C(C=C(O)O)C(=O)O
022O=C=NC(=C=C1C=CC(=O)CC1)C(=O)NC(CC(O)O)C(O)O
023C=O
222O=CN=C=C(O)N=C(C=C(O)O)C(O)O
422N=CO
235NC=C=O
2324N=C(C=C(O)O)C(=O)O
725N=C(O)C(N)CO
722O=C1C=CC(=C=C=NCO)CC1
822O=C=NC(=C=C1C=CC(=O)CC1)CO
921C=O
1122NC(O)C(=C=C1C=CC(=O)CC1)N=CO
1322C#CC(=C=C(N=C=O)C(=O)N=C(CC(O)O)C(O)O)CC
122O=CN=C=C(O)N=C(CC(O)O)C(O)O
322O=CN=C=C(O)N=C(C=C(O)O)C(=O)O
2226O
727N=C(O)C(=N)CO
922O=C1C=CC(=C=C(CO)N=CO)CC1
1022NC(O)C(=C=C1C=CC(=O)CC1)N=C=O
1222NC(O)C(=C=C1C=CC(=O)CC1)NCO

Compound Identification

The Compound Identification utility determines the compounds that most closely match to a given spectra. The spectra for each candidate compound are predicted using a pre-trained model and compared to the input spectra. The candidate compounds may be provided in a list from the user, or can be extracted from a database.

Input:

Spectra: The spectra should be represented as a list of peaks with the format 'mass intensity' on each line. For ESI spectra, 'low','medium', and 'high' or 'energy0', 'energy1', and 'energy2' header lines should begin spectra of different energy levels (in that order) and multiple energy levels are optional (only one is required). EI spectra only need to have one energy level. Spectra may also be in .msp file format, in which case energy levels for ESI spectra should be specified in the "Comment: " field (EI spectra do not need a specified energy level). A corresponding spectra ID must be selected for .msp spectra. .msp files must have an "ID" and "Num peaks" attributes for each spectra.

Example:low
87.0546877.567280
105.0691741.791050
136.0761613.081500
160.0762892.225420
178.0846165.319120
223.106608100.000000
251.1017340.722900
297.1075673.945980
384.14038411.216900
medium
60.0445452.476820
87.0569659.632580
119.0460862.367850
135.0663351.865000
136.07719246.373600
160.0744176.652730
178.0870520.078100
223.109344100.000000
251.1086683.127750
297.1136871.892360
high
42.0339093.047230
60.04374626.520300
70.0272683.162400
87.05627218.342000
91.05449423.516200
119.0482815.711000
121.0634027.273900
133.065515.039960
135.0662383.626030
136.074907100.000000
160.07440926.458000
178.08545412.211700
Example .msp format:
Name: Diazirine
NISTNO: 305841
ID: ID_3
Num peaks: 12
Comment: energy0
12108.00
13228.99
14999.00
1521.98
2617.98
2758.05
28178.04
2922.98
4017.98
41108.00
42431.01
437.99
Name: Methane, diazo-
NISTNO: 57
ID: ID_4
Num peaks: 12
Comment: energy1
12110.10
13220.30
14999.00
1525.18
2612.59
2758.25
28179.34
2920.48
4021.98
41110.10
42424.82
4310.99

Search Candidates: The candidates should be represented as a list of compounds in the format 'ID SMILES_or_InChI' on each line. The list can have a maximum of 100 compounds. The compounds must be represented in proper InChI format or SMILES format. InChI strings need to start with "InChI=" and are not expected to have any charge - an additional H+ will be added. InChI strings need to contain AT LEAST the main layer with it's chemical formula and atom connections sublayers for proper computation.

Example:7156455CC(C)N1C(=O)C2C(CCN2S(C)(=O)=O)N(Cc2cccc(F)c2)C1=O
485776CSCC(=O)NCC1CN(c2ccc(N3CCOCC3)c(F)c2)C(=O)O1
485687CC(=O)NNCC1CN(c2ccc(C3CCS(=O)CC3)c(F)c2)C(=O)O1
45556239O=C(NC1CC1)N1CCC2(CC1)OCCN2S(=O)(=O)c1ccc(F)cc1
19459759Cc1cc(C(F)(F)Cl)n2nc(C(=O)NC3CC4CCC(C3)N4C)cc2n1
59444507Cc1cc(CN(CC(=O)O)CC(=O)O)nc(CN(CC(=O)O)CC(=O)O)c1
58984199C=CC(=O)OCCn1c(=O)n(CCOC)c(=O)n(CCOC(=O)C=C)c1=O
58753253NC(CO)C(=O)NC(CC(=O)O)C(=O)NC(Cc1ccc(O)cc1)C(=O)O
54199399NC(CN(CC(=O)O)CC(=O)O)(c1ccccc1)N(CC(=O)O)CC(=O)O
45644415CNC(=O)NC(=O)COC(=O)C1C(C(=O)OC)=C(C)NC(C)=C1C(=O)OC
44585322COc1cc(C(=O)NCC(=O)NCC(=O)NCC(=O)O)cc(OC)c1OC
36010709COc1cc(C(=O)NCC(=O)OC(C)C(=O)NC(N)=O)cc(OC)c1OC
21494927Nc1ccccc1C(C(=O)O)N(CCN(CC(=O)O)CC(=O)O)CC(=O)O
21273011NC(C(=O)O)(c1ccccc1)N(CCN(CC(=O)O)CC(=O)O)CC(=O)O
20147059Nc1ccc(C(C(=O)O)N(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1
18232127NC(Cc1ccc(O)cc1)C(=O)NC(CO)C(=O)NC(CC(=O)O)C(=O)O
18231916NC(Cc1ccc(O)cc1)C(=O)NC(CC(=O)O)C(=O)NC(CO)C(=O)O
18224136NC(CO)C(=O)NC(Cc1ccc(O)cc1)C(=O)NC(CC(=O)O)C(=O)O
18219720NC(CC(=O)O)C(=O)NC(Cc1ccc(O)cc1)C(=O)NC(CO)C(=O)O

Database: Instead of providing a candidate list, one can be generated from a selected database. Additional input options for generating a compound list from a database are:

Parent Ion Mass: The parent ion mass of the compound used in the mass spectrometry.

Adduct Type: The adduct type used in the mass spectrometry.

Candidate Mass Tolerance: The mass tolerance to use when identifying candidate compounds in the database. The default value is 100.0 ppm.

Candidate Limit: The maximum number of candidates to return. The maximum and default value is 100.

Spectra Type: The type of spectra, either ESI (Electrospray Ionization) or EI (Electron Ionization/Impact).

Ion Mode: Indicates whether the precursor ion has a positive or negative adduct.

Number of Results: The number of results to return, with the default value being 10. If left blank, all results wil be returned.

Mass Tolerance: The mass tolerance to use when matching peaks within the dot product comparison. The default value is 10.0 ppm.

Scoring Function: The type of scoring function to use when comparing spectra. The options are Jaccard and DotProduct.

Output:

The top candidates are ranked according to how closely they match and returned in a list.

ScoreIDSMILES
10.782922418224136NC(CO)C(=O)NC(Cc1ccc(O)cc1)C(=O)NC(CC(=O)O)C(=O)O
20.7188848218232127NC(Cc1ccc(O)cc1)C(=O)NC(CO)C(=O)NC(CC(=O)O)C(=O)O rank   score   fragment_id   fragment_SMILES
30.5880650118231916NC(Cc1ccc(O)cc1)C(=O)NC(CC(=O)O)C(=O)NC(CO)C(=O)O
40.5871725458753253NC(CO)C(=O)NC(CC(=O)O)C(=O)NC(Cc1ccc(O)cc1)C(=O)O
50.5784552818219720NC(CC(=O)O)C(=O)NC(Cc1ccc(O)cc1)C(=O)NC(CO)C(=O)O
60.3091787421273011NC(C(=O)O)(c1ccccc1)N(CCN(CC(=O)O)CC(=O)O)CC(=O)O
70.2714285720147059Nc1ccc(C(C(=O)O)N(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1
80.2619047621494927Nc1ccccc1C(C(=O)O)N(CCN(CC(=O)O)CC(=O)O)CC(=O)O
90.2333333354199399NC(CN(CC(=O)O)CC(=O)O)(c1ccccc1)N(CC(=O)O)CC(=O)O
100.2009803944585322COc1cc(C(=O)NCC(=O)NCC(=O)NCC(=O)O)cc(OC)c1OC

If a list of search candidates is submitted, the predicted spectra for these candidates will be found in a separate file, which will be in .msp format.