Input SMILES file


Maximum authorized length of SMILES (SMILES > maxlength will be discarded)


Standardization description
The in-house standardization uses the Indigo toolkit. It consists in the following operations:
  • Discarding compounds if the number of atoms is > maxlength
  • Normalizing structure using Indigo: neutralizes charges, resolves 5-valence Nitrogen, removes hydrogens
  • Dearomatizing: converts molecules/reactions to Kekule form
  • Standardizing with the following options:
    • Keeping only the largest fragment in the molecule.
    • Removing fragments that consist of only a single heavy atom.
    • Setting all atoms and bonds to NoStereo.
    • Removing all relative stereo groupings.
    • Setting all atoms and bonds marked UnknownStereo to NoStereo.
    • Setting all atoms marked UnknownStereo to NoStereo.
    • Setting all bonds marked UnknownStereo to NoStereo.
    • Clearing any atom valence query features and resets all implicit hydrogen counts to their standard values.
    • Setting the charges on a molecule to a standard form.
  • Applying the following 3 reactions, added in-house :


  • Aromatizing: Converts molecules/reactions back to aromatic form
  • Computing the canonical SMILES (also known as absolute SMILES) string .