SigFinder
SigFinder is a tool for the automatic mining of statistically significant substructures from a graph database. Substructures are considered "significant" if they are unusually rare or abundant as compared to a given background database. SigFinder employs a novel graph mining algorithms called GraphSig to detect and rank these structures efficiently. It provides the user with an explanation of the significance (e.g., "rare in background but x occurences in the test set") and details where the substructures occured. By choosing different background datasets or similarity measures, users can easily customize SigFinder based on their needs.
SigFinder/Chem Discovering What Really Matters
SigFinder/Chem is a specialized version of SigFinder tailored to cheminformatics. SigFinder/Chem can provide powerful SAR insights by applying it to an activity of interest. For example, mining a dataset of blood-brain barrier permeable compounds will result in chemical compounds whose absence or presence may contribute to this activity. Further investigations can then be undertaken, for example by finding similar other compounds that may have the same chemical property via our SimFinder tool.
SigFinder is extremely easy to use and can be applied to small and large compound repositories. The generated output will rank the results from most over-represented fragments discovered to most under-represented ones, allowing for meaningful browsing.
An example input and output is shown in the following figure:

The substructures identified on the right will be tagged as over- or under-represented and given a score indicating how "unusual" they are relative to the background dataset.
It is important to note that SigFinder does not only return significant substructures but also information about why a substructure is deemed significant and where it is found. This is extremely valuable information to the medicinal chemist. Such mechanistic interpretations are currently lacking in competing tools. More examples demonstrating the quality of SigFinder results can be found here: → SigFinder Quality
SigFinder can be used in conjunction with our SimFinder tool. This is shown in the flow below.

Starting from a database of known actives, SigFinder can be employed to discover fragments that may be related to the activity (in addition or as a replacement of traditional visual identification). The resulting significant substructures can then be used as queries for SimFinder to find new candidate actives.
SigFinder/Chem can be easily integrated into existing workflows as it supports both MOL2 and SDF data formats.