SigFinder
SigFinder is a tool that exploits statistical significance of substructures from a given compound dataset. It is based on the GraphSig Technology, and can be used for two purposes:
- Identification of significant substructures
- Classification of the dataset into categories on the basis of BBB permeability/Toxicity/ADME properties.

Features and Capabilities
- Given a compound dataset, this tool mines statistically significant substructures---substructures that are representative of a given dataset because they are structurally overrepresented or underrepresented. This can help in the design of new drugs with similar capabilities.

- It is capable of mining substructures that would not surface by doing a simple Frequent Substructure search on the dataset. Some of the substructures have less than 0.1% support in our test datasets.
- It can be used as an in silico tool to predict BBB permeability/ Toxicity---properties of molecules that not only depend on pharmacokinetic properties, but the arrangement and interaction of different topological fragments of a molecule.
Method
The workflow of using SigFinder for identifying significant substructures is as follows:

The workflow of using SigFinder for classifying chemical compounds is as follows:

Validation
- We discovered the following substructures from the ACE inhibitors in the MDDR database.

- We applied SigFinder to discover significant substructures from a dataset of compounds classified as blood-brain barrier permeable and non-permeable. A couple of examples are shown below.

- We applied SigFinder to classify the Anti-cancer screen dataset into actives and inactives. The BEDROC scores comparing our performance to Daylight are shown below.

More details on this technology can be found here.
References
- Huahai He; Ambuj K. Singh; GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space. Proceedings of the 6th IEEE International Conference on Data Mining (ICDM), December, 2006, doi:10.1109/ICDM.2006.79
- Sayan Ranu; Ambuj Singh; Mining Statistically Significant Molecular Substructures for Efficient Molecular Classification., J. Chem. Inf. Model., 2009, 49(11), pp 2537–2550 DOI : 10.1021/ci900035z

