Validating clustering for gene expression data

Part of this work was done in the context of the Syscol project, where our partner at the Karolinska institute (Prof. Taipale and his team) have characterized the DNA-binding profiles of more than 400 mammalian TFs (7). V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki-Potapov, B., Saxel, H., Kel, A. (2006) TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. It will be tempting to compare the similarities of their matrices with the DBD classification reported here, and with our own approaches to classify DNA-binding profiles (8).:326-332.

Software and online resources used by, or developed as part of the HMP are provided here.

Please be aware that HMP1 funding ended in 2012, and therefore some of these resources may have changed, moved or been discontinued. Core Gene Evaluation Script Screening for core gene sets as an indicator of completeness of draft genomes.

It takes as input a single phylogenetic tree that contains sequences derived from at least two different environmental samples and a file describing which sequences came from which sample.

Unifrac is no longer available as a standalone tool, however has been incorporated into Qiime and mothur.

The data sets comprise 1,329,758 fragments bound by 98 distinct transcription factors, of which 66 factors were not yet covered by Ch IP-Seq data.

