Human MSigDB Collections

The 33196 gene sets in the Human Molecular Signatures Database (MSigDB) are divided into 9 major collections, and several sub-collections. See the table below for a brief description of each, and the Human MSigDB Collections: Details and Acknowledgments page for more detailed descriptions. See also the MSigDB Release Notes.

Click on the "browse gene sets" links in the table below to view the gene sets in a collection. Or download the gene sets in a collection by clicking on the links below the "Download Files" headings. For a description of the GMT file format see the Data Formats in the Documentation section. The gene sets can be downloaded as NCBI (Entrez) Gene Identifiers or HUGO (HGNC) Gene Symbols. There are also JSON bundles containing the HUGO (HGNC) Gene Symbols along with some useful metadata. An XML file containing all the Human MSigDB gene sets is available as well.

H: hallmark gene sets
(browse 50 gene sets)
Hallmark gene sets summarize and represent specific well-defined biological states or processes and display coherent expression. These gene sets were generated by a computational methodology based on identifying overlaps between gene sets in other MSigDB collections and retaining genes that display coordinate expression. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
C1: positional gene sets
(browse 299 gene sets)
Gene sets corresponding to human chromosome cytogenetic bands. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
C2: curated gene sets
(browse 6449 gene sets)
Gene sets in this collection are curated from various sources, including online pathway databases and the biomedical literature. Many sets are also contributed by individual domain experts. The gene set page for each gene set lists its source. The C2 collection is divided into the following two sub-collections: Chemical and genetic perturbations (CGP) and Canonical pathways (CP). details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
CGP: chemical and genetic perturbations
(browse 3399 gene sets)
Gene sets represent expression signatures of genetic and chemical perturbations. A number of these gene sets come in pairs: xxx_UP (and xxx_DN) gene set representing genes induced (and repressed) by the perturbation. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
CP: Canonical pathways
(browse 3050 gene sets)
Gene sets from pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
BioCarta subset of CP
(browse 292 gene sets)
Canonical Pathways gene sets derived from the BioCarta pathway database. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
KEGG subset of CP
(browse 186 gene sets)
Canonical Pathways gene sets derived from the KEGG pathway database. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
PID subset of CP
(browse 196 gene sets)
Canonical Pathways gene sets derived from the PID pathway database. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
Reactome subset of CP
(browse 1635 gene sets)
Canonical Pathways gene sets derived from the Reactome pathway database. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
WikiPathways subset of CP
(browse 712 gene sets)
Canonical Pathways gene sets derived from the WikiPathways pathway database. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
C3: regulatory target gene sets
(browse 3725 gene sets)
Gene sets representing potential targets of regulation by transcription factors or microRNAs. The sets consist of genes grouped by elements they share in their non-protein coding regions. The elements represent known or likely cis-regulatory elements in promoters and 3'-UTRs. The C3 collection is divided into two sub-collections: microRNA targets (MIR) and transcription factor targets (TFT). details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
MIR: microRNA targets
(browse 2598 gene sets)
All miRNA target prediction gene sets. Combined superset of both miRDB prediction methods and legacy sets. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
miRDB subset of MIR
(browse 2377 gene sets)
Gene sets containing high-confidence gene-level predictions of human miRNA targets as catalogued by miRDB v6.0 algorithm (Chen and Wang, 2020). details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
MIR_Legacy subset of MIR
(browse 221 gene sets)
Older gene sets that contain genes sharing putative target sites (seed matches) of human mature miRNA in their 3'-UTRs. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
TFT: transcription factor targets
(browse 1127 gene sets)
All transcription factor target prediction gene sets. Combined superset of both GTRD prediction methods and legacy sets. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
GTRD subset of TFT
(browse 517 gene sets)
Genes that share GTRD (Kolmykov et al. 2021) predicted transcription factor binding sites in the region -1000,+100 bp around the TSS for the indicated transcription factor. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
TFT_Legacy subset of TFT
(browse 610 gene sets)
Older gene sets that share upstream cis-regulatory motifs which can function as potential transcription factor binding sites. Based on work by Xie et al. 2005 details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
C4: computational gene sets
(browse 858 gene sets)
Computational gene sets defined by mining large collections of cancer-oriented microarray data. The C4 collection is divided into two sub-collections: CGN and CM. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
CGN: cancer gene neighborhoods
(browse 427 gene sets)
Gene sets defined by expression neighborhoods centered on 380 cancer-associated genes. This collection is described in Subramanian, Tamayo et al. 2005 Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
CM: cancer modules
(browse 431 gene sets)
Gene sets defined by Segal et al. 2004. Briefly, the authors compiled gene sets ('modules') from a variety of resources such as KEGG, GO, and others. By mining a large compendium of cancer-related microarray data, they identified 456 such modules as significantly changed in a variety of cancer conditions. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
C5: ontology gene sets
(browse 15703 gene sets)
Gene sets that contain genes annotated by the same ontology term. The C5 collection is divided into two sub-collections, the first derived from the Gene Ontology resource (GO) which contains BP, CC, and MF components and a second derived from the Human Phenotype Ontology (HPO). details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
GO: Gene Ontology gene sets
(browse 10561 gene sets)
All gene sets derived from Gene Ontology. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
BP: subset of GO
(browse 7763 gene sets)
Gene sets derived from the GO Biological Process ontology. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
CC: subset of GO
(browse 1035 gene sets)
Gene sets derived from the GO Cellular Component ontology. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
MF: subset of GO
(browse 1763 gene sets)
Gene sets derived from the GO Molecular Function ontology. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
HPO: Human Phenotype Ontology
(browse 5142 gene sets)
Gene sets derived from the Human Phenotype ontology. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
C6: oncogenic signature gene sets
(browse 189 gene sets)
Gene sets that represent signatures of cellular pathways which are often dis-regulated in cancer. The majority of signatures were generated directly from microarray data from NCBI GEO or from internal unpublished profiling experiments involving perturbation of known cancer genes. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
C7: immunologic signature gene sets
(browse 5219 gene sets)
Gene sets that represent cell states and perturbations within the immune system. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
ImmuneSigDB subset of C7
(browse 4872 gene sets)
Gene sets representing chemical and genetic perturbations of the immune system generated by manual curation of published studies in human and mouse immunology. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
VAX: vaccine reponse gene sets
(browse 347 gene sets)
Gene sets curated by the Human Immunology Project Consortium (HIPC) describing human transcriptomic immune responses to vaccinations. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle
C8: cell type signature gene sets
(browse 704 gene sets)
Gene sets that contain curated cluster markers for cell types identified in single-cell sequencing studies of human tissue. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle



All gene sets
(browse all gene sets)
Bundles containing all Human gene sets in MSigDB.

NOTE: we strongly discourage running analyses against the full Human MSigDB GMTs. We recommend using the above GMTs instead for more focused results.
Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

JSON bundle

XML bundle with full metadata