How to use BioMart, Ensembl's data retrieval site

Selecting Filters

After you've selected your dataset, the left side menu will change.

Your next step is to select filters for your search. The filters will be different, depending on which database you chose. This example uses the Ensembl Genes dataset for Human.

In the Ensembl Genes datasets, there are many filter categories to choose from:

  • Region: Genome or chromosome-specific regions. You can upload or paste a list of multiple locations in (Chr:Start:End:Strand) format
  • Gene: This is probably the most common filter to use.
    • If you have a list of identifiers, you can paste/upload these to only retrieve features for these identifiers. The list of identifier types you can use is extensive. If you are interested in only certain transcripts for any gene, it is best to avoid using a list of HGNC Gene Symbols (eg. BRCA1, ACE, TP53), since some gene symbols represent multiple gene products. A gene symbol list will work, but you may retrieve results that are not of interest.
      • That said: using a gene symbol list as filter input is a good way to use BioMart to retrieve other identifiers (eg. Ensembl IDs, NCBI IDs, PDB IDs) as your results. This makes BioMart a useful identifier conversion tool.
    • Alternately, you can choose not to start with input identifiers, but instead filter your results to only include certain gene or transcript types (such as pseudogenes, lncRNAs, etc.)
  • Phenotype: There are hundreds of phenotypes to select for filtering your search, from the sources DDG2P, MIM morbid and Orphanet.
    • Be careful in selecting phenotype filters in combination with too many other filters, since some phenotype annotations may not include large sets of genes in their phenotype definitions. The phenotype filter is useful for retrieving all genes known to match phenotype descriptions from the available sources.
  • Gene Ontology: Filter by an input list of GO Accession IDs, GO Term Name, or GO Evidence Code
  • Multi species comparisons: Use this filter to retrieve all orthologous genes (matching your filters in the Gene or other filter category) in another species.
  • Protein domains and families: Filter to limit to genes with IDs in a variety of protein databases (eg. Interpro, Pfam, PANTHER, SMART, etc.) or filter by an input list of family or domain IDs from protein database sources.
  • Variant: Filter by genes with variant evidence in specific databases: ClinVar, dbSNP, HGMD (for germline variants) or COSMIC (for somatic variants); by variant supporting evidence; or by specific variant consequences (eg. NMD, splice donor/acceptor, stop gained/lost, etc.) You can select more than one variant consequence category to filter by.

As you select your filters, they will be added to the left side menu, so you can see them all as you build your query.

For this example, I uploaded a list of NCBI Gene IDs associated with frontotemporal dementia that have transcript variants, taken from an NCBI gene search.