Skip to Main Content

BioMart

How to use BioMart, Ensembl's data retrieval site

Selecting attribute dataset

After you've selected the filter(s) for your data query, it's time to select the attributes. Attributes are the features that will be retrieved in your dataset. You can select as many or as few as you'd like.
BioMart includes four attributes by default (but you can de-select them):
  • Gene stable ID
  • Gene stable ID version
  • Transcript stable ID
  • Transcript stable ID version

Select your attribute database from the top panel in the main window:

  • Features will retrieve identifiers from other databases, genomic coordinates, and other feature information
  • Structures retrieves structures or structural features from multiple sources
  • Homologues retrieves orthologous sequences or features in up to 6 other species
  • Variant (Germline) retrieves germline variants from many variant sources
    • Note: you can also choose variants by adding a second database to retrieve attributes from, and select a variant database. More on that below
  • Variant (Structural) retrieves structural variants from multiple sources
  • Sequences retrieves sequences for all input (filter) elements, allowing selection of transcript, gene, cDNA, UTRs, exons or peptide

For this example, we'll select Features.

ID conversion

The Features dataset is the best set to use if you want to use Biomart as an ID conversion tool. You input your list of gene names, accession numbers, or symbols, and retrieve the corresponding identifiers from other databases.

Selecting attributes

The expandable menus for attributes will change depending on your choice of dataset in the top panel.

The Features dataset has many attributes under each expandable menu, from multiple databases. You can select as many attributes as you wish, and they will be added to your query on the left menu. Attributes are displayed in the order that you select them, so you may wish to select them in an order that makes the most sense to you.

In this example, I am retrieving the chromosome, strand, gene and transcription start and end sites, some phenotype information and other identifiers from various databases.

ID conversion

You can see in the screen shot below that the input list (Filters) was a list of NCBI Gene IDs. The Attributes selected will retrieve the corresponding Ensembl gene IDs, as well as Ensembl protein ID, HGNC gene symbol, and a couple of protein database IDs.

Select another attribute dataset (optional)

You are not required to add a second dataset to your query, but you have that option.

When searching for variants for genes, you can either start with the Variation database for your species of choice, or add variants as a second dataset when starting with the Features dataset, as demonstrated in this example. This is especially useful if you want to retrieve BOTH detailed genomic location and identifier information, as well as specific information on variants.

Filtering the second dataset

When selecting Filters for a second (variant) dataset, you must be very careful to NOT over-filter your retrieved results. For example, selecting a single, specific, limited phenotype may filter your results so much that you retrieve no results. If you are interested in phenotypes associated with variants, it's better to select the field from the Attributes menu, review the results, then decide if you need to apply phenotype filters and re-run the search to generate a smaller dataset. Similarly, filtering by clinical significance can become very muddy, since there are dozens of combinations in the "Clinical significance" filter category (for example, some gene variant filters can include BOTH benign and pathogenic).

Limits in the second (variants) dataset that can be useful are filtering for a minimum SIFT or PolyPhen score, or MAF, or filter for only deleterious/damaging variants, or specific variant consequences that you are interested in (eg. frameshift, missense). Just be aware that detailed annotations may not exist for all variants, so filters may eliminate variants in your gene of interest, if the filtered field does not contain data in the tables.

Selecting attributes for second dataset

Setting attributes for an added variant dataset allows you to retrieve attributes for just the retrieved variants (as opposed to attributes for the gene as a whole), which can be useful when your genes of interest may have multiple variants associated with different outcomes. The Attributes menus are similar to what you select for Features, except they are specific to the variant only.

BioMart second dataset add to search