In this tutorial, FABIAN-variant is used to analyze variants from a whole genome sequencing (WGS) run using transcription factor flexible models (TFFMs). You will learn how to use different features of FABIAN-variant and how to interpret the results.
The sample results from this tutorial can be viewed and downloaded at this link without repeating the
Download the zipped file VCF_Sample_File_WGS_GRCh37.zip. It contains a public WGS VCF file with 4 million variants and a disease mutation in the promoter of the PKLR gene spiked in (PMID: 11054094). It is good practice to zip or gzip large VCF files before upload to FABIAN-variant.
Open the FABIAN-variant homepage at https://www.genecascade.org/fabian and click on 'VCF file' in the menu on the left side.
In the 'VCF file' input field, upload the VCF_Sample_File_WGS_GRCh37.zip file. Next, click on 'Show options'.
Click on 'Variants near selected genes (select to enter)'. A box should appear that can be used to filter the VCF file for regions near genes. Enter PKLR:
The minimum coverage filter removes low covered positions that usually do not offer reliable data. Variants can also be ignored for analysis due to presence in gnomAD, ExAC or 1000 Genomes Project. In this tutorial, these filters can be left at their default values.
FABIAN-variant currently supports 1387 different human transcription factors (TFs). By clicking on 'Select individually...', the TFs to include in the analysis can be selected manually. The search field ('Search by name') and the alphabet letters ('Select by first letter') can help with the selection.
'Known TFBSs' searches for transcription factors that have been shown to bind at the location of the variants based on ENCODE, Ensembl or FANTOM5 data. In this tutorial, we use this option.
5014 different models are implemented in FABIAN. These include TFFMs and position weight matrices (PWMs) pooled from multiple publicly accessible data sources. TFFMs tend to be a better representation of transcription factor binding sequences compared with PWMs.
Unselect 'PWM (3790)', which leaves 612 TFFM detailed models and 612 TFFM first-order models to be included in the analysis.
You may have noticed that after unselecting PWMs '(0)' has appeared behind all source names except 'jaspar2022'. That is because TFFMs are currently only included in the JASPAR database. You can leave all sources checked nevertheless and click on 'Analyse'.
During execution of the analysis pipeline, FABIAN-variant displays different kinds of information in the 'computation log'. Filter options are disabled at this time.
Once computation is done, the log will disappear but it can be brought back by checking the 'Options > Show log' checkbox on the top menu.
The search revealed 13 variants near the PKLR gene (1:155259084-155271725) and 33 transcription factors with a known TFBS in this region.
Checkboxes and radio buttons in the top menu may be used to refine and sort the table of results. To hide TFs that are not predicted to be affected by the variants, you can uncheck 'Neutral' in the 'Show effects' options.
Changes are reflected immediately in the results table:
To further refine the view and focus on a single variant, click on 'Filter variants > Manual selection ...'. The message explains how to use the selection filtering. Click on "OK".
Now click on variant 'chr1:155271258T>C.12' so that all other variants have a grey background (selected state).
Press the BACKSPACE key to hide all selected variants. Now only the single variant 'chr1:155271258T>C.12' is displayed in the table. Filter and sorting only apply to visible TFs and variants.
In the 'Sort/filter TFs' options, uncheck 'With known BS' and then click on 'Sort by loss'.
If only a few variants are selected, FABIAN-variant automatically breakes the table into multipe columns (above: 6 columns). Users who prefer to display results in a single column can simply uncheck "Options > Columns view".
Hovering the mouse pointer over any of the table cells will display the information about scores from different models and the locations of identified motives relative to the variant.
FABIAN-variant predicts that the variant chr1:155271258T>C leads to a loss of a TFBS for GATA1. The prediction is based on the four TFFM models, all of which yield a much higher binding score in the reference sequence (wild type, WT) compared to the variant sequence (mutant, MT). See the documentation for scoring and the evaluation of models.
Clicking on the cell will open the details in a blank window (e.g., for printing).
FABIAN-variant offers different ways to download results. Here, we will download the complete results file with all scores and analysis performed. On the results page click on 'Download full results'. The file from this tutorial has a size of 1,700 KB and contains 15,913 lines. The format is described in the documentation.
To continue exploration of FABIAN-variant, you can look at the example of disease mutation CHRND_L63P:
If you have suggestions about this software, please do not hesitate to email robin.steinhaus (at) bih-charite.de. If you discover a bug, please submit a ticket via email using this link.