Documentation

Contents

Quick start

Simply enter the DNA variant you would like to analyse into the variant field, select one or multiple transcription factors and click on Analyse. If you do not know the location but have a wild-type and a variant sequence, you can still enter them by clicking on Enter sequences directly.

Interfaces

Search interface

Results page

Detailed results page

Click on an image to enlarge it.

Variant input

FABIAN-variant supports five different input modes for variants. In each mode the supported formats can be displayed by clicking on the link "Format info" below the input field.

Transcription factors

FABIAN supports more than 5,000 different binding models for 1387 human transcription factors. The models were pooled from various publicly accessible data sources:

Many of these data sources were obtained from MotifDb, which is an annotated collection of PWM models. 1224 transcription factor flexible models (TFFMs) from JASPAR are included. For each transcription factor, FABIAN-variant combines the results of different models for a final prediction of the resulting binding affinity change.

The underlying data is available for download. It contains:

TFFM definitions were converted from XML to a flat file format to improve processing in FABIAN-variant.

Known transcription factor binding sites

On the results page, FABIAN-variant highlights known binding sites for transcription factors by a black rectangle around the score. Genome locations of known binding sites were pooled from these sources:

Please note that this function is only available if you entered genomic positions. As the TFBS sites provided by ENCODE and Ensembl are several hundred bases long, there is not necessarily really a binding site for your TF at your exact position.

Evaluation of models

TFFMs and PWMs are evaluated in the window [-15,15] around the variant in both strands and in both the reference sequence (WT) and the mutated sequence (MT). The highest score in the mutated sequence is compared with the highest score in the reference sequence. A greater WT score indicates a weakened binding affinity, and a greater MT score indicates an increased binding affinity due to the variant. For each model, FABIAN-variant generates a joint score S between -1 (likely TFBS loss) and +1 (likely TFBS gain),

with pseudocount α = 0.1 to avoid zero in the denominator. This link illustrates the function in an interactive plot for different values 0 ≤ WT ≤ 1 and 0 ≤ MT ≤ 1.

To obtain the combined prediction from multiple models, FABIAN-variant calculates the average of joint scores S of the individual models. If both TFFMs and PWMs are available, by default only the results from TFFMs are used for the combined prediction (this setting can be changed by unchecking "Options > Prefer TFFMs" on the results page).

The WT score, MT score, the joint score S per model and the combined score are shown on the results page. For example:

Evaluation of TFFMs

A C++ implementation of the forward-backward algorithm evaluates TFFMs. See this article to learn more about TFFMs:

Mathelier A, Wasserman WW. The next generation of transcription factor binding site prediction. PLoS computational biology. 2013 Sep 5;9(9):e1003214. https://doi.org/10.1371/journal.pcbi.1003214

There a two types of TFFMs: Detailed models and first-order models. Detailed models are always listed as jaspar2022DetailedTFFMs and first-order models as jaspar2022FirstOrderTFFMs in the database field in the results table. The model ID field starts with TFFM (e.g., TFFM0040.1).

Evaluation of PWMs

Position count matrices (PCMs) were converted to position weight matrices (PWMs) using the method described in:

Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. Journal of molecular biology. 1990 Apr 20;212(4):563-78. https://doi.org/10.1016/0022-2836(90)90223-9

A custom C++ implementation computes the scores.

In the results table, the database field for PWMs is one of the following: jaspar2022, cisbp_1.02, HOCOMOCOv11-core-B, HOCOMOCOv11-core-C, HOCOMOCOv11-secondary-D, HOCOMOCOv11-core-A, HOCOMOCOv11-secondary-A, HOCOMOCOv11-secondary-B, HOCOMOCOv11-secondary-C, hPDI, jolma2013, SwissRegulon, UniPROBE.

Results page

The results table summarizes predictions from different models per variant and transcription factor on coloured scales for a possible loss (red) or gain (blue) of a TFBS. Deeper shades of the colour represent a higher loss or gain. Known TFBSs are displayed with a border around the cell.

Moving the mouse pointer over a coloured cell reveals the individual model scores. Clicking on the table cell shows the detailed results page.

Variants have the format chr1:713950CTG>C.1 or GGCCCTCAC>TCACACTCTCCAACCT*.1. In both cases, .1 is simply the line number of the variant in the input. * indicates that some bases of a long sequence are not displayed. Clicking on a variants opens the corresponding location in the UCSC Genome Browser. Clicking on a transcription factor opens Ensembl.

The results table can be filtered and sorted in the browser using the checkboxes and radio buttons in the header of the page:

Results retention

Results are kept available on the server for three days after the analysis is complete. After this time, they are automatically deleted. You can also manually delete your results by checking the "Options > Show log" checkbox on the top menu on the results page and clicking on the "delete" link. Deleting results also removes all information about your search parameters and uploaded variants from our servers. Deleted results cannot be restored.

Download format

The full download of all results has the following columns:

variant tf model_id database model_db wt_score mt_score start_wt end_wt start_mt end_mt strand_wt strand_mt prediction score

The summary download is similar to the results table and includes any filters and sorting options at the top of the results page. Scores for a known TFBSs are marked with *.

Programmatic access

On Unix-based systems, you can use cURL to post variants to and receive results from FABIAN-variant. The general pattern is printed below.

printf "($(date +%T)) Submitting " && \
FABIANID=$( curl -sLD - -o /dev/null \
-F "mode=vcf" \
-F "filename=@TinyExample.vcf" \
-F "genome=hg19" \
-F "tfs_filter=all" \
-F "models_filter=tffm_d" \
-F "models_filter=tffm_fo" \
-F "models_filter=pwm" \
-F "dbs_filter=jaspar2022" \
-F "dbs_filter=cisbp_1.02" \
-F "dbs_filter=HOCOMOCOv11" \
-F "dbs_filter=hPDI" \
-F "dbs_filter=jolma2013" \
-F "dbs_filter=SwissRegulon" \
-F "dbs_filter=UniPROBE" \
https://www.genecascade.org/fabian/analyse.cgi \
| grep -m 1 "Location: " | grep -o "\([0-9]\+_[0-9]\+\)" ) && \
i=1; until curl -sfo fabian.data_${FABIANID}.zip \
https://www.genecascade.org/temp/QE/FABIAN/${FABIANID}/fabian.data.zip; \
do printf "\r($(date +%T)) Waiting for $FABIANID"; \
[ $i == 30 ] && sleep $i || sleep $((i++)); done && \
printf "\r($(date +%T)) Saved file fabian.data_${FABIANID}.zip\n"

Some parameters are specific depending on the mode and which transcription factors you are looking for. A few examples are listed below.

If the request is correct, cURL polls our server until results are available, which are then saved under a project-specific name (e.g., fabian.data_1650751034_19489.zip). Please note that your request may wait indefinitely in case of an error. You can always check the status at the project-specific URL (e.g., https://www.genecascade.org/fabian/1650751034_19489)

Please do not run more than three automated requests at the same time! If you require more processing slots, please send us a short email with details of your request.

Team

FABIAN has been developed at Berlin Institute of Health (BIH) by

FABIAN is an update of the ePOSSUM software.

Contact

If you have suggestions about this software, please do not hesitate to email robin.steinhaus (at) bih-charite.de. If you discover a bug, please submit a ticket via email using this link.

Imprint / privacy