Just-DNA-Seq Documentation

Just-DNA-Seq is a project to facilitate working with human genomes at all levels - from clinical cases to personal curiosity, education and longevity. We envision the future when genomics is becoming more available, understandable and useful for everybody, especially those interested in life extension and improving human condition.

For this purpose we use OakVar framework to integrate annotators which are tools or databases accumulating what we actually know about the genome - genes, their influence on health or drug response, polygenic risk scores and so on.

We created the Oakvar-Longevity module for OakVar. It annotates user genome with longevity associated gene variants using the longevitymap annotator, and provides the longevity PRS, variants, drugs and major risks (oncorisk) reports with the longevity-combined reporter.

Check out the Getting Started section for further information, including the assemblies section to understand whether you have the appropriate genome assembly.

NOTE: This project is under active development.

Contents

Getting Started

NOTE: Both OpenCravat and OakVar can be used to annotate a human genome. At the beginning of the project we used OpenCravat as a framework. However, as OakVar is based on OpenCravat and contains more advanced features customized specially for personal longevity genomics, we decided to base further development of the project on OakVar.

Installing OakVar

Since our module is based on OakVar, you have to install OakVar first to run our module, if it is not already installed. OakVar docs: https://oakvar.readthedocs.io/en/latest/

Pre-requirements for Oakvar:
  • installed conda/mamba environment management systems, or you can use their lighter versions: miniconda/micromamba

  • installed python and pip

You can find documentation for mamba here: https://mamba.readthedocs.io/en/latest/index.html

And for conda here: https://docs.conda.io/en/latest/

The installation of OakVar and the further work should proceed after activation of an environment created by Conda/Mamba or Miniconda/Micromamba.

Installing Annotators

For the Longevity module to work, you need to install the following annotators:

  1. clinvar

  2. dbsnp

  3. gnomad

  4. ncbigene

  5. omim

  6. pubmed

  7. longevitymap

You can install them by using terminal or Oakvar GUI.

Installation using terminal:

Use the following command:

ov module install module_name

Installation using GUI:

To activate Oakvar GUI, use the following command:

ov gui

After the execution GUI will be opened in your browser.

Go to “Store”:

gui-installation store

Find annotators and install them:

gui-installation annotators

Installing the Reporter

Installation using terminal:

Use the following command in terminal:

ov module install longevity-combinedreporter

Installation using GUI:

To activate Oakvar GUI, use the following command:

ov gui

Go to “Store”:

gui-installation store

Find the reporter called “longevity-combinedreporter” and install it:

gui-installation reporter

Loading Genome Files

  1. Open OakVar in your browser. You will see the index page:

Index page
  1. In the Variants section you should choose the right assembly version of the Genome: hg38/GRCh38, hg19/GRCh37, or hg18/GRCh36.

For example we’ll take a small VCF file of the hg19/GRCh37 version named example.vcf.

  1. Click Add input files. A file upload dialog will open, allowing to browse and select the vcf file (or multiple files at once).

After loading, the file(s) will be shown next to the Add input files button along with another button Clear file(s) and a small X button next to each file name. If you click that X, the appropriate file will be deleted. If you click Clear file(s), all the files you loaded will be deleted.

vcf files loaded

Installing and Selecting Necessary Annotators

Scroll the left area down to the Annotations section.

Here you can see the categories of annotators available for selection (above) and checkboxes for particular annotators.

Annotators are software modules which can be developed, added and installed as needed. If any necessary annotator is not yet installed, you can install it on the STORE tab in the upper left corner.

All annotators can be divided into 2 groups:

  1. Tools that predict pathogenicity (bold)

  2. Tools that provide information like databases

Here are their internal (coded) module names:

  • cadd_exome (1.6.1) - CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome

  • gnomad_gene (2.2.1) - gene level population statistics from gnomAD

  • pubmed (1.1.5) - articles related to a particular gene

  • clingen (1.0.1) - NIH-funded resource that defines the clinical relevance of genes and variants

  • clinpred (1.0.0) - prediction tool to identify disease-relevant nonsynonymous single nucleotide variants

  • clinvar (2021.10.01) - ClinVar is an archive of reports of the relationships among human variations and phenotypes, as well as interpretations of clinically relevant variants (Uncertain significance, Likely pathogenic, Pathogenic etc.)

  • mitomap (1.1.0) - a human mitochondrial genome database

  • ncbigene (2019.08.02) - gene descriptions from NCBI (National Center for Biotechnology Information) Gene database

  • omim (1.0.0) - catalog of human genes and genetic disorders and traits

  • prec (3.6.0) - provides a database identifying rare and likely deleterious loss-of-function (LoF) alleles

  • provean (1.0.0) - tool which predicts whether an amino acid substitution or indel has an impact on the biological function of a protein

  • revel (2020.12.02) - ensemble method for predicting the pathogenicity of missense variants based on a combination of scores from 13 individual tools

  • sift (1.2.0) - predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids

  • GnomADD - aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects

  • PharmGKB - an NIH-funded resource that provides information about how human genetic variation affects response to medications

  • dbSNP - the Single Nucleotide Polymorphism Database is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI)

Once an annotator is installed, you can select in on the JOBS tab in the upper left corner.

For example, let’s select the ClinVar annotator from the Clinical Relevance category:

Selecting annotators

Note: An annotator may belong to multiple categories at once.

The checkbox and X buttons between the categories and the annotators sections allow to select all of the displayed annotator chechboxes or to clear all of them.

If you right-click any annotator, a pop-up window with its description will open in the right area:

Annotator description

For our purposes we will need the following annotators: ClinVar (clinvar), dbSNP (dbsnp), gnomAD3 (gnomad), LongevityMap (longevitymap), NCBI Gene (ncbigene), OMIM (omim), and PubMed (pubmed). If any of them are missing, install them on the STORE tab, then go back to JOBS, in the Annotations section select All categories, and then select each of the annotator checkboxes.

Annotating Your Genome

When you select all the annotators you need, click the large ANNOTATE button below in the left area.

Annotating a large genome file may take some time. While loading, it will appear in the right area on the top of the list, displaying different stages of the processing in the Status column, and when finished, the Open Results Viewer button will appear in that column of the particular genome row:

Genome annotated

Opening Your Annotated Genome

Now click the Open Results Viewer button, and the annotated genome will open in a new browser tab/window.

Filtering Variants

Filters in OakVar allow to select those variants which are relevant. As the number of variants in a genome usually is very large, you need to filter them first. OakVar cannot load more than 100,000 variants at once.

The Filter page

Select the Filter page in the Result Viewer. There are sections where you can filter the variants:

  • Variant Properties, with Smart Filters and Query Builder tabs

  • Genes, where you can type in any particular gene names

  • Samples, which is used mainly for oncological purposes and is not used in Just-DNA-Seq.

Filter page

Using Smart Filters

Here are various useful filters:

Population AF «==* allows to set the maximum allele frequency in population.

Sequence Ontology allows to choose one or more sequence ontologies.

Chromosome allows to choose one or more chromosomes or their specific versions. E.g. chr1, chr10 and so on.

Coding allows to include only coding or noncoding variants.

ClinGen allows to include only variants with data from ClinGen.

ClinVar allows to include only variants with data from ClinVar.

dbSNP Common ID allows to include only variants with dbSNP common IDs.

PROVEAN Rank Score >= includes variants with PROVEAN rank score not less than 0.9.

Revel Rank Score >= includes variants with Revel rank score not less than 0.9.

SIFT Prediction can be set as Damaging or Tolerated.

Using Query Builder

Here you can create a set of filter rules.

By default, an opening (left) parenthesis appears with + and ( buttons in the lower left corner, and a greyed out NOT switch appears if you hover the mouse in the upper left corner, which allows to make the following rule negative by clicking on it. Clicking NOT once again deactivates it.

Click + to add a rule. A line of boxes will appear:

Adding a rule in Query Builder

The first drop-down box is the source to which the rule will apply. For example: Variant Annotation, ClinVar, PharmGKB etc. The second drop-down box allow to select an item in the source to apply the rule. E.g. UID, Chrom, Position, Gene etc. The following “not” switch, greyed out (inactive) by default, allows to select if the following condition should apply or should not apply. For the latter, click the “not” word, and it will become black (active). To remove “not” from the condition, just click it again, and it will be greyed out. The next drop-down allows to select the condition from one of the following:

has data - if the item being searched contains any data

equals - opens one more box where you can enter what the item should be equal to

is empty - if the item being searched is empty

in range - opens two boxes where you enter the boundaries of the range where the item should be

<= - if the item is less or equal to the value in the following box

>= - if the item is greater or equal to the value in the following box

At the end of the line, a small “x” allows to delete the whole rule by clicking on it.

If you click + once again, another rule is added, and between them the and operator is displayed by default, meaning that to satisfy the filter, both rules should apply. You can change it to or by clicking on it, so that to satisfy the filter, one of rules being true may be enough. Clicking or once again turns it to and again.

You can add as many rules as you wish, and the operators and / or between them will follow the general priority logic of boolean operations, i.e. and has the priority over or, as in any program code.

To change the priority and build more complex logical rules, you can click ( making a separate set of rules (in parentheses), which have higher priority, as in mathematical operations. Note than the and / or operator which appears before the parentheses depends of the previous operator selected, i.e. if it was or, the next one will also be or, and vice versa. You can always change the operators by clicking on them.

Within the parentheses, you can create any number of rules, and there are separate + and ( buttons to add new rules and nested parentheses inside the parentheses. Also in the upper left corner a separate NOT switch appears if you hover the mouse over it.

You can also move any rule to another rule. To do this, drag an anchor || which appears from the left side of the rule if you hover the mouse there, and drop it on any rounded + anchors which appear between rules and/or parentheses (not on the + button that adds rules).

BUG NOTE: If dropping a rule just before or after itself, it redirects browser to an error page. In this case all the previously made filter settings may not be saved. Please avoid dropping a rule before or after itself until this bug is fixed in Oakvar.

Filtering by Genes

Switch to the Genes section and enter any particular gene names, one per line. Also you can load them from a file by clicking Browse…

Filtering by genes

Clearing Filters

Under any section you can click the Clear button to remove any filter settings from that section.

Saving and Importing Filters

You can save the filter (the whole set of rules) in OakVar for further loading, as well as exporting to a file, or import it from a file.

To save the filter, click the middle button (“inward arrow”) in the lower right corner of the page, and enter the filter name.

NOTE: Filters are saved internally in OakVar, i.e. on the server if using a remote installation. To have a filter saved into a local file, export it after saving.

The saved filter appears in the left part of the page in the Saved Filters list:

Saved filters

To load a saved filter, just click its name. To export a saved filter into a file, click the icon with a down arrow next to its name. To delete a saved filter, click the X icon in its line.

To import a filter from a file, click the “up arrow” (rightmost) button in the lower right corner of the page, and browse for a file to import (e.g. pathogenic.json). Clicking Open in the browse window loads the filter. NOTE: the filter is not saved automatically, you need to save it using the “Save filter” (inward arrow) button if you want to keep it on the server for further working.

Loading Filtered Variants

When building a filter, you can click the refresh button next to the number of variants (e.g. 68/12,015,254 variants) in the lower left corner of the page to check how many results the filter provides. If the number is small enough, when the filter is ready, click Load in the lower right corner of the page. After loading the filter, the number of variants in the lower left corner (the first number before the slash, while the second one is the total number of variants and doesn’t change) may be updated.

When the filtered variants are loaded, you can proceed to the Variant tab to analyse them (see the next section).

Working With Annotated Variants

After applying the necessary filters, select the Variants page.

By default a combined view is displayed, with both table and widgets:

Variants default view

By clicking the icons in the upper right corner, you can toggle on/off the table view (window-like icon) and the widgets view (piechart-like icon). For our purposes first of all we need the table view:

Variants table view

The table contains columns and column sets with general information about the filtered variants, as well as those connected to certain annotators. Some logically grouped column sets (by a particular annotator) can be extended or collapsed by clicking the +/- sign in the upper right corner of the column set (the topmost row). If you filtered by particular annotators, especially using “has data” condition, for other annotators it may show nothing for that particular variants, and they can be collapsed for convenience.

Each row of the table represents a variant that you can research.

The most important column groups for us are listed below, along with columns:

Variant Annotation

UID - the variant number in this (filtered) sequence

Chrom - chromosome where the variant is located. Chromosome names are ‘chr1’ to ‘chr22’, ‘chrX’, ‘chrY’ and ‘chrM’.

Position - chromosomal position of the variant. The first position in each chromosome is position 1.

Ref Base - reference allele at this chromosomal position (one of A, C, G, T, and N).

Alt Base - alternative allele; called based on reads mapping to this chromosomal position.

Note - note for the variant, if available.

Coding - whether this gene variant is coding.

Gene - the gene this variant belongs to.

Transcript - GENCODE transcript.

RefSeq - the reference sequence.

Sequence Ontology - could be: missense variant, start lost, stop gained, or stop lost.

cDNA change - change of coding DNA.

Protein Change - change of protein being synthesized.

All Mappings - expression showing all the mappings.

Sample Count - the number of samples which contain the variant.

Samples - samples which contain the variant.

Tags - variant tags from the input file.

ClinVar

Clinical Significance - the level of clinical significance of the variant.

Disease Ref Nums - disease reference numbers.

Disease Names - names of diseases associated with the variant.

Review Status - the level of review supporting clinical significance.

ClinVar ID - ID in the ClinVar database.

Significance Detail - additional detail on clinical significance used when it is conflicting.

dbSNP

rsID - the database identifier (“rs” number) of this variant in dbSNP.

This column is empty if the observed variant is not described in dbSNP. Such variants can be extremely rare variants or technical artifacts.

LongevityMap Annotator

LongevityMap ID - ID(s) of the variant in LongevityMap.

Significance - could be: significant, non-significant, or conflicted.

Source Population - population the data were obtained on, e. g. Danish or American (Caucasian).

dbSNP id - ID of the variant in dbSNP.

Associated Genes - genes associated with the variant.

PubMed ID - ID of the variant in PubMed.

Info - additional information.

Description - detailed description of the research.

allele - allele associated with the variant.

state - state of the variant.

zygosity - zygosity of the variant vs OV.

weight - weight of the variant.

priority - priority of the variant.

VCF Info

Phred - Phred quality score.

VCF Filter - if the VCF filter is passed (PASS).

Zygosity - most likely zygosity of the variant in this chromosomal position, computed from the observed variant frequency (column 8). Can be “FP/HET” (<15%), “HET” (15-75%), “HET/HOM” (75-85%), or “HOM” (>85%).

Alternate reads - the number of reads showing the alternative allele.

Total reads - the total number of reads.

Variant AF - the variant allele frequency.

Haplotype block ID - ID of the haplotype block.

Haplotype strand ID - ID of the haplotype strand.

Viewing Reports

Just-DNA-Seq makes a set of reports on a genome which contain data about longevity-related gene variants, known cancer risks and drug responses.

Opening Reports

  1. Go to the Reports page.

  2. In the Longevity Combined section, click Download.

  3. When asked by your web browser what to do with the report file, open it in the browser. The report will be opened on a new tab.

  4. Switch to the report tab in your browser.

Part 1: Longevity Significant Variations

This report contains gene variants which have significant influence on longevity. It has the following columns:

+ - clicking this green button opens detailed information on each entry (row), and the button becomes red with a - sign. Clicking this - closes the details. Clicking + in the header opens the details for all rows and behaves in the same way (clicking - in the header closes all detail sections).

RSID - the reference sequence ID of the variant.

Population - population(s) on which the research was conducted, e.g. Greek, Ashkenazi Jewish etc., or multiple (for more details, open +).

Gene - gene the variant belongs to.

Your Genotype - which variants your genome contains. Note that in case of homozygosity two letters should be the same, and for heterozygosity they differ.

Ref allele - the reference allele.

Alt allele - the alternative allele.

Zygosity - hom (homozygosity) or het (heterozygosity).

Weight - weight of this variant (the degree of significance).

Part 2: Cancer Report

This report shows variants with known cancer risks and contains the following columns:

+ - acting the same way in all reports (see above).

# - the number of an entry (row).

Chrom - chromosome the variant belongs to, e.g. chr1.

Position - position of the variant on the chromosome (number).

Gene - gene the variant belongs to, like in the previous report.

RSID - the reference sequence ID of the variant.

cDNA Change - change in the coding DNA by the variant.

Zygosity - hom or het (see above).

Allele Frequency - frequency of the allele.

Phenotype Name - description of condition(s) associated with the variant.

Significance - description of significance of this variant.

Part 3: Drugs Report

This report contains known issues of response to certain drugs associated with gene variants. It has the following columns:

# - the number of an entry (row).

Variant/Haplotypes - by rsID.

Drug(s) - name(s) of drug(s) response to which is affected by the variants.

Phenotype Category: Efficacy, Dosage, or Other.

Significance - yes or no.

Sentence - description of the case.

Allele Of Frequency In Cases - allele of the variant (one or more letters A, T, C or G) in cases involved.

Allele Of Frequency In Controls - allele of the variant in controls.

Ratio Stat Type - the ratio statistics type.

Effect - information on the effect.

Note: In some browsers the last one or two columns may be found beyond the visible area at 100% zoom level; in such cases try zooming out to 90%, 80% and so on, until everything is visible.