hgvs HEADER

 
 
line decor
   
line decor
 
 
 
 

 
 
Description of sequence changes: 
examples DNA-level

Last modified October 21, 2009

Since references to WWW-sites are not yet acknowledged as citations, please mention den Dunnen JT and Antonarakis SE (2000). Hum.Mutat. 15: 7-12 when referring to these pages.

 

Contents

 

Introduction

Within this page examples will be given for the description of sequence variations. The examples will be given independently for descriptions at DNA, RNA and protein level. All examples are described relative to a Reference Sequence, depending on the level a genomic or coding DNA sequence (DNA-level), an mRNA sequence (RNA-level) or an amino acid sequence (protein level). 

 

Reference sequence DNA-level

Within this page examples will be given for the description of sequence variations in a DNA sequence. For other examples go to those describing changes in RNA. Examples for protein level are given at the protein page. All examples are described relative to a Reference Sequence, here a coding DNA sequence.

Part of gene nucleotide numbering
genomic
Reference Sequence 
nucleotide numbering
coding DNA
Reference Sequence
 nucleotide numbering
protein
Reference Sequence
5' gene flanking region 1 to 270 (-300 to -31)  -
exon 1 5' UTR 271 to 300 -30 to -1  -
coding region 301 to 312 1 to 12 1 to 4
intron 1 313 to 412

12+1 ... 12+50,
13-50 ... 13-1

-
exon 2 413 to 488 13 to 88 5 to 29 (30)
intron 2 489 to 689 88+1 ... 88+100,
89-100 ... 89-1 
-
exon 3 689 to 723 89 to 123 30 to 41
intron 3 contains rare alternatively spliced exon from 800 to 859 (coding DNA 123+77 to 123+136) 724 to 1023 123+1 ... 123+150,
124-150 ... 124-1 
-
exon 4 1024 to 1200 124 to 300 42 to 100
intron 4 1201 to 1600 300+1 ... 300+200,
301-200 ... 301-1 
-
exon 5 coding region 1601 to 1630 301 to 330 101 to 109
UTR, containing a (CA)7-stretch from nts 1700 to 1713 (coding DNA *70 to *83); poly-A addition site at 1825 (coding DNA *195) 1631 to 1850 *1 to *220 -
3' gene flanking region 1851 to 2000 (*221 to *370)

NOTE: nucleotides in introns in the 5' UTR are numbered like -23+1, -23+2, ..., -22-2, -22-1. Nucleotides in introns in the 3' UTR are numbered like *154+1, *154+2, ..., *155-2, *155-1. 

Legend:
Reference sequence of imaginary gene used for the exaples given on this page. Nucleotide +1 in the coding DNA reference sequence is the A of the ATG translation initiation codon. Abbreviations used: nt = nucleotide, nts = nucleotides, UTR = untranslated region of the mRNA. For a picture of part of this hypothetical sequence see Figure.

 

General

Publications reporting changes in different sequences (genes) or which report linkage or association studies should prevent any confusion regarding which variant resides in which sequence. An easy way to achieve this is to include an unequivocal identifier to the reference sequence used in the description, e.g. NM_004006.2:c.3G>T (see Discussion).

 

Substitutions

Substitutions are designated by a ">"-character after the number of the affected nucleotide.

  • 5' gene flanking region  -  T to C substitution of nt 241 (located 30 nts upstream of the transcription initiation site, i.e. in the promoter region)
    genomic Reference Sequence coding DNA Reference Sequence
    g.241T>C -
  •  5' UTR  -  G to A substitution of nt 289, 12 nts upstream of the ATG translation initiation codon (coding DNA -12). For nucleotide numbering in a case where the ATG is not in exon 1 see here
    genomic Reference Sequence coding DNA Reference Sequence
    g.289G>A c.-12G>A
  • coding region  -  G to C substitution of nt 303, i.e. nt 3 of the coding region (coding DNA 3)
    genomic Reference Sequence coding DNA Reference Sequence
    g.303G>C c.3G>C
  • intron (regarding the numbering of intronic nucleotides see Discussion)
    • 5' part intron  - T to G substitution of the second nt in the intron (88+2) positioned between coding DNA nts 88 and 89 (intron 2)
      genomic Reference Sequence coding DNA Reference Sequence
      g.490T>G c.88+2T>G
    • 3' part intron  - G to T substitution of the last nt of the intron  (89-1) positioned between coding DNA nts 88 and 89 (intron 2)
      genomic Reference Sequence coding DNA Reference Sequence
      g.688G>T c.89-1G>T
  • alternatively spliced exon  -  G to C substitution of intronic nt 812 (coding DNA 123+89)
    genomic Reference Sequence coding DNA Reference Sequence
    g.812C>T c.123+89C>T
  • 3' UTR  -  T to A substitution of nt 1700 (coding DNA 70), located in the 3' UTR (70 nts downstream of the termination codon)
    genomic Reference Sequence coding DNA Reference Sequence
    g.1700T>A c.*70T>A
  • 3' gene flanking region  -  C to A substitution of nt 1923 (located 123 nucleotides downstream of the gene, i.e. the polyA-addition site)
    genomic Reference Sequence coding DNA Reference Sequence
    g.1923C>A *293

 

Deletion

Deletions are designated by "del" after a description of the deleted segment, i.e. the first (and last) nucleotide(s) deleted (see also Discussion). To describe deletions with unknown breakpoints, e.g. based on Southern blotting, PCR, arrayCGH, SNP array data, etc. see Uncertainties.

  • single nucleotide deletion  -  deletion of nt 13 of the coding region
    genomic Reference Sequence coding DNA Reference Sequence
    g.413del   (g.413delG) c.13del   (c.13delG)
    g.304del   (g.304delG)
    (not g.303del / g.303delG)
    c.4del   (c.4delG)
    (not c.3del / c.3delG)
    g.1598delG
    (not g.1596del / g.1596delG)
    c.301-3del   (c.301-3delT)
    (not c.301-5del or  c.301-5delT)
  •  several nucleotide deletion 
    • deletion of nts 92 and 94 (GAC) of the coding region
      genomic Reference Sequence coding DNA Reference Sequence
      g.692_694del
      (g.692_694delGAC, g.692_694del3) 
      c.92_94del   
      (c.92_94delGAC, c.92_94del3) 
    • deletion across the exon 3 / intron 3 border, nts 120 to 123 of the coding region (exon 3) and the first 48 nts of intron 3 (nts 123+1 to 123+48)
      genomic Reference Sequence coding DNA Reference Sequence
      g.720_771del   (g.720_771del52) c.120_123+48del   (c.120_123+48del52)
    • deletion across the intron 3 / exon 4 border, the last 12 nts of intron 3 (nts 124-12 to 124-1) and nts 124 to 129 of the coding region (exon 4) 
      genomic Reference Sequence coding DNA Reference Sequence
      g.1012_1029del   (g.1012_1029del18) c.124-12_129del   (c.124-12_129del18) 
    • deletion of a TG dinucleotide in the sequence ATGTTGTGCC to ATGTTG_CC
      genomic Reference Sequence coding DNA Reference Sequence
      g.307_308del 
      (g.307_308delTG, g.307_308del2)
      NOT g.305_306del
      c.7_8del 
       (c.7_8delTG, c.7_8del2)
      NOT c.5_6del) 
    • deletion of an A nucleotide in the sequence CAAgt... / ..agAAG to CAgt... / ..agAAG
      genomic Reference Sequence coding DNA Reference Sequence
      g.723del
      (g.723delA)
      c.123del 
       (c.123deA)
      NOT c.125delA
  • variability in short sequence repeat - see below
  • (multi) exon deletion
    • breakpoints not sequenced
      • deletion of exons 2 to 4 (e.g. detected on Southern blot, see Discussion)
      genomic Reference Sequence coding DNA Reference Sequence
      - c.13-?_300+?del  
      • deletion of the entire gene; coding DNA reference sequence runs from -30 (cap site) to *220 (polyA-addition site); see Recommendations
      genomic Reference Sequence coding DNA Reference Sequence
      - c.(?_-30)_(*220_?)del   
    • breakpoints sequenced - deletion of exons 2 to 4; the sequences of the deletion breakpoints need to be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession number should be given
      genomic Reference Sequence coding DNA Reference Sequence
      g.390_1458del   (g.390_1458del1069) c.13-23_301-143del   (c.13-23_301-143del1069)
  • large deletion fusing two genes  (see Discussion)
    • genes with identical transcriptional orientation - a large deletion, starting in intron 4, removes the entire 3' end of the gene and fuses it to position 457 (intron position 233+17) of the gene XYZ
      genomic Reference Sequence coding DNA Reference Sequence
      g.1458_XYZ:457del c.301-143_XYZ:233+17del
    • genes with opposite transcriptional orientation - a large deletion, starting in intron 4, removes the entire 3' end of the gene and fuses it to position 457 (intron position 233+17) of the XYZ-gene, which has an opposite transcriptional orientation (indicated by the "o")
    • genomic Reference Sequence coding DNA Reference Sequence
      g.1458_oXYZ:457del c.301-143_oXYZ:233+17del

Duplication

Duplications are designated by "dup" after a description of the duplicated segment, i.e. the first (and last) nucleotide(s) duplicated (even when a mono-nucleotide is duplicated, see Recommendations). To describe duplications with unknown breakpoints, e.g. based on Southern blotting, PCR, arrayCGH, SNP array data, etc. see Uncertainties.

  • single nucleotide duplication  -  duplication of nt 13 of the coding region
    genomic Reference Sequence coding DNA Reference Sequence
    g.413dup   (g.241dupT) c.13dup   (c.13dupT)
  •  several nucleotide duplication 
    • duplication of nts 92 and 94 (GAC) of the coding region
      genomic Reference Sequence coding DNA Reference Sequence
      g.692_694dup
      (g.692_694dupGAC, g.692_694dup3) 
      c.92_94dup   
      (c.92_94dupGAC, c.92_94dup3) 
    • duplication across the exon 3 / intron 3 border, nts 120 to 123 of the coding region (exon 3) and the first 48 nts of intron 3 (nts 123+1 to 123+48)
      genomic Reference Sequence coding DNA Reference Sequence
      g.720_771dup   (g.720_771dup52) c.120_123+48dup   (c.120_123+48dup52) 
    • duplication of a TG dinucleotide in the sequence ATGTTGTGCC to ATGTTGTGTGCC
      genomic Reference Sequence coding DNA Reference Sequence
      g.307_308dup 
      (g.307_308dupTG, g.307_308dup2)
      NOT g.305_306dup
      c.7_8dup 
      (c.7_8dupTG, c.7_8dup2)
      NOT c.5_6dup) 
  • variability in short sequence repeat - see below
  • (multi) exon duplication
    • breakpoints not sequenced - duplication of exons 2 to 4 (e.g. detected on Southern blot, see Discussion)
      genomic Reference Sequence coding DNA Reference Sequence
      - c.13-?_300+?dup
    • breakpoints not sequenced - deletion of the entire gene; coding DNA reference sequence runs from -30 (cap site) to *220 (polyA-addition site); see Recommendations
      genomic Reference Sequence coding DNA Reference Sequence
      - c.(?_-30)_(*220_?)dup
    • breakpoints sequenced - duplication of exons 2 to 4; the sequences of the duplication breakpoints need to be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession numbers should be given
      genomic Reference Sequence coding DNA Reference Sequence
      g.390_1458dup 
      (g.390_1458dup1069)
      c.13-23_301-143dup   (c.13-23_301-143dup1069)

 

Insertion

Insertions are designated by "ins" after the nucleotides flanking the insertion. NOTE: duplicating insertions (incl. duplication of a mono-nucleotide) should be described as duplications (see above).

  • single nucleotide insertion  -  insertion of a T between nts 51 and 52 of the coding region
    genomic Reference Sequence coding DNA Reference Sequence
    g.451_452insT c.51_52insT
  •  several nucleotide insertion
    • insertion of a GAGA-sequence between nts 51 and 52 of the coding region
      genomic Reference Sequence coding DNA Reference Sequence
      g.451_452insGAGA c.51_52insGAGA
    • insertion of a TG dinucleotide in the sequence ATGTTGTGCC to ATGTTGTGTGCC; is a duplication (see above)
    • variability in short sequence repeat (see Recommendations)
  • variability in short sequence repeat - see below
  • large insertion  -  insertion of a 345 nucleotide sequence in intron 3; the sequence of the insertion should be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession number should be given
    genomic Reference Sequence coding DNA Reference Sequence
    g.777_778ins345 (GenBank AB012345.2) c.123+54_123+55ins345 (GenBank AB012345.2)

 

 

Variability of short sequence repeats

  For the recommendations how to describes variable short sequence repeats see Recommendations
  • polymorphic CA-repeat  -  the 3' UTR of the gene contains a (CA)7-stretch from nucleotides 1700 to 1713 (coding DNA *70 to *83) which in the population has a length between 6 and 13 CA's
    genomic Reference Sequence coding DNA Reference Sequence
    g.1700CA(6_13)
    NOT g.1713CA(6_13)
    c.*70CA(6_13)
    NOT c.*83CA(6_13)
  • polymorphic CA-repeat  -  a person carries a CA di-nucleotide repeat of length 6 on one allele (the reference sequence has a repeat length of 7)
    genomic Reference Sequence coding DNA Reference Sequence
    g.1700CA[6]
    NOT g.1712_1713delCA
    c.*70CA[6]
    NOT c.*82_83delCA
  • polymorphic CA-repeat  -  a person carries a CA di-nucleotide repeat of length 6 on one allele and of length 11 on the other allele
    genomic Reference Sequence coding DNA Reference Sequence
    g.1700CA[6]+[11] c.*70CA[6]+[11]
  • polymorphic CA-repeat  -  a person carries a CA di-nucleotide repeat of length 8 on one allele (the reference sequence has a repeat length of 7)
    genomic Reference Sequence coding DNA Reference Sequence
    g.1700CA[8]
    NOT g.1712_1713dupCA
    c.*70CA[68]
    NOT c.*82_83dupCA
  • FMR1 GGC-repeat - based on the coding DNA Reference Sequence (GenBank NM_002024.3), c.-158GGC(1000) describes the presence of an extended GGC-repeat of about 1000 units
    NOTE: "()" is used to indicate uncertainties (see Uncertainties); c.-158GGC[79] describes the presence of an extended GGC-repeat of exactly 79 units

Inversion

Inversions are designated by "inv" after the nt number of the nucleotides inverted.

  • short inversion  -  inversion of nts 177 to 180 (CTAG) of the coding region
    genomic Reference Sequence coding DNA Reference Sequence
    g.1077_1080inv 
    (g.1077_1080inv4, g.1077_1080invCTAG)
    c.77_80inv 
    (c.77_80inv4, c.77_80invCTAG)
  • large inversion  -  a large inversion (212,434 nucleotides in length), starting in intron 4, inverts the entire 3' end of the gene and fuses it to position 233+17 (intron) of the XYZ-gene, having an opposite transcriptional orientation (indicated by the "o")
    genomic Reference Sequence coding DNA Reference Sequence
    g.1458_XYZo:457inv   (g.1458_XYZo:457inv212434) c.301-143_XYZo:233+17inv   (c.301-143_XYZo:233+17inv212434)

Gene conversion

Gene conversions are designated by "con" after the nt number of the nucleotides converted, followed by a description of the origin on the new sequence; "region_changed" con "region of origin" (see Discussion).

  • a gene conversion replacing a segment of the coding region of a gene for a segment derived from elsewhere in the genome (as described in another GenBank file; AC096506.5 and NM_004006.1 resp.)
    genomic Reference Sequence coding DNA Reference Sequence
    g.415_1655conAC096506.5:g.409_1683 c.15_355conNM_004006.1:c.15_355

Translocation

Translocations are designated in the format  "t(X;4)(p21.2;q34)", followed by the usual description, placed between brackets, indicating the exact translocation breakpoint. The sequences of the translocation breakpoints need to be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession numbers should be given (see Discussion).

  • a translocation breakpoint in the 3' half of intron 4, between nucleotides 1453 and 1454 (coding DNA 301-148 and 300-147), joining chromosome bands Xp21.2 and 4q34
    genomic Reference Sequence coding DNA Reference Sequence
    t(X;4)(p21.2;q35)(g.1453_1454) t(X;4)(p21.2;q35)(c.301-148_300-147)   [t(X;4)(p21.2;q35)(c.IVS4)]

Complex

Complex rearrangements are rearrangements which consist of several different types of the six elementary content changes substitution, deletion, duplication, insertion, inversion and translocation. Such rearrangements can be very complex and difficult to describe. Specific recommendations to describe such changes have not made. Complex rearrangements can be best described as a combination of the elementary changes. 

Deletion / insertions ("indels") are described as a deletion ("del"), followed by an insertion ("ins") after a description of the deleted segment, i.e. the first (and last) nucleotide(s) deleted (see Discussion).

  • deletions of nts 712 to 717 of the coding region (coding DNA 112 to 117) and an TG-insertion at the same site
    genomic Reference Sequence coding DNA Reference Sequence
    g.712_717delinsTG
    (g.712_717del6insTG, g.712_717delAGGGCAinsTG)
    c.112_117delinsTG
    (c.112_117del6insTG, c.112_117delAGGGCAinsTG)

Miscellaneous

  • Two changes in one allele are described as "[first change; second change]" (see Discussion)
    • one allele (chromosome) containing an C to T change at nt 476 (coding DNA nt 76) and a G to C change at nt 483 (coding DNA 83)
      genomic Reference Sequence coding DNA Reference Sequence
      g.[476C>T;483G>C]  c.[76C>T;83G>C] 
  • Two changes in one individual with alleles unknown are described as "[first change (+) second change]" (see Discussion)
    • one individual containing an C to T change at nt 476 (coding DNA nt 76) and a G to C change at nt 1083 (coding DNA 183) while it is unknown whether these changes are on the same or different alleles
      genomic Reference Sequence coding DNA Reference Sequence
      g.[476C>T(+)183G>C]  c.[76C>T(+)183G>C] 
  • Recessive disease - ( changes in different alleles) are described as "[change allele 1]+[change allele 2]" (see Discussion)
    • both changes identified  -  a homozygous C to T change at nt 76 of the coding region
      genomic Reference Sequence coding DNA Reference Sequence
      g.[476C>T]+[476C>T] c.[76C>T]+[76C>T]
    • one allele containing a TG di-nucleotide repeat of length 4, the other allele containing a repeat of length 5 (see Recommendation)
      genomic Reference Sequence coding DNA Reference Sequence
      g.983TG[4]+[5] c.88+495TG[4]+[5]
    • one change not yet identified  -  one allele containing a C to T change at nt 76 of the coding region, the other allele containing an unknown change
      genomic Reference Sequence coding DNA Reference Sequence
      g.[476C>T]+[?] c.[76C>T]+[?]
    • one changed allele and one normal allele  -  one allele containing a C to T change at nt 76 of the coding region, the other allele having the normal sequence (wild type)
      genomic Reference Sequence coding DNA Reference Sequence
      g.[476C>T]+[=] c.[76C>T]+[=]
    • several changes in both alleles  -  two alleles containing a C to G change at nt -5, one allele containing a C to T change at nt 76, and two alleles with a G to C change at nt 183 all in relation to the coding region
      genomic Reference Sequence coding DNA Reference Sequence
      g.[266C>G;476C>T;1083G>C]+
      g.[266C>G;=;1083G>C]
      c.[-5C>G; 76C>T;183G>C]+
      c.[-5C>G; =;183G>C]
    • two changes in different genes  -  the GJB2 gene contains a deletion of a G, the GJB6 gene a T insertion
      genomic Reference Sequence coding DNA Reference Sequence
      NM_004004.2:c.[35delG] + NM_006783.1:c.[689T>C]

      (GJB2:c.[35delG]+GJB6:c.[689T>C])

 

 

 

   

 

     
   

 

     
   

 

     

Copyright © HGVS 2009
All Rights Reserved

Website Created by Rania Horaitis, Nomenclature by J.T. Den Dunnen - Disclaimer

 

Top of page | MutNomen homepage | Check-list |
| Recommendations:  DNARNAprotein, uncertain |
| Discussions | FAQ's | Codons / amino acids | History |
| Example descriptions:  QuickRef / symbolsRNAprotein |

    Last update: 9 Nov 2009