|
 |

Description
of sequence changes:
examples DNA-level |
Last modified October 21, 2009
Since references to WWW-sites
are not yet acknowledged as citations, please mention den
Dunnen JT and Antonarakis SE (2000). Hum.Mutat. 15: 7-12 when referring to
these pages. |
Contents
|
Introduction
Within this page examples will be given for the description of sequence variations. The
examples will be given independently for descriptions at DNA, RNA and protein level. All
examples are described relative to a Reference Sequence, depending on the
level a genomic or coding DNA sequence (DNA-level), an mRNA sequence (RNA-level) or an amino
acid sequence (protein level). |
Reference sequence DNA-level
Within this page examples will be given for the description of sequence variations
in a DNA sequence. For other examples go to those describing changes in
RNA. Examples for protein level are given at the protein page. All examples are described relative to
a Reference Sequence, here a coding DNA sequence.
| Part of gene |
nucleotide
numbering
genomic
Reference Sequence |
nucleotide
numbering
coding DNA
Reference Sequence |
nucleotide
numbering
protein
Reference Sequence |
| 5' gene flanking region |
1 to 270 |
(-300
to -31) |
- |
| exon 1 |
5' UTR |
271 to
300 |
-30 to
-1 |
- |
| coding region |
301 to
312 |
1 to 12 |
1 to 4 |
| intron 1 |
313 to
412 |
12+1 ... 12+50,
13-50 ... 13-1 |
- |
| exon 2 |
413 to
488 |
13 to 88 |
5 to 29
(30) |
| intron 2 |
489 to
689 |
88+1
... 88+100,
89-100 ... 89-1 |
- |
| exon 3 |
689 to
723 |
89 to
123 |
30 to 41 |
| intron 3 |
contains rare
alternatively spliced exon from 800 to 859 (coding DNA 123+77 to 123+136) |
724 to
1023 |
123+1
... 123+150,
124-150 ... 124-1 |
- |
| exon 4 |
1024 to
1200 |
124 to
300 |
42 to
100 |
| intron 4 |
1201 to
1600 |
300+1
... 300+200,
301-200 ... 301-1 |
- |
| exon 5 |
coding region |
1601 to
1630 |
301 to
330 |
101 to
109 |
| UTR, containing a (CA)7-stretch
from nts 1700 to 1713 (coding DNA *70 to *83); poly-A addition site at 1825 (coding DNA
*195) |
1631 to
1850 |
*1 to
*220 |
- |
| 3' gene flanking region |
1851 to
2000 |
(*221
to *370) |
- |
NOTE: nucleotides in introns in the 5' UTR are numbered like -23+1,
-23+2, ..., -22-2, -22-1. Nucleotides in introns in the 3' UTR are numbered like
*154+1, *154+2, ..., *155-2, *155-1.
Legend:
Reference sequence of imaginary gene used for the exaples given on this page. Nucleotide
+1 in the coding DNA reference sequence is the A of the ATG translation initiation
codon.
Abbreviations used: nt = nucleotide, nts = nucleotides, UTR = untranslated region of the
mRNA. For a picture of part of this hypothetical sequence see
Figure.
|
General
Publications reporting changes in different sequences (genes) or which report
linkage or association studies should prevent any confusion regarding which
variant resides in which sequence. An easy way to achieve this is to include an unequivocal
identifier to the reference sequence used in the
description, e.g. NM_004006.2:c.3G>T (see
Discussion).
|
Substitutions
Substitutions are designated by a ">"-character after the
number of the affected nucleotide.
- 5' gene flanking region - T to C substitution of nt 241 (located 30
nts upstream of the transcription initiation site, i.e. in the promoter region)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.241T>C |
- |
- 5' UTR - G to A substitution of nt 289, 12 nts upstream of the
ATG translation initiation codon (coding DNA -12). For nucleotide numbering
in a case where the ATG is not in exon 1 see
here.
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.289G>A |
c.-12G>A |
- coding region - G to C substitution of nt 303, i.e. nt 3 of the
coding region (coding DNA 3)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.303G>C |
c.3G>C |
- intron (regarding the numbering of intronic nucleotides see Discussion)
- 5' part intron - T to G substitution of the second nt in the intron
(88+2) positioned between coding DNA nts 88 and 89 (intron 2)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.490T>G |
c.88+2T>G |
- 3' part intron - G to T substitution of the last nt of the
intron (89-1) positioned between coding DNA nts 88 and 89 (intron 2)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.688G>T |
c.89-1G>T |
- alternatively spliced exon - G to C substitution of intronic nt 812
(coding DNA 123+89)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.812C>T |
c.123+89C>T |
- 3' UTR - T to A substitution of nt 1700 (coding DNA
70), located in the 3'
UTR (70 nts downstream of the termination codon)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.1700T>A |
c.*70T>A |
- 3' gene flanking region - C to A substitution of nt 1923 (located 123
nucleotides downstream of the gene, i.e. the polyA-addition site)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.1923C>A |
*293 |
|
Deletion
Deletions are designated by "del" after a description of the
deleted segment, i.e. the first (and last) nucleotide(s) deleted (see also Discussion).
To describe deletions with unknown breakpoints, e.g. based on Southern blotting,
PCR, arrayCGH, SNP array data, etc. see
Uncertainties.
- single nucleotide deletion - deletion of nt 13 of the coding region
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.413del (g.413delG) |
c.13del (c.13delG) |
g.304del (g.304delG)
(not g.303del / g.303delG) |
c.4del (c.4delG)
(not c.3del / c.3delG) |
g.1598delG
(not g.1596del / g.1596delG) |
c.301-3del (c.301-3delT)
(not c.301-5del or c.301-5delT) |
- several nucleotide deletion
- deletion of nts 92 and 94 (GAC) of the coding region
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.692_694del
(g.692_694delGAC, g.692_694del3) |
c.92_94del
(c.92_94delGAC, c.92_94del3) |
- deletion across the exon 3 / intron 3 border, nts 120 to 123 of the coding region
(exon
3) and the first 48 nts of intron 3 (nts 123+1 to 123+48)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.720_771del (g.720_771del52) |
c.120_123+48del (c.120_123+48del52) |
- deletion across the intron 3 / exon 4 border, the last 12 nts of intron 3
(nts 124-12 to 124-1) and nts 124 to 129 of the coding region
(exon 4)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.1012_1029del
(g.1012_1029del18) |
c.124-12_129del
(c.124-12_129del18) |
- deletion of a TG dinucleotide in the sequence ATGTTGTGCC to
ATGTTG_CC
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.307_308del
(g.307_308delTG, g.307_308del2)
NOT g.305_306del |
c.7_8del
(c.7_8delTG, c.7_8del2)
NOT c.5_6del) |
- deletion of an A nucleotide in the sequence CAAgt... /
..agAAG to CAgt... / ..agAAG
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.723del
(g.723delA) |
c.123del
(c.123deA)
NOT c.125delA |
- variability in short sequence repeat - see below
- (multi) exon deletion
- breakpoints not sequenced
- deletion of exons 2 to 4 (e.g. detected on
Southern blot, see Discussion)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| - |
c.13-?_300+?del |
- deletion of the entire gene; coding DNA reference sequence runs from
-30 (cap site) to *220 (polyA-addition site); see
Recommendations
| genomic Reference Sequence |
coding DNA Reference Sequence |
| - |
c.(?_-30)_(*220_?)del |
- breakpoints sequenced - deletion of exons 2 to 4; the sequences of the
deletion breakpoints need to be submitted to a sequence database (Genbank,
EMBL, DDJB) and
the accession number should be given
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.390_1458del (g.390_1458del1069) |
c.13-23_301-143del (c.13-23_301-143del1069) |
- large deletion fusing two genes (see
Discussion)
- genes with identical transcriptional orientation - a large deletion,
starting in intron 4, removes the entire 3' end of the gene and fuses it to position 457
(intron position 233+17) of the gene XYZ
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.1458_XYZ:457del |
c.301-143_XYZ:233+17del |
- genes with opposite transcriptional orientation - a large deletion,
starting in intron 4, removes the entire 3' end of the gene and fuses it to position 457
(intron position 233+17) of the XYZ-gene, which has an opposite transcriptional
orientation (indicated by the "o")
-
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.1458_oXYZ:457del |
c.301-143_oXYZ:233+17del |
|
Duplication
Duplications are designated by "dup" after a description of the
duplicated segment, i.e. the first (and last) nucleotide(s) duplicated (even
when a mono-nucleotide is duplicated, see
Recommendations). To
describe duplications with unknown breakpoints, e.g. based on Southern blotting,
PCR, arrayCGH, SNP array data, etc. see
Uncertainties.
- single nucleotide duplication - duplication of nt 13 of the coding
region
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.413dup (g.241dupT) |
c.13dup (c.13dupT) |
- several nucleotide duplication
- duplication of nts 92 and 94 (GAC) of the coding region
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.692_694dup
(g.692_694dupGAC, g.692_694dup3) |
c.92_94dup
(c.92_94dupGAC, c.92_94dup3) |
- duplication across the exon 3 / intron 3 border, nts 120 to 123 of the coding region
(exon 3) and the first 48 nts of intron 3 (nts 123+1 to 123+48)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.720_771dup (g.720_771dup52) |
c.120_123+48dup (c.120_123+48dup52) |
- duplication of a TG dinucleotide in the sequence ATGTTGTGCC to ATGTTGTGTGCC
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.307_308dup
(g.307_308dupTG, g.307_308dup2)
NOT g.305_306dup |
c.7_8dup
(c.7_8dupTG, c.7_8dup2)
NOT c.5_6dup) |
- variability in short sequence repeat - see below
- (multi) exon duplication
- breakpoints not sequenced - duplication of exons 2 to 4 (e.g. detected on
Southern blot, see Discussion)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| - |
c.13-?_300+?dup |
- breakpoints not sequenced - deletion of the entire gene; coding DNA reference sequence runs from
-30 (cap site) to *220 (polyA-addition site); see
Recommendations
| genomic Reference Sequence |
coding DNA Reference Sequence |
| - |
c.(?_-30)_(*220_?)dup |
- breakpoints sequenced - duplication of exons 2 to 4; the sequences of the
duplication breakpoints need to be submitted to a sequence database (Genbank,
EMBL, DDJB)
and the accession numbers should be given
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.390_1458dup
(g.390_1458dup1069) |
c.13-23_301-143dup (c.13-23_301-143dup1069) |
|
Insertion
Insertions are designated by "ins" after the nucleotides
flanking the insertion. NOTE: duplicating insertions (incl.
duplication of a mono-nucleotide) should be described as
duplications (see above).
- single nucleotide insertion - insertion of a T between nts 51 and 52
of the coding region
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.451_452insT |
c.51_52insT |
- several nucleotide insertion
- insertion of a GAGA-sequence between nts 51 and 52 of the coding region
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.451_452insGAGA |
c.51_52insGAGA |
- insertion of a TG dinucleotide in the sequence ATGTTGTGCC to ATGTTGTGTGCC;
is a duplication (see above)
- variability in short sequence repeat (see
Recommendations)
- variability in short sequence repeat - see below
- large insertion - insertion of a 345 nucleotide sequence in intron 3;
the sequence of the insertion should be submitted to a sequence database (Genbank,
EMBL, DDJB) and the accession number should be given
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.777_778ins345 (GenBank AB012345.2) |
c.123+54_123+55ins345 (GenBank AB012345.2) |
|
Variability of short sequence repeats
For the recommendations how to describes variable short sequence repeats see
Recommendations
- polymorphic CA-repeat - the 3' UTR of the
gene contains a
(CA)7-stretch from nucleotides 1700 to 1713 (coding DNA *70 to *83) which in the population has
a length between 6 and 13 CA's
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.1700CA(6_13)
NOT g.1713CA(6_13) |
c.*70CA(6_13)
NOT c.*83CA(6_13) |
- polymorphic CA-repeat - a person carries a
CA di-nucleotide repeat of length 6
on one allele (the reference sequence has a repeat length of 7)
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.1700CA[6]
NOT g.1712_1713delCA |
c.*70CA[6]
NOT c.*82_83delCA |
- polymorphic CA-repeat - a person carries a
CA di-nucleotide repeat of length 6
on one allele and of length 11 on the other allele
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.1700CA[6]+[11] |
c.*70CA[6]+[11] |
- polymorphic CA-repeat - a person carries a CA di-nucleotide
repeat of length 8 on one allele (the reference sequence has a repeat
length of 7)
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.1700CA[8]
NOT g.1712_1713dupCA |
c.*70CA[68]
NOT c.*82_83dupCA |
- FMR1 GGC-repeat - based on the coding DNA Reference Sequence (GenBank NM_002024.3),
c.-158GGC(1000) describes the presence of an extended GGC-repeat of about
1000 units
NOTE: "()" is used to indicate uncertainties (see
Uncertainties); c.-158GGC[79] describes the presence of an
extended GGC-repeat of exactly 79 units
Inversion
Inversions are designated by "inv" after the nt number of the
nucleotides inverted.
- short inversion - inversion of nts 177 to 180 (CTAG) of the coding
region
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.1077_1080inv
(g.1077_1080inv4, g.1077_1080invCTAG) |
c.77_80inv
(c.77_80inv4, c.77_80invCTAG) |
- large inversion - a large inversion (212,434 nucleotides in length),
starting in intron 4, inverts the entire 3' end of the gene and fuses it to position
233+17 (intron) of the XYZ-gene, having an opposite transcriptional orientation (indicated
by the "o")
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.1458_XYZo:457inv (g.1458_XYZo:457inv212434) |
c.301-143_XYZo:233+17inv (c.301-143_XYZo:233+17inv212434) |
Gene conversion
Gene conversions are designated by "con" after the nt number of the
nucleotides converted, followed by a description of the origin on the new
sequence; "region_changed"
con "region of origin" (see Discussion).
- a gene conversion replacing a segment of the coding region of a gene for
a segment derived from elsewhere in the genome (as described in another GenBank file;
AC096506.5 and NM_004006.1 resp.)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.415_1655conAC096506.5:g.409_1683 |
c.15_355conNM_004006.1:c.15_355 |
Translocation
Translocations are designated in the format "t(X;4)(p21.2;q34)",
followed by the usual description, placed between brackets, indicating the
exact translocation breakpoint. The sequences of the translocation breakpoints need to be
submitted to a sequence database (Genbank, EMBL, DDJB) and the accession numbers should be
given (see Discussion).
- a translocation breakpoint in the 3' half of intron 4, between nucleotides 1453 and 1454
(coding DNA 301-148 and 300-147), joining chromosome bands Xp21.2 and 4q34
| genomic Reference Sequence |
coding DNA Reference Sequence |
| t(X;4)(p21.2;q35)(g.1453_1454) |
t(X;4)(p21.2;q35)(c.301-148_300-147)
[t(X;4)(p21.2;q35)(c.IVS4)] |
Complex
Complex rearrangements are rearrangements which consist of several different types of
the six elementary content changes substitution, deletion, duplication,
insertion, inversion and translocation. Such rearrangements can be very complex and
difficult to describe. Specific recommendations to describe such changes have not made.
Complex rearrangements can be best described as a combination of the elementary
changes.
Deletion / insertions ("indels")
are described as a deletion ("del"), followed by an insertion ("ins")
after a description of the deleted segment, i.e. the first (and last) nucleotide(s)
deleted (see Discussion).
- deletions of nts 712 to 717 of the coding region (coding DNA 112 to 117) and an TG-insertion
at the same site
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.712_717delinsTG
(g.712_717del6insTG, g.712_717delAGGGCAinsTG) |
c.112_117delinsTG
(c.112_117del6insTG, c.112_117delAGGGCAinsTG) |
Miscellaneous
- Two changes in one allele are described as "[first change;
second change]" (see Discussion)
- one allele (chromosome) containing an C to T change at nt 476 (coding DNA nt 76) and a G to C
change at nt 483 (coding DNA 83)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.[476C>T;483G>C] |
c.[76C>T;83G>C] |
- Two changes in one individual with alleles unknown are described as "[first
change (+) second change]" (see Discussion)
- one individual containing an C to T change at nt 476 (coding DNA nt 76) and a G to C
change at nt 1083 (coding DNA 183) while it is unknown whether these changes
are on the same or different alleles
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.[476C>T(+)183G>C] |
c.[76C>T(+)183G>C] |
- Recessive disease - ( changes in different alleles) are described as
"[change allele 1]+[change allele 2]" (see
Discussion)
- both changes identified - a homozygous C to T change at nt 76
of the coding region
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.[476C>T]+[476C>T] |
c.[76C>T]+[76C>T] |
- one allele containing a TG di-nucleotide repeat of length 4, the other allele containing
a repeat of length 5 (see Recommendation)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.983TG[4]+[5] |
c.88+495TG[4]+[5] |
- one change not yet identified - one allele containing a C to T
change at nt 76 of the coding region, the other allele containing an unknown change
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.[476C>T]+[?] |
c.[76C>T]+[?] |
- one changed allele and one normal allele - one allele
containing a C to T change at nt 76 of the coding region, the other allele having the
normal sequence (wild type)
| genomic Reference Sequence |
coding DNA Reference Sequence |
| g.[476C>T]+[=] |
c.[76C>T]+[=] |
- several changes in both alleles - two alleles
containing a C to G change at nt -5, one allele containing a C to T change at nt 76,
and two alleles with a G to C
change at nt 183 all in relation to the coding region
| genomic Reference Sequence |
coding DNA Reference Sequence |
g.[266C>G;476C>T;1083G>C]+
g.[266C>G;=;1083G>C] |
c.[-5C>G; 76C>T;183G>C]+
c.[-5C>G; =;183G>C] |
- two changes in different genes - the GJB2 gene contains a deletion of a G, the GJB6 gene a T insertion
| genomic Reference Sequence |
coding DNA Reference Sequence |
|
NM_004004.2:c.[35delG] +
NM_006783.1:c.[689T>C]
(GJB2:c.[35delG]+GJB6:c.[689T>C]) |
|