 |
Checklist
for the description of sequence variants
|
Last modified April 2, 2006
|
Since references to WWW-sites
are not yet acknowledged as citations, please mention den
Dunnen JT and Antonarakis SE (2000). Hum.Mutat. 15:7-12 when referring to
these pages.
Purpose
Going through publications (JT den Dunnen, in prep.) one can easily see where
people tend to offend the "Current recommendations for the
description of sequence variants". The checklist below covers the most
problematic issues and should assist those preparing a publication to describe sequence
variants following the current recommendations.
Checklist
- Reference Sequence - do you clearly describe the sequence used as
a reference sequence ?
A publication should mention, preferably in the Materials & Methods
section and or Table legend, which
sequence file was used as reference sequence for numbering of the residues (DNA, RNA and
protein); see Recommendations, Discussion
and mtDNA variants.
- do you mention a GenBank (not GeneBank) RefSeq-file
accession number with version number ?; do not forget the underscore in the accession number (correct is
NM_004006.2, not NM004006.2).
- for a coding DNA reference sequence, do you clearly state that nucleotide numbering uses
the A of the ATG translation initiation start site as nucleotide +1 ?
- a genomic reference sequence starts with nucleotide 1; a genomic reference sequence can
thus not have negative numbers
- Intronic variants - do you indicate where the reference intron sequence
can be found ?
the recommendation is to describe intronic variants in the format
"c.89-2A>G" and not like "c.IVS4-2A>G" (see Discussion). When the format "c.IVS4-2A>G" is
used, do you clearly indicate on which GenBank file intron / exon numbering is based and
where the reference intron sequence can be found ?
- Tabular overview - do you provide a clear, unequivocal overview of all
changes reported ?
preferably, a publication contains a tabular overview of all sequence changes
reported. This overview contains columns describing the change at DNA-level
(absolutely essential) and, optional, at RNA and protein level.
When data on RNA and/or protein level are provided, it should be made clear whether the
data were deduced or experimentally verified (e.g. state
explicitly when RNA was analysed to confirm the putative splice mutation detected).
- Insertions
- are insertions reported in the format c.51_52insT ?
since it is not clear whether one means insertion at or
insertion after position 52, insertions should not be reported as c.52insT
but in the format c.51_52insT (see Discussion).
- are the insertions reported really insertions or are they
in fact duplications ?
duplicating insertions should be described as duplications, not as insertions;
c.92_94dup (or c.92_94dupGAC) is correct, c.94_95insGAC is not correct (see Discussion).
- Most 3' position - do you correctly assign the change to the most 3' position
possible ?
for deletions, duplications and insertions the most 3' position possible is arbitrarily
assigned to have been changed (see Recommendations);
important especially in single residue (nucleotide or amino acid) stretches or tandem
repeats. Example ACTTTGTGCC to ACTTGCC is described as c.5_7delTGT (not
as c.4_6delTTG)
- Recessive diseases - do you clearly describe which changes are found in
which combination ?
a publication describing sequence changes found in patients suffering from a recessive
disease should for each patient explicitly mention which combination of
(pathogenic) changes was identified (see Recommendations).
Example c.[76C>T]+[87G>A] or c.[76C>T]+[?].
NOTE: this description differs from that
describing several changes in one allele, which has the format
c.[76A>C; 113G>C].
- Range - is the sign used to indicate a range a "_"
(underscore) and not a "-" (minus) ?
to prevent confusion, the underscore should be used to indicate a range
and not the minus sign. The minus sign should only be used to indicate negative
numbers. The correct description to indicate a deletion of the coding
DNA nucleotides 12 to
14
is c.12_14del. Not correct is c.12-14del, which describes a deletion of nucleotide -14 in
the intron directly preceding cDNA nucleotide 12 (see Discussion).
- Deletion - do you indicate the first and last residue involved in a deletion ?
a deletion of more than one residue should mention the first and last residue deleted,
separated using a "_" (underscore). Example c.21_24del or p.Ala13_Gln16del.
- Describe at DNA-level - do you describe all changes reported at DNA-level ?
all changes reported must be described at DNA-level
- when descriptions at RNA or protein level are given in the text, upon first appearance,
use a format like "c.78G>C (p.Trp26Cys)"
- description of "silent mutations" in the format "p.Leu54Leu (or
p.L54L)" is not allowed. Descriptions should be given at DNA level, it is
non-informative and not unequivocal (there are five possibilities at DNA level); a correct
description is c.162C>G
- RNA protein level descriptions
recommendations exist to describe alternative transcripts deriving from one
allele (see Recommendations). Since these
descriptions are rather complex to explain, it is wise to include a link to
the HGVS recommendations in the publication.
- Protein level descriptions
- protein reference sequence - the protein reference
sequence should represent the primary
translation product, not a processed mature protein, and thus include any signal
peptide sequences (see Recommendations).
- one/three letter amino acid code - are the correct amino acid codes
used at protein level ?; several amino acids start with the same initial letter (Ala, Arg, Asn, Asp start with A,
Gln, Glu, GLy with G, Leu, Lys with L, Phe, Pro
with P and Thr, Tyr with T) but that initial
letter is used as one-letter-amino-acid-code for only one of these (see Discussion and Codons and
amino acids)
- initiating methionine (Met1) -
p.Met1? denotes that amino acid Methionine-1 (translation
initiation site) is changed and that it is
unclear what the consequence of this change is. When experimental data show that no
protein is made, the description p.0 should be used. The
description p.Met1Val is not allowed (see Discussion)
- nonstop change - recommendations have recently been made
to describe substitutions in the stop codon, so called nonstop changes like
p.X110Tyrext16 (see Recommendations)
- Polymorphisms - do not describe polymorphic variants as c.127A/G
or p.43Ile/Val
(or p.43I/V). A description of a variant should be neutral
and polymorphisms and pathogenic changes should not be described differently
(see Discussion). Correct
descriptions are c.127A>G and p.Ile43Val.
| Top of page | MutNomen
homepage |
| Recommendations: DNA, RNA,
protein, uncertain |
| Discussions | FAQ's | Codons
/ amino acids | History |
| Example descriptions: QuickRef / symbols,
DNA, RNA,
protein |
Copyright © HGVS 2007 All Rights Reserved
Website Created by Rania Horaitis, Nomenclature by J.T. Den Dunnen - Disclaimer |