 |
Recommendations
for the description of protein sequence variants
|
Last modified May 12, 2007
|
Since references to WWW-sites are not yet acknowledged as citations, please
mention den
Dunnen JT and Antonarakis SE (2000). Hum.Mutat. 15:7-12 when referring to
these pages.
Contents
Protein level
(suggestions extending the published
recommendations
in italics)
Designations at protein level describe the consequence of
a change, the origin lies at DNA-level, and they will be rarely experimentally
verified. Sequence changes at protein level are described as those at the DNA level
with the following modifications / additions;
- when changes at protein level are described it should be clear whether the changes were
experimentally determined
or only theoretically deduced
- a "p." is used to indicate description at protein level
-
descriptions at protein level should describe the changes
observed on protein level and not try to incorporate any knowledge regarding the
change at DNA-level (see FAQ)
- the description of frame shifts does not include the
deletion at protein level from the site of the frame shift to the natural
C-terminal end (stop codon) of the protein (so p.Arg97ProfsX23 and
not p.Arg97_Pro109delfsX23). Similarly, for frame shifting insertions
the inserted amino acid residues are not described, only the total length of
the new shifted frame is given (i.e. including the inserted amino acids), so
p.Glu5ValfsX5 and not something like p.Glu5Valins2fsX3.
- amino acid numbering
- the translation initiator Methionine is numbered as +1
- the protein reference sequences should represent the primary
translation product, not a processed mature protein, and thus include
any signal peptide sequences (see FAQ)
- amino acids originating from changes introducing upstream translation
initiation are numbered like nucleotides (like ..., Gln-2, Thr-1)
- amino acids originating from changes resulting in translation of
intronic sequences are numbered like nucleotides (like Val4+1, Ser4+2,
..., Phe5-2, Gln5-1)
- amino acids originating from no-stop changes
causing translation downstream of the translation termination codon
numbered like nucleotides (like Gln*1, Ser*2, ...)
- the three letter amino acid code is preferred to
describe the amino acids (see Discussion)
- amino acids are described as "Trp26" (i.e.
with Capital first letter, not as "trp26" or "Trp26")
- "X"
is used to designate a translation termination codon
Silent changes
Description of so called "silent" changes in the format p.Leu54Leu (or
p.L54L) is not allowed; descriptions should be given at DNA level. The description at protein level is
not informative and not unequivocal (there are five possibilities at DNA
level which may underlie p.Leu54Leu).
A correct description has the
format c.162C>G (p.=), with "p.=" indicating that
there is no effect on protein level expected (see Discussion).
Substitutions
Substitutions are designated by a ">"-character (indicating "changes
to").
- missense changes
p.Trp26Cys denotes that amino acid Tryptophan-26 (Trp, W) is changed to a Cysteine
(Cys).
- initiating methionine (Met1) (see
Discussion)
-
p.Met1? denotes that amino acid Methionine-1 (translation
initiation site) is changed and that it is
unclear what the consequence of this change is. When experimental data show that no
protein is made, the description p.0 can be used.
- p.Met1ValextMet-12 denotes that the translation initiating
Methionine is changed to a Valine, activating an upstream
translation initiation site starting at position -12 (Methionine-12)
- nonsense changes
p.Trp26X denotes that amino acid Tryptophan-26 (Trp, W) is changed to a stop codon (X).
no-stop change (substitution in stop codon)
"extX#" is used to indicate the extension of a
protein sequence when the stop codon changes to an amino acid with
"X#" indicating the length of the extended reading frame
NOTE: for counting the length of the extended reading frame
includes the stop codon (X110) that changes to an amino acid.
-
p.X110GlnextX17 (alternatively p.X110QextX17) denotes a change in the stop codon (X) at position
110, changing it
to a codon for Glutamine (Gln, Q) and adding a tail of 16 new amino acids to the protein's
C-terminus after which a new stop codon (X17) is reached.
- unknown effect
- p.? - protein has not been
analysed, an effect is expected but difficult to predict
- p.(=) - protein has not been analysed, but no
change is expected
- amount of protein
changes which affect the promoter of a gene, the transcription
initiation site (cap site), the translation initiation site, etc.
may affect the amount of protein produced. Similarly, a deletion of
the promoter / exon 1 region usually has the effect that no protein
is produced (or that other promoters are activated).
- p0 - no protein can be
detected
- p.0? - probably no protein is
produced
Deletions
Deletions
are designated by "del" after an indication of the first and
last amino acid(s) deleted.
- p.Lys2del in the sequence MKMGHQQQCC denotes a deletion of amino
acid Lysine-2 (Lys, K) to MMGHQQQCC
- p.Gln8del in the sequence MKMGHQQQCC denotes a Glutamine-8
(Gln,
Q) deletion to MKMGHQQCC
- p.Cys28_Met30del denotes a deletion of three amino acids, from Cysteine-28 to
Methionine-30
- if a non-frame shifting deletion creates a new amino acid at the deletion junction the change is
described as a insertion/deletion, e.g. p.Cys28_Met29delinsTrp (see Discussion)
Duplications
Duplications
are designated by "dup" after an indication of the first and
last amino acid(s) duplicated.
- p.GLy4_Gln6dup in the sequence MKMGHQQQCC denotes a duplication of amino acids
Glycine-4 (Gly, G) to Glutamine-6 (Gln, Q) (i.e. MKMGHQGHQQQCC)
- duplicating insertions in single amino acid stretches (or short tandem repeats) are
described as a duplication, e.g. a duplicating HQ insertion in the HQ-tandem repeat
sequence of MKMGHQHQCC to MKMGHQHQHQCC is described as
p.His7_Gln8dup (not p.Gln8_Cys9insHisGln)
Insertions
Insertions are designated by "ins" after an indication of the
amino acids flanking the insertion site, followed by a description of the amino
acid(s) inserted.
Duplicating insertions should be described as duplications (see Discussion), not as insertion. For large insertions the number of inserted
amino acids
should be mentioned, together with an accession.version number referring to a sequence database file
containing the complete inserted sequence.
- p.Lys2_Met3insGlnSerLys denotes that the sequence GlnSerLys (QSK) was inserted between
amino acids Lysine-2 (Lys, K) and Methionine-3 (Met, M), changing MKMGHQQQCC to MKQSKMGHQQQCC
- if a non-frame shifting insertion creates a new amino acid at the insertion junction the change
is described as an insertion/deletion, e.g. p.Cys28delinsTrpVal (see Discussion)
Variability of short sequence repeats
Variability of short sequence repeats
are described as p.Gln6(3_6); the description indicates that a stretch of Glutamines
(Gln, Q) is present, starting at amino acid position 6 (e.g. in MKMGHQQQCC),
which
is found with a variable length from 3 to 6 in the population.
NOTE: the underscore is used to indicate the range (3 to 6
times).
Insertion-deletions (indels)
Insertion/deletions (indels)
are described as a deletion
followed by an insertion after an indication of the amino acid(s) flanking the
site of the insertion/deletion (see
Discussion).
- p.Cys28_Lys29delinsTrp denotes a 3 bp deletion that affects the codons for Cysteine-28
and Lysine-29, substituting them for a codon for Tryptophan
- p.Cys28delinsTrpVal denotes a 3 bp insertion in the codon for Cysteine-28,
generating codons for Tryptophan (Trp, W) and Valine (Val, V)
Frame shifts
Frame shifting mutations are designated by
"fs" after the amino acid(s) affected by the change. Descriptions
either use a short ("fs") or long ("fsX#")
description. The description of frame shifts does not include the
deletion at protein level from the site of the frame shift to the natural end of
the protein (stop codon).
NOTE: typing error in den "Dunnen &
Antonarakis (2000)". The suggestion to use ">"
to indicate "delins" in frame shift descriptions has
been retracted.
NOTE: also for frame shifts the changes
observed should be described on protein level and not try to incorporate any knowledge regarding the
change at DNA-level (see above). Thus, p.His150HisfsX10
is not correct, but p.Gln151ThrfsX9 is.
- short description - uses "fs" only, e.g. p.Arg97fs
- long description - uses "fsX#" (see Discussion)
- includes the change occurring at the site of the frame
shift, e.g. p.Arg97Gly
-
"X#" indicates at which codon position the new reading frame ends in a stop
(X). The position of the stop in the new reading frame is calculated starting at the
first changed amino acid that is created by the frame shift, and ending at the first
stop codon (X#), e.g. p.Arg97GlyfsX16
NOTE: the shifted reading frame is thus open for '#-1' amino
acids
- Examples;
- p.Arg97ProfsX23 (not p.Arg97_Pro109delfsX23; short p.Arg97fs) denotes a frame shifting change with Arginine-97 as the first affected amino acid, changing
to a Proline and
creating a new reading frame ending in a stop at
position 23 (counting starts with the Proline as amino acid 1).
- p.Leu30SerfsX3 (not p.Leu30_Cys42delinsSerfsX3; short
p.Leu30fs) denotes a
frame shifting change that deletes amino acids Leucine-30 to Cysteine-42,
substituting these for a Serine at the deletion
junction and creating a new reading frame ending in a stop at position 3
(counting starts with the Serine as amino acid 1).
- p.Tyr4X describes the consequence of
the change c.12delC in the sequence ATG-GAT-GCA-TAC-GTG-ACG
to ATG-GAT-GCA-TA.-G TG-A CG.
- p.Asp2MetfsX4 (alternatively p.Asp2fs) describes the
consequence of the change c.4delG in the sequence ATG-GAT-GCA-TAC-GTG-ACG
to ATG- .AT-G CA-T AC-G TG-A CG.
- p.Glu5ValfsX5 (alternatively p.Glu5fs) describes the
consequence of the change c.6_13dup in the sequence ATG-GAT-GCA-TAC-GAG-ATG-AGG
to ATG-GAT-GCA-TAC-GT-G CA-T AC-G AG-A
TG-A GG.
NOTE: the inserted amino acids are not specified (see
Recommendation)
More changes in one individual
Two or more changes in one individual are described by combining the changes,
per allele (chromosome) between brackets ("[]").
Changes in different alleles (e.g. in recessive diseases) are
described as "[change allele 1]+[change allele 2]" (see
Discussion).
- p.[Ala25Thr]+[Ala25Thr] denotes a homozygous change of amino acid
Alanine-25 to Threonine.
- p.[Ala25Thr]+[?] denotes a change of amino acid Alanine-25 to
Threonine in one allele and an
unknown change in the other allele
NOTE: "unknown change in the other allele"
does not only mean that no DNA-change was detected in that other allele
but includes cases where the consequence of a detected change is unclear
or can not be predicted (e.g. the consequence of a change at the splice
site)
- p.[Ala25Thr]+[=] denotes a change of amino acid Alanine-25 to
Threonine in one allele and a normal
sequence (indicated by "=") in the other allele (see FAQ)
Two variations in one allele
- deriving from two independent changes at DNA level are described as "[first
change;second change]" (see Discussion).
- p.[Ala25Thr;Gly28Val] denotes two changes in one allele; amino acid
Alanine-25 to Threonine and Glycine-28 to Valine
- deriving from one change at DNA level that has more than
one effect on RNA/protein level are described as "[first change, second change]" (see Discussion).
- p.[Ala5Thr, Ala5_Gly30delfsX] denotes two protein changes
deriving from a change in one allele at DNA level (c.13G>A, see
Figure) resulting in two transcripts (r.[13g>a, 13_88del]); amino acid
Alanine-5 to Threonine and a deletion of amino acids Alanine-5 to
Glycine-30 followed by a frame shift
Two sequence changes with alleles unknown are described as "[change allele
1(+)change allele 2]" (see Disucssion).
- p.[Ala25Thr(+)Gly94Val] denotes that two changes were
identified in one individual (amino acid Alanine-25 to Threonine and
Glycine-94 to Valine), but it is not
known whether these changes are in the same allele or in different
alleles
| Top of page | MutNomen
homepage | Check-list |
| Recommendations: general, DNA,
RNA, uncertain |
| Discussions | FAQ's | Codons / amino acids | History
|
| Example descriptions: QuickRef
/ symbols, DNA, RNA,
protein |
Copyright © HGVS 2007 All Rights Reserved
Website Created by Rania Horaitis, Nomenclature by J.T. Den Dunnen - Disclaimer |