(modified last November 11, 2005)
Exon | Length (bp) |
5' cDNA position |
Splice | Length intron (kb) |
GenBank | Remarks |
---|---|---|---|---|---|---|
1 | >67 | -15 | 1 | >7 | AF018958 | 5'-UTR, 52 bp coding |
2 | 26 | 53 | 0 | 0.9 | AF018959 | |
3 | 106 | 79 | 1 | 1.6 | AF018960 | |
4 | 142 | 185 | 2 | 1.9 | AF018961 | |
5 | 196 | 327 | 0 | 2.2 | AF018962 | |
6 | 2235 | 523 | - | - | AF018963 | 150 bp coding, 3'-UTR |
Legend:
Exon: numbering of exons and intron/exon boundaries is according to Sauer
et al., with the first base of the Met-codon counted as position 1 (see coding
DNA Reference Sequence). Length (bp): size of exon indicated in
basepairs. 5' cDNA position: first base of the exon (according to cDNA sequence
reported by Sauer
et al.). Splice: splicing occurs in between of two coding triplets (0), after
the first (1) or the second (2) base of a triplet. Length intron (kb): size of
intron indicated in kilobasepairs. GenBank: accession number of sequence
in GenBank. Remarks: 5'UTR = 5' untranslated region, 3'UTR = 3' untranslated
region.
1 15 16 30 31 45 46 60 61 75 76 90 * ** * * * human: MSRKIEGFLLLL--- --------------- --------------- --------------- --------------L FGYEATLGLSSTEDE 28 mouse: MPHKIEGFFLLL--- --------------- --------------- --------------- --------------L FGYEATLGLSSTEDE 28 fugu: MHLPREAFLLALAGA FIFPSSQQEKRTQRD LRVVCFYQKDFRNGT KAEQWGKKTPTQIWR GCMS--VCNAIVVCL F----ELGVSET--- 81 91 105 106 120 121 135 136 150 151 165 166 180 * * * * **** * ** * * * *** *** human: GEDPWYQKACKCDCQ GGPNALW-------S AGATSLDCIPECPYH KPLGFESGEVTPDQI TCSNPEQYVGWYSSW TANKARLNSQGFGCA 111 mouse: GEDPWYQKACKCDCQ VGANALW-------S AGATSLDCIPECPYH KPLGFESGEVTPDQI TCSNPEQYVGWYSSW TANKARLNSQGFGCA 111 fugu: ----WNSKSCKCDCE GGESPTEFPSIRTGS SMVRGVDCMPECPYH RPLGFEAGSISPDQI TCSNQDQYTAWFSSW LPSKARLNTQGFGCA 167 181 195 196 210 211 225 226 240 241 255 256 270 ** * ** * ** ** * ** * * * * ** ** * ** human: WLSKFQDSSQWLQID LKEIKVISGILTQGR CDIDEWMTKYSVQYR TDERLNWIYYKDQTG NNRVFYGNSDRTSTV QNLLRPPIISRFIRL 201 mouse: WLSKYQDSSQWLQID LKEIKVISGILTQGR CDIDEWVTKYSVQYR TDERLNWIYYKDQTG NNRVFYGNSDRSSTV QNLLRPPIISRFIRL 201 fugu: WLSKFQDNTQWLQID LIDAKVVSGILTQGR CDADEWITKYSLQYR TDEKLNWIYYKDQTG NNRVFYGNSDRSSSV QNLLRPPIVARYIRI 257 271 285 286 300 ** ** * * ** * ** human: IPLGWHVRIAIRMEL LECVSKCA 224 mouse: IPLGWHVRIAIRMEL LECASKCA 224 fugu: LPLGWHTRIALRLEL LLCMNKCS 280
Legend:
The mouse (Reid
et al., GenBank AF073780) and fugu (Brunner
et al., GenBank AF146687) RS1 proteins show 95% and 71% amino acid identity to the
human RS1 protein, respectively. Identical amino acids are indicated in red. The identity
between the three proteins is mainly in the discoidin domains (shown in boldface) and less
in the putative leader sequences (amino acids 1-23). All amino acids showing variations in
human are indicated by an asterisk above the sequence.
mis C C * * ** * *C C * * **C C*C C* * ** * * Cc* con C PLGmesG I d QItASS s W p +aRLn g nAW p d qWLQvDL + + vtGv TQGa ++d 55 g e i i i i hRS1 TSLDCIPECPYHKPLGFESGEVTPDQITCSNPEQYVGWYS-S-WTANKARLNSQGFGCAWLSK--FQ-DS-SQWLQIDLKEIKVISGILTQG--RCD--- mRS1 TSLDCIPECPYHKPLGFESGEVTPDQITCSNPEQYVGWYS-S-WTANKARLNSQGFGCAWLSK--YQ-DS-SQWLQIDLKEIKVISGILTQG--RCD--- fRS1 RGVDCMPECPYHRPLGFEAGSISPDQITCSNQDQYTAWFS-S-WLPSKARLNTQGFGCAWLSK--FQ-DN-TQWLQIDLIDAKVVSGILTQG--RCD--- TRK3 KAQVNPAICRY--PLGMSGGQIPDEDITASSQ---WS---EST-AAKYGRLDSEEGDGAWCPEIPVEPDDLKEFLQIDLHTLHFITLVGTQG--RHAGGH EDD1 KGHFDPAKCRY--ALGMQDRTIPDSDISASS--S-WS---DST-AARHSRLESSDGDGAWCPAGS-VFPKEEEYLQVDLQRLHLVALVGTQG--RHAGGL AEBP WTPTEKVKCP---PIGMESHRIEDNQIRASSMLRHGLGAQRGRLNMQTGATEDDYYDGAWCA----EDDARTQWIEVDTRRTTRFTGVITQG---RDSSI CAP AEGWGYYGC-DEELVGPLYARS----LGASSYYSLL------T-APRFARLH---GISGWSPRIG-DPNP---WLQIDLMKKHRIRAVATQGSF--NS-- MFGM-1 AGNHCETKC--VEPLGMENGNIANSQIAASSVRVTF-LGLQH-WVPELARLNRAGMVNAWTPS----SNDDNPWIQVNLLRRMWVTGVVTQGA-SRLAS- MFGM-2 ----ELNGCA--NPLGLKNNSIPDKQITASSSYKTWGL-HLFSWNPSYARLDKQGNFNAWVAG-SYGND---QWLQVDLGSSKEVTGIITQGA--RN-FG HEMO-1 STVSPPPECSPDNYIDLVMGDEPLPD-TAFSASSEFS-EIFAPHNARLNRGPTNSGAGSWNPKV----NNDKQYIQVELPRREPIYGVVLQGSPIFD--- HEMO-2 PTSESPLQCTE--PLGLI-GELPLENIQVSSNSEEKD--YLSINGNRGWKPLYNT--PGWV--MFDFT-GPRNITGILTKGGN-------------D--- COAG5-1 PFLIMDRDCRM--PMGLSTGIISDSQIKAS----EF-LGY---WEPRLARLNNGGSYNAWSVEKLAAEFASKPWIQVDMQKEVIITGIQTQGA-KHYLK- COAG5-2 ----EVNGCST--PLGMENGKIENKQITASSFKKSW--WG-DYWEPFRARLNAQGRVNAWQA-K--ANNN-KQWLEIDLLKIKKITAIITQGC-KSLSS- COAG8-1 LFLVYSNKC--QTPLGMASGHIRDFQITASG------QYG-Q-WAPKLARLHYSGSINAWST-K----EP-FSWIKVDLLAPMIIHGIKTQGA--RQKFS COAG8-2 ----DLNSCSM--PLGMESKAISDAQITASSYF-TNMF--AT-WSPSKARLHLQGRSNAWRP----QVNNPKEWLQVDFQKTMKVTGVTTQG-VKSLLT- NRP-1 SSVSEDFKC-ME-ALGMESGEIHSDQITASS---QY---S-TNWSAERSRLNYPE--NGW----TPGEDSYREWIQVDLGLLRFVTAVGTQGAISKETKK NRP-2 --KITDYPCSG--MLGMVSGLISDSQITSSNQGDRN-------WMPENIRLVT-SR-SGWALPPA-PHSYINEWLQIDLGEEKIVRGIIIQGG-KH---- NRP2-1 QEPLENFQC--NVPLGMESGRIANEQISASSTY------SDGRWTPQQSRLHGDD--NGW----TPNLDSNKEYLQVDLRFLTMLTAIATQGAISRET-- NRP2-2 --RVTDAPCSN--MLGMLSGLIADSQISASSTQEYL-------WSPSAARLVS-SR-SGWFPR-IPQAQPGEEWLQVDLGTPKTVKGVIIQGARGGDSIT DISCa Q-LLANAQCH--------LRTSTNYNGV-HT----QF---NSALNYKNNGTNTIDGSEAWCSSIVDTN----QYIVAGCEVPRTFMCVALQG--RGDA-- mis * *C * C * C ** ** C *C ** ** C * ** C *C con e v+syki yS ng W y+d kvF GN D V +nlf PPI ARyiRi P tWh +IaLRlELlGC 144 f d n fv 224 hRS1 IDEW-MTKYSVQYR-TDERLNWIYYKDQTG-NNRVFYGNSDRTSTV-QNLLRPPIISRFIRLIPLGWHV--RIAIRMELLECVSKCA* mRS1 IDEW-VTKYSVQYR-TDERLNWIYYKDQTG-NNRVFYGNSDRSSTV-QNLLRPPIISRFIRLIPLGWHV--RIAIRMELLECASKCA* fRS1 ADEW-ITKYSLQYR-TDEKLNWIYYKDQTG-NNRVFYGNSDRSSSV-QNLLRPPIVARYIRILPLGWHT--RIALRLELLLCMNKCS* TRK3 GIEF-APM--YKINYSRDGTRWISWRNRHGKQV--LDGNSNPYDIFLKDL-EPPIVARFVRFIPVTDHSMN-VCMRVELYGCVWLDGL EDD1 GKEF--S-RSYRLRYSRDGRRWMGWKDRWGQ-E-VISGNEDPEGVVLKDL-GPPMVARLVRFYPRADRVMS-VCLRVELYGCLWRDGL AEBP HDDF---VTTFFVGFSNDSQTWVMYTNGYE--EMTFHGNVDKDTPVLSELPE-PVVARFIRIYPLTWNG-S-LCMRLEVLGCSVAPVY CAP -WDW---VTRYMLLYGDRVDSWTPFYQRGHNST--FFGNVNESAVVRHDL-HFHFTARYIRIVPLAWNPRGKIGLRLGLYGCPYKADI MFGM-1 -HEY---LKAFKVAYSLNGHEFDFIHD-VNKKHKEFVGNWNKNA-VHVNLFETPVEAQYVRLYPTSCHT-A-CTLRFELLGC------ MFGM-2 SVQF---VASYKVAYSNDSANWTEYQDPRTGSSKIFPGNWDNHS-HKKNLFETPILARYVRILPVAWHN--RIALRLELLGC* HEMO-1 --QYVTSY-EIMYGDDGNTFSTVDGPDGKPK-I--FRGPIDNTHPV-KQMISPPIEAKVVRIRPLTWHD-E-ISLRLEIIGC------ HEMO-2 -GWVTS-YK-VLYTSDFETFNPVIDKD--GKE-KIFPANFDGIVSVTNE-FHPPIRARYLKVLPQKWNK-N-IELRIEPIGCFEPYPE COAG5-1 -SCY-T--TEFYVAYSSNQINWQIFKGNSTRNVMYFNGNSD-ASTIKENQFDPPIVARYIRISPTRAYN--RPTLRLELQGC------ COAG5-2 --EM--YVKSYTIHYSEQGVEWKPYRLKSSMVDKIFEGNTNTKG-HVKNFFNPPIISRFIRVIPKTWNQ-S-IALRLELFGCDIY* COAG8-1 SL----YISQFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSG-IKHNIFNPPIIARYIRLHPTHYSI--RSTLRMELMGC------ COAG8-2 --SM--YVKEFLISSSQDGHQWTLFFQNGKV--KVFQGNQDSFTPV-VNSLDPPLLTRYLRIHPQSWVH-Q-IALRMEVLGCEAQDLY NRP-1 K--Y--YVKTYKIDVSSNGEDWITIKEGNK--PVLFQGNTNPTDVVVA-VFPKPLITRFVRIKPATWET-G-ISMRFEVYGC------ NRP-2 REN-KVFMRKFKIGYSNNGSDWKMIMDDSKRKAKSFEGNNNYDTPELR-TF-PALSTRFIRIYPERATHGG-LGLRMELLGCEVEAPT NRP2-1 -QNG-YYVKSYKLEVSTNGEDWMVYRHGKNH--KVFQANND-ATEVVLNKLHAPLLTRFVRIRPQTWHS-G-IALRLELFGC------ NRP2-2 AVEARAFVRKFKVSYSLNGKDWEYIQDPRTQQPKLFEGNMHYDTPDIRRF-D-PIPAQYVRVYPERWSPAG-IGMRLEVLGCDWTDSK DISCa -DQW---VTSYKIRYSLDNVSWFEYRN-GAA----VTGVTDRNTVVNH-FFDTPIRARSIAIHPLTWNG--HISLRCEFYTQ
Legend:
Discoidin domain alignment. An alignment was made between 31 proteins containing one or
two discoidin domains (14 and 17 proteins respectively). If for a specific protein
sequences from more than one organism were known, only the human sequence is presented
(except for RS1, for which all three known discoidin sequences are displayed).
Top line; mis is amino acids hit by RS1-missense mutations (C if a cysteine was hit or
created, asterisk for other changes), con is the consensus sequence with amino acid in
bold capital when found in at least 18/20 sequences, capital when found in at least 12/20
sequences, small when found at least 7 times and + when a positively charged amino acid
was found in at least 7/20 sequences. For proteins containing two discoidin domains, the
first and second domain are indicated as -1 and -2 respectively. Consensus amino acids are
shown in color in the proteins.
The proteins aligned are (between brackets known number of species, full name and GenBank
accession number): hRS1 (1* human X-linked juvenile retinoschisis precursor protein,
AF014459), mRS1 (1* mouse X-linked juvennile retinoschisis precursor protein, AF073780),
fRS1(1* fugu X-linked juvenile retinoschisis precursor protein, AF146687) TRK3 (2*
tyrosine receptor kinase, Q16832), EDD1 (3* epithelial discoidin domain receptor 1
precursor, Q08345), AEBP1 (2* adipocyte transcription factor, JC5256), CAP (2* contactin
associated protein, U87223), MFGM (5* milk fat globule membrane protein, Q08431), HEMO (1*
silkworm hemocytin, S52093), COAG5 (2* coagulation factor V precursor, M14335), COAG8 (2*
coagulation factor VIII precursor, P00451), NRP (4* neuropilin, AF018956), NRP2 (3*
neuropilin 2, AF022859) and DISCa (1* slime mold discoidin I chain A, J01282).
| Top of page | RetinoschisisDB
homepage |
| RS1 gene sequence variations |