The Structural Biology Of Telomere Associated Proteins Biology Essay

Published: November 2, 2015 Words: 7224

Telomere is essential for solving the protection and replication problems associated with linear chromosome ends. A group of telomere-associated proteins collaborate delicately for telomere elongation and end protection. Despite the fact that structural biology of telomeres is still in its infancy, the available structural information from some telomere-assoiciated proteins and their sub-complexes have provided valuable insights into the evolution relationship and functional mechanisms of telomeric proteins. This review summarizes recent progress on structural studies on telomerase complex and telomere-binding protein complexes, with the goal to highlight common theme and evolutionary plasticity of telomeric proteins in the action of telomere synthesis and end protection.

Telomere is a specialized region at the chromosome end which plays important roles in genomic maintenance and faithful transmission of genetic information . Telomeres function as end guard to solve the protection and replication challenges associated with the nature termini of linear chromosomes . Telomeres can protect nature ends from being recognized as double-stranded DNA break (DSB) . Dysfunction of telomeres will trigger deleterious degradation and fusion events corresponding to exposed DNA ends . In addition, telomere can provide a "buffer" region to circumvent gradual chromosome shortening caused by semi-conservative DNA replication .

The establishment of these functions rely on specialized telomeric DNA and associated protein complexes . Telomeric DNA contains several hundreds to thousands of non-coding repetitive Guanidine-rich DNA sequences that are unique in different organisms (e.g. GGTTAG in vertebrates and GGTTAC(A)(C)G(0-6) in fission yeast), terminating a 3' G-overhang at the very end . Telomeric DNA is coated by numerous proteins to form different structural and functional complexes to regulate protection and replication of telomere . Telomere-associated proteins can be divided into three groups according to their functions and localizations. The first one is telomerase complex, who can de novo extends telomeric DNA to the 3' ends of chromosomes to complete the genome replication . The second group is telomere-binding proteins. They specifically associate with telomere DNA to from a protective cap to shield telomere from inappropriate DNA repair . They also collaborate to regulate telomerase activity to maintain telomere length . The third group is telomere accessory factors, including proteins in DNA repair and damage signaling pathway and in protein degradation pathway . These accessory factors have important roles in telomere biogenesis and maintenance, but they also have more general roles in other cellular processes . More often, they only transiently associate with telomere during cell cycle and much less abundant than telomere-specific proteins . In this review, we will focus on proteins from the first and second group, which are telomere-specific.

Structural elucidation has been an important tool to infer the evolutionary relationship of telomere proteins across species . Hitherto high resolution structures of several telomere-associated proteins have been reported, which reveal some important features about the conservation and plasticity of these telomeric proteins . Here, the recent progresses on structural studies on telomerase complex and telomere-binding protein complexes will be discussed to reveal some conserved structural elements with diversified functions. The organization and functions of telomeric complexes in three representing species will also be addressed to highlight the common theme and evolutionary plasticity of telomeric proteins involving in telomere homeostasis and protection.

1. Structural biology of Telomerase

Telomere can be extended by transposition, homology recombination and de novo synthesis . The de novo synthesis of telomere is carried out by a special ribonucleoprotein complex, telomerase . It contains a reverse transcriptase protein component (TERT) and an internal RNA template (TR) to mediate nucleotide addition to the 3' end of G overhang . Additional regulatory or structural proteins associate with the core TERT-TR complex to form a functional holoenzyme .

TERT proteins in most species contain four domains: a telomerase essential N-terminal (TEN) domain, a telomerase RNA binding (TRBD )domain, a reverse transcriptase (RT) domain, and a C-terminal extension (CTE) domain . The TEN domain interacts with both telomeric ssDNA and TR RNA component, and can promote processive repeat synthesis by telomerase . TRBD domain assure specific recognition between TERT and TR . RT domain is similar to retrotransposon RTs, and is the active site for catalysis of dNTP addition . The CTE domain may enhance the DNA association, but the exact role is not clear . The domain organization of TERT is quite conserved, as the variation often happens at the connecting loops between these domains . In contrast, the RNA component of telomerase vary a lot, with disparate sequences and diversified sizes from about 150 nt in ciliates to 1300 nt in budding yeasts . Although conserved core secondary structure elements of TR were identified , the precise roles of these elements are still in debate.

Up to date, some individual telomerase domains and RNA fragments have been structural characterized (Table 1). However, the high-resolution structures of telomerase holoenzymes are still missing. Here, we will focus on structural studies of Tetrahymena thermophila telomerase, since this is the first identified and best characterized telomerase . Besides TERT and TR, T. thermophila telomerase holoenzyme contain four regulatory proteins, p20, p45, p66 and p75 . TERT, TR and p65 comprise a stable catalytic core complex; p75, p45, and p19 comprise a telomere adaptor subcomplex (TASC) (Figure 1A). Another telomerase-associated factor, Teb1, collaborates with TASC to stimulate the repeat addition processivity (RAP) of telomerase . Several domains of T. thermophila TERT, including TtTEN , TtTRBD and a homologous RT domain from Tribolium castaneum , have high-resolution structures (Figure 1B). The structures of two important structural elements (Stem II and Stem IV) in T. thermophila TR have been determined . The complex structures of some telomerase-regulatory proteins, such as P65 and Teb1 in complex with RNA and ssDNA respectively, have also been reported (Figure 1B). These structures reveal important information about the assembly and activity regulation of telomerase.

The structure of TtTEN domain presents a novel protein fold . Mutagenesis of some conserved residues in a groove on its surface shows these residues are crucial for binding of telomeric ssDNA and promote telomerase activity. A flexible C-terminal loop of TEN domain is positively charged and is involved in RNA binding. The multiple roles of TEN domains highlight the importance of this domain in telomerase catalysis activity.

TtTRBD domain is a kinked structure with two asymmetric helical lobes , which is ideal for TR binding. Two conserved motifs (CP and T motifs) important for RNA binding, are localized at the center hinge region of TRBD domain, forming the potential RNA-binding pocket. The structure of isolated TtTRBD domain is almost identical to TRBD structure in full length TcTERT (T. castenem TERT), suggesting T.thermophila TERT may adopt similar structure as T. castenem TERT structure.

TcTERT structure shows a striking ring-like structure composed of three domains: TRBD, RT and CTE. RT domains processes the canonical RT motifs 1, 2, A, B', C, D, and E, organized in a right-hand-like structure . RT and CTE domains form a canonical palm-finger-thumb structure, and TRBD domain contacts with CTE domain to complete the ring. The complex structure of TcTERT with a model RNA-DNA hybrid confirms that the nucleic acid sits in the center of the ring with significant contacts with fingers, palm and thumb domains . The active site is in the palm of the RT and contains universally conserved catalytic aspartates (Asp251, Asp343 and Asp344). Contacts between protein and nucleic acid places 3′-end hydroxyl of the DNA primer at the active site of the enzyme for nucleotide addition, suggesting current structure is in an active telomerase conformation. However, since the TR component associated with TcTERT has not been identified, the real picture of telomerase action is still on hold until the elucidation of the structure TERT complex with an endogenous TR.

T. thermophila possesses the shortest TR with only 159-nt, but includes all the conserved motifs . It contains four helices, Stem I to IV, and several single-stranded regions with important functions (Figure 1A). Stem I is involved in long-range base pairing and crucial for TERT binding . Stem II contains template boundary element (TBE) for 5′ template boundary definition, and also interacts with TRBD of TERT . Stem IIIa and IIIb contribute to form RNA pseudoknot involved in telomerase assembly and activity . The 9-nt template sequence is localized between stem II and III. Stem IV is involved in nucleotide addition processivity and interaction with TEN domain of TERT . The solution structure of 23-nt T. thermophila stem II reveals that stem II forms a well-defined helix with two unpaired adenines and a pentaloop , in sharp contrast with previous biochemical data that stem II alone is unstructured . Comparison of same TBE loop in stem II structure of Haloarcula marismortui 23s rRNA indicates that similar RNA-protein interactions may occur in TERT/TR complex. The NMR structure of 43-nt T.thermophila stem IV was obtained from two overlapping fragments corresponding to the template-proximal and distal parts of the stem IV. The complete stem IV structure exhibits a severely kinked structure, with a rigidly defined distal loop linked to a conformational flexible template-proximal region . The GA bulge in the middle of stem-loop IV sharply kinks the entire structure. The nucleotides contributing to formation of such a specific structure are well conserved, and mutations of these nucleotides disrupt the enzyme activity while no effect on TERT binding, suggesting a complicated mechanism for reposition of stem IV during catalytic cycle.

The catalytic core of T.thermophila telomerase is composed of TR, TERT and an essential La family protein P65. P65 is important for TR accumulation and holoenzyme assembly. P65 is a member of LARP7 family proteins, and contains four domains: an N-terminal domain, a La motif (LAM), an RNA recognition motif (RRM), a C-terminal domain. The C-terminal domain is an atypic RRM domain as shown by NMR structure . Structure of P65 C-terminal domain with stem IV of TR revealed that specific protein-RNA recognition induce significant conformational change on both protein and RNA. The previously disordered C-terminal extension in unliganded P65 protein converts to an α helix in the complex, which is necessary for hierarchical assembly of TERT with p65-TR . The protein-binding also induce bend in stem IV of TR, possibly positioning it for interaction with TRBD domain .

Teb1 association with the T.thermophila telomerase catalytic core can convert the limited repeat addition processivity (RAP) of the catalytic core to the high RAP of endogenously assembled holoenzyme. Teb1 contains three putative OB (Oligonucleotide or oligosaccharide-Binding) folds, which were confirmed by crystal structures of these domains . Two N-terminal OB folds (Teb1OB1+2) achieve high affinity and selectivity of telomeric single-stranded DNA (ssDNA) recognition by specific protein structures. The C-terminal OB fold (Teb1OB3) only marginally contribute to DNA binding, but definitely crucial for high RAP activity . These results suggest a model that the initial recruitment of telomerase to telomeric ssDNA tracts involves Teb1OB1,2 recognition of telomeric repeats. Subsequent capture of the 3′ end by the active site of the telomere catalytic core could then favor Teb1OB3-ssDNA contact, trapping product in a sliding-clamp-like manner that does not require high-affinity DNA binding for high stability of enzyme-product association .

2. The common structural elements in telomere-associated proteins

Besides telomerase, a group of telomere-binding proteins play vital roles in telomere protection and maintenance . It has been suggested that telomere-binding proteins are evolutionarily conserved . However, the sequences of telomeric proteins are very divergent through species . Recent structural studies established that these telomere-binding proteins often adopt a modular flexible structure sharing some common structural elements, which can define the conservation of these telomere-binding proteins . Here, we will discuss some structural domains identified in the telomere proteins, to reveal the important structural and functional roles of these conserved domains in telomere protection and maintenance.

2.1 OB fold

The most common domain in telomere-binding proteins is Oligonucleotide or oligosaccharide-Binding (OB) fold. OB fold has been a golden marker to identify evolutionary orthologs of telomere proteins from different organismes . One example is to confirm HsTPP1 (formerly known as TINT1, PTOP or PIP1) protein is a homolog of Oxytricha nova TEBPβ by structural comparison . The OB fold is a structural domain of 70 - 220 amino acids in length with diverse functions. The OB fold comprises two orthogonally packed anti-parallel β sheets with β1a: β4: β5 strand topology in one sheet and β1b: β2: β3 topology in the other (Figure 2A). The N-terminal strand β1 continues as the outer edge of both sheets, dividing into β 1a and β 1b. Strands β4 and β5 often fold over to extend the other sheet and thus complete a closed β -barrel-like structure. N-terminal region and the connecting loop between β 3 and β 4 seal the top and bottom of the OB barrel respectively. Most telomere-specific OB-folds are further characterized by a C-terminal α-helix. The loops connecting β strands are variable in length, especially for loop joining strands β 3 and β 4, which often contains an extra α-helix (Figure 2A). The variable loops and also different N or C terminal extensions make OB fold to be an ideal domain for divergent evolution and achievement of diversified functions . The increasing accumulation of available structures of OB folds from telomere proteins have highlighted the remarkable structural conservation and diversity of this fold, and the myriad ways in which this fold can mediate protein-protein and protein-DNA interactions (Table 1).

The primary function for OB-containing telomeric proteins is to recognize and protect telomeric ssDNA. The first identified ssDNA binding protein complex is O. nova TEBP (Telomere End-Binding Protein) complex, which is composed of TEBPα and TEBPβ . TEBPα comprises three OB folds, whereas TEBPβ contains one OB fold. The ssDNA binding activity is achieved by OB1 and OB2 from O. nova TEBPα . Pot1 (Protection of Telomeres 1), in fission yeast and mammalian cells, were identified to be distant homologs of O. nova TEBPα, also use one or two OB folds to recognize telomeric ssDNA with varying length . In Saccharomyces cerevisiae and other budding yeast, the ssDNA is bound by Cdc13 (cell division cycle 13), a multiple OB-folding protein . The third OB fold of ScCdc13 is involved in ssDNA binding . Besides these telomere ssDNA-binding proteins, OB fold is also identified in telomerase regulatory protein. For example, T.thermophila telomerase processivity factor Teb1 contains 3 OB folds, which all contribute to ssDNA binding . Collectively, the structures of OnTEBP, ScCdc13, SpPot1, HsPot1 and TtTeb1 in complex with ssDNA revealed a conserved protein-DNA interaction interface . The ssDNA primarily bind in a groove formed by one side of the β-barrel and two flanking loops L12 and L45 (Figure 2B). In these structures, both basic and aromatic residues on the ssDNA-binding grooves are required for both electrostatic and van der Waals interactions: basic residues stabilize the negative phosphate groups of the DNA backbone, whereas aromatic residues are involved in stacking with the bases of the DNA. However, the detailed interaction interfaces are slightly different, which explains different substrate specificity and variable optimal recognition length for these proteins .

More-recent studies establish that OB fold is also versatile in binding different protein partners. OB fold can recognize a polypeptide or another OB fold, either from itself or a different protein. The first OB fold in ScCdc13 can interact with a polypeptide from Pol1, the catalytic subunit of DNA polymerase α-primase complex . Disruption of the ScCdc13-Pol1 interaction causes cell growth defect and telomere lengthening . Crystal structure of Cdc13OB1 in complex with Pol1 reveals that Pol1 peptide is folded into a single amphipathic α-helix that binds into the deep basic groove (Figure 2C), the same groove used for ssDNA binding as shown in Cdc13OB3/ssDNA structure, suggesting a conserved oligomer binding mode. The lack of aromatic residues at interaction groove, which is important for stacking with the bases of the DNA, explains why Cdc13OB1 doesn't have ssDNA binding activity.

OB-OB interaction is another conserved protein interaction mode found in numerous telomeric proteins. O. nova TEBPα uses its third OB fold to associate with OB fold of TEBPβ subunit, and the resulting heterodimer can modulate access of telomerase to the end of telomere . Meanwhile, the third OB fold of TEBPα can also act as a homodimerization domain to form a TEBPα homodimer, which can recognize telomeric ssDNA in a distinct way as TEBP α /β heterodimer . In another structural context, yeast Stn1 is found to interact with Ten1 via OB-OB interaction, resembling Rpa32-Rpa14 interaction in nonspecific ssDNA-binding RPA (Replication Protein A, including Rpa70, Rpa32 and Rpa14) complex . SpStn1-Ten1 packed parallel with a side-by-side interface. The interactions are mediated primarily by the crossover of C-terminal α-helices from both proteins, and less contact through N terminal loop with one side of the β-barrel (Figure 2D) . OB-OB interaction can also occur in homodimerization of telomeric proteins. Most Saccharomyces and Kluyveromyces Cdc13 proteins form dimers through association of their N-terminal OB domains, whereas homodimerization of Candida Cdc13 proteins is mediated by the C-terminal OB fold . The structures of the ScCdc13OB1 dimer and the CgCdc13OB4 dimer reveal dramatically distinct modes of dimerization (Figure 2E and 2F). In the case of ScCdc13OB1, the two protomers are arranged end to end, and the symmetry dyad is perpendicular to the axis of the β-barrel(Figure 2F) . By contrast, the CgCdc13OB4 dimer involves a 2-fold symmetry axis that runs parallel to the β-barrel axis and a side-to-side dimerization interface (Figure2E) . All these structural studies confirm that OB fold is a versatile protein-protein interaction domain and plasticity in structure makes OB fold to be an ideal domain for divergent functions.

2.2 Homeodomain

Just as the OB-fold is the signature of all single-stranded G-overhang binding proteins, the homeodomain is found in all telomere binding proteins that recognize telomeric double-stranded DNA (dsDNA). The homeodomain, or related Myb domain, has a typical helix-turn-helix motif involved in DNA recognition. The difference between Myb domain and homeodomain is that homeodomain has an N-terminal arm making extra contacts with DNA, thus increasing both stability and specificity . The crystal structures of DNA-binding domains from ScRap1, HsTRF1 and HsTRF2 all exhibited such an N-terminal arm signature, so they belong to homeodomain family instead of Myb domain family (Figure 3A).

The first homeodomain-containing telomeric protein identified is budding yeast S. cerevisiae Rap1 (repressor-activator protein 1) . ScRap1 binds to the budding yeast irregular telomeric sequence (GTG1-3) directly via its two tandem homeodomains in the center of the protein . The crystal structure of ScRap1 DNA-binding domain with a 18bp telomeric DNA revealed that ScRap1 use two canonical homeodomains to bind DNA in a tandem orientation and each recognize six bp of binding site . Each homeodomain consists of three α-helices arranged in an orthogonal bundle around a hydrophobic core (Figure 3A). The second and third helices form a helix-turn-helix signature, which present residues that make sequence-specific contacts with bases in the major groove of DNA. Additional affinity and specificity are achieved by the N-terminal arms making extra contacts in the minor groove of DNA (Figure 3A). The extra long linker between two homeodomains allow two domains to sit into two adjacent major groove of DNA, which assures the binding specificity for tandem telomeric repeats. ScRap1 homologs in fission yeast and human, SpRap1 and HsRap1, also contain one or two homeodomains, but they cannot bind to telomeric DNA directly. The NMR structure of human Rap1 homeodomain exhibits a canonical three-helix bundle highly resembling ScRap1 (Figure 3A), but HsRap1 lacks significant positive charged residues on the surface, which accounts for the inability to bind DNA .

In mammalian cell, the double-stranded telomeric DNA is bound by two closely related proteins, TRF1 and TRF2 . The DNA-binding domains of these two proteins are localized to the single C-terminal homeodomain. The structures of the DNA-binding domains of human TRF1/2 bound to human telomeric DNA reveal that the single homeodomain specifically recognizes the telomeric sequence motif AGGGTT, sharing the conserved protein-DNA interface as in ScRap1-DNA structure (Figure 3B). TRF1 and TRF2 form a dimer in vivo, suggesting a more specific and higher affinity interaction with telomeric DNA can be achieved by juxtaposition of two homeodomains through TRF1/2 dimerization. Indeed, the crystal structure of HsTRF1 and hTRF2 homedomain in complex with 18bp telomeric DNA revealed that two homeodomains are bound to opposite faces of two adjacent binding site TAGGGTT . The dimeric nature and conformational flexibility of TRF1/2 enable TRF1/2 dimer to recognize two TAGGGTT sites simultaneously with extreme spatial variability . It is proposed that the use of two homeodomains to recognize DNA may not simply increase the affinity, but provide a means to facilitate specific higher-order structure formation, such T-loop formation .

Different from telomere-binding proteins in yeast and mammalian, the plant telomere-binding proteins present a novel family of homeodomain, which contains a C-terminal α-helix extension. They adopt a unique four-helix organization in tetrahedron shape, as reported in structures of rice RTBP1, Arabidopsis AtTRP1 and tobacco NgTRF1. The first three-helices in these plant homeodomains are in almost identical configuration as observed in ScRap1 and the protein-DNA interfaces are largely conserved (Figure 3C). The presence of the peculiar fourth helix adds extra contacts with DNA and also stabilizes the overall structure of DNA-binding domain, which is the characteristic of telomeric proteins in plants(Figure 3C).

2.3 TRFH domain

TRFH (Telomeric Repeat Factor Homology) domain was first identified in human TRF1 and TRF2, two telomeric dsDNA-binding proteins . Although TRFH domains from TRF1 and TRF2 only share 27% sequence identity, the crystal structures of the TRFH domains from human TRF1 and TRF2 show that they have almost identical and entirely α-helical dimeric structures in a twisted horseshoe shape (Figure 4A). Each monomer contains 10 α-helices and dimer formation is mediated by α 1, α 2 and α 10 from each monomer. Two key residue on dimerization interface (Met 78, Leu82 in TRF1 and Val56, Tyr60 in TRF2) prevent heterodimerization of TRF1 and TRF2. Dimeric structure is important for functions of TRF1/2 as point mutations at the dimeric interface prevents telomere localization in vivo .

TRF1 and TRF2 serve as the general docking sites for recruitment of various telomere-associated proteins. The twisted horseshoe-like dimeric structure of TRFH domain gives rise to significant large surface for interaction with other proteins, and differences in surface residues determines the interaction specificity of TRF1 or TRF2. Structural analysis of some TRF1 and TRF2 complexes demonstrate that TRFH domains from TRF1 and TRF2 recognize a conserved sequence motif F(Y)-x-L-x-P with distinct specificities (Figures 4A and 4 B). This conserved binding mode has been confirmed in various telomeric proteins, including TIN2 (TRF1 and TRF2-Interacting Nuclear protein 2), Apollo, PinX1(Pin2/TRF1 interaction protein 1) , PNUTS (phosphatase nuclear targeting subunit), MCPH1 (microcephalin 1) , and more candidates are awaiting for be discovered. It should be noted that not all the proteins binding to TRF1/TRF2 TRFH domain are mediated by this conserved motif. For example, FBX4, a F-box protein in ubiquitin E3 ligase SCF, interact with TRFH domain of TRF1 in a totally different mode

Taz1, the telomeric dsDNA-binding protein in fission yeast is proposed to be the ortholog of human TRF1 and TRF2 . Taz1 has similar domain organization as TRF1 and TRF2, containing a center TRFH domain and a C-terminal homeodomain. However, our recent structural study of Taz1 TRFH domain shows that Taz1TRFH is not a structural homolog of TRFH domains from TRF1 and TRF2 (Wang and Lei, in preparation). Besides Taz1, another TRF1-ortholog SpTbf1 is identified in fission yeast , which is essential for viability and telomere length regulation. Whether SpTbf1 contains Taz1-like or TRF1/2-like TRFH domain is unknown due to lack of structural information.

TRFH domains have been widely discovered in other eukaryotic telomeric dsDNA-binding proteins, such as S. cerevisiae Tbf1 , Trypanosoma brucei TRF, Trypanosoma cruzi TRF, and Leishmania amazonensis TRF. The structures and functions of these TRFH domains need further investigation, and it is not surprising to observe structural variation of these TRFH domains as seen in Taz1 and TRF1/2. It indicates that TRFH domain might be an ancient domain in telomeric dsDNA-binding proteins, and has achieved significant structural plasticity during evolution.

2.4 RCT domain

RCT (Rap1 C-Terminal) domain is a protein-protein interaction domain which mediates Rap1 interaction with a range of different proteins to exert diverse functions in different organisms . In mammalian and fission yeast, this module interacts with TRF2 and Taz1, respectively, targeting Rap1 to chromosome ends . In contrast, S. cerevisiae Rap1 uses its RCT domain to recruit Rif1/Rif2/Sir3/Sir4 to telomeres to mediate telomere homeostasis and telomere silencing . Originally, sequence similarity of this domain can only be detected between ScRap1 and HsRap1, but no conserved pattern can be found in SpRap1. Moreover, RCT domains from different species have large variation in length, from 97 residues in HsRap1 (303-399) to 133 residues in ScRap1 (695-827) (Figure 5A), raising questions about the structural similarity of these RCT domains. Systematic structural studies of Rap1 RCT from multiple organisms in complex with their respective protein-binding partners reveal a common structural motif and also species-specific features in RCT domains (Figure 5B). As shown in HsRap1-TRF2, SpRap-Taz1 and ScRap1-Sir3 structures, Rap1 use a conserved three-helix bundle to recognize a helical peptide from binding partner, driven by hydrophobic interactions (Figure 5C) . This three-helix RIM (Rap1-Interaction Motif) comprise the major protein-interaction surface, and is well conserved in all organisms. Interestingly, this structural motif resembles a UBA (UBiquitin-Associated) domain (Figure 5D), but the functional implication is not clear. At the C-terminal of RIM, ScRap1 and HsRap1 contain another three-helix motif, which is absent from SpRap1 (Figure 5B). This conserved motif is also marked by the significant sequence similarity of last 50 residues of ScRap1 and HsRap1as shown before . What is the function for this RCTC domain need further investigation. Besides RIM and RCTC, ScRap1 has an unique N-terminal four-helix extension which makes no contribution to the Sir3 interaction but might be responsible in other specific functions of ScRap1. The structural conservation and plasticity of RCT domains from different organisms account for the functional divergence of Rap1.

2.5 BRCT domain

BRCT (BRCA1 C-Terminal) domain is an important protein-interaction domain found in a number of proteins involving in DNA repair . BRCT domain is localized at the N-terminal region of Rap1 from all species, which is connected to the homeodomain via a long flexible linker. The function of Rap1 BRCT domain is still a puzzle. Deletion of BRCT from HsRap1 diminished the heterogeneity of human telomeres, although the underlying mechanism is not clear . It is generally accepted that BRCT domain is involved in protein interaction, especially for interaction with phosphorylated peptides, but the interaction partners for Rap1 BRCT domains remain uncovered. One possible partner is Gcr1 protein in budding yeast, whose interaction with ScRap1BRCT may regulate glycolysis process . The NMR structure of ScRap1 BRCT domain and crystal structures of HsRap1 and SpRap1 BRCT domains (unpublished) all exhibit a global fold with three β-strands and three to six α-helices. A β -sheet composed of three β-strands is sandwiched by two layers of helices: a bottom layer with conserved three helices, and a top layer with variable numbers of helices (from no helix in ScRap1 to four helices in SpRap1) (Figure 6). Compared with canonical BRCT domains, ScRap1 BRCT domains adopt a rather loose-packed conformation with more flexible loops , which will be suitable for substrate binding. It is expected that more BRCT-interaction proteins will be identified in the future, which will help elucidation of Rap1 BRCT functions in vivo.

3. The structures of telomeric complex in different species

In eukaryotes, telomeres are bound by specialized proteins that regulate telomere length and end capping. These telomere-associated proteins forms telomeric complexes through a complicated protein-protein interaction network . Recent advances in structural studies of some telomeric proteins and complexes provide mechanistic insights into how these proteins collaborate together to regulate telomere protection and maintenance . Here, we will summarize progress of structural characterization of telomeric complexes in three representing species.

3.1 The human shelterin complex

The mamalian telomere is coated by a six-protein complex, shelterin, which consists of TRF1,TRF2, Rap1, TIN2, TPP1 and Pot1 (Figure 7A). The double-stranded telomeric DNA is bound by TRF1 and TRF2 , while the single-stranded DNA is coated by Pot1 . TRF1/2 and Pot1 serve as docking sites on telomeres for other components. Rap1 protein is recruited to telomere by interaction with TRF2 . Both TRF1 and TRF2 can bind to TIN2, which link TRF1 and TRF2 to form a dsDNA sub-complex . TPP1 can form a complex with Pot1 at the single-stranded DNA region . The interaction between TPP1 and TIN2 will bridge the ssDNA complex to dsDNA complex to form an intact shelterin complex .

Shelterin regulates telomere maintenance and inhibits DNA damage response at telomeres . Removal of whole shelterin complex induces six different DNA damage response pathways: ATM (ataxia telangiectasia mutated) and ATR (ataxia telangiectasia and Rad3 related) signaling, classical-NHEJ (Non-Homology End Joining), alternative-NHEJ, HR (homologous recombination), and resection. The functions of individual components in shelterin have been extensively explored. TRF1 regulates telomere length by a negative feedback mechanism through a telomerase-dependent pathway. TRF1 also promotes efficient replication of telomere dsDNA, as deletion of TRF1 leads to fragile-site phenotype . In contrast, although TRF2 plays minor role in telomere length regulation, its primary role appears to be in capping and protecting chromosome ends by inhibiting ATM signaling and NHEJ pathway . TIN2 occupies a central position in shelterin. Its ability to interact with TRF1,TRF2 and TPP1 provides a scaffold for shelterin assembly. Besides structural roles, TIN2 contributes to both telomere length regulation and telomere protection. TIN2 is involved in TRF1-mediated telomere length regulation through two different mechanisms. TIN2 protects TRF1 from poly(ADP-ribosyl)ation by tankyrase 1, which in turn stabilizes TRF1 association on telomeres . TIN2 also compete with SCFFBX4 binding with TRF1, thus preventing TRF1 from ubiquitin-dependent proteolysis . Meanwhile, TIN2 can stabilize Pot1/TPP1 complex on 3' G-overhang, thereby allowing repression of ATR signaling . TRF2-interacting protein Rap1 functions to inhibit homologous recombination, as recently revealed by structure-based mutagenesis and knock-out mouse model . Rap1 can also negatively regulate telomere length and affect heterogeneity of telomere distribution, although the underlying mechanism is unclear . Pot1 is essential for both telomere end protection and length regulation. Pot1 can repress ATR signaling pathway. Pot1 have ying-yang roles in regulating telomerase-dependent telomere elongation . Pot1 shares the same substrate as telomerase, and depending on its location relative to the DNA 3'-end, POT1 can either inhibit telomerase action or form a preferred substrate for telomerase . The telomere-localization of Pot1 is dependent on its interacting protein TPP1, which serves as a bridge to TIN2/TRF1/TRF2 complex on the dsDNA region. Moreover, Pot1 and TPP1 forms a complex to increase activity and processivity of telomerase, probably through interaction between TPP1 and telomerase . Recently, a RPA-like CST complex (Ctc1, Stn1 and Ten1) was identified in the human telomere ssDNA region. CST complex competes with Pot1-TPP1 for telomere ssDNA binding, thus regulating both telomerase-mediated elongation and end protection .

Along with function exploration, the structural studies of shelterin components have been greatly advanced to facilitate better understanding the molecular mechanisms of telomere maintenance. Although the flexible nature of shelterin complex impedes structural determination of the intact complex, most structural domains of these proteins and some sub-complexes have been structurally characterized (Figure 7B). TRF1 and TRF2 share similar domain organization. Both contains an N-terminal charged domain (acidic in TRF1 and basic in TRF2), a central TRFH domain and a C-teminal DNA-binding homeodomain. The structures of TRFH domains and homeodomains from TRF1 and TRF2 have been reported, whose features have been summarized above. Rap1 contains three structural domains: BRCT, homoedomain and RCT domain, and the structures of all these three domains have been revealed. TIN2 consists of a N-terminal α-helix rich domain and C-terminal disorder loop. TIN2 is the most mysterious protein, and only a C-terminal fragment which interacts with TRF1 has been structurally characterized . Pot1 is composed of two N-terminal OB folds and one predicted C-terminal OB fold. The structure of Pot1 N-terminal OB folds in complex with telomeric ssDNA has been solved . TPP1 is mostly disordered, except for one OB fold, whose structure is available .

Besides these individual domains, some complex structures composed of interaction domains have been obtained. The shelterin complex can be stably obtained in the absence of DNA, indicating strong interactions among these proteins . The stable association of shelterin is mediated by five interactions: TRF1-TIN2, TRF2-TIN2, TRF2-Rap1, TIN2-TPP1 and Pot1-TPP1. TRF1-TIN2 and TRF2-Rap1 interactions have been well established through structural analysis. TRF1 use its TRFH domain to recognize a short peptide (256-268) of TIN2 (TIN2TBM, TRF1-Binding Motif). TIN2TBM harboring F-x-L-x-P motif binds to the top side of homodimeric horseshoes through a conserved hydrophobic groove . TIN2-F258 sits on a concave hydrophobic surface, and the side chain of TIN2-L260 is surrounded by a group of hydrophobic residues from TRF1TRFH, while TIN2-P262 stacks with TRF1-F142 (Figure 4B). Complementary to these hydrophobic contacts, TIN2 R265-R266-R267 make extensive electrostatic interactions with TRF1TRFH. Interestingly, the hot spot of TIN2 mutations in dyskeratosis congenita is localized in the vicinity of TIN2TBM . These mutations don't affect TRF1 interaction, but are defective in association with telomerase by an unknown mechanism . Given the sequence and structural similarities of the TRFH domains of TRF1 and TRF2, we initially hypothesize that TRF2 would also bind to TIN2 in the similar manner. However, later experiments established that TIN2 uses its N-terminal domain to recognize a short motif on TRF2, although TRF2TRFH can also bind to TIN2TBM at much weaker affinity than TRF1TRFH . The domain-peptide interaction mode is also employed by Rap1-TRF2 complex. The RCT domain of Rap1 recognizes a short fragment of TRF2 right following TRFH domain, which is missing from TRF1. The TRF2RBM (Rap1 Binding Motif) forms a helix-turn-helix motif that packs against helices α1 and α2 of RAP1RCT to form an intermolecular four-helix bundle. In addition to helices α1 and α2, the terminal regions of TRF2RBM functions as two arms of a clamp to hold helix α2 of RAP1RCT .

Other than TRF1-TIN2 and TRF2-Rap1 complexes, other sub-complexes are not structurally characterized, although the interaction domains have been identified. N-terminal region of TIN2 associates with a short fragment of TRF2 between TRF2RBM and TRF2-homeodomain. Meanwhile, the same TIN2 N-terminal region interacts with C-terminal domain of TPP1. These interactions are compatible with each other , as TIN2 can bind to TRF2 and TPP1 simultaneously . Pot1 C terminal region is predicted to contain a potential OB fold, and is found to interact with a short fragment of TPP1 at the vicinity of OB fold . This interaction may resemble TEBPα-TEBPβ interaction, which is still awaiting for structural elucidation.

3.2. The shelterin-like complex in S. pombe

A multi-protein telomeric complex with a shelterin-like architectural organization is also revealed in fission yeast S. pombe . There are seven components in this complex, and many of them are structural and functional homologues of mammalian shelterin proteins (Figure 8A). Taz1 binds to the double-stranded telomeric DNA and share limited homology to human TRF1 and TRF2 . Taz1 functions as the combination of TRF1 and TRF2: regulate telomere homeostasis, telomere DNA replication and end protection . Rap1 has similar domain organization as its human counterpart (Figure 5A), but has more essential roles in telomere end protection and length regulation . The telomere localization of Rap1 is dependent on both Taz1 and a Pot1-interacting partner Tpz1 . At single-stranded G-overhangs, Pot1 directly binds to ssDNA and associate with Tpz1 to form a heterodimer that is the homolog of mammalian POT1-TPP1 . Consistent with this notion, Pot1-Tpz1 protects telomeres and regulates telomerase activity . Poz1, a novel protein with no obvious sequence similarity to any components of mammalian shelterin, interacts with both Tpz1 and Rap1, and thus connects the single-stranded and double-stranded binding proteins together . This bridging function of Poz1 closely resembles the architectural role of TIN2 in shelterin complex, raising the possibility that Poz1 might be a TIN2 functional homolog. One important difference is that Poz1 in S. pombe interacts with Rap1 and Tpz1, whereas TIN2 links TRF1/2 to TPP1. Another unique component is coiled-coil quantitatively enriched protein 1 (Ccq1). Ccq1 interacts with Tpz1 and plays a key role in recruiting telomerase to telomeres .

Compared with human shelterin complex, the structural studies of S.pombe shelterin-like complex are just initiated (Figure 8B). The structure of N-terminal OB fold in SpPot1 has been determined in complex with single-stranded DNA . It was later found that SpPot1 contains another putative OB fold, collaborating with the first OB fold to bind two telomere repeats with much higher affinity, indicating SpPot1 may use similar DNA-recognition mode as in HsPot1 . However, whether second OB fold contribute to in vivo binding is not clear and the structure of second OB fold remains to be confirmed. The other reported structure is from SpRap1-Taz1 complex. SpRap1RCT (residues 639-693) forms a three-helix bundle to recognize a helical peptide derived from Taz1 through hydrophobic contacts . This SpRap1-Taz1 interaction interface closely resembles HsRap1-TRF2 structure (Figure 5C), suggesting the interaction between Rap1 and the double-stranded telomeric DNA-binding protein is evolutionarily conserved. Recently, we carried out systematic structural analysis of Taz1 protein. The crystal structure of Taz1 TRFH domain reveals that Taz1TRFH has significant structural difference from TRF1TRFH and TRF2TRFH, and TRFH domain cannot dimerize by itself (Wang and Lei, in preparation). Alternatively, Taz1 dimerizes via a two-helices fragment after TRFH domain (Wang and Lei, in preparation). We also solved structure of BRCT domain from SpRap1, which exhibits a canonical BRCT fold (unpublished). Nevertheless, the majority of interactions mediating S. pombe shelterin-like complex formation is still awaiting for structural investigation, including SpPot1-Tpz1, SpRap1-Tpz1, SpTpz1-Poz1 and SpTpz1-Ccq1 complexes.

3.3 The telomeric complex in budding yeast

In sharp contrast to telomeric complexes in fission yeast and mammalian, budding yeast S. cerevisiae use a distinct mechanism for telomere maintenance and protection. Rap1 is the major telomeric dsDNA binding protein instead of TRF-like protein, while a RPA-like complex containing Cdc13/Stn1/Ten1 binds to telomeric ssDNA instead of Pot1 (Figure 9A). The central molecule in budding yeast telomere is Rap1 protein which plays a crucial role in regulating telomere length, telomere silencing and end protection. Rap1 recruits Rif1/Rif2 for telomere maintenance and recruit Sir3/Sir4 for telomere silencing (Figure 9A). Rap1 is composed for three structural domains: an N-terminal BRCT domain, a middle DNA binding domain containing two tandem homeodomains, a C-terminal protein-interacting domain. The high resolution structures of these individual domains have been obtained, and the architecture of intact Rap1 protein has been revealed by SAXS (small-angle X-ray scattering) (Figure 9B). Rap1 protein itself exhibits a partial unstructured molecule with elongated shape . Upon interaction with DNA, the overall shape keeps similar but orientation of RCT domain gets restrained, indicating potential conformational change when bound with DNA . The recent crystal structure of ScRap1 DNA-binding domain in complex with a longer telomeric DNA (31bp) revealed the C-terminal loop of the second homeodomain wraps around the DNA molecule along the major groove . As a result, the relative orientation between the N-terminal BRCT domain and C-terminal RCT domain is constrained to extend in opposite directions, which is consistent with SAXS analysis . RCT domain of ScRap1 is an entire helical novel fold, which mediates Rif1, Rif2, Sir3, Sir4 interaction . The complex structure of ScRap1RCT with ScSir3 shows striking similarity to HsRap1-TRF2 and SpRap1-Taz1: a helical peptide from Sir3 packs against a three-helix bundle from Rap1. Disruption of Rap1RCT and Sir3 interaction affect telomere silencing, but not mate-type silencing .

The telomeric ssDNA region of budding yeast is coated by a multiple OB-fold protein Cdc13, which is essential for both chromosome capping and telomere length homeostasis . Cdc13 associates with another two OB-containing proteins Stn1 and Ten1 to fulfill their roles as telomere-dedicated RPA complex, based on direct structural comparison. Stn1/Ten1 structures from both budding yeast and fission yeast show striking similarities with Rpa32-Rpa14 structure . The structural similarity comes from not only the individual OB folds, but also the conserved heterodimerization interfaces . Moreover, the C-terminal domain of ScStn1 contains two tandem WH (winged helix-turn-helix) motif, which is most similar to the WH motif at the C terminus of Rpa32 . In the contrary, ScCdc13 exhibit certain similarity in domain organization but significant structural difference compared to Rpa70 counterpart, as shown by crystal structures of multiple OB folds at the N- and C-terminal ends of ScCdc13 .

ScCdc13 functions as a large platform that harbors four OB folds with different functionalities, such as specific telomeric ssDNA binding, interaction with DNA polymerase α and telomerase recruitment. The first OB fold in ScCdc13 serves as a homodimerization domain and a Pol1-interaction domain. Crystal structure of ScCdc13OB1 in complex with Pol1 shows that a helical Pol1 peptide binds into a deep basic groove on Cdc13, and this association requires the dimerization of Cdc13OB1 . The centrally located recruitment domain (RD) and the putative second OB fold (ScCdc13OB2) are both required for interaction with Est1, a regulatory component of telomerase . How they recognize Est1 and regulate telomerase activity remain elusive, prominently due to a lack of complex structure composed Cdc13 and Est1. The third OB fold (ScCdc13OB3) is involved in ssDNA binding, and the binding interface has been detailed revealed by the structure of ScCdc13OB3 complexed with a 11-mer telomeric ssDNA . The last OB fold of Cdc13 interacts with C-terminal WH motif of Stn1 , which also requires structural characterization for functional insights into Cdc13-Stn1 interaction.

4. Prospective

Recent progresses in structural studies of telomere-associated proteins have provided informative insights into their evolution and molecular mechanisms. However, it is of noted that most of structural studies focused on isolated component or core complex composed of interaction domains, which preclude precise demonstration of functional mechanisms of intact complexes. The next challenge will be to decipher the secret of telomere and telomerase by studying larger complexes. For example, the high-resolution structures of TERTs complexed with TRs from different species will significantly advance our understanding of action mechanisms telomerase, and complex structure with regulatory proteins will reveal the regulation mechanism for telomerase. The structural analysis of intact shelterin complex will manifest how telomeric proteins cooperate for telomere maintenance. The structural studies of such complicated complexes will require the combination of different biophysical techniques, including X-ray crystallography, NMR, cryo-electron microscopy, SAXS and computation modeling et al.

Another long-standing question is how these telomere-associated complexes organize into higher-order structure in vivo. Quantitative estimation suggests that there might be hundreds of copies of shelterin complex coated on a single telomere (Takai et al. 2010). Will these hundreds of shelterin complexes pack into any specific structure or just form separated beads on a chain? A lariat structure called t-loop was visualized in telomeres isolated from various species , suggesting a complicated telomere organization in vivo. A recent EM (Electron Microscopy) analysis of Pot1 in complex with 72 to 144-mer telomeric ssDNA revealed that multiple Pot1-TPP1 coated on ssDNA can form compact and ordered structures , further indicating that shelterin proteins have intrinsic ability for self-organization. Moreover, little is known about the contribution of nucleosomes to the shelterin assembly and telomere organization . It will be next frontier to reveal the higher-order structural organization of telomere in the context of hundreds of telomeric repeats and in the presence of nucleosomes, which will be more relevant to telomere organization in vivo.