Copy Number Variation And Male Infertility Biology Essay

Published: November 2, 2015 Words: 5611

Genomic disorders are defined as diseases caused by rearrangements of the genome incited by a genomic architecture that conveys instability. Y-chromosome related dysfunctions are frequently associated with gross DNA rearrangements resulting from its peculiar genomic architecture making it a useful model to study genomic disorders. The Y-chromosome has evolved into a highly specialized chromosome to perform male functions, mainly spermatogenesis. Direct and inverted repeats, some of them palindromes with highly identical nucleotide sequences, characterize the genomic structure of the Y-chromosome long arm. Some particular Y chromosome genomic deletions can cause spermatogenesis failure likely because of removal of one or more transcriptional units with a role in spermatogenesis. Potential phenotypic consequences of genomic duplications at these same loci remain virtually unexplored. This review summarizes the mechanisms underlying the formation of human genomic rearrangements in general, with a special focus on Y-chromosome deletions associated with male infertility, an important genomic disorder yet to be fully comprehended.

The concept of genomic disorders

Genomic disorders are diseases in which the molecular mechanism causative of the pathogenicity is a DNA rearrangement rather than a point mutation; i.e. a copy number variant (CNV) due to deletion or duplication rather than single nucleotide variants (SNV). Ultimately, the DNA rearrangement results from genomic instability incited by the local genomic architecture (Lupski 1998). The concept of genomic disorders was delineated from experimental observations on two autosomal dominant diseases: Charcot-Marie-Tooth disease type 1A [CMT1A (MIM 118220)] (Lupski et al. 1991) and hereditary neuropathy with liability to pressure palsies [HNPP (MIM 162500)] (Chance et al. 1993). Both diseases arose from genomic rearrangements at the same locus at 17p12; duplication of 1.4 Mb causes CMT1A whereas the reciprocal 1.4 Mb deletion causes HNPP. Genomic disorders can be either inherited or sporadic, depending on whether the rearrangement was transmitted through the germ line or occurred de novo.

Genomic rearrangements resulting in CNVs can be responsible for sporadic traits including birth defects (Lu et al. 2008), Mendelian disease, such as familial juvenile nephronophthisis 1 [NPHP1 (MIM 256100)], neurofibromatosis type 1 [NF1 (MIM 162200)], and complex traits such as Parkinson disease [PD (MIM 168600)] Singleton, 2003 #248}, obesity (Bochukova et al. 2010), autism (IM 209850) Ben-Shachar, 2009 #49} or autism spectrum disorders (Sharp et al. 2007), schizophrenia and other neuropsychiatric disorders (International Schizophrenia Consortium 2008; Stefansson et al. 2008). The average estimates of de novo locus specific point mutations is ~ 2X10-8 new mutations per locus per haploid genome; contrasting with an average estimate for de novo genomic rearrangements of ~10-6-10-4, based on locus-specific data from pooled sperm PCR assay (Turner et al. 2008) and prevalence estimates for of some particular genomic disorders (Lupski 2007).

The instability of the human genome can be caused by the ubiquitous presence of repeats such as low copy repeats (LCRs), also called segmental duplications (SDs), and repetitive sequences such as short interspersed nuclear elements (SINEs) and long interspersed nuclear elements (LINEs). LCRs are intrachromosomal duplications ~ 10-400 kb in length and ≥ 97% sequence identity that probably arose by duplication of genomic segments resulting in paralogous regions (Stankiewicz and Lupski 2002). Indeed, it has been estimated that 5.4% of the human genome is duplicated as determined by specific definitional criteria (≥ 1 kb and ≥ 90% identity) (Bailey and Eichler 2006). On the other hand, repetitive sequences constitute a different class of repeats, experimentally detected by Britten and Kohne (Britten and Kohne 1968) through reassociation kinetics of single-stranded assays. LCRs are usually the substrate for the best-characterized mechanism underlying genomic rearrangements described to date: Non-Allelic Homologous Recombination (NAHR) (Stankiewicz and Lupski 2002). NAHR results from alignment of two highly similar non-allelic or paralogous LCRs (usually >10 kb) followed by homologous recombination (HR), resulting in loss, gain or inversion of the segment in between; gene conversion can also accompany the process. The crossover product is a junction fragment that can be assessed by techniques such as Southern blot and pulsed-field gel electrophoresis (PFGE) if the junction is larger than 10 kb or by long-range PCR if the junction is smaller than 10 kb (Reiter et al. 1996; Chen et al. 1997; Shaw et al. 2002; Bi et al. 2003). NAHR events are biased toward regions containing LCRs, especially frequent in pericentromeric and subtelomeric regions of the human genome (Bailey and Eichler 2006); and can also occur between different chromosomes (Ou et al. submitted). SINEs and LINEs can also be substrates for NAHR (Shaw and Lupski 2004), but alternative non-homologous recombination mechanisms (NHR) can also potentially be stimulated by these elements (Zhang et al. 2009a).

Particular features of LCRs such as size, level of DNA sequence similarity, orientation and distance are important factors influencing the probability of a particular LCR pair to act as substrates for NAHR. For example, longer LCRs seem to be able to produce larger rearrangements (Lupski 1998). NAHR also seems to require a minimal size of uninterrupted DNA sequence identity, that is, a minimal efficient processing segment (MEPS) for the recombination to occur, likely representing the minimal homology requirement for HR. Experimental data suggest that MEPS in humans requires between 300-500 bp in length, but mitotic and meiotic NAHRs might have different MEPS size requirements (Stankiewicz and Lupski 2002; Gu et al. 2008).

LCRs can also stimulate rearrangements within or at its flanking regions rather than mediate them (Inoue et al. 2002; Woodward et al. 2005; Lee et al. 2006; Lee et al. 2007; Carvalho et al. 2009), although the mechanism involved can be different from that of NAHR. Inoue et al. (Inoue et al. 2002), Woodward et al. (Woodward et al. 2005) and Lee et al. (Lee et al. 2006; Lee et al. 2007) observed a remarkable grouping of one of the breakpoints of the deletions and duplications that encompass the gene PLP1 at chromosome Xq22 and cause Pelizaeus-Merzbacher disease [PMD (MIM 312080)]. These authors showed that the genomic region surrounding PLP1 is laden with LCRs of different sizes and levels of identity as well as in different orientations; interestingly, most of the breakpoints were mapped within or flanking such LCRs. Carvalho et al. (Carvalho et al. 2009) performed a Monte Carlo simulation using breakpoint data on 30 DNA samples that carry MECP2 duplication and statistically showed a nonrandom distribution of the distal breakpoints mapping within or flanking the adjacent LCRs. This nonrandom association between breakpoint grouping of one end of the nonrecurrent rearrangements suggested a possible association with these particular genome architecture features.

The genomic instability caused by nonrecurrent rearrangements breakpoint associated-LCRs might be explained in some cases by formation of non-canonical non-B DNA, such as cruciforms and hairpins. Non-B DNA conformation seems to interfere with the replication process and cause replication forks to stall. Voineagu et al. (Voineagu et al. 2008) assayed replication intermediates in bacteria, yeast and mammalian cells and observed that hairpin structures formed by inverted repeats (i.e. Alu repeats) stall the replication fork during the lagging strand synthesis in vivo. The role of the fork stalling for the generation of rearrangements remains to be studied, but such in vivo demonstration that inverted repeats can form hairpins and stall the replisome may be important for replicative mechanisms such as Fork Stalling and Template Switching (FoSTeS) and Microhomology-mediated break induced replication (MMBIR). FoSTeS is based on the phenomenology observed at the breakpoints of nonrecurrent rearrangements that cause genomic disorders. It proposes that following the stalling of the replication fork due to the formation of a secondary structure at the lagging strand template might release the 3' end of the recently formed fragment (Okasaki fragments) of its template which can switch to another replication fork nearby and prime DNA replication at another fork often resulting in complex rearrangements (Slack et al. 2006; Lee et al. 2007; Zhang et al. 2009a). Depending on the relative position of the other replication fork on the chromosome, the resulting product can be accompanied by deletions, duplications, inversions and/or complex rearrangements. MMBIR provides molecular mechanistic details based upon experimental observations in human, bacteria (E. coli) and yeast - it proposes that a collapsed replication fork resulting in a one ended double-stranded DNA is processed to expose a 3' end that anneals with another replication fork priming a new synthesis and initiating a break induced replication (BIR). The process utilizes a poorly processive DNA polymerase potentially resulting in multiple fork collapses and template switching resulting in complex rearrangements (Hastings et al. 2009; Hastings et al. 2009).

In addition to its potential role in perturbing the replicative machinery, non-B DNA can induce double-strand breaks (DSBs) in the genome, ultimately leading to genomic rearrangements. Bacolla et al. (Bacolla et al. 2004) showed that the end points of gross deletions in humans are physically mapped at regions able to form non-B DNA. Wang and Vasquez (Wang and Vasquez 2004) introduced a sequence able to form H-DNA (i.e. a kind of non-B DNA) from the breakage hot spot promoter of the human proto-oncogene, c-MYC into shuttle vectors and found they are highly mutagenic and able to induce DSBs in mammalian cells. Further, Wang et al. (Wang et al. 2008) constructed transgenic mice using a genomic region susceptible to the H-DNA conformation plus a genomic region susceptible to Z-DNA formation. Interestingly, they found a high level of genetic instability in mouse strains carrying the transgenic non-B DNA when compared to those carrying the canonical B-DNA, supporting that such regions are highly mutagenic. The exact mechanism by which DSBs will be repaired once they occur may vary according to the cell cycle and the macromolecular process (transcription, replication, etc) that a particular DNA region is undergoing. Frequently, DSBs are repaired by Non-Homologous End Joining (NHEJ), a non-homologous, non-replicative mechanism known to be responsible for generating the antigen-receptor diversity repertory in the immune system (Lieber 2008). NHEJ is the third mechanism described in this review. Importantly, if nicks occur during DNA replication, the replication fork may collapse and produce a one-ended DSB (as opposed to a two-ended DSBs) which might not be repaired by NHEJ and NAHR, but, instead, by replicative-based models such as FoSTeS/MMBIR.

Genomic rearrangements can be classified as recurrent and nonrecurrent; such classification, in some ways, might implicate the mechanisms underlying it. Recurrent rearrangements, for example, present the same genomic size and content with breakpoints that cluster in hot spots within LCRs that flank the altered region. Such LCRs both stimulate and mediate the rearrangement formation, the latter by providing paralogous sequence substrates for the recombination, which implies NAHR as the mechanism for formation of the recurrent rearrangements that have breakpoints that cluster at LCRs (Gu et al. 2008). Nonrecurrent rearrangements present variable sizes and genomic content and there is no breakpoint clustering, although LCRs mapping close or at the breakpoints are frequently found to "group" at one end. Repetitive sequences such as SINEs, LINEs are often observed at the breakpoints as well, especially in deletion cases (Vissers et al. 2009). NAHR between repetitive sequences can explain some cases at which the breakpoints have extensive homology and map within two repetitive sequences that share high identity. However, in several cases, the microhomology is limited, certainly not of a great enough length to support HR, and the final recombinant product does not keep the structure of the repetitive element involved ruling out NAHR as the mechanism for formation. In those cases other mechanism such as FoSTeS/MMBIR and NHEJ may be involved (Zhang et al. 2009a).

Establishing the rules for NAHR enabled definition of novel genomic disorders

Rearrangements involving the chromosome 17p12-17p11.2 genomic interval played a fundamental role in unveiling important NAHR features. Four genomic disorders, CMT1A, HNPP, SMS and PTLS map to this region. CMT1A, is a distal symmetric polyneuropathy (England et al. 2009) caused by a 1.4 Mb duplication involving the distal CMT1A-REP and proximal CMT1A-REP (Lupski et al. 1991); HNPP is a milder condition with susceptibility to asymmetric neuropathy; that results from the reciprocal deletion of the same genomic segment (Chance et al. 1993). SMS is a multiple congenital anomaly mental retardation syndrome with obesity, sleep disturbance and behavioral abnormalities due to a recurrent 3.7 Mb deletion generated by NAHR between two LCRs (Chen et al. 1997; Park et al. 2002); PTLS is due to the reciprocal duplication and manifests as neurobehavioral abnormalities, including features of autism (Potocki et al. 2000; Bi et al. 2003; Potocki et al. 2007; Treadwell-Deering et al.). Recurrent rearrangements (common and also uncommon that result from alternative more distantly spaced LCRs) as well as nonrecurrent rearrangements leading to disease and to polymorphisms are common on the 17p12-17p11.2 chromosome likely because the region has a complex genomic architecture, laden with several LCRs in direct and inverted orientation (Stankiewicz et al. 2003; Shaw and Lupski 2004; Shaw et al. 2004; Stankiewicz et al. 2004; Carvalho and Lupski 2008; Carvalho et al. 2010; Zhang et al. 2010; Zhang et al. In press)

Data analysis of the breakpoint sequences of the genomic disorders involving 17p12-17p11.2 documented the existence of positional preference or recombination hot spots for the occurrence of the crossovers within the LCRs (Reiter et al. 1996; Bi et al. 2003; Lupski 2004). Recently, a homologous recombination (HR) "hot spot motif" was proposed in humans, which is also remarkably observed to be coincident with breakpoints of rearrangements generated by NAHR, suggesting that allelic homologous recombination (AHR) hot spots can coincide with NAHR hot spot (Lindsay et al. 2006; Myers et al. 2008; Zhang et al. 2010). NAHR can be interchromosomal, intrachromosomal or intrachromatidal depending on the localization of the LCRs involved in a particular rearrangement; interchromosomal and intrachromosomal NAHR can produce deletions and duplications whereas intrachromatidal NAHR can only produce deletions (Stankiewicz and Lupski 2002). Additionally, the outcome or product resulting in duplication, deletion, inversion, or translocations, will also depend on the orientation of the LCRs (Lupski 1998; Stankiewicz and Lupski 2002; Ou et al. submitted). Regarding male germline, de novo recurrent genomic rearrangements deletions occur twice as often as duplication on autosomes (Turner et al. 2008).

The experimental data allowed one to delineate the rules by which how and where in the genome NAHR could take place (Lupski 1998; Stankiewicz and Lupski 2002). Sharp et al. (Sharp et al. 2006) used those rules to construct a BAC-array targeting genomic regions containing potential "NAHR mediated rearrangement hot spots" based on the presence of paired SDs with ³ 10 kb in length and ³ 95% of sequence identity, within a distance of 50 kb to 5 Mb. That array was used to search for pathogenic copy-number variation in a group of patients with severe clinical phenotypes including, mental retardation and dysmorphic features. In doing so, they successfully detected five microdeletions (at 17q21.31, 17q12, 15q24, 15q13.3 and 1q21.1) unveiling the molecular cause of five novel genomic disorders.

How genomic disorders can convey a phenotype

Genomic rearrangements can convey a phenotype by means of different molecular mechanisms, e.g. gene dosage, gene interruption, gene fusion, position effects, unmasking of recessive alleles or functional polymorphism, and even potential transvection effects (Lupski and Stankiewicz 2005; Zhang et al. 2009b). Alterations of the copy-number of dosage sensitive gene can produce a clinical phenotype; e.g. PMP22, which encode peripheral myelin protein can cause CMT1A if over expressed (Lupski et al. 1991; Lupski and Weinstock 1992) or HNPP if under expressed (Chance et al. 1993). Interestingly, duplication CNVs of the region upstream to PMP22 are postulated to perturb the expression of PMP22 as they are associated with a CMT neuropathy phenotype (Weterman et al. 2010; Zhang et al. In press). Thus, CNVs that do not include even the gene coding sequences may alter their expression. Gene interruptions can occur when the breakpoint of the rearrangement maps within a gene and lead to loss-of-function; e.g. inversion encompassing factor VIII (F8) gene causing Hemophilia A [HEMA (MIM 306700)] (Lakich et al. 1993). Gene fusion can also be a consequence of rearrangements if the breakpoint puts together exons of two otherwise separated genes; such mechanism can be a common event in cancer (Wang et al. 2009). Position effects can alter the expression or regulation of a gene near the breakpoint due to interruption or addition of a new regulatory region; e.g. a 250 kb deletion upstream the gene NR0B1 (encodes DAX1), leading to patients with male-to-female sex reversal (Smyk et al. 2007). Kurotaki et al. (Kurotaki et al. 2005) reported unmasking of a functional polymorphism within the gene that encodes the plasma coagulation factor XII in patients with Sotos syndrome who carry the recurrent 2Mb deletion.

Male infertility

Infertility affects approximately one in each 10 couples during the reproductive age (de Kretser 1997). Male infertility alone can account for as many as 20% of the cases (de Kretser 1997). There are several etiological factors underlying male infertility; generally they can be classified as pre-testicular, such as endocrine and coital disorders, testicular, including abnormalities of sperm production and post-testicular, such as obstructions (de Kretser 1997). Chromosomal alterations are highly prevalent in spermatogenic impairment. In fact, the frequency of chromosome abnormalities increases with decreasing sperm count; e.g. oligozoospermic males can carry 4.6% of chromosomal aberrations, whereas azoospermic males can have such abnormalities detected in up to 10-15% (Van Assche et al. 1996; Lidegaard et al. 1998). Intriguingly, infertile males clinically diagnosed with oligoasthenoteratozoospermia (low sperm count with a high percentage of slow moving and abnormal sperm) and a normal karyotype, present at greater than ten fold increase of diverse chromosomal abnormalities in their sperm, including diploidy, autosomal dissomy, autosomal nullissomy, etc (Van Assche et al. 1996; Pang et al. 1999). An important causative factor of spermatogenic failure is abnormalities of the DNA structure of the Y chromosome, as discussed below.

Is male infertility a genomic disorder? Y chromosome rearrangements are an important cause of male infertility

The association between male infertility and absence of genes that map to the long arm of Y chromosome was established by Tiepolo and Zuffardi in 1976 (Tiepolo and Zuffardi 1976) when they reported that 0.5% of idiopathic sub infertile individuals carry cytogenetically visible deletions of the Yq11. The authors termed the missing locus as "Azoospermia Factor (AZF)". Later, several studies using both karyotyping and molecular techniques confirmed the association between deletions involving the Yq11 region and male infertility (Andersson et al. 1988; Chandley et al. 1989). Further, the use of molecular markers such as Y-chromosome locus-specific probes and sequence-tagged sites (STS) enabled the detection of small non-overlapping interstitial deletions culminating with the definition of three AZF regions at Yq11 (Azfa, Azfb, Azfc) (Ma et al. 1992; Vogt et al. 1992; Reijo et al. 1995; Vogt et al. 1996) (figure 1). Most frequently, males with Y chromosome microdeletions have non-obstructive idiopathic azoospermia (absence of sperm in semen) or severe oligozoospermia but there may be other factors underlying the disease (such as age) as some rare familial cases are reported in the literature (Reijo et al. 1995; Gatta et al. 2002; Kuhnert et al. 2004; Krausz et al. 2006; Luddi et al. 2009). Therefore, the testicular histology of patients can be quite variable ranging from a rare complete absence of germ cells (Sertoli-Cell-Only Syndrome, SCO), to the presence of cells arrested in different meiotic stages occasionally producing mature sperm (Reijo et al. 1995). Such clinical phenotypic variation makes the genotype-phenotype correlation challenging. Each of those three regions carries a number of male infertility candidate genes but, with the exception of Azfa, no point mutation leading to infertility in Azfb and c were reported thus far.

According to a report based on a 10 year study of 1997 infertile men in Italy, the prevalence of Y chromosome microdeletions in unselected infertile males is 3.2%; this number rises to 8.3% in males with nonobstructive azoospermia and 5.5% in males with severe oligozoospermia (Ferlin et al. 2007). Other studies reported that the frequency of de novo AZF deletions in males with idiopathic azoospermia is ~13% (Reijo et al. 1995) and 6 to 8% in severely oligospermic males (Walsh et al. 2009). Approximately 60% of men with Y chromosomal microdeletions and severe oligozoospermia or idiopathic azoospermia have Azfc deletions compared to 16% with Azfb deletions and <5% with Azfa (Walsh et al. 2009). The genomic architecture at each azoospermia factor locus likely plays a role in the recurrence frequency rate of each rearrangement. For instance, the LCR sequences underlying Azfa deletions are 10-12 kb in size, share 94% of identity and are mapped far apart ~790 kb, contrasting with the sequences underlying Azfc deletions that are 229 kb in size, with 99.9% of nucleotide identity, and ~3.5 Mb inter-paralogue distance. The explanation for such large differences in AZF frequencies might be due to differences in efficiency of ectopic homologous recombination or NAHR as it is known to positively correlate with the size of the LCRs involved and it is influenced by the physical distance between them (Lupski 1998; Stankiewicz and Lupski 2002).

Azfa

Deletions encompassing the Azfa region only are rarely found among infertile males and are associated with a more severe phenotype, such as SCO syndrome (Kamp et al. 2001). These recurrent deletions are caused by intrachromosomal NAHR between two 10-12 kb retroviral sequence blocks HERV15 with an average identity of 94%, separated by ~790 kb and located at proximal Yq11 (Blanco et al. 2000; Kamp et al. 2000; Sun et al. 2000). Smaller nonrecurrent deletions encompassing partial segments of Azfa associated with male infertility are well documented (Foresta et al. 2000; Kamp et al. 2001; Ferlin et al. 2007), supporting the hypothesis that there is more than one gene with a potential role in spermatogenesis mapping to that region. Two widely expressed genes are the primary candidates for the infertility phenotype, USP9Y (ubiquitin-specific protease 9) and DDX3Y (DEAD/H box polypeptide). USP9Y is the only gene thus far that had an identified mutation producing a truncated protein that leads to azoospermia (Sun et al. 1999). However, two studies reported transmission of deletions spanning USP9Y by fertile fathers, suggesting that it might not be as essential for spermatogenesis as previous suggested (Krausz et al. 2006; Luddi et al. 2009).

Azfb

Azfb was the last azoospermia factor to be defined in terms of deletion extension, genomic architecture and mechanisms for formation. Azfb was considered a distinct deletion interval on Yq11, responsible for male infertility in a fraction of idiopathic infertile males (Ferlin et al. 2003). In 2002, however, Repping et al. (Repping et al. 2002), demonstrated that the genomic structure of the Azfb segment is laden with palindromic amplicons as previously demonstrated on the nearby region, Azfc (figure 1). In fact, the deletions spanning Azfb are actually the outcome of homologous recombination events between amplicon P5, mapped within Azfb and two minipalindromes, P1.1 or P1.2, mapped within Azfc, producing large deletions varying in size from 6.2 Mb to up to 7.7 Mb. Therefore, what was previously defined as a discrete region the so-called Azfb locus, turned out to be part of its neighbor, Azfc. Azfb deletion removes part of the Azfc region (1.5 Mb), including two copies of the main Azfc gene candidate, DAZ; such region was then renamed after the palindromes that underlie that rearrangement, P5/proximal-P1. Another recurrent deletion, previously called Azfb + Azfc, was renamed P5/distal-P1. Additionally, an uncommon deletion, apparently not reported previously, was also detected and named P4/distal-P1 (Repping et al. 2002). Based on the breakpoint junctions provided for most of the deletions, they were able to define a "hot spot" region (25-30 kb) for the recurrent homologous recombination event within each one of the palindromes that flank the rearrangements. Interestingly, 78% (7/9) of the deletions were generated by NAHR, but 22% (2/9) were produced by a nonhomologous mechanism, constituting the first formal example of such events in the Y chromosome (Repping et al. 2002).

Azfc

Azfc represents the most frequently deleted region among infertile males; it is estimated to occur de novo in approximately 1 in 4000 males (Kuroda-Kawaguchi et al. 2001). The 4.5 Mb genomic structure that includes Azfc at Yq11 consists of six families of amplicons, which are paralogous sequences or LCRs, organized as inverted (including palindromes or quasi-palindromes) and direct repeats, comprising 93% of the region (Kuroda-Kawaguchi et al. 2001) (figure 1). Those amplicon units range from 115 kb to 678 kb, with nucleotide sequence identity varying from 99.82% to 99.98%. Most of the Azfc deletions seem to be recurrent and span ~3.5 Mb at the Yq11 chromosome. Two LCRs, b2 and b4, 229 kb each in size and with 99.9% of nucleotide identity were mapped in direct orientation at the proximal and distal breakpoints of Azfc (Kuroda-Kawaguchi et al. 2001). All of these features support homologous recombination (NAHR) as the major mechanism for formation of the Azfc deletions (Kuroda-Kawaguchi et al. 2001).

DAZ (Deleted in Azoospermia) is the primary candidate gene in Azfc. It encodes a RNA binding protein that is transcribed in the adult testis and it is expressed exclusively in germ cells (Reijo et al. 1995). Using a single male as a Y-reference chromosome, Saxena et al. (Saxena et al. 2000) reported the existence of at least four DAZ copies with different numbers of intragenic tandem repeats, organized in two blocks, each comprising an inverted pair of DAZ genes, all mapped to the Azfc region. DAZ has a homologous autosomal copy on chromosome 3, DAZL that is a fruitfly orthologue to the boule gene, mutations of which causes spermatogenic failure in Drosophila. Recently, DAZ was shown to promote germ cell progression and formation of haploid germ cells (Kee et al. 2009), which support its proposed role as one of the "azoospermia factors". Interestingly, DAZ genes in the Y-chromosome are a recent acquisition in primates (humans and old world monkeys) (Saxena et al. 2000). Mulhall et al (Mulhall et al. 1997) proposed that the DAZ cluster is preferentially involved in quantitative rather than qualitative production of sperm.

TSPY1 copy number and male infertility

The influence of the copy number of the testis-specific protein Y-encoded 1 gene, TSPY, in male infertility was first suggested based on the fact that it was isolated from human testis and showed to be also expressed in testis of chimpanzee (Arnemann et al. 1987; Zhang et al. 1992). TSPY maps to Yp11.2 (figure 1) and the locus consist of an array of tandemly repeated units of approximately 20 kb each with high copy number variability among the population probably generated through frequent ectopic recombination events (Tyler-Smith et al. 1988). The TSPY array ranges in size from 23 to 64 units (0.47 to 1.3 Mb), with a median of 32 units (0.65 Mb) (Repping et al. 2006), but with a limited variation in copy number, supporting the contention that this locus has been undergoing selectively constraint to maintain a certain unit number in the human Y chromosome (Repping et al. 2006). Some studies suggest an association between low copy number of the TSPY units with male infertility due to lower sperm count (Vodicka et al. 2007; Giachini et al. 2009). TSPY is one of the gonablastoma candidate genes (Tsuchiya et al. 1995), potentially involved with the development of other types of cancers (reviewed in (Lau et al. 2009)).

Partial microdeletions in Yq11 chromosome can be a risk for male infertility?

In 1986, Vergnaud et al. (Vergnaud et al. 1986) used DNA samples from 27 individuals carrying different portions of the Y chromosome to propose a "deletion map" based on the hybridization of chromosome-specific probes in addition to genotype-phenotype correlations. The repetitive nature of the Y-chromosome was already known at that time and the multiple hybridization patterns of some of those probes made the organization of the intervals particularly challenging. Later, the analysis of the whole Y chromosome sequencing data confirmed and enlarged that picture (Kuroda-Kawaguchi et al. 2001; Tilford et al. 2001; Skaletsky et al. 2003). In fact, palindromes comprise ~25% of the male-specific region of the Y chromosome (MSY), ranging from 30 kb to 2.9 Mb in size and present in ≥ 99.9% nucleotide sequence identity (Skaletsky et al. 2003). Its massive presence incites frequent homologous recombination events within the Y-chromosome (Y-Y recombination) (Rozen et al. 2003) which, in turn, maintain the high similarity between the inverted repeats as a result of gene conversion.

According to a model proposed by Lange et al. (Lange et al. 2009) to explain isodicentric Y chromosome formation, the "palindrome maintenance" outcome is, in fact, one of the alternative products of either intrachromatid or sister chromatid homologous recombination that frequently occurs in the long arm of the Y chromosome. Such a model for chromosome rearrangements consisting of sister chromatid NAHR between inverted LCRs, had been previously proposed for iso17q formation (Barbouti et al. 2004; Carvalho and Lupski 2008). Lange et al. (Lange et al. 2009) identified 51 isodicentric (idic) Y chromosomes in patients with clinical abnormalities including spermatogenic failure; interestingly, the molecular analysis unveiled that the idic breakpoints in all cases map at 8 out of 9 palindromes and inverted repeats present on the Y chromosome long arm (Lange et al. 2009). Remarkably, eight out of nine of the testis-expressed genes identified in MSY are located within these palindromes (Skaletsky et al. 2003). The biological role of that association, if any, is still speculative. It has been hypothesized that it could be that the functionality of the testis-genes present in the Y chromosome is conserved through frequent gene conversions. Alternatively, the formation of secondary structures, such as cruciforms, may have a role in the transcription regulation of such genes (Skaletsky et al. 2003).

The genomic architectural complexity of the MSY, due to the remarkable presence of repeats in inverted and in direct orientation, predicts that rearrangements, other than those known to lead to Azfa, b and c microdeletions, may occur. Repping et al. (Repping et al. 2003) reported partial deletions of the Azfc (gr/gr and b1/b3, which were named after the repeat underlying the rearrangement) as well as duplications. Using association analysis, in addition to the fact that gr/gr deletion removes 1.6 Mb of the Y chromosome including nine transcription units with testis-specific expression, they proposed that gr/gr deletion represents a higher risk for spermatogenic failure. Such association was confirmed in some studies (Ferlin et al. 2005; Lynch et al. 2005; Giachini et al. 2008; Visser et al. 2009) but not confirmed in others (Machev et al. 2004; Hucklenbroich et al. 2005; Carvalho et al. 2006; Carvalho et al. 2006; Zhang et al. 2006) [see (Carvalho and Santos 2005) for detailed discussion]. In fact, the gr/gr deletion is frequently found in East Asian individuals (~8.0%) (Zhang et al. 2006) and even more frequent within the Japanese population (~30%) on which they were not observed associated with higher infertility risk (Carvalho et al. 2006; Zhang et al. 2006). Another Azfc partial deletion (1.8 Mb deletion), termed b2/b3, is frequently found in Northern Eurasia; similar to gr/gr deletions, study of b2/b3 in different populations reveals controversial results regarding its association with infertility risk (Repping et al. 2004; Wu et al. 2007). Reasons for such contrasting results might include influence of environmental and genetic background on the phenotype, but structural differences in Y chromosome genomic structure between individuals can also confound the association studies. For example, Machev et al (Machev et al. 2004) reported the co-existence of four different gr/gr deletion types within the French population, along with inversions and duplications involving the Azfc region. Repping et al. (Repping et al. 2006) used diverse molecular assays including PFGE and fluorescent in situ hybridization (FISH) to analyze the genomic structure of 47 haplotypes that determine the major branches of the Y chromosome genealogy. In this way, they could observe that 20 out of 47 chromosomes have structural variation involving the Azfc, including one 3.5 Mb duplication (figure 1). Therefore, a detailed molecular knowledge of the structure of the different Y chromosome haplogroups seems to be the first requirement to access the biological importance of the partial deletions involving the Azfc.

Conclusions

Male infertility is a common clinical phenotype whose etiological cause often remains enigmatic. Rearrangements involving the Y chromosome, stimulated by its genomic architecture laden with LCRs in direct and inverted orientation and large palindromes, are an important cause of male infertility. In addition, the structural complexity generates a propensity for structural polymorphism in worldwide Y chromosomes across human populations. The clinical phenotypic outcome of the males with Y chromosome deletions vary from a complete absence of germ cells (Sertoli-Cell-Only Syndrome) to the presence of cells arrested in different meiotic stages, occasionally producing mature sperm, depending primarily on the region deleted. Despite that, the role of the genes present at the azoospermia factors is still to be unveiled. It is possible that genetic background also plays a role in fertility among individuals as it has been estimated that >4000 genes may be involved in human spermatogenesis. Moreover, environmental factors are also likely to influence the phenotype; including advanced paternal age.

Disclosure

J.R.L. is a consultant for Athena Diagnostics and Ion Torrent Systems and holds multiple United States and European patents for DNA diagnostics. Furthermore, the Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from molecular diagnostic testing (Medical Genetics Laboratories).

Legends

Figure 1: Schematic representation of the Y chromosome. In red, pseudoautosomal regions (PAR1 and 2); in black, heterochromatic regions; arrows in light blue represent palindromes or inverted repeats. a- Major deletions observed in infertile males: AZFa, P5/proximal-P1 (AZFb), P5/distal-P1, P4/distal-P1, AZFc. The green bars delimit the regions according to the reference genome (adapted from (Repping et al. 2002; Skaletsky et al. 2003)). b- Structural variation observed in the Y chromosome from males worldwide (representing different haplogroups) as reported in the literature. In sharp blue, regions that show length variation within the worldwide population; In purple, regions that show any other kind of variation such as inversions, partial deletions and duplications (adapted from (Repping et al. 2006)).