Genetic variation and alternative splicing

In this chapter the relationship between gene expression and genetic variation has been explored with strong evidence of heritable differences in expression which can be mapped to specific genomic loci, local or distant to the gene encoding the transcript whose abundance has been quantified. But in terms of the transcript, what has been measured? For the majority of human genes, several different mRNAs will be expressed as a result of alternative splicing (Section 11.6.1). The occurrence of different, alternatively spliced isoforms of varying levels of abundance encoded by a given gene adds a significant additional level of complexity to the analysis of the genetics of gene expression, as clearly such diversity needs to defined and quantified if the consequences of underlying genetic variation are to be resolved. Indeed it is worth considering that almost all the expression array datasets reviewed in this chapter so far used probes targeted at the 3' ends of genes. Such array designs give an incomplete picture of the true extent of transcript diversity and abundance as they are not designed to resolve and quantify specific splice isoforms. Published studies have been based on the knowledge and technological platforms available at the time and it remains the case that the detection and quantification of alternatively spliced isoforms at a genome-wide level remains a very considerable analytical challenge (Johnson et al. 2003; Wang et al. 2003a; Xing et al. 2006; Anton et al. 2008). This has to date largely precluded analysis of the impact of genetic variation on gene expression at the level of the abundance of specific alternatively spliced isoforms. However, the advent of exon-specific microarray platforms (Section 11.6.2) and the future application of 'next generation' sequencing are set to significantly advance this field of research (Salehi-Ashtiani et al. 2008).

11.6.1 Alternative splicing in health and disease

The process of splicing involves identifying and joining together coding exonic sequence in pre-mRNA through a complex process involving the splicing machinery (the spliceosome) and 'splicing code', which includes consensus splice site sequences at exon-intron boundaries together with c/s-regulatory elements to which specific proteins bind (Fig. 11.11) (Wang and Cooper 2007). The latter include enhancer and suppressor elements within intronic and exonic sequences and are important to splice site recognition and regulation, including control of alternative splicing. Current estimates are that between 40% and 70% of human genes show evidence of alternative splicing, with on average four to six alternatively spliced isoforms per gene (Kapranov et al. 2002; Johnson et al. 2003; Wang and Cooper 2007). These alternatively spliced isoforms can differ in many potential ways including exon skipping, intron retention, alternative splice site usage, and more complex events (Fig. 11.11C) (Kim et al. 2008).

The process of alternative splicing is critical to our ability to generate proteomic diversity but also modulates gene expression. The latter occurs through isoform variation affecting control of mRNA stability, translation efficiency, and mRNA localization as well as mRNA degradation resulting from the introduction of premature termination codons (Wang and Cooper 2007). Dysregulation of splicing as a result of genetic variation is a major cause of inherited disease and is increasingly recognized to be involved in common complex traits.

There are many different ways in which genetic variation may modulate splicing. Disease may result directly from disruption of the splicing code by c/s-acting genetic variants or of the splicing machinery by transacting variants (Wang and Cooper 2007). These are relatively common events, indeed between 15% and 60% of disease-causing point mutations may act by affecting splicing (Krawczak et al. 1992; Lopez-Bigas et al. 2005). The complex, often tissue-specific, consequences of dysregulated splicing were highlighted for adult myo-tonic dystrophy resulting from triplet repeat expansion (Section 7.6.3). As well as causing disease, variants can alter splicing of modifier genes affecting disease severity, as seen for example at the CFTR gene in cystic fibrosis (Section 2.3.1) (Niksic et al. 1999), and susceptibility to common disease (Wang and Cooper 2007). Examples of the latter include IRF5 at chromosome 7q32 encoding interferon regulatory factor 5 and systemic lupus erythematosus (Section 11.6.2); CTLA4 at chromosome 2q33 encoding cytotoxic T lymphocyte-associated protein 4 and autoimmune disease (Ueda et al. 2003); ERBB4 on chromosome 2q34 encoding the neuregulin 1 receptor and schizophrenia (Law et al. 2007); and BTNL2 at chromosome 6p21.3 encoding butyrophilin-like 2 with sarcoidosis (Section 12.8).

11.6.2 Common genetic variation and alternative splicing

A number of studies have highlighted how alternative splicing varies between individuals and shows significant heritability (Hull et al. 2007; Kwan et al. 2007). Insights into the potential relationship with underlying genetic variation have been gained from databases of alternatively spliced isoforms (Modrek et al. 2001) looking for association with transcribed SNPs, which showed that 6-21% of alternatively spliced genes had evidence of complete or relative isoform abundance varying with specific alleles (Nembaware et al. 2004). Variation between unrelated individuals for specific splice events has been investigated for simple cassette exon events in a panel of 22 lymphoblastoid cell lines, showing that consistent differences between individuals occur, and could be associated with local SNP diversity (Hull et al. 2007). In this study six exons were resolved showing variable skip-inclusion and association with specific SNPs. When further unrelated lymphoblastoid lines were investigated, the SNP genotype accurately predicted the splicing pattern. For five exons, the strongest correlation with splicing pattern was found with the SNP closest to the intron-exon boundary. Intriguingly, four of the six linked SNPs were located within alternative exons.

Further advances have been made through exon targeted arrays such as the Affymetrix GeneChip Human Exon 1.0 ST array which allowed expression to be quantified for more than 1 million known and predicted exons using multiple probes to individual exons (Clark et al. 2007). Kwan and colleagues investigated exon level

Start codon AUG

Stop codon UGA

Intronic c/s-acting elements constituting splicing code

Intron

U1 snRNP Regulatory complex

Branch site 3' splice site Exonic splicing enhancers and suppressors 5' splice site

U1 snRNP Regulatory complex

Branch site 3' splice site Exonic splicing enhancers and suppressors 5' splice site

Exon

Exon skipping

Intronic splicing enhancers

Alternative acceptor site

Alternative donor site

Intron retention

Figure 11.11 Continued variation in gene expression between two unrelated individuals based on a 'splicing index' that divided expression of a given probes set corresponding to one exon by the sum of expression from probes representing the gene (Kwan et al. 2007). The splicing index was analysed as a quantitative trait in two lymphoblastoid cell lines with most of the observed variation due to individual differences; up to 2.5% of expressed exons were estimated to show differential expression between the two lines. When a small number of validated alternative splicing events were analysed in a three generation CEPH family, evidence of linkage was found with segregation of splicing pattern and associated haplotype in the pedigree (Kwan et al. 2007).

Kwan and colleagues proceeded to a genome-wide analysis relating common SNP diversity to the expression of specific transcript isoforms using the same exon targeted array (Kwan et al. 2008). Here gene expression of 57 lymphoblastoid cell lines established from individuals genotyped in the CEU panel of the HapMap Project were analysed, each using three different RNA preparations. Genetic association was sought for expression intensity with SNP markers within a 50 kb region flanking the transcribed region using a linear regression analysis. Based on a 5% false discovery rate and cut-off P value of 9.7 X 10-9, significant SNP association was found for 324 transcripts. The complexity of potential gene expression differences at all stages of transcript processing was illustrated by the breakdown of these flanking SNP associations: 39% involved whole gene expression changes and 6% were classified as complex; the remaining 55% were at the level of transcript isoforms with 11% involving changes in transcriptional initiation, 26% alternative splicing, and 18% transcription termination changes.

Overall the authors estimated that 50-55% of variation in gene expression is isoform based (Kwan et al. 2008). However, this analysis represents only the beginning of the story as significant advances are needed in tools to enable splice isoform reconstruction and accurate quantification (Anton et al. 2008).

As the lymphoblastoid cell lines analysed by Kwan and colleagues had been subject to previous detailed array analysis (Cheung et al. 2005; Stranger et al. 2007a), the additional resolution provided by the exon level probe sets could be compared. For example, a clear isoform-specific effect was resolved at IRF5 encoding interferon regulatory factor 5 on chromosome 7q32 (Fig. 11.12). This replicated recently published data which showed that the associated SNP (rs10954213) (c.*555G>A) created a functional polyadenylation site in the presence of the A allele, which was correlated with the short isoform (Cunninghame Graham et al. 2007). This SNP was part of a haplotype associated with susceptibility to systemic lupus erythematosus (Cunninghame Graham et al. 2007). The role of genetic diversity at IRF5 in disease susceptibility is incompletely understood; the disease association has been robustly demonstrated (Sigurdsson et al. 2005; Graham et al. 2006) with evidence that complex modulation of splicing may be involved as a further disease associated SNP was shown to create a 5' donor site in an alternative exon 1 of IRF5 (Graham et al. 2006).

0 0

Post a comment