Mathematics Department, NUI, Galway

Alternative mRNA splicing

The transcriptome of higher eukaryotes is complex and diverse, with multiple isoforms present for most genes, resulting from heterogeneity at several stages of the generation and processing of RNA from transcription initiation to splicing and polyadenylation. The diversity of the splicing step has been intensively studied. Microarrays that target splice junctions were used to demonstrate that the majority of human genes are alternatively spliced and, more recently, next generation sequencing has provided even greater resolution such that alternatively spliced isoforms have now been observed from almost all human multi-exon genes.

Allele-specific Splicing

In 2004 we carried out the first genome-scale estimation of allele-specific splicing and reported that approximately 20% of alternatively spliced genes are spliced differently between different alleles. Mutations that affect splicing appear to be responsible for a large proportion of human genetic and may even be the largest contributor to human genetic diseases resulting from single point mutations. With the increasing power of genome-wide association studies (GWAS), a large number of genetic variants linked to common diseases have been identified, but attempts to elucidate the molecular mechanisms responsible for disease associations have lagged behind. Therefore understanding the effects of human polymorphisms on splicing is important. We recently followed up our earlier study with a more detailed survey of the effects of human polymorphisms on splicing, combining evidence from genome sequence context, published exon microarray data and ESTs. For the latter, we developed a statistical model of allele-specific splicing which we estimate using maximum likelihood. Graphical depictions of candidate splicing polymorphisms we identified are available here. Some well known examples are provided below:

Legend (Figure 3 from Nembaware et al. 2008) Allele-specific splicing evidence in the OAS1 based on exon array analysis. Support for allele-specific acceptor site use in the OAS1 gene. (A) Genomic sequence of OAS1 showing the alternatively spliced exons. The boxed section is magnified and drawn to scale in the next panel. (B) Relationship between the genotypes of the SNP and splicing indices of nearby probesets, illustrating that there is likely to be a complex pattern of allele-specific splicing in this gene. Probesets in red are significantly associated with the SNP genotype. The p-values for the association of these probesets to SNP genotypes are also included. Unfilled rectangles represent probesets that were not tested for an association with the genotype because they were not detected above background in a sufficient number of the cell lines, or were too distant from the SNP. (C) Histograms showing the splicing index distribution as a function of the genotype of a SNP, rs10774671, at the G nucleotide of the canonical splice acceptor site. (D) Association plot illustrating that rs10774671 is more strongly associated with a probeset between the SNP and an alternative acceptor site than any other SNP in the region for which genotype data were available. Nembaware et al. BMC Genomics 2008 9:265 doi:10.1186/1471-2164-9-265

Legend (Figure 4 from Nembaware et al. 2008) Allele-specific splicing evidence in the GLO1 based on exon array analysis. Support for allele-specific exon-skipping in the GLO1 gene. (A) Genomic sequence of the GLO1 gene showing the alternatively spliced exons. The boxed section is magnified and drawn to scale in the next panel. (B) Illustration of the relationship between the genotypes of this SNP and splicing indices of nearby probesets, using the same conventions as in Figure 3. (C) Histograms showing the splicing index distribution as a function of the genotype of a SNP, rs2736654, predicted to affect an exonic splice enhancer site. (D) Association plot illustrating that rs2736654 is marginally more strongly associated with a probeset spanning exon 4 than any of the other SNPs in the region for which genotype data were available. Nembaware et al. BMC Genomics 2008 9:265 doi:10.1186/1471-2164-9-265