Origin and Evolution of Introns in the Nuclear genome : As Selfish genetic Element

Contents of this lecture can be viewed on YouTube site : Life Science Lectures for You

1. Contents of this lecture

In this lecture, evolution of spliceosomal intron orignated in eubacterial Group II intron is included. This article contains 28 figures.

In this lecture, I will explain:

  1. Intron transposition by enzymatic activities encoded in Group I and Group II introns.

2. The relationship between prokaryotic Group II introns and the introns in the nuclear genome.

3. The splicing reaction of prokaryotic Group I introns.

Key Words:

Group I, Group II, intron, spliceosomal, intronic ORF, eubacteria, archaea, chloroplast, mitochondria, ribozyme, snRNA, U1RNA, U6RNA, IBS, EBS, non-LTR retrotransposon, reverse transcription, retrohoming, intron transfer

2. Various introns in eukaryotic cell

I suppose you are all familiar with introns found in eukaryotic genes that are spliced out by the spliceosome.

However, in the genomes of true bacteria, mitochondria, and chloroplasts, there are entirely different types of introns.

Here are schematic diagrams of Group I introns and Group II introns. These introns, which have such three-dimensional structures, and encode proteins within themselves, are found in abundance in these genomes.

3. Spliceosomal intron : Introns within protein-coding genes in the nuclear genome

This figure illustrates an overview of splicing for protein-coding genes encoded in the nuclear genome of eukaryotes. Messenger RNA that has just been transcribed in the nucleus contains introns.

Messenger RNA in this state is called immature messenger RNA. The apparatus that removes these introns and connects the remaining exon regions is called the Spliceosome. The Spliceosome is a complex of six types of short RNAs called U1, U2, U3, U4, U5, U6, and more than 100 types of proteins.

Messenger RNA that has completed the splicing reaction is called mature messenger RNA. Mature messenger RNA is transported through nuclear pores to the cytoplasm, where it is translated.

4. Distribution of introns in bacteria

First, all bacterial genomes do not contain introns that are spliced out by spliceosome, but they do contain Group I and Group II introns.

Bacteria are classified into eubacteria and archaea. Archaea are well known as the host cells in an evolutionary event where they engulfed eubacteria, leading to symbiosis and eventually giving rise to eukaryotes.

Archaea have very few introns. Only rarely are Group II introns discovered in them. On the other hand, many Group I and Group II introns have been found in various eubacteria.

5. Group I, Group II-introns bear 3D structures

Group I and Group II introns form distinctive three-dimensional structures, unlike the introns excised by spliceosomes in eukaryotes, which lack special structural features.

Group I introns consist of several hairpin structures combined. The number of hairpins varies. A notable characteristic is the presence of an open reading frame  within one of the hairpin loops, referred to as an “intronic open reading frame.” This ORF typically encodes a protein with homing endonuclease activity.

Group II introns are composed of six arm structures. The first arm is particularly large, followed by the second, third, fourth, fifth, and sixth arms. The fourth arm often contains an intronic ORF. In the upstream exon region, there are sequences called Intron Binding Sequence 2 and Intron Binding Sequence 1. Complementary sequences in the first arm, called Exon Binding Sequence 2  and Exon Binding Sequence 1, can form hydrogen bonds with IBS2 and IBS1 respectively.

A common feature of both Group I and Group II introns is that when properly folded, the RNA of these introns functions as an enzyme. While most enzymes are proteins, it has recently been discovered that RNA can also possess enzymatic activity.

Such enzymatically active RNAs are called ribozymes. Correctly folded introns act as ribozymes, capable of self-splicing and joining the remaining exon parts. In contrast, introns excised by spliceosomes do not possess this activity. These characteristics make Group I and Group II introns significantly different from the nuclear introns in eukaryotes that are excised by spliceosomes.

6. Three types of introns

Let’s summarize the three types of introns once again: Group I introns are characterized by an open reading frame within the hairpin structure introns. This open reading frame encodes a homing enzyme or homing endonuclease, which is a DNase that cuts specific sequences in the genomic DNA. They are widely found in the genomes of eubacteria, chloroplasts, and mitochondria. In eukaryotic genomes, they are exceptionally found within ribosomal RNA genes.

Group II introns have a larger and more complex tertiary structure than Group I introns. Like Group I introns, they also contain an open reading frame within the intron. This open reading frame encodes reverse transcriptase and endonuclease. Regarding their distribution, they are widely found in eubacteria and very rarely in archaea. They are also numerous in chloroplast and mitochondrial genomes.

On the other hand, spliceosomal introns, which are removed by the spliceosome, do not contain open reading frames within the intron. Additionally, the intron portion does not have a tertiary structure. They are embedded within protein-coding genes in eukaryotic genomes.

These are the general characteristics of the three types of introns.

7. Comparison of intron frequencies within the mitochondrial genome

As I explained that there are Group I and Group II introns in the mitochondrial genome, let’s examine this in more detail. We find that there is a significant bias in the frequency of these introns within genomes.

Regarding multicellular animals, one Group I intron has been exceptionally found in sea anemones, but there are no reports of either Group I or Group II introns in the mitochondria of other multicellular animals.

In protozoans, while Group I and Group II introns are not very frequent, they have been reported. Similarly, for algal mitochondria, there are reports of Group I and Group II introns, but their frequency is not particularly high. In contrast, in fungi, both Group I and Group II introns are frequently discovered.

8. Fifteen group I introns within mitochondrial cox1 gene of a fungi Podospora anserina

This is the cox1 gene of a fungi Podospora anserina. The white areas with hatching indicate Group I introns.

There are total of 15 Group I introns within this cox1 gene. Additionally, there is one Group II intron, which is shown in gray.

On the other hand, the red areas indicate exons. The total length of the exons in the cox1 gene is 1.6 kilobases, whereas the total length of the introns is 22.9 kilobases. As in this example, numerous Group I and Group II introns have been discovered in the mitochondria of fungi.

9. Expression of the ORF encoded in intronic region

10. Splicing reaction mechanism in Group II and Spliceosomal intron

I’d like to compare the splicing reactions of Group II introns and spliceosomal introns. This figure shows a spliceosomal intron. In this case, small nuclear RNAs within the spliceosome first bind to specific locations. For example, U1 small nuclear RNA attaches to the upstream exon-intron boundary, while U2 small nuclear RNA binds to a specific adenosine within the intron. Once the spliceosome correctly binds to key sites on the immature messenger RNA, the splicing reaction begins. The 2‘-OH oxygen of an adenosine attacks the exon-intron boundary in a nucleophilic reaction. This results in the cleavage of the exon-intron boundary and the formation of a lariat structure. After the lariat forms, the 3‘-OH oxygen at the end of the upstream exon attacks the downstream exon-intron boundary, simultaneously excising the intron and joining the upstream and downstream exons. This is the splicing reaction by spliceosome.

On the other hand, this is the splicing reaction of Group II introns. The 2’-OH oxygen of an adenosine within the intron acts as a nucleophile, attacking the exon-intron boundary. This cleaves the exon-intron boundary, and simultaneously forms a lariat structure. The upstream 3’-OH then attacks the downstream exon-intron boundary in a nucleophilic reaction, joining the two exons and simultaneously ejecting the intron with its lariat structure.

In other words, Group II introns and spliceosomal introns are excised through exactly the same chemical reactions. The difference is that in Group II introns, the intron portion forms a tertiary structure and functions as a ribozyme, whereas in spliceosomal introns, the intron portion does not form a tertiary structure and does not function as a ribozyme. Therefore, a spliceosome, an apparatus with RNA cleavage and ligation activity, is essential.

11. Compariosn of the Group I and Group II splicing reactions

Now, using this diagram, we will compare the splicing reactions of Group I introns and Group II introns.

In the case of Group I introns, the oxygen element of the 3′-OH of the GTP bound to the intron nucleophilically attacks and cleaves the upstream exon-intron boundary. Next, the oxygen element at the cleavage site of the exon nucleophilically attacks the downstream exon-intron boundary. As a result, a chemical reaction occurs where the intron is excised and ejected while the two exons are joined together.

This reaction proceeds due to the ribozyme activity that arises from the Group I intron forming a three-dimensional structure. In this way, the excision reaction of Group I introns differs from the excision reactions of Group II introns and spliceosomal introns.

12. Structural similarity between the spliceosomal small nuclear RNAs and Group II introns

Next, let’s focus on structural similarity between the three-dimensional structure of spliceosomal small nuclear RNAs and that formed in Group II introns.

13. Spliceosome

Spliceosome is a complex composed of five types of small RNAs called U1, U2, U4, U5, and U6, along with approximately 50 to 100 different proteins.

The small RNAs are about 150 nucleotides in length and are rich in uracil bases, which is why they are given the prefix ‘U’. The number of proteins varies depending on the species. This is an overview of the spliceosome.

14. Structure of Group II intron

On the other hand, here is a two-dimensional schematic of Group II intron RNA, which has six structures called Arm or domains from D1 to D6. This is a diagram of the three-dimensional structure formed by these arms.

15. Catalytic active centre of spliceosome

We have discussed that in Group II introns, if the RNA folds correctly, it can act as a ribozyme that can excise itself and join exons.

It has been discovered that small nuclear RNAs within the spliceosome can also function as ribozymes, when they fold correctly and associate with each other. This figure depicts a complex formed by the association of U2 small nuclear RNA and U6 small nuclear RNA, binding to the adenosine located at the branch point of the lariat structure to be excised. As shown in this figure, in the spliceosome, splicing reactions occur when small nuclear RNAs form complexes and attach to messenger RNA.

Although the spliceosome is a complex of proteins and RNA, the active site for the splicing reaction is on the RNA side, not on the protein side. Proteins are thought to primarily serve the purpose of helping small nuclear RNAs form the correct three-dimensional structure and positioning them correctly.

16. Maturase coded in Group II intron

As I mentioned earlier, small nuclear RNAs within the spliceosome require protein assistance to fold correctly. The same is true for the three-dimensional structures of Group II introns.

The proteins used for this purpose are products of intronic Open Reading Frame. This figure shows the activities of a protein encoded within the intronic ORF of a Group II intron. This protein has several different activities, each associated with a specific region. One of these activities is called the maturase activity, which is encoded in the X-domain.

As the name ‘maturase’ suggests, it has an activity that helps something mature. Specifically, the maturase has the activity to mature the three-dimensional structure of the intronic RNA into its correct form. Thus, the intron carries within its own intronic ORF the protein necessary for maturing its own three-dimensional structure.

17. Structural similarities core reaction center

Group II introns, when correctly folded, possess ribozyme activity and proceed with splicing. Similarly, small nuclear RNAs within the spliceosome can function as ribozymes that carry out splicing reactions when properly folded. Notably, the chemical reactions for intron excision in these two systems are identical.

While these two types of introns already share remarkably similar properties, recent findings have revealed structural similarities between their intronic RNAs as well.

This figure shows domains 5 and 6 of the Group II intron. In domain 6, there is an adenine base that corresponds to the knot of the lariat structure. On the other hand, this image depicts the U6-small nuclear RNA and U2-small nuclear RNA complex bound to the exon-intron boundary of the messenger RNA to be spliced in the spliceosome. Here too, an adenine base corresponding to the lariat structure’s knot is present. You can see that the body structures of both RNAs are very similar.

18. Birth of spliceosomal Introns from Group II Intron

The theory that spliceosomal introns originated from Group II introns can be explained as follows: As is widely known, eukaryotes emerged when an archaeon engulfed and established a symbiotic relationship with a bacterium. In present-day archaea, Group II introns are extremely rare, and Group I introns have not been found at all. In contrast, eubacteria that were engulfed by archaea have been found to contain numerous Group II and Group I introns.

It is also known that in early eukaryotes, many genes from the engulfed eubacterium were transferred to the archaeal nuclear genome. This gene transfer would naturally include the transfer of Group I and Group II introns. If we consider that the Group II introns transferred to the archaeal nuclear genome eventually evolved into the spliceosomal introns we see today, we can better understand the distribution of Group II introns across the biological world, and the similarities in the splicing mechanisms between the two types of introns.

This explanation effectively accounts for the origin of spliceosomal introns from Group II introns, considering the evolutionary history of eukaryotes, and the distribution and characteristics of different intron types. However, Group I introns are currently found only in ribosomal genes of eukaryotes. The reason for this limited distribution of Group I intron in eucaryotic genes remains unexplained.

19. Retrohoming of Group II intron: Translocation to homologous genes without intron

Next, I will explain the intron transposition reaction, in which a spliced out Group II intron is inserted into a specific site with a particular nucleotide sequence in a genome through a reverse reaction of intron splicing.

In many cases, such specific nucleotide sequences are found in homologous genes that do not contain the intron. This transposition reaction is called ‘Retrohoming’, because the reverse-transcribed RNA back to the original site.  Retrohoming is a reaction that does not occur with spliceosomal introns.

20. Reverse splicing to target DNA is retrohoming of Group II intron

I will now explain the retrohoming of Group II introns. In this figure, this gene containing a Group II intron, and here is its homologous gene without the intron.

First, messenger RNA is produced from the gene containing the intron, and the Group II intron is spliced out from this immature messenger RNA. The excised intron RNA is then inserted into the intron-less gene, at the position corresponding to the exon-intron boundary in the intron-containing gene, through a reverse of the splicing reaction.

The site where the intron is newly inserted is the same as where the original intron was located, hence it’s called ‘homing’. Additionally, because this reaction involves reverse transcriptase, the prefix ‘retro’ is added, and the process is termed ‘retrohoming’. I will explain this reaction in more detail.

21. Retrohoming reaction of Group II intron

Group II intron retrohoming is a complex process involving several steps. The retrohoming process occurs as follows:
This is a spliced out Group II intron with a lariat structure. The protein required for the assist of the splicing is still attached on the spliced out intronic RNA. The protein includes reverse transcriptase activity and endonuclease activity in addition to the RNA folding activity. Target DNA sequence of reverse-splicing is formed by the linkage of IBS-1 and IBS-2 sequences. I will explain later for these sequences. What is characteristic in the reverse-splicing reaction is the insertion of intronic RNA into the target DNA.

In this reaction, first, an endonuclease cut one strand of the target DNA. Then,the intronic RNA is ligated to the digested point. Following this reaction, cleavage of another DNA strand occurs. Then, cDNA synthesis of the intronic RNA begins at the 3’- end of the cleaved DNA. When the initial cDNA synthesis is complete, the intronic RNA is degraded, and the gap in the DNA strand is filled in, so that the intronic RNA sequence is eventually converted into double-stranded DNA.

In this reaction, Endonuclease activity and reverse-transcriptase activity are provided from the protein attached to the spliced out Group II intron

22. Enzyme activities coded in Domain IV-ORF Group II intron

This is an immature messenger RNA containing Group II intron. In the upstream exon region, there is an RNA sequence, , where IBS1 and IBS2 are connected. On the other hand, in the intron part, there is EBS1, which has a sequence complementary to IBS1, and EBS2, which has a sequence complementary to IBS2.

For the excision of Group II introns, it is important that stable hydrogen bonds are formed between these RNA sequences. This refers to the hydrogen bond formation between IBS1 and EBS1, and IBS2 and EBS2. In contrast, during the reverse-splicing reaction, AAA, EBS1 and EBS2, which are RNA sequences of the excised Group II intron, need to form hydrogen bonds with DNA sequences homologous to IBS1 and IBS2, respectively.

In the splicing reaction, the intron-exon boundary was determined by RNA-RNA hydrogen bonding, whereas in reverse-splicing, the insertion site of the intron is determined by RNA-DNA hydrogen bonding.

23. Reverse transcriptase and Endonuclease coded in an intronic ORF

Group II intron reverse-splicing requires activities such as reverse transcriptase and endonuclease.

Let’s now explain where these enzymatic activities are encoded.

24. Enzyme activities coded in Domain IV-ORF Group II intron

These are encoded in the intronic Open Reading Frame located in domain 4 of the Group II intron.

This single protein encodes reverse transcriptase activity, endonuclease activity, and DNA binding activity. The intronic ORF is translated before the intron splicing occurs, and the resulting protein is used for both splicing and reverse-splicing processes.

25. Selfish-genetic elements

Group II introns are considered selfish genetic elements, as they increase their own copy through reverse splicing. There are other selfish genetic elements in the genome that use similar mechanisms to expand their own coding element.

One such example is Long Interspersed Nuclear Element, for short called LINE, which belongs to the Non-Long Terminal Repeat Retrotransposon class. L1, a type of LINE found in the human genome, contains open reading frames that encode an endonuclease and a reverse transcriptase. Protein produced from this open reading frame attach to the messenger RNA that encoded it, forming a complex. This complex then randomly attacks genomic DNA. The endonuclease first cleaves one strand of the double-stranded DNA, and reverse transcription of the LINE begins from this cleavage point. Eventually, the LINE messenger RNA is converted into double-stranded DNA, and integrated into the genomic DNA. Through this process, LINE molecules can increase their own copies at random locations within the genome.

Interestingly, the reverse transcriptase possessed by LINE shows high homology with the reverse transcriptase of Group 2 introns. Molecules that have the ability to reverse transcribe their own messenger RNA into double-stranded DNA, exhibit characteristics of selfish genetic elements.

26. Homing reaction in Group I intron

Group I introns are considered selfish genetic elements that can insert DNA fragments encoding themselves into specific sites of genes, through a process called homing. thereby increasing their copy number. The gene shown here contains a Group I intron with an intronic open reading frame, encoding an enzyme called a homing enzyme or homing endonuclease.

The homing enzyme recognizes a very long DNA sequence of 15 to 20 base pairs and cleaves double-stranded DNA near the center of this sequence. The recognition sequence for the homing enzyme is the combined sequence of the upstream and downstream exon parts, where this intron is inserted. In other words, genes lacking this Group I intron contain the recognition sequence for the homing enzyme within them. On the other hand, in genes where the Group I intron is inserted, the recognition sequence is interrupted by the intronic DNA sequence.

When an intron-less gene encounters the homing enzyme, its double-stranded DNA is cleaved due to the presence of the recognition sequence. Once double-stranded DNA is cut by the homing enzyme, it needs to be repaired immediately. However, simply rejoining the cut ends would recreate the sequence recognized by the homing enzyme, leading to repeated cleavage. There is only one way to escape cleavage by the homing enzyme, while maintaining normal gene function. That is to perform recombinational repair, using a homologous gene containing the intron as a template.

The figure shows the cleaved gene and the homologous gene containing the Group I intron. When the intron-less gene is cut, it can undergo recombination with the intron-containing homologous gene, incorporating the intronic DNA and thus escaping cleavage by the homing enzyme.

Moreover, since the incorporated fragment is intronic DNA, it is removed in the messenger RNA, resulting in the production of a protein with normal function. In this way, when a gene containing a homing enzyme and an intron-less homologous gene are present in the same cell, the intron-less homologous gene can only escape cleavage by the homing enzyme by incorporating the DNA region encoding the Group I intron, while still producing a normal protein. The reaction in which an intron-less homologous gene becomes an intron-containing gene, through homologous recombination is called homing. This reaction does not involve reverse transcriptase, so it is called a homing reaction rather than a retro-homing.

27. Homing enzyme as a tool for genetic engineering

Homing enzymes with such long recognition sequences are now used as tools in genetic engineering. The example shown here is a type of homing enzyme named I-SceI. The recognition sequence of this homing enzyme is 18 base pairs long.

This enzyme is encoded in the open reading frame of a Group I intron in the mitochondria of Saccharomyces cerevisiae, a species of yeast. The recognition sequence is 18 base pairs long. Since 4 to the power of 18 is about 68 billion, a simple probability calculation suggests that such a sequence would appear only once in 68 billion base pairs.

Considering that the human genome is about 3 billion base pairs long, this demonstrates that it is an enzyme with extremely high specificity. Therefore, it is used in cases where there is a need to fragment the genome into very large pieces.

28. Coexistence of intron-containing and intronless homologous genes : Bacterial conjugation

Group II intron retrohoming and Group I intron homing require the coexistence of an intron-containing gene and an intronless homologous gene within the same cell. However, in many bacteria, each gene typically exists as a single copy, and there is no situation where intron-containing and intronless genes are simultaneously encoded in the genome. Let’s explore how the coexistence of intron-containing genes and intronless homologous genes can occur.

In bacteria, conjugation is a well-known process. Conjugation refers to the sharing and recombination of genomic DNA between bacteria of the same species through mating. Even in bacteria, there is a form of sex, where DNA can be transferred from a male bacterium to a female bacterium. The female and male bacteria connect through a sex pilus. DNA sharing begins from the male bacterium to the female bacterium through this sex pilus. In this way, a situation can arise where the intronless gene from the male bacterium and the intron-containing gene from the female bacterium coexist within the same cell.

In eukaryotes, during fertilization, not only do the nuclei fuse, but mitochondria and chloroplasts also merge. If the male’s mitochondrial DNA genes are intronless while the female’s mitochondrial DNA genes contain introns, a situation can occur where intronless genes and intron-containing genes coexist. Under such circumstances, retrohoming or homing can take place.

This concludes the explanation of this lecture.

Visited 12 times, 1 visit(s) today

コメント

コメントする

目次