M.Sc. (Zoology) (MSCZOO)
Term-End Examination
December, 2024
MZO-005 : GENOMICS AND PROTEOMICS
Time : 2 Hours| Maximum Marks : 50
Note: Attempt any five questions. All questions carry equal marks.
1. (a) Differentiate between short interspersed nuclear elements and long interspersed nuclear elements. (5 Marks)
Short Interspersed Nuclear Elements (SINEs) and Long Interspersed Nuclear Elements (LINEs) are two major types of non-coding repetitive DNA sequences found in eukaryotic genomes, especially in mammals. Both belong to the category of transposable elements but there are many differences between them based on various aspects.
1. Based on Length
SINEs are short sequences, usually about 100 to 400 base pairs long.
In contrast, LINEs are much longer, typically ranging from 6,000 to 8,000 base pairs.
2. Based on Autonomy and Transposition
SINEs are non-autonomous. They cannot move by themselves and depend on the enzymatic machinery of LINEs for their retrotransposition.
On the other hand, LINEs are autonomous elements. They have their own promoter and encode proteins like reverse transcriptase and endonuclease, which are necessary for their own transposition.
3. Based on Origin and Composition
SINEs are generally derived from cellular RNAs such as tRNA or 7SL RNA. They are mostly non-coding and act as pseudogenes.
In contrast, LINEs originate independently and may carry open reading frames (ORFs) coding for proteins needed for their mobility.
4. Based on Abundance and Distribution
SINEs are more numerous than LINEs in the human genome. The most common SINE in humans is the Alu element.
LINEs, especially LINE-1 (L1), are also abundant but fewer in number compared to Alu repeats.
5. Based on Presence of Coding Regions
SINEs do not contain any protein-coding regions.
LINEs contain open reading frames (ORFs) that code for functional proteins like reverse transcriptase.
6. Based on Role in Genome Evolution
SINEs mainly act as regulatory elements and influence gene expression by inserting near or within genes. However, they cannot cause structural genome changes on their own.
LINEs can actively reshape the genome because they are mobile and can insert themselves into new locations. This may lead to mutations, gene disruption, or new regulatory sequences.
(b) Discuss the different mechanisms of gene duplication with diagrams. (5 Marks)
Gene duplication is an important evolutionary process that increases genetic material and provides raw material for new gene functions. There are several known mechanisms, but five are considered the most important and well-studied in molecular biology and genetics:
1.Whole Genome Duplication (Polyploidy)
In this case, the entire genome gets duplicated. It is common in plants and some vertebrates like amphibians and fish. This leads to a massive increase in gene number and can allow evolution of new functions through divergence of duplicated genes.
2. Tandem Duplication
In this type, a gene is duplicated and placed right next to the original gene on the same chromosome. This often happens due to errors like unequal crossing over during meiosis. It can result in gene families, such as those coding for hemoglobin.
3. Transposon-Mediated Duplication (Mobile Element-Mediated Duplication)
Here, transposable elements (jumping genes) help to copy and move gene sequences from one location to another in the genome. These duplicated genes may stay active or become non-functional depending on where they are inserted.
4. Segmental Duplication (Low Copy Repeats)
This involves duplication of medium to large DNA segments, often 1,000 to 200,000 base pairs long. These segments may include one or more complete genes. They may be present on the same chromosome or different ones and can lead to structural variations.
5. Retroduplication (Retroposition)
In this type, mRNA of a gene is copied back into DNA by an enzyme called reverse transcriptase and inserted into the genome. These copies usually lack introns and often lack original promoter sequences. Many become non-functional (pseudogenes), but some may gain new functions.
2. Describe the hybridization based approach to transcriptome analysis along with advantages and limitations. (10 Marks)
Hybridization-based approaches are used to study the transcriptome by detecting RNA transcripts through base pairing between complementary nucleic acid strands. These methods mainly rely on pre-designed probes that can bind specifically to target RNA sequences. The most common and well-established method under this category is microarray technology.
How It Works
In a typical microarray experiment, mRNA is first isolated from the cell or tissue of interest. This mRNA is then converted into complementary DNA (cDNA) using reverse transcriptase. The cDNA is labeled with fluorescent dyes and hybridized onto a microarray slide. This slide contains thousands of DNA probes fixed in a grid-like pattern, each specific to a known gene. When labeled cDNA binds to its matching probe, it produces a fluorescent signal. The intensity of this signal indicates the level of gene expression. The data is collected using a laser scanner and analyzed using computational tools.
This method helps researchers compare gene expression between different samples like normal vs diseased tissue, or treated vs untreated cells. It is often used in cancer research, drug discovery and developmental biology.
Advantages
- High throughput: Thousands of genes can be studied at once in a single experiment.
- Cost-effective for known genes: Once the microarray is designed, it can be reused for multiple samples, making it economical for studying known transcripts.
- Comparative analysis: It is very useful for comparing gene expression profiles between different conditions.
- Standardization: The method is well-established with standardized protocols, making it suitable for routine use.
Limitations
- Limited to known sequences: Only genes for which probes are already available can be detected. Novel or unknown transcripts cannot be studied.
- Low dynamic range: The sensitivity is lower than sequencing-based methods and it may not detect low abundance transcripts effectively.
- Cross-hybridization: Non-specific binding between similar sequences can give false results or background noise.
- Quantitative limitations: While it gives relative expression levels, it is not very accurate for absolute quantification.
- Cannot detect splice variants properly: Different isoforms of a gene are often missed unless probes are specifically designed.
3. (a) Explain the steps of whole genome shotgun sequencing with illustrations. (5 Marks)
Whole genome shotgun sequencing is a method used to determine the complete DNA sequence of an organism. In this approach, the whole genome is randomly broken into small fragments and then sequenced. The overlapping sequences are later assembled using computational tools. This method is fast and widely used for large-scale genome projects.
There are following main steps in whole genome shotgun sequencing.
1. Random Fragmentation
In the first step, the genomic DNA is randomly broken into small fragments. This can be done using physical methods like sonication or chemical/enzymatic treatment. The goal is to get multiple small DNA pieces from all over the genome.
2. Library Preparation
These DNA fragments are then processed to make a sequencing library. Special adapter sequences are attached to the ends of each DNA fragment. These adapters help the fragments bind to the sequencing platform and also help in amplification.
3. Sequencing of DNA Fragments
Once the library is ready, each DNA fragment is sequenced. High-throughput sequencing methods such as Illumina are used. Usually, both ends of each fragment are sequenced. This is called paired-end sequencing and it improves the accuracy of assembly.
4. Sequence Assembly
After sequencing, thousands or millions of short DNA reads are obtained. Using bioinformatics tools, these reads are assembled based on overlapping regions. This produces longer sequences called contigs. Multiple contigs are then joined to make scaffolds and eventually the full genome.
5. Gap Filling and Annotation
Some regions may remain unsequenced or may contain gaps. These are filled using targeted sequencing. Finally, the assembled genome is annotated to identify genes, coding regions, regulatory sequences and other important genomic elements.
(b) Enumerate the different tools for the prediction of secondary structure of proteins. (5 Marks)
Proteins have specific folded shapes and one of the early levels of folding is the secondary structure, which includes alpha helices, beta sheets and random coils. There are many tools that help in predicting the secondary structure of proteins. These tools are important in understanding how a protein folds into structures like alpha helices, beta sheets and random coils. These structures affect the function of proteins, so prediction is useful in research, drug development and biotechnology. The following are the major tools used for this purpose:
1. Chou-Fasman Method
This is one of the oldest method used for secondary structure prediction. This method works by checking how likely each amino acid is to form a particular structure like helix or sheet. It uses data from many proteins already studied. If a part of the sequence has amino acids that usually form helix, it predicts helix there. This method is simple and based on statistics. It is simple and useful for understanding basic prediction logic but has low accuracy.
2. GOR Method (Garnier-Osguthorpe-Robson)
GOR looks not only at one amino acid but also at its neighbors (nearby amino acids). It uses statistics from many known proteins to see what shape a particular group of amino acids usually forms. So, the shape prediction depends on the amino acid and its surroundings.
3. PSIPRED
PSIPRED uses advanced computer programs called neural networks. It learns from many proteins whose structures are known. It also compares the given sequence with similar sequences in databases. This helps PSIPRED make very accurate predictions about secondary structure.
4. JPred
JPred is a web-based tool that combines several prediction methods. It uses multiple sequence alignment and machine learning. It gives not only the prediction but also the confidence level for each part of the sequence. This helps users judge how reliable the prediction is.
5. SOPMA (Self-Optimized Prediction Method with Alignment)
SOPMA predicts secondary structure by using multiple sequence alignments and statistical patterns, which increases accuracy over single-sequence methods. It provides flexibility to adjust parameters for better predictions. SOPMA is widely used because it is reliable and fast.
4. (a) Outline the basic principle of quantitative RT-PCR (qRT-PCR) with schematic diagram. (5 Marks)
Quantitative RT-PCR (qRT-PCR), also called real-time PCR, is a molecular technique used to measure the quantity of specific RNA present in a biological sample. The principle involves two main stages:
- First, the RNA is converted into complementary DNA (cDNA) using the enzyme reverse transcriptase. This is necessary because DNA is more stable and PCR amplification works on DNA, not RNA.
- Second, this cDNA is amplified through polymerase chain reaction (PCR), where the amount of DNA is measured in real-time using fluorescent dyes or fluorescent-labelled probes.
During each PCR cycle, the amount of cDNA doubles and the fluorescence signal increases. This increase is detected by the instrument and the point at which the fluorescence crosses a threshold level is used to calculate the starting quantity of RNA. The more RNA in the original sample, the earlier the signal is detected.
This method is widely used for gene expression studies, viral RNA quantification and disease diagnostics. It is highly sensitive, specific and provides accurate quantification of RNA.
There are two commonly used formats in qRT-PCR:
- One-step qRT-PCR: The reverse transcription and PCR amplification happen in a single tube. This is faster, reduces contamination risk and is preferred for high-throughput analysis.
- Two-step qRT-PCR: The reverse transcription is done first in one tube to make cDNA. Then a portion of that cDNA is used in a separate PCR reaction. This allows storage of cDNA and testing of multiple genes from the same sample.
(b) Write a brief note on c-DNA-AFLP with a neat diagram. (5 Marks)
cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a powerful technique used for large-scale transcriptome analysis, especially useful when full genome sequence is not available. It helps in identifying and comparing gene expression patterns across different biological samples such as control vs stressed, infected vs uninfected etc. This technique is highly sensitive, reproducible and does not require any prior knowledge of sequence data. It is particularly useful in non-model organisms and has been applied in plant stress studies, disease diagnostics and developmental biology.
This method is based on the AFLP (Amplified Fragment Length Polymorphism) principle but it uses complementary DNA (cDNA) instead of genomic DNA, which reflects the mRNA expression pattern of the sample. By comparing banding patterns on gel, differentially expressed genes can be visualized and studied.
The procedure includes the following eight major steps:
1. cDNA synthesis – Total RNA is extracted from the sample and converted into double-stranded complementary DNA (cDNA) using reverse transcriptase enzyme.
2. First restriction digestion – The synthesized cDNA is digested using a restriction enzyme like EcoRI to generate DNA fragments.
3. 3' end capturing – To enrich for coding sequences, the 3' end of mRNA is targeted using oligo(dT) primers during the synthesis step.
4. Second restriction digestion – A second enzyme like MseI is used for further fragmentation to improve resolution.
5. Adapter ligation – Synthetic DNA adapters are attached to the ends of the fragments. These adapters act as primer-binding sites for PCR.
6. Preamplification – A non-selective PCR is done using primers that bind to the adapter regions. This increases the total fragment amount.
7. Selective amplification – PCR is repeated with primers containing additional selective nucleotides to reduce the number of amplified fragments, which gives a clearer banding pattern.
8. Gel electrophoresis – The amplified DNA fragments are separated using polyacrylamide gel. Different banding patterns represent different gene expression levels between samples.
By comparing band patterns from different samples, researchers can identify which genes are upregulated or downregulated under certain conditions.
5. Discuss the technique of western blotting for detecting any specific protein in a sample. Comment on the blotting efficiency of this technique. (10 Marks)
Western blotting is a molecular technique used to detect a specific protein from a mixture of proteins in a biological sample such as tissue extract, blood, or cultured cells. It is commonly used in research and diagnostics. This technique works by separating proteins using gel electrophoresis, then transferring them on the surface of the membrane and finally using antibodies to identify the target protein. It helps in knowing whether a particular protein is present in a sample and also gives an idea about its size and amount.
There are five main steps in the western blotting process:
1. Protein Extraction and Quantification
First, proteins are taken out from the sample using a lysis buffer which breaks open the cells and releases their contents. The total protein concentration is then measured using simple color-based methods like the Bradford or BCA assay. This step ensures that equal amounts of protein from each sample can be loaded into the gel, which is important for comparing results accurately and avoiding false differences.
2. SDS-PAGE (Sodium Dodecyl Sulfate–Polyacrylamide Gel Electrophoresis)
Proteins are treated with SDS, a detergent that gives negative charge and linear shape to proteins. They are then separated based on size using polyacrylamide gel under an electric field. Small proteins move faster than larger ones.
3. Transfer to Membrane (Blotting)
After separation, the proteins are transferred from the gel to a membrane made of nitrocellulose or PVDF. This step is called blotting. It is usually done using an electric field in a technique called electroblotting. This membrane holds the proteins in the same arrangement and makes them accessible for antibody detection.
4. Blocking and Antibody Incubation
The membrane is soaked in a blocking solution (like skimmed milk or BSA) to prevent non-specific antibody binding. After this, the membrane is treated with a primary antibody that binds only to the target protein. Then a secondary antibody is added which binds to the primary antibody and is linked to an enzyme like horseradish peroxidase (HRP).
5. Detection and Analysis
When a special chemical (substrate) is added, the enzyme on the secondary antibody reacts and gives a signal like light or color. This signal shows where the target protein is and how much of it is present.
Blotting Efficiency of Western Blotting
Western blotting is known for its good specificity and sensitivity. It can detect even small amounts of protein if the process is done correctly. The success of blotting depends on how well proteins move from the gel to the membrane, how well the membrane is blocked to avoid background noise and how specific the antibodies are. If not done carefully, small proteins may be lost or the signal may not be clear. Still, it is one of the most trusted methods for protein detection in research.
6. (a) Discuss the different methods for post-electrophoretic protein detection and visualization. (5 Marks)
After electrophoresis, proteins are separated on the gel based on their molecular weight, but they remain invisible. To study them further, specific detection and visualization methods are needed. These help to locate the proteins, estimate their quantity and sometimes even identify specific proteins. The following are the major methods used for post-electrophoretic protein detection:
1. Coomassie Brilliant Blue Staining
This is the most common method. After electrophoresis, the gel is soaked in Coomassie dye which binds to proteins. Then the gel is washed to remove excess stain. It is simple, low-cost and suitable for detecting medium to high amounts of proteins. However, its sensitivity is lower than silver or fluorescent methods.
2. Silver Staining
This method is more sensitive than Coomassie brilliant blue staining. Silver ions bind to protein and are reduced to metallic silver to give dark bands. It can detect nanogram levels of protein. But it is more time-consuming and can sometimes give background staining if not done carefully.
3. Fluorescent Staining
Fluorescent dyes such as SYPRO Ruby or Deep Purple are used for staining proteins in gels. They provide high sensitivity and wide dynamic range. Fluorescent imaging systems are required to visualize the proteins. This method is more expensive but gives precise quantification.
4. Immunodetection (Western Blotting)
In this method, proteins are transferred to a membrane and then detected using specific antibodies. It allows highly specific detection of individual proteins. This technique is important when the sample has complex protein mixtures.
(b) Elucidate the process of preparation of microarray. (5 Marks)
Microarray is a powerful technique used to study the expression of thousands of genes at once. It works on the principle of hybridization between nucleic acid sequences. Before using it in experiments, a microarray slide or chip has to be prepared where known DNA probes are fixed in an organized manner. These probes will later bind to complementary target sequences from the sample. Although different types of microarrays exist, the basic preparation steps are mostly common across them.
The general steps in microarray preparation are as follows:
1. Probe Selection:
Probes are short DNA sequences that match the target genes or regions of interest. These can be generated through PCR or chemical synthesis depending on the system being used. The aim is to ensure high specificity and binding ability.
2. Purification of Probes:
If the probes are produced through amplification, they are purified to remove unwanted components such as enzymes, salts, or unincorporated nucleotides. This step ensures that only pure probes are spotted on the surface of the array.
3. Spotting of Probes on Solid Surface:
The purified probes are placed on a solid support, usually a glass slide or silicon surface. This is done using a robotic spotting machine that places the probes in an ordered grid pattern. Some systems may directly synthesize probes on the surface itself.
4. Immobilization of Probes:
After spotting, the probes are permanently attached to the surface using methods like UV crosslinking or heating. This step ensures the probes do not wash away during later hybridization.
5. Surface Blocking:
To prevent non-specific binding, the remaining surface of the slide is blocked using agents like BSA or detergents. This helps in getting accurate and clean results during sample hybridization.
6. Quality Control:
A test hybridization or signal check is done using control DNA to ensure that the array is correctly prepared. It helps verify that the probes are present and active.
7. Storage:
The final prepared microarrays are stored in dry and clean conditions at controlled temperatures to protect the probes until they are used.
Note: This general preparation process is widely followed, though exact details can vary depending on the microarray platform.
7. Write short account on the following: (2.5 × 4 = 10 Marks)
(a) Human Genome Project
The Human Genome Project (HGP) was a large international research initiative started in 1990 and completed in 2003. Its main aim was to decode the complete sequence of human DNA and identify all the genes present in the genome. Scientists found that the human genome contains about 3.2 billion base pairs and nearly 20,000 to 25,000 protein-coding genes.
One major discovery was that only about 1.5% of the human genome codes for proteins. The remaining DNA is non-coding, but much of it is still functionally important, like in regulation, gene expression and maintaining chromosome structure.
The project helped create detailed genetic maps, physical maps and DNA sequence data which are now used in biomedical research. It has contributed to understanding genetic diseases, cancer biology and gene functions.
The project was mainly led by the United States, along with major contributions from the UK, Japan, France, Germany and China. In the US, it was coordinated by the National Institutes of Health (NIH) and the Department of Energy (DOE). Dr. Francis Collins led the publicly funded HGP, while Dr. Craig Venter led a parallel private effort through Celera Genomics. Both teams worked together to finish the genome early and shared the data openly with scientists worldwide.
(b) Yeast artificial chromosome
Yeast Artificial Chromosome (YAC) is a type of vector used to clone very large fragments of DNA, often up to 1 megabase. It was developed in the 1980s to help in studying eukaryotic genomes, especially the human genome. YAC is based on the natural chromosome of Saccharomyces cerevisiae (baker's yeast) and it contains all the necessary elements for yeast chromosome function. These include a centromere (CEN), telomeres (TEL) and an origin of replication (ARS).
A YAC also carries selectable marker genes that allow researchers to identify successful clones. Since it mimics the structure and behavior of real chromosomes, it supports stable and long-term propagation of inserted DNA fragments inside yeast cells. YACs played a major role in the Human Genome Project by helping to map and sequence large regions of the human genome. However, due to instability and recombination problems, they were later replaced by other systems like Bacterial Artificial Chromosomes (BACs).
(c) Dicer
Dicer is an important enzyme involved in the process of RNA interference (RNAi), which helps regulate gene expression. It is a type of ribonuclease (RNase) enzyme that cuts long double-stranded RNA (dsRNA) or precursor microRNA (pre-miRNA) into small fragments called small interfering RNA (siRNA) or microRNA (miRNA). These small RNA molecules are about 20-25 nucleotides long.
Dicer works by recognizing the double-stranded RNA and cleaving it into these shorter pieces. The siRNA or miRNA produced by Dicer then guide a protein complex called RISC (RNA-induced silencing complex) to target specific messenger RNA (mRNA) molecules, leading to their degradation or blocking their translation into proteins.
Thus, Dicer plays a key role in controlling which genes are turned off, protecting cells from viruses and regulating development and cellular functions. It is found in many organisms, including plants, animals and fungi.
(d) Isotope coded affinity tag
Isotope Coded Affinity Tag (ICAT) is a method used to compare and measure proteins from different samples. It helps scientists find out which proteins change in amount between two conditions, like normal and diseased states.
Isotope Coded Affinity Tag (ICAT) has three important parts:
- First is the isotope-coded tag, which comes in two types – light and heavy. These tags stick to a part of protein called cysteine. Both tags act the same but have different weights because of isotopes.
- Second is the affinity tag, which helps to catch only the tagged proteins from a mix using special beads or columns. This makes it easy to separate proteins we want to study.
- Third is mass spectrometry, a machine that reads the weight difference between light and heavy tags. It tells how much protein is in each sample by comparing these weights.
👍
ReplyDelete