Human genome project is a comprehensive mega project ,
international research effort dedicated to map the entire human genome by
determining the sequence of nucleotides in the DNA of each 22+X and Y
chromosomes and to study the function of human genes. It has been also called As
INTERNATIONAL GENOME SEQUENCING
HGP is considered to be the most ambitious project ever
undertaken due to the following points:
1. it deals with 3*10^9 base pairs, to determine their exact
sequences in different genes and in different
chromosomes and to determine the relationship of different genesto
2.It deals with storage of huge data.imagine if 3*10^9
base pairs and their sequences to be stored in books with more than 1000
letters per page and 1000 pages per book just from one single human cell.
3. The cost of sequencing this was estimated to be 9
billion in the year 1990s .( $3 per base pair).
Human genome project was initially a 13 years project. UK
was the major contributor of this project. The project was funded by NATIONAL
INSTITUTE OF HEALTH AND US DEPARTMENT OF ENERGY. The project officially began
on October 1,1990. Advances in technology computational devices for data
processing , data sorting and data retrieval, the first working draft of entire
human genome was announced in june ,2000and first detailed analysis appeared in
February 2001 on NATURE AND SCIENCE
journel.the project completed in april 2003.
GOALS OF HUMAN
1: To sequence entire above 3 billion
base pairs genomes.
2: to store
this information in databases , easily accessible to scientists across the
identify 20,000-25,000 genes in human DNA.
a physical map of human genome by cloning DNA into yeast artificial chromosomes(YACs)
and bacterial artificial chromosomes(BACs).
5: to develop
technological advances in genetic methodologies like gene cloning , sequence
costigs and sequencing genomes.
transfer related technologies to other sectors (eg:industries)
A few salient features are:
1.size of genes vary greatly. Hence the largest guamn gne
named DYSTROPHIN considered to have 2.4 billion bases.
2:more than 50% genes discovered are not known.
3: less than 2% of genes code for proteins.
4:repetitive sequences do not code for proteins are 50%of
human genome.( repetitive sequences are stretches of DNA repeated upto thousand
5: 1 billion copies of 5-8 bp repeated sequences are
clustered around centromeres and telomeres. They are junk DNA.
6: In HGP, as many
as 1.4 million single base difference are found. These are called as SINGLE NUCLEOTIDE POLYMORPHISM(SNP).
SNP promises accurate identification and localization of disease associated sequnces and tracing human evolutionary history.
STRATEGY AND METHODOLOGY:
It includes following stages:
The genetic and physical maps of human genome are
prepared by using MOLECULAR MARKERS, SIMPLE SEQUENCE REPEATS OR SEQUENCE TARGET
SITES, MICROSATELLITES and PCR
amplification of particular microsatellites.
used for sequencing the entire genomic DNA of human have two basic approach.
EXPRESSED SEQUENCE TAGGING METHOD and SEQUENCE ANNOTATION.
sequence tagging method:
involves identifying all genes that are expressed as RNA.they are represented
determining all coding and noncoding sequences and assigning functions to
different regions in the sequence.following steps are used:
from cell was isolated.broken into fragments randomly by sonication.( a
technique uses high frequency sound waves to make random fragments of DNA)
are then separated by agarose gel electrophoresis or pluse field gel
fragments are then clone din suitable host by using special vectors.these hosts
are yeast and bacteria. Vectors are YACs
will result in the amplification of inserted DNA fragments.
are then sequenced by automated DNA sequence.these sequence costigs are
arranged on the basis of overlapping regions present in them. On this basis,a
continuous sequence of nucleotides could be established for a chromosome region.
specialised computers are developed for alignment of sequences.
help of these computer based programs these sequences were annoted and were assigned
to each chromosome.
3:GENERATION OF PHYSICAL
AND GENETIC MAPS:
and physical maps are generated by using information on-
of restriction endonuclease recognition sites
repetitive sequences called microsatellites.