Vision and Goals:
Fundamental plant biological information from a model plant: As a member of the Graminae and a crop plant, a wealth of fundamental information about important aspects of plant biology can be learned from the genomic sequence of rice. Rice is a model for learning about yield, hybrid vigor, single and multigenic disease resistance. Different races of rice are adapted to a wide variety of environmental situations, from tropical flooding to temperate dry land, so it is a model for real life adaptive responses. Because it shares collinear genomes, rice is a key to knowledge of the genomic organization of the other grasses. Comparison of the sequence of the dicot, Arabidopsis thaliana, with that of rice, a model monocot, will tell us what genomic structures these two different groups of angiosperms have in common and how they differ.
Rice is the principal food of half of the world's population. Over the last 30 years there have been great strides in rice production, but the combination of greater world demand for rice coupled with less land and water inputs puts even greater pressure on us to improve rice production. The complete genome sequence will provide the ultimate genetic map in combination with the application of functional genomics tools will tell us where all of genes are and what they do. In the near future a plant breeder will map a trait to a chromosomal region and then go to the database to find the likely candidate gene. Not only does this provide the ultimate genetic marker, it allows the breeder to look at other germplasms for new and potentially better alleles.
While the goals of the International Rice Genome Project must be focused, the information provided by the International Project can be exploited by the entire research community to learn:
The function and map location of cereal and ultimately all plant genes.
Use of map-based sequence information to identify and provide markers for
agronomicly significant genes.
The molecular basis of plant growth and development so that fundamental questions
in plant physiology, biochemistry, cell biology, and pathology can be addressed.
The relationship of genome structure to gene expression.
The primary goal is the complete genome sequence of rice.
The primary activity in the first year will be to prepare and distribute
clones for sequencing. During this period, it is anticipated that the libraries
will be quality controlled and that the clones will be end sequenced and fingerprinted.
Subsequent years will be devoted to large-scale genomic sequencing. The objective
is to complete the task in ten years.
The purpose of the international collaboration is to accelerate the completion of this goal.
The International Collaboration is best achieved by sharing materials and
technologies and by the timely release of sequence and related information.
To this end, scientists interested in the genomic sequencing of rice participated
in a workshop held in conjunction with the International Symposium on Plant
Molecular Biology in Singapore on September 23, 1997 (rice.html).
A Working Group, nominated in Singapore, met on February 5, 1998 to develop
this document.
Membership in the Rice Genome Project:
Any group willing to sequence large stretches of contiguous genomic DNA is welcome to join the collaborative effort as long as they are willing to following the agreed upon guidelines. Participants agree to share materials, including libraries, and to the timely release to public databases of physical mapping information and annotated DNA sequences. A group must agree to sequence one megabase of DNA per year to maintain membership. Members agree to declare their sequencing plans and to provide detailed plans and progress on their respective web pages.
Individual sequencing groups are encouraged to claim large chromosomal regions or entire chromosomes, if they have the sequencing capacity, to increase the likelihood that entire chromosomes are completed. Groups may claim chromosomal regions which they agree to sequence within one to three years.
Post-sequencing activities, such as functional genomics, are beyond the
scope of the International Rice Sequencing Project. Further, the Project does
not encompass the cloning and sequencing of specific rice genes for research
purposes or industrial sequencing efforts. While the International Project
will be happy to share information with these individual efforts, their conduct
is beyond the scope of these agreements.
The Rice Genome Working Group:
The Working Group is the body that will make decisions that pertain to the goals, strategies, and coordination of the collaborative effort. The Working Group will be responsible for planning the most efficient means of completing the project. Among its responsibilities will be assigning regions to be sequenced that will avoid duplication and maximize overall progress.
The Working Group is comprised of representatives of each research group participating in the International Rice Genome Sequencing Project. As Japan is recognized as having a leadership role in the Project, the head of the RGP will be the permanent chairman of the Working Group.
Major policy decisions, including sequencing assignments, will be taken by representatives from each of the major national groups participating in the Project. Currently, these regional representatives are Japan, Canada, China, France, Korea, India, Taiwan, Thailand, the UK and the U.S.
The Working Group will meet annually in Japan. Interim meetings, as needed,
may be held elsewhere. The meetings will be open to the public. Results of
Working Group meetings will be posted on web sites and published in ORYZA.
Methodology:
The Oryza sativa ssp. japonica cultivar, Nipponbare, also known as GA3, will be sequenced. Seed from a single plant will be distributed by Dr. Sasaki for the purpose of making libraries. The primary reasons for choosing this cultivar are that more than 40,000 EST sequences from the strain have been released to DDBJ and that a physical map based on YACs that covers over 60% of the genome has been published. Sequencing other cultivars is strongly discouraged as genetic polymorphisms cannot be distinguished from sequencing errors. Moreover, groups not sequencing from one of the shared libraries would not benefit from the associated accumulated knowledge and the other advantages of collaboration. It is recognized that comparative mapping and sequencing of other rice subspecies is valuable information that the International Rice Genome Sequencing Project would like to share. Nevertheless, the primary goal of the Project is the complete sequence of the genome of a single cultivar.
The primary substrate for sequencing will be large insert libraries prepared with PAC or BAC vectors. Currently, the RGP has made a PAC library with 20-fold genome coverage. Dr. Rod Wing (CUGI) has made two BAC libraries which together provide about 28-fold coverage of the genome. The quality of these libraries and their coverage has been verified by hybridizing each with many single copy EST probes and with organellar DNA.
The BACs have been fingerprinted for the purposes of preparing contigs and checking the integrity (deletions or rearrangements) of the clones. The information generated will also be invaluable where repeated sequences make BAC end sequences ambiguous. In addition, where there is multi-fold coverage, the assembly program can pick out inserts that have deletions or rearrangements. Fingerprinting information is publicly available so that individual laboratories can verify the quality of the contigs they plan to sequence.
The RGP has used 4,000 mapped ESTs to physically map their PAC clones. These mapped ESTs are an unmatched resource in preparing a physical map as they provide sequence, map location, and direction.
In parallel with fingerprinting, the BAC clones have been subjected to
end-sequencing. This should provide an STS every 4 to 5 Kb on average and
will allow genome sequencers to pick the clones with minimum overlap.
Accuracy:
The Rice Genome Sequencing Project will serve as a model for all other grasses and cost about $200M. The sequence will be used by other researchers and will thus be scrutinized. It is imperative that these resources not be squandered on inaccurate results. In part, this problem has been addressed by insisting on sequencing DNA from the same cultivar, if not the same plant, to minimize variation due to genetic polymorphism.
Fingerprinting of multiply overlapping inserts is a means of verifying that the BACs chosen for sequencing have not been rearranged. Collinearity with the genome should also be verified by probing restriction enzyme digests of genomic or the appropriate YAC DNA with the BAC and comparing this with digests of the BAC itself.
The Rice Genome Sequencing Project has adopted the following finishing standards:
Minimum Standards (exceptions are noted in annotation comments):
(i) A single contig is generated.
(ii) The bulk of the sequence should be derived from multiple subclones sequenced from both strands. Less than 3% of the sequence should be derived from multiple subclones sequenced only from the same strand with the same chemistry. These regions must pass manual inspection by the finisher for any sequence problems, but do not need to be annotated unless the sequence quality falls below phred 30. Less than 1% of the sequence should be derived from a single subclone. In the case of a region covered by a single subclone, the clone must be sequenced either on both strands or with two different chemistries, and the region must be annotated.
(iii) More than 99% of the sequence has less than one error in 10,000 base pairs as reported by phrap or other sequence assembly consensus scores. The RGP has empirically determined that a phrap score of 30 or above exceeds the standard of less than one error in 10,000 bp. Exceptions must be be manually checked and have passed inspection for possible sequencing problems. These areas must be annotated.
(iv) The assembled sequence is confirmed by restriction enzyme digestion.
Exceptions, all of which require an annotation note:
(i) In instances where gap closure/finishing is difficult to complete, sequences should be submitted to DDBJ/EMBL/Genbank as complete under the following conditions:
a) the sequence within a single BAC or PAC clone contains at most one gap of less than 500 bp
b) the contigs on either side of the gap are oriented and ordered correctly
c) all currently available closure/finishing techniques have been attempted to close the gap
In addition, the sequencing group is strongly encouraged to continue making a good faith effort to close the gap as long as possible, and to revise their submission to DDBJ/EMBL/Genbank if and when they close it.
(ii) In the case of regions consisting only of PCR fragments (including PCR products from subclones), high fidelity polymerase should be used and if the PCR products are cloned before sequencing, at least two PCR clones are necessary.
(iii) In the case of simple repeat sequences, including single nucleotide repeats, where the number of repeats can not be determined, the length of the repeat region should be estimated by restriction enzyme digestion or PCR.
(iv) Every effort must be made to resolve large repeats, particularly if they contain unique sequence. Should problems persist, the size of the repeat region, confirmed by restriction enzyme digestion or PCR, the nature of the repeats, the size of repeats, and the finishing problem should be indicated.
(v) Sequences of bacterial transposons and other obvious contaminants are screened and deleted from the finished sequence; the size, sequence and position of the deleted region are indicated.
(vi) Where the confirmed sequences of overlapping regions between adjacent
PACs or BACs differs, these differences should be indicated.
Sequence Release:
The Rice Genome Sequencing Project agrees to the immediate release of finished,
but not necessarily annotated, sequence in units of intact BAC or PAC inserts.
These finished sequences will conform the accuracy standards described above.
Release means submission to a public database such as DDBJ, EMBL, or GenBank.
Immediate submission of preliminary assemblies - at least the phase 2 stage
- to the HTG divisions of DDBJ, EMBL, or GenBank is also required. Phase
2 sequences are unfinished BACs or PACs, in ordered, oriented contigs, with
or without gaps. Members agree to follow the Bermuda guidelines
for data release and intellectual property. In particular, "all genomic
sequence information should be freely available and in the public domain in
order to encourage research and development and to maximize its benefit to
society." IRGSP members will not patent primary sequence data generated
under the auspices of IRGSP.
Annotation:
Members of the Working Group, while recognizing the importance of annotation
to the value of sequence information, view annotation as separate from release
of finished sequence. Each sequencing group is responsible for annotating
the sequence they contribute. A uniform standard of annotation has been agreed
upon that checks the integrity of the sequence, assigns and identifies regions
of homologies, delineates potential open reading frames, and names and indicates
the beginnings and ends of genes. Common annotation software will be adopted.
The annotator must state whether coding sequences and splice sites where
determined experimentally or by using software. It is recognized that the
use of published cDNA sequences greatly facilitates this task. We recognize
that annotation tools and standards are changing. Current annotation guidelines
can be found at http://demeter.bio.bnl.gov/Annotation.html
Rice Genome Database:
An integrated database established in Japan will facilitate collaboration, coordinate sequencing work, and provide methods for submitting, using, and sharing information. Sequences will be released to one of the public databases, DDBJ, EMBO, or GenBank. The Rice Genome Database will pick up new submissions from the public databases. The Database will store and manage the annotation information. Each participant will maintain a Web site with a standardized format that describes work in progress and sequences completed. The Database will be linked with the Web sites of each of the Projects participating laboratories and thus be able to maintain a registry of clones being sequenced, monitor progress, and coordinate activities. The database will also be linked with sites that are providing finger printing information and end sequences. With ever expanding databases, annotation is never complete. It may be advisable to assign the task of periodic update of the annotation of rice genomic sequence to the Rice Genome Database.
The larger goals for the Project envision the use of sequence information
to provide biological lessons for rice and other cereals. The Rice Genome
Database is a means for linking all genomic information related to rice DNA
sequence. This information comes from existing genomic databases and from
work that derives from DNA sequencing, such as determination of gene function.
The Rice Genome Database will thus be linked with other rice and cereal databases
and to international groups that will be learning about the function of rice
and other cereal genes.
Outreach:
To be successful, this large sequencing effort needs the broad support
of scientists working on rice and other cereals who will be the potential
end users of the sequence information. Ultimately, it is the public at large
who supports the project and steps at public education should be undertaken.
They must believe that the project is worthwhile, that is well-organized and
credible. There are a number of ways that The Rice Genome Sequencing Project
will attempt to engender this support:
Timely release of finished, annotated sequence blocks
as well as the availability of mapped BACs and YACs.
ORYZA will report the results from
the Working Group meetings as well as news of the Project.
Internet access to The Rice Genome Database will
engender awareness and utility of the Project.
Publications from participating sequencing laboratories
should acknowledge that they are part of the Project.
The IRGSP welcomes the participation in its activities of all scientists who can contribute to its goals.
Last modified on April 23, 2002 by B. Burr and T. Sasaki.
|
|
|