The promise of the complete sequence of the rice genome in ten years offers
great potential for rice as an agronomic and model plant.
There have already been important advances in rice genomics:
The germplasms of Oryza sativa and related species have been widely collected and
surveyed.
There are abundant mapped molecular markers that provide convenient linkage with
important phenotypes.
A physical map of YACs covers most of the genome.
Expressed genes from a number of tissues are known through large scale cDNA
analyses.
Databases linking this information are available through the Internet.
These tools make it possible to identify a gene corresponding to a phenotype.
However, gene identification must still be done on a gene-by-gene basis and only a fraction
of the genes can be discovered this way. The rice genome is expected to contain more than
20,000 genes expressed in different parts of the plant or in response to different stimuli.
Genome sequencing will provide a more direct route to decoding the information content of
the genome. The complete genomic sequence is a bridge to understanding the rice plant as
the result of the expression of an assemblage of genes. The availability of the complete
sequence of the rice genome will form the basis for understanding the function and time of
expression of most of the genes. Knowing the order of the sequence permits association of
candidate genes and phenotypes. It is the means to more effectively modify the plant and its
responses to the environment. All cereals exhibit remarkable syntenic relationships to each
other. The rice nuclear genome, which contains about 400 Mb of DNA, is the smallest of the
cereal grasses. A complete and accurate map-based rice genomic sequence is the key to
deciphering the genomes of the other cereal grasses whose genomes are too large to
sequence.
Realizing the small size of the rice genome and the key role it could play as a model
cereal, a number of scientists began thinking about the possibility of sequencing rice. A
workshop was held in Singapore in September 1997 to see if there was interest in an
international collaborative project. Based on the positive outcome of that meeting,
representatives of Japan, the United States, the European Union, China, and Korea met in
Tsukuba, Japan, in February 1998 to formalize the principals of an international
collaboration. The major points agreed upon were the use of a single germplasm, sharing of
materials, immediate sequence release, accuracy standards, and mechanisms for
coordinating the work.
The International Rice Genome Sequencing Project is comprised of all the
laboratories that agree to principles of the Tsukuba meeting. Scientists engaged in the
Project and representing each of the participating countries form the Working Group that
jointly administers the Project. In addition to the original five countries named above,
France, Taiwan, and Singapore will soon join the Project. The expanded participation can be
expected to accelerate the speed of progress and increase the benefits to all members
involved. Only with intimate cooperation of many nations can a complete, high-quality rice
genome sequence be obtained and the promise be kept. The rice community and other cereal
scientists do not need to wait ten years to begin to reap the benefits of this promise. A
detailed physical map will permit map and more complete knowledge of syntenic
relationships will provide tools for map-based cloning. Incremental genome sequences will
provide an ever-expanding resource for gene discovery. The database that supports the
Project will provide access to this information as well and a means of coordinating the
sequencing effort.
This Newsletter attempts to serve two functions: It is a means of fostering
communication between laboratories engaged in the International Rice Genome Sequencing
Project. It is also our way of reaching out to the broader community to describe our
progress and to publicize our plans. The current issue contains the 1998 Tsukuba
agreement for international cooperation, reports of previous meetings, current progress, and
plans for the next Working Group meeting. We welcome your comments and contributions
to future issues.
Takuji Sasaki, RGP, NIAR/STAFF
Benjamin Burr, Brookhaven National Laboratory
The Formation of an International Rice Genome
Sequencing Project
The International Rice Genome Project would not have been possible without the
enthusiastic support of the scientific leader of the Rice Genome Research Program (RGP)
in Japan, Dr. Takuji Sasaki. Basically, RGP came to the believe that there was a greater
advantage to rather form an international consortium than to sequence the entire rice
genome by themselves. There are two strong arguments for an international consortium.
First, a whole-genome sequencing project of more than 400 million base pairs conducted by
several sequencing centers will be done faster. Second, the quality of the data is likely to be
better if DNA sequences are assembled and annotated by more than one group. However,
these benefits are only realized if the international effort is built on all the data and materials
that have already been assembled primarily by the RGP in Japan.
One of the great attraction to choosing rice for a whole-genome sequencing effort is
the high density physical map and the EST data that has already been produced.
Furthermore, participation of several countries and sequencing centers will require speedy
distribution of data and materials by all participants. It will require both coordination and
management of sequencing target sites and data processing. In addition to the multitasking
of the technical aspects of the project, a grass roots approach has to be initiated to bring rice
genetics and breeding to its full research potential. Despite the role of the rice genome as a
reference genome, plant biologists around the world and in particular those already working
with rice have to rally around rice as focal research point so that the DNA sequence data can
be readily leveraged in the public domain. Given these many considerations for the
preparation of an international effort to sequence the rice genome, it became imperative to
conduct a series of meetings with plant biologists of different expertise and interest and to
develop a position paper on the basic framework for such a project. In addition, different
local governments and their agencies had to be persuaded to allocate the necessary
resources for the participating sequencing centers. Here is the chronology of this first
round of meetings and the outcome from each of them.
Spring of 1997. The Japanese Rice Genome Project is under review for a
five-year renewal on April 1, 1998, and for the first time a proposal was submitted to the
Japanese government for the complete sequencing of the entire rice genome over ten years.
April 97. In the US, Senator Bond of Missouri introduced a bill to provide
$ 40 million annually for a NSF Plant Genome Initiative (PGI). The Office of Science and
Technology Policy (OSTP) assembles an Interagency Working Group (IWG) consisting
of representatives of NIH, NSF, DOE, and USDA and chaired by the corn geneticist
Ronald Phillips to develop a scientific plan for such an initiative.
June 97. A colloquium of the US National Academy of Sciences on Food
Supply and the Role of Genome Projects was held at the Beckman Center in Irvine, CA. The
organizers of the colloquium seized the opportunity to call on the participants to discuss the
scope of a US PGI with participants from other countries. The underlying congressional
activities in the US were explained by members of the IWG. It was recognized that if
another genome besides Arabidopsis is going to be sequenced on the whole genome level, it
should be the rice genome. Although rice was not a major crop in the US, there would be
great value of an international format and a contribution by the US.
June 97. Participants of the colloquium attending the Gordon Conference on
Plant Genetics and Development a week afterwards, brought the subject of the PGI to the
floor of the conference for additional input. A subgroup consisting of Jeff Bennetzen, Joe
Ecker, Michael Gale, Jo Messing, Ron Phillips, Takuji Sasaki, and Satoshi Tabata met to
discuss the feasibility of an international rice genome project. When Takuji Sasaki, the
leader of the RGP in Japan, indicated support and willingness to share all information and
material, a strong consensus developed that an international rice genome project could be
modelled after the Arabidopsis genome project. Michael Gale was asked by the group to
develop an agenda for the International Plant Molecular Biology meeting in Singapore to
have a broader discussion with the scientific community.
September 97. Ben Burr and Michael Gale chaired a Rockefeller-sponsored
workshop on the feasibility of an international format to sequence the rice genome at the
International Plant Molecular Biology Conference in Singapore. Several hundred people
attended and with broad input four key points were established.
1) Participants agreed to participate in an international collaboration that includes the
sharing of clones and the timely release of sequence and mapping information. The
Nipponbare variety used to produce STS maps, cDNA sequences, and YAC clones was
selected as a single DNA source.
2) PAC and BAC libraries for sequencing would be simultaneously constructed in Japan
and the U.S.
3) PAC and BAC-end sequencing adopted from the human genome project would be
applied to the libraries to create a database complemented by DNA fingerprinting.
4) A genome working group was nominated to develop a position paper and to coordinate
the international effort in annual meetings at the Japanese site.A meeting report was posted
on the WEB and further discussed in preparation for the rice genome forum and workshop
in Japan in February 98.
October 97. US congress appropriates $40 million to NSF for a PGI on
significant crops. The IAW meets with different constituencies to prepare a report to OSTP
on guidelines for a PGI.
January 98. The IAW report recommends a US participation in an international
effort to sequence the rice genome with a five-year, $8 million per year budget. The Rice
Genome Project in Japan gets renewed for another 5-year term with $10 million per year.
China and Korea each support one rice genome sequencing center.
February 98. The rice genome working group, Takuji Sasaki (Japan), Guofan
Hong (China), Michael Bevan (EU), Moo Young Eun (Korea), Jo Messing (U.S.), and Ben
Burr representing the Rockefeller Foundation met to finalize a position paper.
Methodologies, data release policies, scientific standards, time frames, and funding sources
have been summarized and published on the WEB:
ftp://genome1.bio.bnl.gov/pub/maize/RiceProject.html. The group also met with Japanese
officials to explain the importance and the value of international collaborations.
April 98. Ben Burr and Ralph Quatrano organized a workshop to present to the
U.S. scientific community the results of the working group's meeting in Tsukuba.
Additional discussion on rice biology and its suitability of a model organism was presented
to NSF, USDA, and DOE officials as well. For the first time a representative from France
indicated an interest to participate. The Rockefeller Foundation supported the construction
of a Nipponbare BAC library and Novartis announced the support of a public BAC-end
sequence database.
e-mail:bohsing@ccvax.sinica.edu.twMay 98. The USDA organized a meeting on a Food genome initiative. It
indicated its intention to coordinate with NSF a genome initiative that seeks to address
problems in agriculture that cannot be met under the scope of the NSF PGI alone. A diverse
group of scientists discussed the value of different aspects of plant, animal, and microbial
genome projects. One of the foci of an USDA PGI is to support a rice genome sequencing
effort.
July 98. A public hearing at USDA was conducted to hear stakeholders on a
Food Genome Initiative. Again, only rice was proposed as a whole genome project and it
was recognized as a model crop for applied and basic plant genetics.
September 98. At the International Symposium on Rice Germplasm Evaluation
and Enhancement on the occasion of the dedication of the new U.S. rice germplasm facility
in Stuttgart, Arkansas, rice breeders and geneticists from abroad and the U.S. were
presented with the outline of an international rice genome project. The importance of
integrating the sequencing effort with the ongoing work in the rice research community was
highlighted. Participants were enthusiastic about the potential of the project for rice research
and its public format.
September 98. At the International Genome Sequencing and Analysis
Conference in Miami, a semiannual meeting of the rice genome working group reaffirmed
cooperation and strategies for the International Rice Genome Project. Progress on the
construction of PAC and BAC libraries, BAC-end sequencing and DNA fingerprinting data
were reported. A total genome shotgun sequencing approach proposed by the John Innes
Centre was discussed as a possibility, but was found to be unsuitable for an international
project and raised the concern that accuracy and completeness would be jeopardized.
Although NSF decided not to fund the rice genome project in its first year of the US PGI,
representatives from NSF announced the preparation of a new triagency program, similar to
the one for Arabidopsis, solely for the rice genome sequencing effort. It is clear that the
delay will require a change in the current project schedule (Figure 1).
Joachim Messing,
Waksman Institute,
Rutgers University, USA
messing@waksman.Rutgers.EDU
A Report from Singapore, September 1997:
An International Collaboration to Sequence the Rice Genome
There is strong interest among many cereal biologists to sequence the rice genome. Given
its relatively small size, it is a feasible undertaking given present technology. Nevertheless,
with a genome size of approximately 400 Mb, the task is so great that it is unlikely that any
one country can devote the resources to sequence the rice genome in the next ten years. In
any case, an international effort will accelerate the process and insure public access to the
data. Such a collaboration will greatly benefit from tools that have already been developed in
key laboratories working on rice genomics and from parallel collaborative efforts on other
genomes.
The purpose of this document is to summarize the current state of an international
collaboration to sequence the rice genome and to form the basis
for future decisions. Members of the Working Group solicit your comments and
suggestions.
Why sequence rice?
Rice as a model cereal
Genome size: Oryza sativa ssp. japonica is reported to have a 2C value of 0.88 pg, three
times the size of the Arabidopsis thaliana genome. The predicted gene density is one gene
every 15 kb. As such, rice has the smallest genome of the major cereals.
Well-mapped genome: The rice molecular map with over 2300 markers has already been
useful in helping align physical maps. Over 30,000 ESTs have been reported and many are
mapped. At least 200 mapped SSTs have been published. A YAC library has been
fingerprinted and ordered with mapped markers currently covers 52% of the rice genome.
Several BAC libraries have been described. A recent report suggests that 92% of the
genome is covered by ordered BACs in many contigs.
Molecular genetics: With the introduction of new methods for Agrobacterium tumefaciens
transformation, rice is the easiest of all cereal plants to transform genetically. This tool
permits geneticists to complement mutations or confer dominant phenotypes to verify gene
function.
Synteny
While grass genomes differ markedly in the amount of total DNA, they share a common set
of genes. Recent work indicates that the grass genomes - wheat, rye, barley, maize, sorghum,
millet, and rice - have similar genetic maps over large blocks of the chromosomes. When
examined in detail, local gene order has been found to be preserved, but the genes are
separated by greater amounts of repetitive DNA - mostly retrotransposons - in the species
with larger genomes. This syntenic relationship can be exploited, for instance, by geneticists
who are interested in the map-based cloning of a gene controlling chromosome pairing
which has been mapped in wheat but are faced with dealing with a genome 37 times larger
than that of rice. By selecting tightly linked single copy makers in wheat, the wheat
geneticists will be able to screen the homologous region in rice for their candidate
homologue. The syntenic relations can be exploited in the other direction as well. For
example, mapping data can be taken from maize where there is extensive work in both
transmission and molecular genetics to predict the location of a homologue in rice.
Commercial Value
Rice, wheat, and maize account for approximately half of the world's food production. Rice
itself is the principal food of half of the world's population. Over the last 30 years world rice
production has doubled as the result of the introduction of new varieties and improved
technology. However, the annual rate of rice production has slowed to the point that it is no
longer keeping pace with the growth in the number of consumers. Rice production in the
next fifty years faces even greater challenges. On the one hand, with a larger and more
affluent population there will be greater demands for higher production and better quality
rice. On the other hand, the same constraints mean that there will be less land, water, and
labor to produce the crop. In short, there will be great demands on biotechnology to improve
rice production.
Map-based sequence information: The objective of plant breeding is the selection of
favorable combinations of genes. In recent years, plant breeding has been enhanced by
molecular marker technology that permits one to screen larger populations with less
progeny testing. Knowledge of the location of all genes in a genome extends molecular
marker technology because it becomes possible to identify candidate genes controlling
specific traits. The genes then become the markers and the process becomes more accurate
and more efficient. For example, knowing the location and sequence of candidate genes
makes it possible to design allele specific markers which readily lend themselves to
automation.
Models
International alliances formed to sequence yeast, C. elegans, humans, and Arabidopsis
provide examples of how to manage an international collaboration to sequence rice. These
previously established efforts will provide examples for the present endeavor. It can be
noted, that government agencies are fortunately already familiar with the writing and
approval of memoranda of agreement in this area. Some lessons we can learn from other
genome efforts are:
1) Shared tools and information. In other projects, it has proven useful for all groups to have
access to and work from the same few libraries . All data - physical mapping information
and sequences - should be released in a timely fashion. These are principles that the
participants in the rice genome sequencing collaboration have already agreed to.
2) Scientists initiate the collaboration. Scientific rather than political decisions should
dictate the specifics of the collaboration. Individual sequencing projects will be funded
nationally, locally managed, and subject to oversight of their respective funding agencies.
Nevertheless, a system of peer oversight should guide these projects.
3) Sequencing should be done in the most efficient manner based on the science. The effort
should not be diluted by peripheral projects.
Rice Genome Workshop
On September 23, 1997, scientists interested in the genomic sequencing of rice met to
participate in a workshop held in conjunction with the International Symposium on Plant
Molecular Biology in Singapore. The meeting was chaired by Ben Burr, Brookhaven
National Laboratory, NY, USA, and Mike Gale, John Innes Centre, Norwich, UK.
The participants who spoke were:
Dr. Takuji Sasaki, Rice Genome Program, NIAR/STAFF, Tsukuba, Japan
Dr. Moo Young Eun, National Institute of Agricultural Science and Technology,
Suweon, Korea
Dr. Rod Wing, Clemson University, Clemson, SC, USA
Dr. Guo-liang Wang, Institute of Molecular Agrobiology, Singapore
Dr. Michael Roberts, John Innes Centre, Norwich, UK
Dr. John McPherson, Washington University, St. Louis, MO, USA
Dr. Jo Messing, Waksman Institute, Rutgers University, Piscataway, NJ, USA
Dr. Andy Pereira, CPRP-DLO, Wageningen, The Netherlands
Dr. John Bennett, International Rice Research Institute, Los Banos, The Philippines
Dr. Apichart Vanavichit, Kasetsart University, Nakorn Pathom, Thailand
Dr. Cliff Gabriel, Office of Science and Technology Policy, Washington DC, USA
Dr. Zhi-Hong Xu, Chinese Academy of Sciences, Beijing, China
Dr. Gary Toenniesson, The Rockefeller Foundation, NY, USA
In addition, there was discussion from the floor.
At that meeting, the participants agreed to participate in an international collaboration to
sequence the rice genome. Participants explicitly agreed to share materials, including
libraries, and to the timely release to public databases of physical mapping information and
annotated DNA sequences.
Furthermore, general agreement was reached on the initial steps in methodology:
1) The cultivar, Nipponbare, also known as GA3, will be sequenced. Seed from a single
plant will be distributed by Dr. Sasaki for the purpose of making libraries. The primary
reasons for choosing this cultivar are that more than 10,000 EST sequences from the strain
have been released to DDBJ and that a physical map based on YACs that covers over 50%
of the genome has been published. Sequencing other cultivars is strongly discouraged as
genetic polymorphisms cannot be distinguished from sequencing errors. Moreover, groups
not sequencing from one of the shared libraries would not benefit from the associated
accumulated knowledge and the other advantages of collaboration.
2) The RGP will make a PAC library. Dr. Rod Wing will make three BAC libraries using
partial digests of different enzymes to generate the inserts. 60,000 BAC clones will be
isolated to provide a 20-fold coverage of the genome.
3) The BACs and PACs will be fingerprinted for the purposes of preparing contigs and
checking the integrity (deletions or rearrangements) of the clones. The information
generated will also be invaluable where repeated sequences make BAC and PAC end
sequences ambiguous.
4) In parallel with fingerprinting, the BAC and PAC clones will be subjected to end-
sequencing. This should provide an STS every 3 to 5 kb on average, allow genome
sequencers to pick the clones with minimum overlap, and provide further information for
the physical map.
It is important that none of these early steps delay large scale sequencing. Preparation of the
PAC library is currently underway and preparation of the BAC libraries will begin shortly.
Both types of libraries will be available before the end sequencing can begin. It is estimated
that with the participation of several laboratories, end sequencing could be completed within
six months. The analyses of fingerprinted Arabidopsis libraries are expected to be
completed by the end of 1997. These results will indicate what we might expect for the rice
project in term of speed, cost, and the degree of closure.
The Workshop concluded with the nomination of a provisional Working Group chosen to
direct the collaboration and to decide future directions. This document will appear on Web
sites viewed by rice researchers and comments are solicited. The next meeting of the
Working Group will be held in conjunction with the Rice Genome Forum, February 5, 1998,
in Tsukuba, Japan. Members of the Working Group are:
Dr. Takuji Sasaki, Japan
Dr. Zhi-Hong Xu, China
Dr. Moo Young Eun, S. Korea
Dr. Jo Messing, USA
Dr. Mike Bevan, Europe
Dr. Ben Burr, representing the Rockefeller Foundation
The Rockefeller Foundation has offered to facilitate administration of the collaboration.
Future Decisions Membership in the International Rice Genome Sequencing Initiative:
Any group willing to sequence large stretches of contiguous genomic DNA is welcome to
join the collaborative effort as long as they are willing to follow the agreed upon guidelines.
In Singapore there was some discussion about the minimum amount of sequence a group
would have to contribute annually to maintain membership.
The Rice Genome Working Group:
The Working Group is the body that will make decisions that pertain to the goals, strategies,
and coordination of the collaborative effort. The Working Group will be responsible for
planning the most efficient means of completing the project. Among its responsibilities will
be assigning regions to be sequenced that will avoid duplication and maximize overall
progress.
The Working Group is envisioned as being comprised of representatives of the major
groups participating in rice genome sequencing. The current group is provisional and it is
recognized that some of the major contributors to the effort might change. Rules for
deciding membership in the Working Group need to be established.
Sequencing strategy:
It has been implicit in the discussions, but never stated, that once the BAC- and PAC-end
sequencing is completed and the relevant fingerprinting data is available, the most efficient
sequencing strategy of complete BACs or PACs will be from random subclone libraries. It
will be useful to standardize this technology to insure high quality libraries that are
completely randomized with non-chimeric inserts of a uniform size. Sequencing in a
specific region of the genome should not start until a sufficient number of tiled BACs or
PACs are available to ensure an unfragmented sequence.
In the Human Genome Project it has been found that assembly of shotgun sequences leads
to contigs of about 30 kb. Sequence closure is the most difficult step in the sequencing
process because it cannot be automated. Closure will be aided by restriction site information
available from fingerprinting and possibly sequence information from overlapping BACs or
PACs. Should ambiguities remain, they should be marked on the final sequence. The final
product of this phase will be a single contiguous sequence representing the entire PAC or
BAC.
Accuracy:
The Rice Genome sequencing project, which will serve as a model for all other grasses, will
cost an estimated $200M. Given the significant costs in material and manpower, it is
imperative that the results be of the highest quality.
In part, this problem has been addressed by agreeing to sequence DNA from the same
cultivar, if not the same plant, to minimize variation due to genetic polymorphism. The
Human Genome Project has agreed to accept a standard of less than one error in 10,000 bp.
While the level of accuracy is difficult to verify, this standard is achievable by a combination
of high quality shotgun sequence reads, a seven-fold redundancy, and the requirement that
every base be sequenced on both strands. Rice is expected to have 50% repetitive DNA.
Because of this, the accuracy of final assembly of shotgun sequences will be dependent on
the length and quality of individual sequence reads. The Working Group might wish to
establish some guidelines here.
Annotation and Sequence Release:
In other genomic sequencing efforts, it has been recognized that the most useful releases are
large contiguous stretches of annotated sequence. A uniform standard of annotation must
be agreed upon that checks the integrity of the sequence, assigns and identifies regions of
homologies, and delineates potential open reading frames. This should not preclude
individual groups from publishing unannotated sequences on their local web sites.
In Singapore the participants agreed to timely release of the sequence information. It might
be useful for each participating group to agree to release of the complete annotated sequence
of a BAC or PAC within three to six months after beginning to sequence the clone.
Rice Genome Database:
An integrated database will facilitate collaboration and data sharing. Sequences will be
released to one of the public databases, DDBJ, EMBO, or GenBank, but a central database
for the project will be required to store and manage the annotation information. With ever
expanding databases, annotation is never complete. It may be advisable to assign the task of
periodic update of the annotation of rice genomic sequence to the centralized rice genome
database.
The database should also be linked with other rice and cereal databases, serve as a means of
coordinating sequencing work, and provide methods for submitting and using information.
Functional Genomics:
To date at least 50% of newly discovered open reading frames do not have homologues with
identifiable function. The use of populations with transposable element-induced knockout
mutations has been a powerful tool for identifying the function of some of these unknown
genes. While it is beyond the scope of this project, it should be recognized that a consortium
of international laboratories has formed to develop knockout populations of rice for the
purpose of discovering gene function. This consortium will provide useful tools for the
downstream analysis of genomic sequence information.
Intellectual Property Rights:
Intellectual property rights issues will be raised because of the obvious commercial interest
in the sequence for rice and other cereals. In the Human Genome Project, as well as other
international sequencing efforts, withholding data for patent application is recognized as
being incompatible with the policy of immediate release. Patent issues are regarded as being
downstream of data generation and release. These issues that must be confronted but are
probably beyond the scope of the Working Group and should be discussed at a meeting
called for that specific purpose.
Outreach:
To be successful, this large sequencing effort needs the broad support of scientists working
on rice and other cereals who will be the potential end-users of the sequence information.
They must believe that the project is worthwhile, well-organized and credible. There are a
number of ways that this support might be engendered. Roles for the general community to
influence general strategies and policies should be considered. Outside scientists can serve
as peer reviewers of individual projects. Timely release of finished, annotated sequence
blocks, as well as the availability of mapped BACs and YACs, increases end-user support.
Periodic progress reports, similar to the RGP's RICE GENOME newsletter, and internet
access to a useful database, will engender awareness and utility of the project. Interested
members of the community can begin to influence the project by commenting and making
suggestions on this document.
Benjamin Burr
Biology Department
Brookhaven National Laboratory,
Upton, NY 11973, USA
e-mail: burr@sun2.bnl.gov
A Report from Tsukuba, February 1998:
International Rice Genome Sequencing Project
Vision and Goals:
Fundamental plant biological information from a model plant: As a member of the
Graminae and a crop plant, a wealth of fundamental information about important aspects of
plant biology can be learned from the genomic sequence of rice. Rice is a model for learning
about yield, hybrid vigor, single and multigenic disease resistance. Different races of rice
are adapted to a wide variety of environmental situations, from tropical flooding to temperate
dry land, so it is a model for real life adaptive responses. Because it shares collinear
genomes, rice is a key to knowledge of the genomic organization of the other grasses.
Comparison of the sequence of the dicot, Arabidopsis thaliana, with that of rice, a model
monocot, will tell us what genomic structures these two different groups of angiosperms
have in common and how they differ.
While the goals of the International Rice Genome Project must be focused, the information
provided by the International Project can be exploited by the entire research
community to learn:
The function and map location of cereal and ultimately all plant genes.
Use of map-based sequence information to identify and provide markers for agronomicly
significant genes.
The molecular basis of plant growth and development so that fundamental questions in
plant physiology, biochemistry, cell biology, and pathology can be addressed.
The relationship, if any, of genome structure to gene expression.
The primary goal is the complete genome sequence of rice.
The primary activity in the first year will be to prepare and distribute clones for sequencing.
During this period, it is anticipated that the libraries will be quality controlled and that the
clones will be end sequenced and fingerprinted. Subsequent years will be devoted to large-
scale genomic sequencing. The objective is to complete the task in ten years.
The time line below for the first five years indicates that greater than 170 MB of the 430 MB
genome will be sequenced by 2003, that chromosomes 6 and 10 will have been
completed, and the sequencing of chromosomes 1 and 2 will be well underway.
Figure: Five Year Plan: 1998-2003.The purpose of the international collaboration is to accelerate the completion
of this goal.
The International Collaboration is best achieved by sharing materials and technologies and
by the timely release of sequence and related information. To this end, scientists interested
in the genomic sequencing of rice participated in a workshop held in conjunction with the
International Symposium on Plant Molecular Biology in Singapore on September 23, 1997
(ftp://genome1.bio.bnl.gov/pub/maize/rice.html). A Working Group, nominated in
Singapore, met on February 5, 1998 to develop this document.
Membership in the Rice Genome Project:
Any group willing to sequence large stretches of contiguous genomic DNA is welcome to
join the collaborative effort as long as they are willing to following the agreed upon
guidelines. Participants agree to share materials, including libraries, and to the timely release
to public databases of physical mapping information and annotated DNA
sequences. A group must agree to sequence one megabase of DNA per year to maintain
membership. Members agree to declare their sequencing plans and to provide detailed
plans and progress on their respective web pages.
Individual sequencing groups are encouraged to claim large chromosomal regions or entire
chromosomes, if they have the sequencing capacity, to increase the likelihood that entire
chromosomes are completed. Groups may claim chromosomal regions which they agree to
sequence within one to three years.
Post-sequencing activities, such as functional genomics, are beyond the scope of the
International Rice Sequencing Project. Further, the Project does not encompass the cloning
and sequencing of specific rice genes for research purposes or industrial sequencing efforts.
While the International Project will be happy to share information with these individual
efforts, their conduct is beyond the scope of these agreements.
The Rice Genome Working Group:
The Working Group is the body that will make decisions that pertain to the goals, strategies,
and coordination of the collaborative effort. The Working Group will be responsible for
planning the most efficient means of completing the project. Among its responsibilities will
be assigning regions to be sequenced that will avoid duplication and maximize overall
progress.
The Working Group is comprised of representatives of each research group participating in
the International Rice Genome Sequencing Project. As Japan is recognized as having a
leadership role in the Project, the head of the RGP will be the permanent chairman of the
Working Group.
Major policy decisions, including sequencing assignments, will be taken by representatives
from each of the major national groups participating in the Project. Currently, these regional
representatives are Japan, China, Korea, Europe and the U.S.
The Working Group will meet annually in Japan. Interim meetings, as needed, may be held
elsewhere. The meetings will be open to the public. Results of Working Group
meetings will be posted on web sites and published in the RICE GENOME.
Methodology:
The Oryza sativa ssp. japonica cultivar, Nipponbare, also known as GA3, will be sequenced.
Seed from a single plant will be distributed by Dr. Sasaki for the purpose of making
libraries. The primary reasons for choosing this cultivar are that more than 20,000 EST
sequences from the strain have been released to DDBJ and that a physical map based on
YACs that covers over 50% of the genome has been published. Sequencing other cultivars
is strongly discouraged as genetic polymorphisms cannot be distinguished from
sequencing errors. Moreover, groups not sequencing from one of the shared libraries would
not benefit from the associated accumulated knowledge and the other advantages of
collaboration. It is recognized that comparative mapping and sequencing of other rice
subspecies is valuable information that the International Rice Genome Sequencing Project
would like to share. Nevertheless, the primary goal of the Project is the complete sequence
of the genome of a single cultivar.
The RGP will make a PAC library each with a 20-fold genome coverage. Dr. Rod Wing will
make three BAC libraries using partial digests of different enzymes to generate the inserts.
60,000 BAC clones will be isolated to provide a 20-fold coverage of the genome. The
quality of these libraries and their coverage will be verified by hybridizing each with 100
single copy EST probes and the number of clones and their insert size will be measured. It
is expected that inserts will be greater than 120 kb. The number of clones with organellar
DNA and rRNA repeats will also be determined.
The BACs will be fingerprinted for the purposes of preparing contigs and checking the
integrity (deletions or rearrangements) of the clones. The information generated will also be
invaluable where repeated sequences make BAC end sequences ambiguous. In addition,
where there is multi-fold coverage, the assembly program can pick out inserts that have
deletions or rearrangements. Fingerprinting information will be publicly available so that
individual laboratories can verify the quality of the contigs they plan to sequence.
The RGP plans to increase the number of currently mapped ESTs to 8,000 in order to make
their physically map their PAC clones. These mapped ESTs are an unmatched resource in
preparing a physical map as they provide sequence, map location, and direction.
In parallel with fingerprinting, the BAC and PAC clones will be subjected to end-
sequencing. This should provide an STS every 3 to 4 Kb on average and will allow genome
sequencers to pick the clones with minimum overlap.
Accuracy:
The Rice Genome Sequencing Project will serve as a model for all other grasses and cost
about $200M. The sequence will be used by other researchers and will thus be scrutinized.
It is imperative that these resources not be squandered on inaccurate results. In part, this
problem has been addressed by insisting on sequencing DNA from the same cultivar, if not
the same plant, to minimize variation due to genetic polymorphism.
Fingerprinting of multiply overlapping inserts is a means of verifying that the BACs chosen
for sequencing have not been rearranged. Collinearity with the genome should also be
verified by probing restriction enzyme digests of genomic or the appropriate YAC DNA
with the BAC and comparing this with digests of the BAC itself.
The Rice Genome Sequencing Project will adopt the standards of The Human Genome
Project, established at its Bermuda meetings in 1996 and 1997, which has agreed to accept a
standard of less than one error in 10,000 bp. While the level of accuracy is difficult to verify,
this standard is achievable by a combination of high quality shotgun sequence reads, a
seven-fold redundancy, and the insistence 97% of all bases be sequenced on both strands or
two chemistries used. In addition, mimimum error estimation values provided by PHRED
of 75 over protein coding regions and 40 over the remainder of the genome must be
obtained. Further, restriction sites predicted from the sequence must conform to observed
digest patterns.
Sequence Release:
The Rice Genome Sequencing Project agrees to the immediate release of finished, but not
necessarily annotated, sequence in units of intact BAC or PAC inserts. These finished
sequences will conform the accuracy standards described above. Release is submission to a
public database such as DDBJ, EMBO, or GenBank. In keeping with the NHGRI
recommendations, automated release of assemblies greater than 2 Kb to local Web sites is
encouraged.
Annotation:
Members of the Working Group, while recognizing the importance of annotation to the
value of sequence information, view annotation as separate from release of finished
sequence. Each sequencing group is responsible for annotating the sequence they
contribute. A uniform standard of annotation has been agreed upon that checks the integrity
of the sequence, assigns and identifies regions of homologies, delineates potential open
reading frames, and names and indicates the beginnings and ends of genes. Common
annotation software will be adopted. The annotator must state:
Whether coding sequences and splice sites where determined experimentally or by using
software. It is recognized that the use of published cDNA sequences greatly facilitates this
task.
If gaps cannot be closed, the method of sizing and the reasons for not closing must be
stated.
Exact details on how adjacent BACs or PACs were assembled with a minimum overlap of
100 bp should also be stated.
It is hoped that annotation will be expanded to include recognition of genetic markers, ESTs,
known genes, and syntenic regions. An annotation workshop is projected for the Working
Group meetings.
Rice Genome Database:
An integrated database established in Japan will facilitate collaboration, coordinate
sequencing work, and provide methods for submitting, using, and sharing information.
Sequences will be released to one of the public databases, DDBJ, EMBO, or GenBank. The
Rice Genome Database will pick up new submissions from the public databases. The
Database will store and manage the annotation information. Each participant will maintain a
Web site with a standardized format that describes work in progress and sequences
completed. The Database will be linked with the Web sites of each of the Projects
participating laboratories and thus be able to maintain a registry of clones being sequenced,
monitor progress, and coordinate activities. The database will also be linked with sites that
are providing finger printing information and end sequences. With ever expanding
databases, annotation is never complete. It may be advisable to assign the task of periodic
update of the annotation of rice genomic sequence to the Rice Genome Database.
The larger goals for the Project envision the use of sequence information to provide
biological lessons for rice and other cereals. The Rice Genome Database is a means for
linking all genomic information related to rice DNA sequence. This information comes
from existing genomic databases and from work that derives from DNA sequencing, such
as determination of gene function. The Rice Genome Database will thus be linked with
other rice and cereal databases and to international groups that will be learning about the
function of rice and other cereal genes.
Outreach:
To be successful, this large sequencing effort needs the broad support of scientists working
on rice and other cereals who will be the potential end users of the sequence information.
Ultimately, it is the public at large who supports the project and steps at public education
should be undertaken. They must believe that the project is worthwhile, that is well-
organized and credible. There are a number of ways that The Rice Genome Sequencing
Project will attempt to engender this support:
Timely release of finished, annotated sequence blocks as well as the availability of mapped
BACs and YACs.
RICE GENEOME will report the results from the Working Group meetings as well as
news of the Project.
Internet access to The Rice Genome Database will engender awareness and utility of the
Project.
Publications from participating sequencing laboratories should acknowledge that they are
part of the Project.
Benjamin Burr
Biology Department
Brookhaven National Laboratory
Upton, NY 11973, USA
e-mail: burr@sun2.bnl.gov
A Report from Miami Beach, September 1998:
Interim Working Group Meeting
Funding: Japan: Takuji Sasaki reports that his group has received funding of US$10M
for 10 years subject to review.
France: Michel Delseny reports that initial work to prepare sequence ready
BACs will begin in his lab in Perpignan and in Alain Ghesquiere's and Jean Christophe
Glaszmann's labs at ORSTOM/CIRAD in Montpellier, and that the work will move to
Genoscope in Evry for sequencing.
Taiwan: Yue-ie Hsing reports that the National Science Council, Academia
Sinica and the Education Ministry will support rice genome sequencing at 3 to 5 sites with 2
to 4 PIs in each site.
Korea: Moo Young Eun reports that they have applied the 1st 5-year(total 10-
year) special budget for rice sequencing. from their government. For FY99 this would
support 1.2 Mb of sequencing and new automated sequencers. The proposal passed several
reviewing steps and is in final evaluation period. He is optimistic. In the meantime, they have
been conducting pilot sequencing with their BAC clones, and will begin sequencing
Nipponbare as soon as the requested libraries arrive.
Canada: Tom Bureau at McGill University has received a Provincial equipment
grant to establish a sequencing facility from which he hopes to launch a Canadian rice
genome sequencing program.
China: Guofan Hong reports that his group receives funds from the State
Commission of Science and Technology, the Chinese Academy of Sciences and the
Shanghai Municipal Government to support the current project of sequencing chromosome
4 of the rice genome at the rate set in the International Rice Genome Sequencing Areeement.
The funding period is 5 years and may be renewed based on review.
US: Rod Wing (Clemon University) received funding from the Rockefeller
Foundation to prepare BAC libraries. He has also received funding from Novartis to do
BAC end sequencing and fingerprinting of the BACs. All of these results are being made
publicly available. Machi Dilworth (NSF) reported that an interagency program
announcement for sequencing rice, with USDA as the lead agency, would be published
early in the 1999 fiscal year beginning October 1.
IRRI: While strictly not part of the Rice Genome Sequencing Program, it is
important to note that Hei Leung has begun a functional genomics program based on
deletion mutagenesis. He has used primarily fast neutrons and diepoxybutane to generate
M3 populations with mutation rates of 1/1000 per locus and has obtained preliminary
evidence supporting this work at the Xa21 locus conditioning bacterial blight resistance.
Progress: Japanese Rice Genome Research Program:
Tomoya Baba reported that the RGP mapped 2000 ESTs on a YAC map of the rice genome
previously constructed with 2275 DNA rice genome markers. They achieved 306
Mb of total coverage with 82% of chromosome 6 and 74% of chromosome 1 covered. Baba
pointed out that particularly on chromosome 6, the gaps corresponded to regions of the map
that lacked DNA markers.
The RGP constructed PAC libraries with the pCYPAC2 vector from Pieter de Jong. The
vector employs the SacB gene for positive selection of inserts. They used partial Sau3AI
digests to prepare their library which comprises 71,000 clones. The mean insert size is
112kb with 18% under 80 kb and 11% chloroplast contamination. The RGP is using a
combination of restriction digests and hybridization or PCR of ESTs to establish contigs.
They plan to obtain PAC end sequences using tail PCR. A second PAC library made with
partial MboI digests of genomic DNA is now under construction.
Jo Messing:
Jo presented an overview of the International Agreement for rice genomic sequencing with
specific emphasis on methodology. In particular, BAC end sequencing and shot gun
sequencing. He illustrated his talk with assembled sequences from syntenous regions of
sorgum and maize. He pointed out that it was important to have long, single pass reads from
both ends of a random clone as distance is very important in assembling repeated sequences.
The LTR regions of some retrotransposons that his group discovered exceeded 4kb, and
these repeated sequences could not have been assembled without this protocol. In
summarizing his talk, Jo showed how sequencing from individual BAC clones was both
important and necessary.
Jo is collaborating with the Clemson group. He confirmed quality checks of the
Nipponbare BAC library with BAC end-sequencing (see below). He also selected BAC
clones generated by Rod Wing's group from japonic and indica with an orthologous probe
from maize for a joint DNA sequencing project to compare sequence differences between
the two subspecies.
Rod Wing:
Rod described the work of the Clemson Genomics Institute. A BAC library prepared with
partial HindIII digest of Nipponbare DNA has produced a library of 36, 864 clones with an
average insert size of 128.5 kb of which 8.2% have inserts less than 80kb. The library
represents a greater than 10 fold coverage of the genome. 2.25% of the inserts appear to be
chloroplast DNA and have been removed from the library. The library was probed with 12
RFLP single-copy markers and each probe produced from 2 to 21 hits. An EcoRI partial
library is also being constructed.
The BAC libraries will be end sequenced with a goal of obtaining 148,000 ends. Currently
13,932 reads have been obtained with an average read length of 380 bp. These sequences
are released monthly. 8432 sequences are their BLAST hits are currently posted on the
Genome Institutes web pages, which also provide for searching by sequence.
Rod's colleague, Ralph Dean, spoke about finger printing the BACs. Their goal is to
fingerprint and assemble 74,000 BACs. They are currently proving the technology on the
rice blast fungal pathogen.
Dick McCombie:
Dick spoke about lessons learned from the Arabidopsis sequencing project with specific
emphasis on the usefulness of a physical map based on BACs. He pointed out that the
current cost of obtaining finished sequence is US$ 0.4 to 0.5 per base with the possibility
that this might drop to US$ 0.3. Thus the cost of the rice sequencing project will be at least
US$150M. The rice sequence will provide value in the form of rice improvement, the
identity of almost all cereal genes, and a reference for larger cereal genomes based on
comparative genomics. In order for this value to be realized, the sequence needs to be
accurate, complete, and correlated with genetic markers and maps.
The group at Washington University headed by John McPherson and Marco Marra has
undertaken a whole genome approach to fingerprinting Arabidopsis. Mapped markers or
ESTs are used to anchor the physical map to the genetic and cytogenetic maps. The group
has fingerprinted BACs from two libraries representing an 18-fold coverage of the genome.
17,000 clones have been assembled into 394 contigs. Dick pointed out that the automated
assembly gives you bins (although they need to be checked) but does not give you precise
order information without manual editing at the rate of 15 Mb per month for an experienced
person. Currently, 75 contigs (comprising 13,000 clones) have been edited and cover 81
Mb.
Dick estimated that it would cost about US$ 2M to map the rice genome and that it would
take an experienced team two years to complete the task. Dick pointed out that recent
advances in mapping technology and strategies allow the task to be done quickly for a
fraction of the former cost. In addition to greatly aiding sequencing, the map is likely to pay
for itself as a tool for positional cloning prior to the completion of the sequence.
Proposal for a whole genome approach to rice sequencing:
Ian Bancroft and Mike Bevan submitted "A whole-genome approach to sequencing the rice
genome" for discussion at this meeting. In their absence, the proposal was outlined by
George Murphy.
While a number of people found the proposal attractive for larger genomes, there was little
support for this proposal for rice. Basically, the question was do you want to discover the
most genes in the shortest possible time or to you want to proceed most efficiently toward a
complete reference sequence? The group assembled in Miami was decidedly of the second
opinion. In this context, there were three areas that presented problems:
1) A number of people said that you needed sequence plus position for rice as a model
organism and they didn't believe that the proposed outline would lead to map-based
sequence.
2) If, as outlined in the proposal, most of the genes can be found in 20% of the DNA, it was
not obvious how you would use this method to obtain the complete sequence.
3) The principal assumptions, that most of the genes can be discovered from a narrow range
of buoyant density and that whole-genome shotgun reads can be assembled, are untested.
Attendants suggested two preliminary experiments. The first was to prepare clones from the
"gene-space DNA" as well as from the whole genome, and examine the frequency of
homologous hits in sample single pass reads from both sets of clones. The second was to
assemble the random single pass reads, but not the finishing reads, from the extensive
contiguous sequence already obtained. For example, the 2 Mb that the European
consortium has obtained from Arabidopsis would represent a best case scenario.
The meeting closed with the announcement that the next working group meeting would take
place in Tsukuba, Japan during the week of February 7-10, 1999.
Benjamin Burr
Biology Department
Brookhaven National Laboratory
Upton, NY 11973, USA
e-mail: burr@sun2.bnl.gov
Rice Genome Sequencing in RGP
Rice genome sequencing in RGP depends primarily on the construction of
sequence-ready P1-derived artificial chromosome (PAC) contigs. To construct contigs, first
we screen our PAC library by Sequence-Tagged Site (STS) markers generated by
restriction length polymorphism (RFLP) or expressed sequence tag (EST) markers.
Among 2275 RFLP markers on the linkage map, about half might be converted to
unambiguous STS markers. EST markers are generated by mapping cDNAs on a yeast
artificial chromosome (YAC) physical map by using 3'-UTR as a source of specific PCR
primers and 7000 YACs as templates for the polymerase chain reaction (PCR). So far,
about 3000 ESTs have been located on YACs being assigned positions along the linkage
map. RGP will sequence rice chromosomes 1 and 6 as the first targets, and EST mapping
will be done mainly for these chromosomes.
Our PAC library is composed of 72 000 clones created by using pCYPAC2 as a
vector for Sau3AI digestion of rice DNAs. The average insert size is 120kb. Contamination
by chloroplast DNA is about 11%. A preliminary screening of 20000 clones by several
random EST markers revealed no failure to identify positive clones by any markers.
To complete a sequence-ready PAC physical map, additional tools other than EST
or STS will be required. End sequences of PACs are used to assign overlapping sequences
within adjacent PACs to get the most efficient sequencing and, more importantly, to provide
PCR primers for searching for new PACs carrying the same end sequence. Even by using
more than 10000 ESTs, we might not complete a sequence-ready PAC physical map using
only the ESTs because of the expected existence of an uneven distribution of genes within
the rice genome. So, primer walking will have to be used to complement the lack of EST
markers.
Subcloning by the shotgun method is used to prepare templates for sequencing.
Two types of shotgun library are constructed, with average insert sizes of 2kb and 5kb. The
former library is used usually for sequencing after PCR amplification; the latter is used
mainly to fill gaps between the sequence contigs obtained with the former. At present, only
ABI377 sequencers are used, but in the near future a newly developed sequencer with
capillaries will be introduced. For sequence assembly, PHRED-PHRAP software, developed by
PhillipGreen, is used, and attaining a quality score of more than 70 is the challenge. At the
moment, RGP sequences 1 PAC (100_150kb) every two weeks with about 5000 sequence reads.
The edited sequence is swiftly released to a public database (DDBJ), and an annotation is
made on our web page, <http://www.staff.or.jp>.
Takuji Sasaki, Yoshiaki Nagamura, Tomoya Baba, Kimiko Yamamoto, Takashi Matsumoto,
Katsumi Sakata
RGP, NIAR/STAFF
1-2, Kannondai 2-chome
Tsukuba, Ibaraki 305-8602, Japan
e-mail: tsasaki@nias.affrc.go.jp
Status of the Sequencing Project in China
BAC library of O.sativa Spp. Indica Guang Lu Ai 4 was constructed early
in 1994 in National Center for Gene Research, the Chinese Academy of Sciences,Shanghai
(Cell Research,1994,4, 127-133). The library constructed was of approx. 8 equivalents of
the rice genome and has been proved to be genetically stable. Based on the library, the BAC
contigs of various length were constructed in 1997 by combination of fingerprinting and
DNA marker anchoring, the contigs constructed were of approx. 92% coverage of the rice
genome (Journal of Sequencing & Mapping.1997, 319-335). The contigs with mapped
markers were assigned to and ordered along the chromosomes, based on the order of
markers used. The gaps between contigs were supposed to be caused either by the lack of
gap-bridging clones in the BAC library used in the project or by failing to identify the
gap-bridging clones due to the high threshold values of tolerance and match probability cut
off, set in the mapping program. Gaps are planned to be closed up by, for instance, contig
end clone walking. There were contigs of yet unknown assignment, which would be
determined by FISH. Genomic sequencing of chromosome 4 was initiated in 1998. The
sequencing strategy we have adopted consists of the following: (1) BAC tiling to identify
clones with minimum overlaps;(2) subcloning of BACs identified; (3) building up
scaffolding;(4) filling up gaps in scaffolding and (5) primer walking to close up sequence
gaps. Tiles of BACs derived from contigs to be sequenced are first established, sequencing
of which is being performed by shot gun strategy followed by "semi-random" approach
and primer walking. To test the reliability of the approach applied, analysis of the sequence
obtained was done, which showed that the depth of the scaffolding formed by sequencing at
random was as thick as of 22 clones on average with average eight folds sequence
redundancy. The current sequencing rate is approx. 5 Mb a year. This is a renewable
multi-year project funded jointly by Ministry of Science & Technology, Chinese Academy
of Sciences and Shanghai Commission of Science and Technology.
Guofan Hong
National Center for Gene Research
Chinese Academy of Sciences
500 CaoBao Road, Shanghai 200233, China
e-mail: gfhong@newnetra.ncgr.ac.cn
Korea Rice Genome Research Program
The Korea Rice Genome Research Program (KRGRP) was established at National
Institute of Agricultural Science and Technology (NIAST) in September, 1994, and
KRGRP has been focused on the construction of molecular map and map-based cloning,
molecular breeding in practice and transformation, and large-scale cDNA sequencing and
data-base construction.
The 'Milyang 23/Gihobyeo RI population' which consists of 164 F14 lines has been
developed by single seed descent, and 'NIAST map', an international reference map of rice
in RiceGenes, has been constructed using this population. Using this molecular map, yearly
and regional variations of 30 quantitative trait loci (QTLs) of agronomic traits, such as yield
and yield components have been analyzed with three year data('95-97), and will be
continued to replicate over years for tagging QTLs.
Partial nucleotide sequences of about 8,000 individual cDNAs derived from
Milyang 23 emmature seeds have been characterized and analyzed the sequence similarity
with the database. So far, over 6,000 partial-sequenced cDNA clones and 175 full-length
cDNA clones have been registered to EMBL and GenBank database. The Korea Rice
Genome Network Server (http://krgrp.niast.go.kr and http://bioserver.myongji.ac.kr) is
maintained to provide the information of sequences, their homologies and alignments of the
identified genes of the ESTs.
The complete sequencing of rice genome in IRGSP will lead to even more efficient
identification and manipulation of traits. KRGRP is actively participating in the
international collaborative effort and will sequence DNA clones on chromosome 1, as we
proposed at the 1st Working Group Meeting in Tsukuba, on February 5, 1998.
Moo Young Eun
National Institute of Agricultural Science and Technology
Suwon, 441-707, Korea
e-mail: myeun@niast.go.kr
France Targets Chromosome 12
A few French groups (namely at the CIRAD and ORSTOM (recently renamed
IRD) in Montpellier have long been working on rice breeding and genetics, essentially for
their oversea territories, their own production in Camargue and as part of their contribution
to international cooperation with developping countries. These groups have numerous
collaboration in East and South East Asia as well as in Africa and they have participated
during the recent years to the Rockefeller Foundation Rice Biotechnology Network.
When it was decided, at the international level to start organising an International
Rice Genome Sequencing Project, we feeled that our country should participate in such a
project. In the absence of any short term plan for an European initiative and because a new
large scale national sequencing facility was established in Evry near Paris at Genoscope,
we propose to sequence rice chromosome 12 and successfully apply for a first phase
project in collaboration with Genoscope. There are several reasons for chosing this
chromosome: the first one is that previous mapping work was carried out on this
chromosome in Montpellier, where our ORSTOM colleagues have mapped a series of
disease and pest resistance genes including Rice Yellow Mottle Virus and Magnaporthe
grisea, on the mapping population IR 64 X Azucena. The second reason is that this
chromosome is homologous to part of wheat chromosome 5 which is of great interest for
French wheat breeders.
This first phase of the project is to rapidly establish one or two Nipponbare BAC
contigs with minimum overlaping of approximately one Mbp so that sequencing can start
in Genoscope as soon as possible, most likely in July or August. During this phase of the
project essentially four labs will be involved with a coordination in Perpignan: for
physical mapping most of the work will be carried out in Perpignan (Michel Delseny) and
in IRD lab (Alain Ghesqui_re ) in Montpellier. BAC libraries are maintained in CIRAD
(Jean Chrsitophe Glaszmann) next building to IRD in Montpellier. Most of the sequencing
will be done in Paris by Genoscope. Due to the availability of a short term training
fellowship we could start the project by mid of November, before receiving any fund, after
collecting filters from Rod Wing at the Miami meeting and collecting a number of probes
already available in our different labs. Most of them were previously obtained by
ORSTOM from RGP and Cornell programmes. We are in the process of expending this
collection of chromosome 12 specific probes by requesting additional ones from both
groups.
Strategy used is rather straightworward: we hybridize chromosome 12 probes on
the high density filters given by Rod Wing. The positive clones are evaluated by HindIII
fingerprinting to build up contigs. All the probes are sequenced, at least in part, to check
their identity when it is known or to compare with EST and BAC end database when the
probe is a genomic clone which has not yet been sequenced. So far 6 probes were used and
a few BACs have been collected for establishing the fingerprint technique. Two of the
probes detected overlapping BACs. We also sequenced a few of the ends in order to use
them as probes to extend this first contig. One of the ends corresponds to a repetitive
sequence which is already present several times in the CUGI database illustating the
difficulties which might arise in a complete shotgun sequencing strategy. Further control
will be made in the future to verify collinearity between BAC contigs and corresponding
genomic DNA.
We should receive some money to hire two people sometimes in January or
February so that the project can accelerate in March. This initial project ,which is aproved
for 1999 will be followed by a second phase which should start by the end of 1999-early
2000 which will consist in more massive sequencing when we have selected a large number
of entry points in the physical map and when most of the BAC end sequencing will be
completed in Clemson. In order this project to be developedk, new proposals should be
submitted and approved by Genoscope Scientific Council by the end of 1999. An important
point is that by that time most of the force devoted right now to Arabidopsis programme at
Genoscope can be shifted to rice. We hope that this first commitment in sequencing rice
genome will be confirmed and that our country could contribute to the progress of the Rice
Genome Sequencing Initiative. After 1999 we anticipate that most of the sequencing work
will be carried out in Evry and that our task will be shifted towards annotation,
characterisation of cognate cDNA and return to biological problems with genomic tools at
hand. Genoscope, like other large scale sequencing centres applies the Bermudes regulatory
rules consisting in immediate release of pre-finish sequences so that most of the sequence is
immediately available on-line to the community .
Meanwhile we set up the Nipponbare project, a new BAC library has been prepared
from Azucena by our colleagues from CIRAD and INRA in collaboration with Rod Wing
in Clemson and this will complement the IR 64 library existing at the IRRI.
Michel Delseny
Lab Plant Physiology and Molecular Biology
UMR 5545 CNRS/ University of Perpignan
66860 Perpignan cedex, France
e-mail :delseny@univ-perp.fr
Status of the Rice Sequencing Project in Taiwan
As indicated earlier, we are interested in collaborating closely with laboratories
involved in the International Rice Genome Sequencing Project. Taiwan will join the project
and we propose to sequence chromosome 5.
One central laboratory in the Institute of Botany, Academia Sinica will conduct a
pilot sequencing work starting early spring of 1999 for about six months. Four PIs will
participate in the program, including one molecular biologist, one plant physiologist, one
biochemist, and one geneticist. National Science Council will provide funding for the pilot
work. If the pilot work is satisfactory, subsequent fundings may be awarded. Currently,
there is a program on structural and functional genomics of human hepatoma in our country,
and we have agreed to share experience and technology concerning genome sequencing
between the rice and human groups.
The home page of this lab's web site is now under construction, and the temporary
URL address is http://biometrics.sinica.edu.tw/genome. This lab agrees to an immediate
release of sequences upon completion. Please contact with Yue-ie Hsing at
bohsing@ccvax.sinica.edu.tw for any further questions.
Yue-ie Hsing
Institute of Botany
Academia Sinica
Taipei, Taiwan, ROC
e-mail:bohsing@ccvax.sinica.edu.twRICE GENOME RESEARCH PROGRAM (RGP) HOME PAGE
webmaster@staff.or.jp Copyright (C) The International Rice Genome Sequencing Project (IRGSP). 2005 All rights reserved.