1999 Januay Issue

Contents

Rice Genome Sequencing as a Gold Mining for All

The promise of the complete sequence of the rice genome in ten years offers great potential for rice as an agronomic and model plant. There have already been important advances in rice genomics: The germplasms of Oryza sativa and related species have been widely collected and surveyed. There are abundant mapped molecular markers that provide convenient linkage with important phenotypes. A physical map of YACs covers most of the genome. Expressed genes from a number of tissues are known through large scale cDNA analyses. Databases linking this information are available through the Internet. These tools make it possible to identify a gene corresponding to a phenotype. However, gene identification must still be done on a gene-by-gene basis and only a fraction of the genes can be discovered this way. The rice genome is expected to contain more than 20,000 genes expressed in different parts of the plant or in response to different stimuli. Genome sequencing will provide a more direct route to decoding the information content of the genome. The complete genomic sequence is a bridge to understanding the rice plant as the result of the expression of an assemblage of genes. The availability of the complete sequence of the rice genome will form the basis for understanding the function and time of expression of most of the genes. Knowing the order of the sequence permits association of candidate genes and phenotypes. It is the means to more effectively modify the plant and its responses to the environment. All cereals exhibit remarkable syntenic relationships to each other. The rice nuclear genome, which contains about 400 Mb of DNA, is the smallest of the cereal grasses. A complete and accurate map-based rice genomic sequence is the key to deciphering the genomes of the other cereal grasses whose genomes are too large to sequence. Realizing the small size of the rice genome and the key role it could play as a model cereal, a number of scientists began thinking about the possibility of sequencing rice. A workshop was held in Singapore in September 1997 to see if there was interest in an international collaborative project. Based on the positive outcome of that meeting, representatives of Japan, the United States, the European Union, China, and Korea met in Tsukuba, Japan, in February 1998 to formalize the principals of an international collaboration. The major points agreed upon were the use of a single germplasm, sharing of materials, immediate sequence release, accuracy standards, and mechanisms for coordinating the work. The International Rice Genome Sequencing Project is comprised of all the laboratories that agree to principles of the Tsukuba meeting. Scientists engaged in the Project and representing each of the participating countries form the Working Group that jointly administers the Project. In addition to the original five countries named above, France, Taiwan, and Singapore will soon join the Project. The expanded participation can be expected to accelerate the speed of progress and increase the benefits to all members involved. Only with intimate cooperation of many nations can a complete, high-quality rice genome sequence be obtained and the promise be kept. The rice community and other cereal scientists do not need to wait ten years to begin to reap the benefits of this promise. A detailed physical map will permit map and more complete knowledge of syntenic relationships will provide tools for map-based cloning. Incremental genome sequences will provide an ever-expanding resource for gene discovery. The database that supports the Project will provide access to this information as well and a means of coordinating the sequencing effort. This Newsletter attempts to serve two functions: It is a means of fostering communication between laboratories engaged in the International Rice Genome Sequencing Project. It is also our way of reaching out to the broader community to describe our progress and to publicize our plans. The current issue contains the 1998 Tsukuba agreement for international cooperation, reports of previous meetings, current progress, and plans for the next Working Group meeting. We welcome your comments and contributions to future issues. Takuji Sasaki, RGP, NIAR/STAFF Benjamin Burr, Brookhaven National Laboratory

The Formation of an International Rice Genome Sequencing Project

The International Rice Genome Project would not have been possible without the enthusiastic support of the scientific leader of the Rice Genome Research Program (RGP) in Japan, Dr. Takuji Sasaki. Basically, RGP came to the believe that there was a greater advantage to rather form an international consortium than to sequence the entire rice genome by themselves. There are two strong arguments for an international consortium. First, a whole-genome sequencing project of more than 400 million base pairs conducted by several sequencing centers will be done faster. Second, the quality of the data is likely to be better if DNA sequences are assembled and annotated by more than one group. However, these benefits are only realized if the international effort is built on all the data and materials that have already been assembled primarily by the RGP in Japan. One of the great attraction to choosing rice for a whole-genome sequencing effort is the high density physical map and the EST data that has already been produced. Furthermore, participation of several countries and sequencing centers will require speedy distribution of data and materials by all participants. It will require both coordination and management of sequencing target sites and data processing. In addition to the multitasking of the technical aspects of the project, a grass roots approach has to be initiated to bring rice genetics and breeding to its full research potential. Despite the role of the rice genome as a reference genome, plant biologists around the world and in particular those already working with rice have to rally around rice as focal research point so that the DNA sequence data can be readily leveraged in the public domain. Given these many considerations for the preparation of an international effort to sequence the rice genome, it became imperative to conduct a series of meetings with plant biologists of different expertise and interest and to develop a position paper on the basic framework for such a project. In addition, different local governments and their agencies had to be persuaded to allocate the necessary resources for the participating sequencing centers. Here is the chronology of this first round of meetings and the outcome from each of them. Spring of 1997. The Japanese Rice Genome Project is under review for a five-year renewal on April 1, 1998, and for the first time a proposal was submitted to the Japanese government for the complete sequencing of the entire rice genome over ten years. April 97. In the US, Senator Bond of Missouri introduced a bill to provide $ 40 million annually for a NSF Plant Genome Initiative (PGI). The Office of Science and Technology Policy (OSTP) assembles an Interagency Working Group (IWG) consisting of representatives of NIH, NSF, DOE, and USDA and chaired by the corn geneticist Ronald Phillips to develop a scientific plan for such an initiative. June 97. A colloquium of the US National Academy of Sciences on Food Supply and the Role of Genome Projects was held at the Beckman Center in Irvine, CA. The organizers of the colloquium seized the opportunity to call on the participants to discuss the scope of a US PGI with participants from other countries. The underlying congressional activities in the US were explained by members of the IWG. It was recognized that if another genome besides Arabidopsis is going to be sequenced on the whole genome level, it should be the rice genome. Although rice was not a major crop in the US, there would be great value of an international format and a contribution by the US. June 97. Participants of the colloquium attending the Gordon Conference on Plant Genetics and Development a week afterwards, brought the subject of the PGI to the floor of the conference for additional input. A subgroup consisting of Jeff Bennetzen, Joe Ecker, Michael Gale, Jo Messing, Ron Phillips, Takuji Sasaki, and Satoshi Tabata met to discuss the feasibility of an international rice genome project. When Takuji Sasaki, the leader of the RGP in Japan, indicated support and willingness to share all information and material, a strong consensus developed that an international rice genome project could be modelled after the Arabidopsis genome project. Michael Gale was asked by the group to develop an agenda for the International Plant Molecular Biology meeting in Singapore to have a broader discussion with the scientific community. September 97. Ben Burr and Michael Gale chaired a Rockefeller-sponsored workshop on the feasibility of an international format to sequence the rice genome at the International Plant Molecular Biology Conference in Singapore. Several hundred people attended and with broad input four key points were established. 1) Participants agreed to participate in an international collaboration that includes the sharing of clones and the timely release of sequence and mapping information. The Nipponbare variety used to produce STS maps, cDNA sequences, and YAC clones was selected as a single DNA source. 2) PAC and BAC libraries for sequencing would be simultaneously constructed in Japan and the U.S. 3) PAC and BAC-end sequencing adopted from the human genome project would be applied to the libraries to create a database complemented by DNA fingerprinting. 4) A genome working group was nominated to develop a position paper and to coordinate the international effort in annual meetings at the Japanese site.A meeting report was posted on the WEB and further discussed in preparation for the rice genome forum and workshop in Japan in February 98. October 97. US congress appropriates $40 million to NSF for a PGI on significant crops. The IAW meets with different constituencies to prepare a report to OSTP on guidelines for a PGI. January 98. The IAW report recommends a US participation in an international effort to sequence the rice genome with a five-year, $8 million per year budget. The Rice Genome Project in Japan gets renewed for another 5-year term with $10 million per year. China and Korea each support one rice genome sequencing center. February 98. The rice genome working group, Takuji Sasaki (Japan), Guofan Hong (China), Michael Bevan (EU), Moo Young Eun (Korea), Jo Messing (U.S.), and Ben Burr representing the Rockefeller Foundation met to finalize a position paper. Methodologies, data release policies, scientific standards, time frames, and funding sources have been summarized and published on the WEB: ftp://genome1.bio.bnl.gov/pub/maize/RiceProject.html. The group also met with Japanese officials to explain the importance and the value of international collaborations. April 98. Ben Burr and Ralph Quatrano organized a workshop to present to the U.S. scientific community the results of the working group's meeting in Tsukuba. Additional discussion on rice biology and its suitability of a model organism was presented to NSF, USDA, and DOE officials as well. For the first time a representative from France indicated an interest to participate. The Rockefeller Foundation supported the construction of a Nipponbare BAC library and Novartis announced the support of a public BAC-end sequence database. e-mail:bohsing@ccvax.sinica.edu.tw May 98. The USDA organized a meeting on a Food genome initiative. It indicated its intention to coordinate with NSF a genome initiative that seeks to address problems in agriculture that cannot be met under the scope of the NSF PGI alone. A diverse group of scientists discussed the value of different aspects of plant, animal, and microbial genome projects. One of the foci of an USDA PGI is to support a rice genome sequencing effort. July 98. A public hearing at USDA was conducted to hear stakeholders on a Food Genome Initiative. Again, only rice was proposed as a whole genome project and it was recognized as a model crop for applied and basic plant genetics. September 98. At the International Symposium on Rice Germplasm Evaluation and Enhancement on the occasion of the dedication of the new U.S. rice germplasm facility in Stuttgart, Arkansas, rice breeders and geneticists from abroad and the U.S. were presented with the outline of an international rice genome project. The importance of integrating the sequencing effort with the ongoing work in the rice research community was highlighted. Participants were enthusiastic about the potential of the project for rice research and its public format. September 98. At the International Genome Sequencing and Analysis Conference in Miami, a semiannual meeting of the rice genome working group reaffirmed cooperation and strategies for the International Rice Genome Project. Progress on the construction of PAC and BAC libraries, BAC-end sequencing and DNA fingerprinting data were reported. A total genome shotgun sequencing approach proposed by the John Innes Centre was discussed as a possibility, but was found to be unsuitable for an international project and raised the concern that accuracy and completeness would be jeopardized. Although NSF decided not to fund the rice genome project in its first year of the US PGI, representatives from NSF announced the preparation of a new triagency program, similar to the one for Arabidopsis, solely for the rice genome sequencing effort. It is clear that the delay will require a change in the current project schedule (Figure 1). Joachim Messing, Waksman Institute, Rutgers University, USA messing@waksman.Rutgers.EDU

A Report from Singapore, September 1997: An International Collaboration to Sequence the Rice Genome

There is strong interest among many cereal biologists to sequence the rice genome. Given its relatively small size, it is a feasible undertaking given present technology. Nevertheless, with a genome size of approximately 400 Mb, the task is so great that it is unlikely that any one country can devote the resources to sequence the rice genome in the next ten years. In any case, an international effort will accelerate the process and insure public access to the data. Such a collaboration will greatly benefit from tools that have already been developed in key laboratories working on rice genomics and from parallel collaborative efforts on other genomes. The purpose of this document is to summarize the current state of an international collaboration to sequence the rice genome and to form the basis for future decisions. Members of the Working Group solicit your comments and suggestions. Why sequence rice? Rice as a model cereal Genome size: Oryza sativa ssp. japonica is reported to have a 2C value of 0.88 pg, three times the size of the Arabidopsis thaliana genome. The predicted gene density is one gene every 15 kb. As such, rice has the smallest genome of the major cereals. Well-mapped genome: The rice molecular map with over 2300 markers has already been useful in helping align physical maps. Over 30,000 ESTs have been reported and many are mapped. At least 200 mapped SSTs have been published. A YAC library has been fingerprinted and ordered with mapped markers currently covers 52% of the rice genome. Several BAC libraries have been described. A recent report suggests that 92% of the genome is covered by ordered BACs in many contigs. Molecular genetics: With the introduction of new methods for Agrobacterium tumefaciens transformation, rice is the easiest of all cereal plants to transform genetically. This tool permits geneticists to complement mutations or confer dominant phenotypes to verify gene function. Synteny While grass genomes differ markedly in the amount of total DNA, they share a common set of genes. Recent work indicates that the grass genomes - wheat, rye, barley, maize, sorghum, millet, and rice - have similar genetic maps over large blocks of the chromosomes. When examined in detail, local gene order has been found to be preserved, but the genes are separated by greater amounts of repetitive DNA - mostly retrotransposons - in the species with larger genomes. This syntenic relationship can be exploited, for instance, by geneticists who are interested in the map-based cloning of a gene controlling chromosome pairing which has been mapped in wheat but are faced with dealing with a genome 37 times larger than that of rice. By selecting tightly linked single copy makers in wheat, the wheat geneticists will be able to screen the homologous region in rice for their candidate homologue. The syntenic relations can be exploited in the other direction as well. For example, mapping data can be taken from maize where there is extensive work in both transmission and molecular genetics to predict the location of a homologue in rice. Commercial Value Rice, wheat, and maize account for approximately half of the world's food production. Rice itself is the principal food of half of the world's population. Over the last 30 years world rice production has doubled as the result of the introduction of new varieties and improved technology. However, the annual rate of rice production has slowed to the point that it is no longer keeping pace with the growth in the number of consumers. Rice production in the next fifty years faces even greater challenges. On the one hand, with a larger and more affluent population there will be greater demands for higher production and better quality rice. On the other hand, the same constraints mean that there will be less land, water, and labor to produce the crop. In short, there will be great demands on biotechnology to improve rice production. Map-based sequence information: The objective of plant breeding is the selection of favorable combinations of genes. In recent years, plant breeding has been enhanced by molecular marker technology that permits one to screen larger populations with less progeny testing. Knowledge of the location of all genes in a genome extends molecular marker technology because it becomes possible to identify candidate genes controlling specific traits. The genes then become the markers and the process becomes more accurate and more efficient. For example, knowing the location and sequence of candidate genes makes it possible to design allele specific markers which readily lend themselves to automation. Models International alliances formed to sequence yeast, C. elegans, humans, and Arabidopsis provide examples of how to manage an international collaboration to sequence rice. These previously established efforts will provide examples for the present endeavor. It can be noted, that government agencies are fortunately already familiar with the writing and approval of memoranda of agreement in this area. Some lessons we can learn from other genome efforts are: 1) Shared tools and information. In other projects, it has proven useful for all groups to have access to and work from the same few libraries . All data - physical mapping information and sequences - should be released in a timely fashion. These are principles that the participants in the rice genome sequencing collaboration have already agreed to. 2) Scientists initiate the collaboration. Scientific rather than political decisions should dictate the specifics of the collaboration. Individual sequencing projects will be funded nationally, locally managed, and subject to oversight of their respective funding agencies. Nevertheless, a system of peer oversight should guide these projects. 3) Sequencing should be done in the most efficient manner based on the science. The effort should not be diluted by peripheral projects. Rice Genome Workshop On September 23, 1997, scientists interested in the genomic sequencing of rice met to participate in a workshop held in conjunction with the International Symposium on Plant Molecular Biology in Singapore. The meeting was chaired by Ben Burr, Brookhaven National Laboratory, NY, USA, and Mike Gale, John Innes Centre, Norwich, UK. The participants who spoke were: Dr. Takuji Sasaki, Rice Genome Program, NIAR/STAFF, Tsukuba, Japan Dr. Moo Young Eun, National Institute of Agricultural Science and Technology, Suweon, Korea Dr. Rod Wing, Clemson University, Clemson, SC, USA Dr. Guo-liang Wang, Institute of Molecular Agrobiology, Singapore Dr. Michael Roberts, John Innes Centre, Norwich, UK Dr. John McPherson, Washington University, St. Louis, MO, USA Dr. Jo Messing, Waksman Institute, Rutgers University, Piscataway, NJ, USA Dr. Andy Pereira, CPRP-DLO, Wageningen, The Netherlands Dr. John Bennett, International Rice Research Institute, Los Banos, The Philippines Dr. Apichart Vanavichit, Kasetsart University, Nakorn Pathom, Thailand Dr. Cliff Gabriel, Office of Science and Technology Policy, Washington DC, USA Dr. Zhi-Hong Xu, Chinese Academy of Sciences, Beijing, China Dr. Gary Toenniesson, The Rockefeller Foundation, NY, USA In addition, there was discussion from the floor. At that meeting, the participants agreed to participate in an international collaboration to sequence the rice genome. Participants explicitly agreed to share materials, including libraries, and to the timely release to public databases of physical mapping information and annotated DNA sequences. Furthermore, general agreement was reached on the initial steps in methodology: 1) The cultivar, Nipponbare, also known as GA3, will be sequenced. Seed from a single plant will be distributed by Dr. Sasaki for the purpose of making libraries. The primary reasons for choosing this cultivar are that more than 10,000 EST sequences from the strain have been released to DDBJ and that a physical map based on YACs that covers over 50% of the genome has been published. Sequencing other cultivars is strongly discouraged as genetic polymorphisms cannot be distinguished from sequencing errors. Moreover, groups not sequencing from one of the shared libraries would not benefit from the associated accumulated knowledge and the other advantages of collaboration. 2) The RGP will make a PAC library. Dr. Rod Wing will make three BAC libraries using partial digests of different enzymes to generate the inserts. 60,000 BAC clones will be isolated to provide a 20-fold coverage of the genome. 3) The BACs and PACs will be fingerprinted for the purposes of preparing contigs and checking the integrity (deletions or rearrangements) of the clones. The information generated will also be invaluable where repeated sequences make BAC and PAC end sequences ambiguous. 4) In parallel with fingerprinting, the BAC and PAC clones will be subjected to end- sequencing. This should provide an STS every 3 to 5 kb on average, allow genome sequencers to pick the clones with minimum overlap, and provide further information for the physical map. It is important that none of these early steps delay large scale sequencing. Preparation of the PAC library is currently underway and preparation of the BAC libraries will begin shortly. Both types of libraries will be available before the end sequencing can begin. It is estimated that with the participation of several laboratories, end sequencing could be completed within six months. The analyses of fingerprinted Arabidopsis libraries are expected to be completed by the end of 1997. These results will indicate what we might expect for the rice project in term of speed, cost, and the degree of closure. The Workshop concluded with the nomination of a provisional Working Group chosen to direct the collaboration and to decide future directions. This document will appear on Web sites viewed by rice researchers and comments are solicited. The next meeting of the Working Group will be held in conjunction with the Rice Genome Forum, February 5, 1998, in Tsukuba, Japan. Members of the Working Group are: Dr. Takuji Sasaki, Japan Dr. Zhi-Hong Xu, China Dr. Moo Young Eun, S. Korea Dr. Jo Messing, USA Dr. Mike Bevan, Europe Dr. Ben Burr, representing the Rockefeller Foundation The Rockefeller Foundation has offered to facilitate administration of the collaboration. Future Decisions Membership in the International Rice Genome Sequencing Initiative: Any group willing to sequence large stretches of contiguous genomic DNA is welcome to join the collaborative effort as long as they are willing to follow the agreed upon guidelines. In Singapore there was some discussion about the minimum amount of sequence a group would have to contribute annually to maintain membership. The Rice Genome Working Group: The Working Group is the body that will make decisions that pertain to the goals, strategies, and coordination of the collaborative effort. The Working Group will be responsible for planning the most efficient means of completing the project. Among its responsibilities will be assigning regions to be sequenced that will avoid duplication and maximize overall progress. The Working Group is envisioned as being comprised of representatives of the major groups participating in rice genome sequencing. The current group is provisional and it is recognized that some of the major contributors to the effort might change. Rules for deciding membership in the Working Group need to be established. Sequencing strategy: It has been implicit in the discussions, but never stated, that once the BAC- and PAC-end sequencing is completed and the relevant fingerprinting data is available, the most efficient sequencing strategy of complete BACs or PACs will be from random subclone libraries. It will be useful to standardize this technology to insure high quality libraries that are completely randomized with non-chimeric inserts of a uniform size. Sequencing in a specific region of the genome should not start until a sufficient number of tiled BACs or PACs are available to ensure an unfragmented sequence. In the Human Genome Project it has been found that assembly of shotgun sequences leads to contigs of about 30 kb. Sequence closure is the most difficult step in the sequencing process because it cannot be automated. Closure will be aided by restriction site information available from fingerprinting and possibly sequence information from overlapping BACs or PACs. Should ambiguities remain, they should be marked on the final sequence. The final product of this phase will be a single contiguous sequence representing the entire PAC or BAC. Accuracy: The Rice Genome sequencing project, which will serve as a model for all other grasses, will cost an estimated $200M. Given the significant costs in material and manpower, it is imperative that the results be of the highest quality. In part, this problem has been addressed by agreeing to sequence DNA from the same cultivar, if not the same plant, to minimize variation due to genetic polymorphism. The Human Genome Project has agreed to accept a standard of less than one error in 10,000 bp. While the level of accuracy is difficult to verify, this standard is achievable by a combination of high quality shotgun sequence reads, a seven-fold redundancy, and the requirement that every base be sequenced on both strands. Rice is expected to have 50% repetitive DNA. Because of this, the accuracy of final assembly of shotgun sequences will be dependent on the length and quality of individual sequence reads. The Working Group might wish to establish some guidelines here. Annotation and Sequence Release: In other genomic sequencing efforts, it has been recognized that the most useful releases are large contiguous stretches of annotated sequence. A uniform standard of annotation must be agreed upon that checks the integrity of the sequence, assigns and identifies regions of homologies, and delineates potential open reading frames. This should not preclude individual groups from publishing unannotated sequences on their local web sites. In Singapore the participants agreed to timely release of the sequence information. It might be useful for each participating group to agree to release of the complete annotated sequence of a BAC or PAC within three to six months after beginning to sequence the clone. Rice Genome Database: An integrated database will facilitate collaboration and data sharing. Sequences will be released to one of the public databases, DDBJ, EMBO, or GenBank, but a central database for the project will be required to store and manage the annotation information. With ever expanding databases, annotation is never complete. It may be advisable to assign the task of periodic update of the annotation of rice genomic sequence to the centralized rice genome database. The database should also be linked with other rice and cereal databases, serve as a means of coordinating sequencing work, and provide methods for submitting and using information. Functional Genomics: To date at least 50% of newly discovered open reading frames do not have homologues with identifiable function. The use of populations with transposable element-induced knockout mutations has been a powerful tool for identifying the function of some of these unknown genes. While it is beyond the scope of this project, it should be recognized that a consortium of international laboratories has formed to develop knockout populations of rice for the purpose of discovering gene function. This consortium will provide useful tools for the downstream analysis of genomic sequence information. Intellectual Property Rights: Intellectual property rights issues will be raised because of the obvious commercial interest in the sequence for rice and other cereals. In the Human Genome Project, as well as other international sequencing efforts, withholding data for patent application is recognized as being incompatible with the policy of immediate release. Patent issues are regarded as being downstream of data generation and release. These issues that must be confronted but are probably beyond the scope of the Working Group and should be discussed at a meeting called for that specific purpose. Outreach: To be successful, this large sequencing effort needs the broad support of scientists working on rice and other cereals who will be the potential end-users of the sequence information. They must believe that the project is worthwhile, well-organized and credible. There are a number of ways that this support might be engendered. Roles for the general community to influence general strategies and policies should be considered. Outside scientists can serve as peer reviewers of individual projects. Timely release of finished, annotated sequence blocks, as well as the availability of mapped BACs and YACs, increases end-user support. Periodic progress reports, similar to the RGP's RICE GENOME newsletter, and internet access to a useful database, will engender awareness and utility of the project. Interested members of the community can begin to influence the project by commenting and making suggestions on this document. Benjamin Burr Biology Department Brookhaven National Laboratory, Upton, NY 11973, USA e-mail: burr@sun2.bnl.gov

A Report from Tsukuba, February 1998: International Rice Genome Sequencing Project

Vision and Goals: Fundamental plant biological information from a model plant: As a member of the Graminae and a crop plant, a wealth of fundamental information about important aspects of plant biology can be learned from the genomic sequence of rice. Rice is a model for learning about yield, hybrid vigor, single and multigenic disease resistance. Different races of rice are adapted to a wide variety of environmental situations, from tropical flooding to temperate dry land, so it is a model for real life adaptive responses. Because it shares collinear genomes, rice is a key to knowledge of the genomic organization of the other grasses. Comparison of the sequence of the dicot, Arabidopsis thaliana, with that of rice, a model monocot, will tell us what genomic structures these two different groups of angiosperms have in common and how they differ. While the goals of the International Rice Genome Project must be focused, the information provided by the International Project can be exploited by the entire research community to learn: The function and map location of cereal and ultimately all plant genes. Use of map-based sequence information to identify and provide markers for agronomicly significant genes. The molecular basis of plant growth and development so that fundamental questions in plant physiology, biochemistry, cell biology, and pathology can be addressed. The relationship, if any, of genome structure to gene expression. The primary goal is the complete genome sequence of rice. The primary activity in the first year will be to prepare and distribute clones for sequencing. During this period, it is anticipated that the libraries will be quality controlled and that the clones will be end sequenced and fingerprinted. Subsequent years will be devoted to large- scale genomic sequencing. The objective is to complete the task in ten years. The time line below for the first five years indicates that greater than 170 MB of the 430 MB genome will be sequenced by 2003, that chromosomes 6 and 10 will have been completed, and the sequencing of chromosomes 1 and 2 will be well underway. Figure: Five Year Plan: 1998-2003. The purpose of the international collaboration is to accelerate the completion of this goal. The International Collaboration is best achieved by sharing materials and technologies and by the timely release of sequence and related information. To this end, scientists interested in the genomic sequencing of rice participated in a workshop held in conjunction with the International Symposium on Plant Molecular Biology in Singapore on September 23, 1997 (ftp://genome1.bio.bnl.gov/pub/maize/rice.html). A Working Group, nominated in Singapore, met on February 5, 1998 to develop this document. Membership in the Rice Genome Project: Any group willing to sequence large stretches of contiguous genomic DNA is welcome to join the collaborative effort as long as they are willing to following the agreed upon guidelines. Participants agree to share materials, including libraries, and to the timely release to public databases of physical mapping information and annotated DNA sequences. A group must agree to sequence one megabase of DNA per year to maintain membership. Members agree to declare their sequencing plans and to provide detailed plans and progress on their respective web pages. Individual sequencing groups are encouraged to claim large chromosomal regions or entire chromosomes, if they have the sequencing capacity, to increase the likelihood that entire chromosomes are completed. Groups may claim chromosomal regions which they agree to sequence within one to three years. Post-sequencing activities, such as functional genomics, are beyond the scope of the International Rice Sequencing Project. Further, the Project does not encompass the cloning and sequencing of specific rice genes for research purposes or industrial sequencing efforts. While the International Project will be happy to share information with these individual efforts, their conduct is beyond the scope of these agreements. The Rice Genome Working Group: The Working Group is the body that will make decisions that pertain to the goals, strategies, and coordination of the collaborative effort. The Working Group will be responsible for planning the most efficient means of completing the project. Among its responsibilities will be assigning regions to be sequenced that will avoid duplication and maximize overall progress. The Working Group is comprised of representatives of each research group participating in the International Rice Genome Sequencing Project. As Japan is recognized as having a leadership role in the Project, the head of the RGP will be the permanent chairman of the Working Group. Major policy decisions, including sequencing assignments, will be taken by representatives from each of the major national groups participating in the Project. Currently, these regional representatives are Japan, China, Korea, Europe and the U.S. The Working Group will meet annually in Japan. Interim meetings, as needed, may be held elsewhere. The meetings will be open to the public. Results of Working Group meetings will be posted on web sites and published in the RICE GENOME. Methodology: The Oryza sativa ssp. japonica cultivar, Nipponbare, also known as GA3, will be sequenced. Seed from a single plant will be distributed by Dr. Sasaki for the purpose of making libraries. The primary reasons for choosing this cultivar are that more than 20,000 EST sequences from the strain have been released to DDBJ and that a physical map based on YACs that covers over 50% of the genome has been published. Sequencing other cultivars is strongly discouraged as genetic polymorphisms cannot be distinguished from sequencing errors. Moreover, groups not sequencing from one of the shared libraries would not benefit from the associated accumulated knowledge and the other advantages of collaboration. It is recognized that comparative mapping and sequencing of other rice subspecies is valuable information that the International Rice Genome Sequencing Project would like to share. Nevertheless, the primary goal of the Project is the complete sequence of the genome of a single cultivar. The RGP will make a PAC library each with a 20-fold genome coverage. Dr. Rod Wing will make three BAC libraries using partial digests of different enzymes to generate the inserts. 60,000 BAC clones will be isolated to provide a 20-fold coverage of the genome. The quality of these libraries and their coverage will be verified by hybridizing each with 100 single copy EST probes and the number of clones and their insert size will be measured. It is expected that inserts will be greater than 120 kb. The number of clones with organellar DNA and rRNA repeats will also be determined. The BACs will be fingerprinted for the purposes of preparing contigs and checking the integrity (deletions or rearrangements) of the clones. The information generated will also be invaluable where repeated sequences make BAC end sequences ambiguous. In addition, where there is multi-fold coverage, the assembly program can pick out inserts that have deletions or rearrangements. Fingerprinting information will be publicly available so that individual laboratories can verify the quality of the contigs they plan to sequence. The RGP plans to increase the number of currently mapped ESTs to 8,000 in order to make their physically map their PAC clones. These mapped ESTs are an unmatched resource in preparing a physical map as they provide sequence, map location, and direction. In parallel with fingerprinting, the BAC and PAC clones will be subjected to end- sequencing. This should provide an STS every 3 to 4 Kb on average and will allow genome sequencers to pick the clones with minimum overlap. Accuracy: The Rice Genome Sequencing Project will serve as a model for all other grasses and cost about $200M. The sequence will be used by other researchers and will thus be scrutinized. It is imperative that these resources not be squandered on inaccurate results. In part, this problem has been addressed by insisting on sequencing DNA from the same cultivar, if not the same plant, to minimize variation due to genetic polymorphism. Fingerprinting of multiply overlapping inserts is a means of verifying that the BACs chosen for sequencing have not been rearranged. Collinearity with the genome should also be verified by probing restriction enzyme digests of genomic or the appropriate YAC DNA with the BAC and comparing this with digests of the BAC itself. The Rice Genome Sequencing Project will adopt the standards of The Human Genome Project, established at its Bermuda meetings in 1996 and 1997, which has agreed to accept a standard of less than one error in 10,000 bp. While the level of accuracy is difficult to verify, this standard is achievable by a combination of high quality shotgun sequence reads, a seven-fold redundancy, and the insistence 97% of all bases be sequenced on both strands or two chemistries used. In addition, mimimum error estimation values provided by PHRED of 75 over protein coding regions and 40 over the remainder of the genome must be obtained. Further, restriction sites predicted from the sequence must conform to observed digest patterns. Sequence Release: The Rice Genome Sequencing Project agrees to the immediate release of finished, but not necessarily annotated, sequence in units of intact BAC or PAC inserts. These finished sequences will conform the accuracy standards described above. Release is submission to a public database such as DDBJ, EMBO, or GenBank. In keeping with the NHGRI recommendations, automated release of assemblies greater than 2 Kb to local Web sites is encouraged. Annotation: Members of the Working Group, while recognizing the importance of annotation to the value of sequence information, view annotation as separate from release of finished sequence. Each sequencing group is responsible for annotating the sequence they contribute. A uniform standard of annotation has been agreed upon that checks the integrity of the sequence, assigns and identifies regions of homologies, delineates potential open reading frames, and names and indicates the beginnings and ends of genes. Common annotation software will be adopted. The annotator must state: Whether coding sequences and splice sites where determined experimentally or by using software. It is recognized that the use of published cDNA sequences greatly facilitates this task. If gaps cannot be closed, the method of sizing and the reasons for not closing must be stated. Exact details on how adjacent BACs or PACs were assembled with a minimum overlap of 100 bp should also be stated. It is hoped that annotation will be expanded to include recognition of genetic markers, ESTs, known genes, and syntenic regions. An annotation workshop is projected for the Working Group meetings. Rice Genome Database: An integrated database established in Japan will facilitate collaboration, coordinate sequencing work, and provide methods for submitting, using, and sharing information. Sequences will be released to one of the public databases, DDBJ, EMBO, or GenBank. The Rice Genome Database will pick up new submissions from the public databases. The Database will store and manage the annotation information. Each participant will maintain a Web site with a standardized format that describes work in progress and sequences completed. The Database will be linked with the Web sites of each of the Projects participating laboratories and thus be able to maintain a registry of clones being sequenced, monitor progress, and coordinate activities. The database will also be linked with sites that are providing finger printing information and end sequences. With ever expanding databases, annotation is never complete. It may be advisable to assign the task of periodic update of the annotation of rice genomic sequence to the Rice Genome Database. The larger goals for the Project envision the use of sequence information to provide biological lessons for rice and other cereals. The Rice Genome Database is a means for linking all genomic information related to rice DNA sequence. This information comes from existing genomic databases and from work that derives from DNA sequencing, such as determination of gene function. The Rice Genome Database will thus be linked with other rice and cereal databases and to international groups that will be learning about the function of rice and other cereal genes. Outreach: To be successful, this large sequencing effort needs the broad support of scientists working on rice and other cereals who will be the potential end users of the sequence information. Ultimately, it is the public at large who supports the project and steps at public education should be undertaken. They must believe that the project is worthwhile, that is well- organized and credible. There are a number of ways that The Rice Genome Sequencing Project will attempt to engender this support: Timely release of finished, annotated sequence blocks as well as the availability of mapped BACs and YACs. RICE GENEOME will report the results from the Working Group meetings as well as news of the Project. Internet access to The Rice Genome Database will engender awareness and utility of the Project. Publications from participating sequencing laboratories should acknowledge that they are part of the Project. Benjamin Burr Biology Department Brookhaven National Laboratory Upton, NY 11973, USA e-mail: burr@sun2.bnl.gov

A Report from Miami Beach, September 1998: Interim Working Group Meeting

Funding: Japan: Takuji Sasaki reports that his group has received funding of US$10M for 10 years subject to review. France: Michel Delseny reports that initial work to prepare sequence ready BACs will begin in his lab in Perpignan and in Alain Ghesquiere's and Jean Christophe Glaszmann's labs at ORSTOM/CIRAD in Montpellier, and that the work will move to Genoscope in Evry for sequencing. Taiwan: Yue-ie Hsing reports that the National Science Council, Academia Sinica and the Education Ministry will support rice genome sequencing at 3 to 5 sites with 2 to 4 PIs in each site. Korea: Moo Young Eun reports that they have applied the 1st 5-year(total 10- year) special budget for rice sequencing. from their government. For FY99 this would support 1.2 Mb of sequencing and new automated sequencers. The proposal passed several reviewing steps and is in final evaluation period. He is optimistic. In the meantime, they have been conducting pilot sequencing with their BAC clones, and will begin sequencing Nipponbare as soon as the requested libraries arrive. Canada: Tom Bureau at McGill University has received a Provincial equipment grant to establish a sequencing facility from which he hopes to launch a Canadian rice genome sequencing program. China: Guofan Hong reports that his group receives funds from the State Commission of Science and Technology, the Chinese Academy of Sciences and the Shanghai Municipal Government to support the current project of sequencing chromosome 4 of the rice genome at the rate set in the International Rice Genome Sequencing Areeement. The funding period is 5 years and may be renewed based on review. US: Rod Wing (Clemon University) received funding from the Rockefeller Foundation to prepare BAC libraries. He has also received funding from Novartis to do BAC end sequencing and fingerprinting of the BACs. All of these results are being made publicly available. Machi Dilworth (NSF) reported that an interagency program announcement for sequencing rice, with USDA as the lead agency, would be published early in the 1999 fiscal year beginning October 1. IRRI: While strictly not part of the Rice Genome Sequencing Program, it is important to note that Hei Leung has begun a functional genomics program based on deletion mutagenesis. He has used primarily fast neutrons and diepoxybutane to generate M3 populations with mutation rates of 1/1000 per locus and has obtained preliminary evidence supporting this work at the Xa21 locus conditioning bacterial blight resistance. Progress: Japanese Rice Genome Research Program: Tomoya Baba reported that the RGP mapped 2000 ESTs on a YAC map of the rice genome previously constructed with 2275 DNA rice genome markers. They achieved 306 Mb of total coverage with 82% of chromosome 6 and 74% of chromosome 1 covered. Baba pointed out that particularly on chromosome 6, the gaps corresponded to regions of the map that lacked DNA markers. The RGP constructed PAC libraries with the pCYPAC2 vector from Pieter de Jong. The vector employs the SacB gene for positive selection of inserts. They used partial Sau3AI digests to prepare their library which comprises 71,000 clones. The mean insert size is 112kb with 18% under 80 kb and 11% chloroplast contamination. The RGP is using a combination of restriction digests and hybridization or PCR of ESTs to establish contigs. They plan to obtain PAC end sequences using tail PCR. A second PAC library made with partial MboI digests of genomic DNA is now under construction. Jo Messing: Jo presented an overview of the International Agreement for rice genomic sequencing with specific emphasis on methodology. In particular, BAC end sequencing and shot gun sequencing. He illustrated his talk with assembled sequences from syntenous regions of sorgum and maize. He pointed out that it was important to have long, single pass reads from both ends of a random clone as distance is very important in assembling repeated sequences. The LTR regions of some retrotransposons that his group discovered exceeded 4kb, and these repeated sequences could not have been assembled without this protocol. In summarizing his talk, Jo showed how sequencing from individual BAC clones was both important and necessary. Jo is collaborating with the Clemson group. He confirmed quality checks of the Nipponbare BAC library with BAC end-sequencing (see below). He also selected BAC clones generated by Rod Wing's group from japonic and indica with an orthologous probe from maize for a joint DNA sequencing project to compare sequence differences between the two subspecies. Rod Wing: Rod described the work of the Clemson Genomics Institute. A BAC library prepared with partial HindIII digest of Nipponbare DNA has produced a library of 36, 864 clones with an average insert size of 128.5 kb of which 8.2% have inserts less than 80kb. The library represents a greater than 10 fold coverage of the genome. 2.25% of the inserts appear to be chloroplast DNA and have been removed from the library. The library was probed with 12 RFLP single-copy markers and each probe produced from 2 to 21 hits. An EcoRI partial library is also being constructed. The BAC libraries will be end sequenced with a goal of obtaining 148,000 ends. Currently 13,932 reads have been obtained with an average read length of 380 bp. These sequences are released monthly. 8432 sequences are their BLAST hits are currently posted on the Genome Institutes web pages, which also provide for searching by sequence. Rod's colleague, Ralph Dean, spoke about finger printing the BACs. Their goal is to fingerprint and assemble 74,000 BACs. They are currently proving the technology on the rice blast fungal pathogen. Dick McCombie: Dick spoke about lessons learned from the Arabidopsis sequencing project with specific emphasis on the usefulness of a physical map based on BACs. He pointed out that the current cost of obtaining finished sequence is US$ 0.4 to 0.5 per base with the possibility that this might drop to US$ 0.3. Thus the cost of the rice sequencing project will be at least US$150M. The rice sequence will provide value in the form of rice improvement, the identity of almost all cereal genes, and a reference for larger cereal genomes based on comparative genomics. In order for this value to be realized, the sequence needs to be accurate, complete, and correlated with genetic markers and maps. The group at Washington University headed by John McPherson and Marco Marra has undertaken a whole genome approach to fingerprinting Arabidopsis. Mapped markers or ESTs are used to anchor the physical map to the genetic and cytogenetic maps. The group has fingerprinted BACs from two libraries representing an 18-fold coverage of the genome. 17,000 clones have been assembled into 394 contigs. Dick pointed out that the automated assembly gives you bins (although they need to be checked) but does not give you precise order information without manual editing at the rate of 15 Mb per month for an experienced person. Currently, 75 contigs (comprising 13,000 clones) have been edited and cover 81 Mb. Dick estimated that it would cost about US$ 2M to map the rice genome and that it would take an experienced team two years to complete the task. Dick pointed out that recent advances in mapping technology and strategies allow the task to be done quickly for a fraction of the former cost. In addition to greatly aiding sequencing, the map is likely to pay for itself as a tool for positional cloning prior to the completion of the sequence. Proposal for a whole genome approach to rice sequencing: Ian Bancroft and Mike Bevan submitted "A whole-genome approach to sequencing the rice genome" for discussion at this meeting. In their absence, the proposal was outlined by George Murphy. While a number of people found the proposal attractive for larger genomes, there was little support for this proposal for rice. Basically, the question was do you want to discover the most genes in the shortest possible time or to you want to proceed most efficiently toward a complete reference sequence? The group assembled in Miami was decidedly of the second opinion. In this context, there were three areas that presented problems: 1) A number of people said that you needed sequence plus position for rice as a model organism and they didn't believe that the proposed outline would lead to map-based sequence. 2) If, as outlined in the proposal, most of the genes can be found in 20% of the DNA, it was not obvious how you would use this method to obtain the complete sequence. 3) The principal assumptions, that most of the genes can be discovered from a narrow range of buoyant density and that whole-genome shotgun reads can be assembled, are untested. Attendants suggested two preliminary experiments. The first was to prepare clones from the "gene-space DNA" as well as from the whole genome, and examine the frequency of homologous hits in sample single pass reads from both sets of clones. The second was to assemble the random single pass reads, but not the finishing reads, from the extensive contiguous sequence already obtained. For example, the 2 Mb that the European consortium has obtained from Arabidopsis would represent a best case scenario. The meeting closed with the announcement that the next working group meeting would take place in Tsukuba, Japan during the week of February 7-10, 1999. Benjamin Burr Biology Department Brookhaven National Laboratory Upton, NY 11973, USA e-mail: burr@sun2.bnl.gov

Rice Genome Sequencing in RGP

Rice genome sequencing in RGP depends primarily on the construction of sequence-ready P1-derived artificial chromosome (PAC) contigs. To construct contigs, first we screen our PAC library by Sequence-Tagged Site (STS) markers generated by restriction length polymorphism (RFLP) or expressed sequence tag (EST) markers. Among 2275 RFLP markers on the linkage map, about half might be converted to unambiguous STS markers. EST markers are generated by mapping cDNAs on a yeast artificial chromosome (YAC) physical map by using 3'-UTR as a source of specific PCR primers and 7000 YACs as templates for the polymerase chain reaction (PCR). So far, about 3000 ESTs have been located on YACs being assigned positions along the linkage map. RGP will sequence rice chromosomes 1 and 6 as the first targets, and EST mapping will be done mainly for these chromosomes. Our PAC library is composed of 72 000 clones created by using pCYPAC2 as a vector for Sau3AI digestion of rice DNAs. The average insert size is 120kb. Contamination by chloroplast DNA is about 11%. A preliminary screening of 20000 clones by several random EST markers revealed no failure to identify positive clones by any markers. To complete a sequence-ready PAC physical map, additional tools other than EST or STS will be required. End sequences of PACs are used to assign overlapping sequences within adjacent PACs to get the most efficient sequencing and, more importantly, to provide PCR primers for searching for new PACs carrying the same end sequence. Even by using more than 10000 ESTs, we might not complete a sequence-ready PAC physical map using only the ESTs because of the expected existence of an uneven distribution of genes within the rice genome. So, primer walking will have to be used to complement the lack of EST markers. Subcloning by the shotgun method is used to prepare templates for sequencing. Two types of shotgun library are constructed, with average insert sizes of 2kb and 5kb. The former library is used usually for sequencing after PCR amplification; the latter is used mainly to fill gaps between the sequence contigs obtained with the former. At present, only ABI377 sequencers are used, but in the near future a newly developed sequencer with capillaries will be introduced. For sequence assembly, PHRED-PHRAP software, developed by PhillipGreen, is used, and attaining a quality score of more than 70 is the challenge. At the moment, RGP sequences 1 PAC (100_150kb) every two weeks with about 5000 sequence reads. The edited sequence is swiftly released to a public database (DDBJ), and an annotation is made on our web page, <http://www.staff.or.jp>. Takuji Sasaki, Yoshiaki Nagamura, Tomoya Baba, Kimiko Yamamoto, Takashi Matsumoto, Katsumi Sakata RGP, NIAR/STAFF 1-2, Kannondai 2-chome Tsukuba, Ibaraki 305-8602, Japan e-mail: tsasaki@nias.affrc.go.jp

Status of the Sequencing Project in China

BAC library of O.sativa Spp. Indica Guang Lu Ai 4 was constructed early in 1994 in National Center for Gene Research, the Chinese Academy of Sciences,Shanghai (Cell Research,1994,4, 127-133). The library constructed was of approx. 8 equivalents of the rice genome and has been proved to be genetically stable. Based on the library, the BAC contigs of various length were constructed in 1997 by combination of fingerprinting and DNA marker anchoring, the contigs constructed were of approx. 92% coverage of the rice genome (Journal of Sequencing & Mapping.1997, 319-335). The contigs with mapped markers were assigned to and ordered along the chromosomes, based on the order of markers used. The gaps between contigs were supposed to be caused either by the lack of gap-bridging clones in the BAC library used in the project or by failing to identify the gap-bridging clones due to the high threshold values of tolerance and match probability cut off, set in the mapping program. Gaps are planned to be closed up by, for instance, contig end clone walking. There were contigs of yet unknown assignment, which would be determined by FISH. Genomic sequencing of chromosome 4 was initiated in 1998. The sequencing strategy we have adopted consists of the following: (1) BAC tiling to identify clones with minimum overlaps;(2) subcloning of BACs identified; (3) building up scaffolding;(4) filling up gaps in scaffolding and (5) primer walking to close up sequence gaps. Tiles of BACs derived from contigs to be sequenced are first established, sequencing of which is being performed by shot gun strategy followed by "semi-random" approach and primer walking. To test the reliability of the approach applied, analysis of the sequence obtained was done, which showed that the depth of the scaffolding formed by sequencing at random was as thick as of 22 clones on average with average eight folds sequence redundancy. The current sequencing rate is approx. 5 Mb a year. This is a renewable multi-year project funded jointly by Ministry of Science & Technology, Chinese Academy of Sciences and Shanghai Commission of Science and Technology. Guofan Hong National Center for Gene Research Chinese Academy of Sciences 500 CaoBao Road, Shanghai 200233, China e-mail: gfhong@newnetra.ncgr.ac.cn

Korea Rice Genome Research Program

The Korea Rice Genome Research Program (KRGRP) was established at National Institute of Agricultural Science and Technology (NIAST) in September, 1994, and KRGRP has been focused on the construction of molecular map and map-based cloning, molecular breeding in practice and transformation, and large-scale cDNA sequencing and data-base construction. The 'Milyang 23/Gihobyeo RI population' which consists of 164 F14 lines has been developed by single seed descent, and 'NIAST map', an international reference map of rice in RiceGenes, has been constructed using this population. Using this molecular map, yearly and regional variations of 30 quantitative trait loci (QTLs) of agronomic traits, such as yield and yield components have been analyzed with three year data('95-97), and will be continued to replicate over years for tagging QTLs. Partial nucleotide sequences of about 8,000 individual cDNAs derived from Milyang 23 emmature seeds have been characterized and analyzed the sequence similarity with the database. So far, over 6,000 partial-sequenced cDNA clones and 175 full-length cDNA clones have been registered to EMBL and GenBank database. The Korea Rice Genome Network Server (http://krgrp.niast.go.kr and http://bioserver.myongji.ac.kr) is maintained to provide the information of sequences, their homologies and alignments of the identified genes of the ESTs. The complete sequencing of rice genome in IRGSP will lead to even more efficient identification and manipulation of traits. KRGRP is actively participating in the international collaborative effort and will sequence DNA clones on chromosome 1, as we proposed at the 1st Working Group Meeting in Tsukuba, on February 5, 1998. Moo Young Eun National Institute of Agricultural Science and Technology Suwon, 441-707, Korea e-mail: myeun@niast.go.kr

France Targets Chromosome 12

A few French groups (namely at the CIRAD and ORSTOM (recently renamed IRD) in Montpellier have long been working on rice breeding and genetics, essentially for their oversea territories, their own production in Camargue and as part of their contribution to international cooperation with developping countries. These groups have numerous collaboration in East and South East Asia as well as in Africa and they have participated during the recent years to the Rockefeller Foundation Rice Biotechnology Network. When it was decided, at the international level to start organising an International Rice Genome Sequencing Project, we feeled that our country should participate in such a project. In the absence of any short term plan for an European initiative and because a new large scale national sequencing facility was established in Evry near Paris at Genoscope, we propose to sequence rice chromosome 12 and successfully apply for a first phase project in collaboration with Genoscope. There are several reasons for chosing this chromosome: the first one is that previous mapping work was carried out on this chromosome in Montpellier, where our ORSTOM colleagues have mapped a series of disease and pest resistance genes including Rice Yellow Mottle Virus and Magnaporthe grisea, on the mapping population IR 64 X Azucena. The second reason is that this chromosome is homologous to part of wheat chromosome 5 which is of great interest for French wheat breeders. This first phase of the project is to rapidly establish one or two Nipponbare BAC contigs with minimum overlaping of approximately one Mbp so that sequencing can start in Genoscope as soon as possible, most likely in July or August. During this phase of the project essentially four labs will be involved with a coordination in Perpignan: for physical mapping most of the work will be carried out in Perpignan (Michel Delseny) and in IRD lab (Alain Ghesqui_re ) in Montpellier. BAC libraries are maintained in CIRAD (Jean Chrsitophe Glaszmann) next building to IRD in Montpellier. Most of the sequencing will be done in Paris by Genoscope. Due to the availability of a short term training fellowship we could start the project by mid of November, before receiving any fund, after collecting filters from Rod Wing at the Miami meeting and collecting a number of probes already available in our different labs. Most of them were previously obtained by ORSTOM from RGP and Cornell programmes. We are in the process of expending this collection of chromosome 12 specific probes by requesting additional ones from both groups. Strategy used is rather straightworward: we hybridize chromosome 12 probes on the high density filters given by Rod Wing. The positive clones are evaluated by HindIII fingerprinting to build up contigs. All the probes are sequenced, at least in part, to check their identity when it is known or to compare with EST and BAC end database when the probe is a genomic clone which has not yet been sequenced. So far 6 probes were used and a few BACs have been collected for establishing the fingerprint technique. Two of the probes detected overlapping BACs. We also sequenced a few of the ends in order to use them as probes to extend this first contig. One of the ends corresponds to a repetitive sequence which is already present several times in the CUGI database illustating the difficulties which might arise in a complete shotgun sequencing strategy. Further control will be made in the future to verify collinearity between BAC contigs and corresponding genomic DNA. We should receive some money to hire two people sometimes in January or February so that the project can accelerate in March. This initial project ,which is aproved for 1999 will be followed by a second phase which should start by the end of 1999-early 2000 which will consist in more massive sequencing when we have selected a large number of entry points in the physical map and when most of the BAC end sequencing will be completed in Clemson. In order this project to be developedk, new proposals should be submitted and approved by Genoscope Scientific Council by the end of 1999. An important point is that by that time most of the force devoted right now to Arabidopsis programme at Genoscope can be shifted to rice. We hope that this first commitment in sequencing rice genome will be confirmed and that our country could contribute to the progress of the Rice Genome Sequencing Initiative. After 1999 we anticipate that most of the sequencing work will be carried out in Evry and that our task will be shifted towards annotation, characterisation of cognate cDNA and return to biological problems with genomic tools at hand. Genoscope, like other large scale sequencing centres applies the Bermudes regulatory rules consisting in immediate release of pre-finish sequences so that most of the sequence is immediately available on-line to the community . Meanwhile we set up the Nipponbare project, a new BAC library has been prepared from Azucena by our colleagues from CIRAD and INRA in collaboration with Rod Wing in Clemson and this will complement the IR 64 library existing at the IRRI. Michel Delseny Lab Plant Physiology and Molecular Biology UMR 5545 CNRS/ University of Perpignan 66860 Perpignan cedex, France e-mail :delseny@univ-perp.fr

Status of the Rice Sequencing Project in Taiwan

As indicated earlier, we are interested in collaborating closely with laboratories involved in the International Rice Genome Sequencing Project. Taiwan will join the project and we propose to sequence chromosome 5. One central laboratory in the Institute of Botany, Academia Sinica will conduct a pilot sequencing work starting early spring of 1999 for about six months. Four PIs will participate in the program, including one molecular biologist, one plant physiologist, one biochemist, and one geneticist. National Science Council will provide funding for the pilot work. If the pilot work is satisfactory, subsequent fundings may be awarded. Currently, there is a program on structural and functional genomics of human hepatoma in our country, and we have agreed to share experience and technology concerning genome sequencing between the rice and human groups. The home page of this lab's web site is now under construction, and the temporary URL address is http://biometrics.sinica.edu.tw/genome. This lab agrees to an immediate release of sequences upon completion. Please contact with Yue-ie Hsing at bohsing@ccvax.sinica.edu.tw for any further questions. Yue-ie Hsing Institute of Botany Academia Sinica Taipei, Taiwan, ROC e-mail:bohsing@ccvax.sinica.edu.tw
RICE GENOME RESEARCH PROGRAM (RGP) HOME PAGE
webmaster@staff.or.jp
Copyright (C) The International Rice Genome Sequencing Project (IRGSP). 2005 All rights reserved.
RGP NIAS STAFF IRGSP