Monsanto Sequence and Physical Map Contribution to the IRGSP

Today, the Japanese Ministry of Agriculture, Forestry and Fisheries (MAFF) announced the contribution of a working draft sequence of the rice genome and the associated files and physical mapping data to the IRGSP.

The working draft sequence was done in Dr. Lee Hood's laboratory at the University of Washington under contract to the Monsanto Company. Monsanto's interest in the sequence is gene discovery in the cereal grasses and to support its crop R&D. Monsanto is sharing the materials with the IRGSP in support of greater openness in global agriculture. They expect the IRGSP to put all of this sequence in the public domain when it has been combined it with new sequence information generated by the IRGSP.

The announcement of this contribution comes in the final phases of preparing the physical map and assembling the sequence data. Delivery of the sequence and the related materials to the IRGSP is expected in May to June.

The approach taken by Dr. Hood's laboratory is outlined in JC Roach et al., Gaps in the Human Genome Project, Nature 401: 843-845 (Oct. 28,1999). A partial HindIII digest BAC library with 80,000 members was made from Nipponbare DNA. . These were end-sequenced and fingerprinted with HindIII. The end sequences (STCs) were compared with mapped STSs to anchor the assembled BACs to the map. A key feature of this approach was the use of STCs to choose subsequent BACs for sequencing that had a minimum overlap . Shotgun libraries with 2-3Kb inserts into pGEM were prepared from individual BACs most of which were sequenced to 5X coverage (~10 reads/kb). The amount of sequence is expected to grow to about 400 MB and to be derived from about 3500 BACs. No finishing was done.

What will be transferred?

1) Database information on the physical map.

2) The approximately 3500 sequenced BACs, but not shotgun libraries.

Individual clones from the ~80,000 HindIII libraries will be available on request.

3) BAC-end sequences (STCs). About 110K high quality STC's and about 40K more of lower quality are available.

4) Trace files of the shotgun reads for each of the sequenced BACs.

5) Fingerprint files: GIF + txt files

6) A rice repeated sequence database.

The entire assembled, unannotated sequence will not be released, but reassembling the sequence on a region by region or chromosome-by-chromosome basis, as needed by members of the IRGSP to make choices and decisions regarding sequencing, is anticipated.

In addition to these materials, Monsanto will also set up a Company server for public access by registered researchers to the entire assembled unannotated sequence for limited Blast searches.

IRGSP Registration Agreements:

Individual laboratories participating in the IRGSP will be asked to sign a registration agreement. Data from Monsanto is expected to remain confidential until it is combined with new sequence from IRGSP laboratories and submitted to a public database. Unmodified data cannot be submitted directly, but is expected to be released once additional sequencing has been done. For example, joining contigs from a BAC and increasing quality would be an acceptable enhancement. In the simplest situation, all data based on or including data from Monsanto is expected to ameliorated and published. Should an institution decide to file a patent on the Monsanto sequence before it is made public, then Monsanto asks for the first chance to negotiate for a nonexclusive, royalty bearing license to the patent. There is no "reach through" provision to the combined sequence when it has been made public.

(Users of the Company server will likewise have to sign a registration agreements that has similar provisions. This service is a boon for members for the IRGSP because it will lessen the pressure on them to release confidential information in their possession.)

Value to the IRGSP:

There are substantial savings in both the time and energy required to select sequence-ready BACs in production sequencing. The combination of physical map information along with the rough draft sequence, allows one to see what needs to be done on a chromosome arm and in itself is a useful planning tool. This material represents possibly as much as a 50% reduction in the total cost of the project and may allow us to finish within three years once the data and revised strategies are in place.

The physical mapping information will be invaluable to the IRGSP and it will greatly enhance the speed at which sequence-ready BACs can be located and gaps filled. Practically speaking, it is also required to manage the data. BACs or contigs linked to the map with only a single marker will need to be RFLP mapped to verify the correct location. As stated above, integration of the physical maps is encouraged with the provision that the information be made publicly available. Integration of the maps means that RGP PACs or Clemson BACs can be used as the substrate for subsequent sequencing and may also be the source of bridge clones.

As stated above, additional shotgun sequencing (perhaps an additional 2X coverage) will probably be required for all of the regions covered by the rough draft. Nevertheless, this greatly reduces the amount of production sequencing required.

This contribution should in no way be viewed as the end of our project. Our goal is to obtain a complete and accurate sequence of the genome. The Monsanto contribution complements our work, particularly the extensive genetic and physical mapping resources that have already been developed. There is immediate value in the combination of the two data sets. Viewed in this light, the Monsanto contribution represents an acceleration of our activities; this is something that we and Monsanto have been careful to point out to funding agencies.

Takuji Sasaki and Ben Burr

April 4, 2000

RICE GENOME RESEARCH PROGRAM (RGP) HOME PAGE
webmaster@staff.or.jp
Copyright (C) The International Rice Genome Sequencing Project (IRGSP). 2005 All rights reserved.
RGP NIAS STAFF IRGSP