Minutes of the Interim Working Group Meeting
Phuket, Thailand
September 20, 1999
A one-day meeting of the International Rice Genome Sequencing Project was held in conjunction with the Rockefeller Foundation's International Program on Rice Biotechnology (IPRB) Meeting. The sequencing meeting was attended by about 60 scientists. The major reason for holding the meeting in Phuket was to publicize the Sequencing Project to the general community of rice researchers. We described the Project at an IPRB workshop, to the Asian Rice Biotechnology Network, and a number of our members gave formal presentations during the IPRB meeting. Virtually all of the participants in the sequencing meeting attended the IPRB meeting, and this was an opportunity for those members who do not normally work on rice to become acquainted with current rice research and with other rice scientists.
Meeting Highlights
As of September 7, 1.26 Mb of Nipponbare contiguous genomic sequence had been submitted to public databases. Additionally there were 49,424 BAC-end sequences.
The sequence of the ends of 92,000 BACs should be completed by the end of the year.
The use of mapped DNA markers and ESTs will permit the coverage of 30% of the genome by PAC contigs by February, 2000.
It is important to keep nucleation points to a minimum on each chromosome. The nucleation points must be part of a contig.
A variety of methods, including verification of marker restriction fragment size, fingerprinting, use of multiple markers, and end sequences, are required for BAC validation prior to sequencing. Genetic mapping is required to confirm chromosomal assignments.
The IRGSP agreed to a minimum uniform standard for annotation.
We learned that positive funding decisions for both Thailand and India's participation in the project are immanent.
Tomoya Baba (RGP, Japan): Rice Physical Map
Tomoya reviewed the resources and accomplishments of the RGP physical mapping effort. 2275 DNA markers have been genetically mapped. They have constructed a YAC physical map that covers 270 MB or 63% of the genome upon which 2600 ESTs have been mapped. They constructed an Sau3AI partial PAC library of 71,000 clones with an average insert size of 112kb.
Their current goal is to supplement the physical map with PACs. The current method is to use DNA probing or PCR to map the PACs with mapped DNA markers or with ESTs. They will employ 4,000 markers for the entire genome. Currently, they have completed chromosomes 1 and 6 and expect to complete the genome by February, 2000. The overlapping PAC contigs are confirmed with finger-printing and end sequencing. For chromosomes 1 and 6 they are achieving 30% coverage of the genome. They have confirmed the physical mapping strategy with sequencing data for ten overlapping PACs. There are some exceptions, but the estimates of overlaps are generally correct.
The next step to cover the remaining 60% of the genome, exclusive of the telomere and centromere repeated regions, is to employ PAC end walking.
Tomoya pointed out that they could also complement their PACs with the CUGI BACs. The mapped PAC contigs can be sequenced as nucleation points for gene rich regions. Fingerprinted and end-sequenced BACs could then be used to provide minimal overlap tiling paths.
Takuji Sasaki (Japan): Genome Sequencing at the RGP
Current sequencing capacity at the RGP is ten ABI377s and six ABI3700s. They confirm their finished sequence with restriction mapping and PCR of genomic DNA. Takuji emphasized the importance of genetic mapping to confirm chomosomal location.
Currently, they have finished, annotated, and submitted four chromosome 6 PACs. So far they have observed a gene density of one gene per 5 kb. Takuji pointed out the high frequency of repeats greater than 100 bp.
Rod Wing (USA): BAC Libraries at CUGI
Rod described two Nipponbare BAC libraries his lab has prepared. There are about 37,000 members of a HindIII library in pBeloBAC11 with an average insert size of 128 kb. The second library, recently finished, employed EcoRI in the pBACIndigo vector and has 55,000 members with an average insert size of 121 kb. Rod estimates a 2.9% contamination with choloroplast, and a 0.6% contamination with mitochrodrial DNA. He expects that they will focus on the HindIII library for sequencing.
They are currently end-sequencing the BACs with the goal of completing the ends of all 92,000 BACs and expects to take three to four months to finish. Their current results are:
| Library | Attempted | Succeeded | Avg. Quality Sequence |
| HindIII | 77,407 | 60,841 | 353 bp |
| EcoRI | 19,531 | 13,661 | 313 bp |
Rod estimates that this work comprises 26.5 Mb of high quality sequence. Of 60,000 ends 37.5% had positive Blast hits (probability values < e-7).
Rod also reported finger-printing the HindIII library. A set of 33,300 BACs were assembled into 1431 contigs, and 1241 singletons using a stringency of 10-12. This data will be released in the same manner as the is the Arabidopsis fingerprinting database maintained at Washington University. Rod said that CUGI planned to share raw finger-printing data with TIGR and Genoscope.
Rod spoke at length about validating BACs before they are sequenced. The identifying marker is sequenced to confirm its identity. BACs with positive hits are subjected to Southern blotting to confirm that the positive restriction fragment is the same size as the mapped genomic fragment. The fingerprint is compared to the fingerprints of overlapping clones to verify the absence of insertions or deletions. The insert size measured by CHEF gel electrophoresis is used to confirm the size based on fingerprinting.
Rod described CCW, a consortium formed with CUGI, Cold Spring Harbor Laboratory, and Washington University, to sequence a portion of chromosome 10.
Jo Messing (USA): Sequencing at the Rutgers Plant Genome Institute
Jo described and illustrated four methods for verifying BAC location:
Multiple markers
Fingerprinting
Restriction fragment size
End sequences
He demonstrated how he had constructed a contig of six BACs with increasingly minimal but verifiable overlap. Three of the BACs comprise a contig of 453,624 bp near the centromere on chromosome 10. It is in a region of no detectable (< 0.3cM) recombination. GenScan indicates 38 genes or one gene per 12 kb. Jo pointed out that this level of gene density in itself and the five retroviral elements and the five transposons were not enough to account for the observed reduction in recombination. He believes that there must be some other structural feature that accounts for the lack of recombination. He also pointed out that DNA sequence and physical mapping could resolve the order, spacing, and orientation of DNA markers that had hitherto been unresolved by genetic mapping.
Moo Young Eun (RDA): Sequencing in Korea
Moo Young's group has received a Nipponbare BAC library(Hind III) from Rod Wing and has a sequencing capacity of 4 ABI377s. His group has chosen to work in two regions of chromosome 1: on the short arm around RLG8 and on the long arm around sd-1. He reported that 4 BACs are nearly completed, but that they are observing a problem of loss of sequencing signal following GC repeats in either direction.
The-Yuan Chow (Academia Sinica): Sequencing in Taiwan
The-Yuan reported that his group has received a contig of 14 PACs for chromosome 5 from the RGP. Sequencing of the first PAC is nearly complete. The-Yuan also observed that same loss of signal after GC repeats as reported by Moo Young Eun. He also pointed out that they have sometimes encountered deletions in PACs grown with 5% sucrose, but not in those grown in LB.
Apichart Vanavichit (Kasetsart University): Sequencing in Thailand
Apichart reported developing ten mapped markers on chromosome 9 around an important QTL for submergence tolerance. His group developed physical maps of the region and identified a PAC that contains the flanking markers. The project is three months old and preliminary sequencing at the RGP and in Thailand has begun on the target PAC.
Apichart pointed out that internet traffic is prohibitive for his group to use the NCBI for homology searching and that they must mirror the GenBank database locally.
Robin Buell (USA): Sequencing at TIGR:
Robin described the TIGR operation and their sequencing protocols. She especially emphasized their recently updated Rice Gene Index and described its use. The Rice Gene Index is free of charge to academic laboratories (www.tigr.org). She also illustrated her talk with the methods they used for Arabidopsis sequencing and the sequencing of a random 180 kb insert Nipponbare BAC from CUGI.
Mike Gale (United Kingdom): Rice sequencing at the John Innes Institute
Mike described a 340 kb contig from chromosome 2 assembled by Ian Bancroft from a variety of sources and sequenced by Renato Tarchini. The region was chosen because of apparent homology with a region on chromosome 4 of Arabidopis. Mike pointed out that only five genes interspersed among 40 preserved synteny.
Discussion of Preparation of Sequence-Ready Contigs
Jo Messing led a discussion on the production of minimum tiling paths. In his introductory remarks, he made the following remarks:
1) Nucleation points must be reliably anchored to the map.
2) It will require close coordination to manage nucleation by different groups on the same chromosome.
3) A web page with a graphic summary should be developed that would show the nucleation points of each group and the progress made.
4) Released sequences should show identity of sequenced mapped markers.
Dick McCombie (Cold Spring Harbor) made the point that sequencing from a variety of libraries is not a problem as long as one is well characterized.
Robin Buell suggested limiting nucleation points so that you can afford to close the region you have staked out in the time for which you have funding. For example, they used eight seed BACs for 19 MB on chromosome 2 of Arabidopsis. Don't go beyond your capacity. Otherwise, there may be a mess for the next group to clean up. She also suggested that you go straight to the genome to fill gaps rather than relying on YACs which may not have complete fidelity. Alternatively, it would be very useful to have a lambda library to rely on in place doing of long-distance PCR. Most gaps are relatively small - less than 12 kb.
Dick McCombie cautioned that seed BACs must be parts of contigs.
Melissa de la Bastide (Cold Spring Harbor, USA): Finishing Tutorial
Melissa gave a detailed overview of the finishing process that including a description of the current protocols being used at Cold Spring Harbor Laboratory. Much of this information can be found on the Web at http://nucleus.cshl.org/genseq/Protocol%20Index.htm.
Maria-Ines Benito (TIGR, USA): Annotation Tutorial
Maria-Inez gave a detailed description of the annotation procedures, especially for Arabidopsis, that are being used at TIGR. She said that a trained annotator could finish two BACs a week. The practice is broken into the following steps:
1) Homology searches and prediction searches.
Among the prediction tools TIGR uses are GenScan, GeneMark, and PlantNetGene for intron prediction. (Splice Predictor is designed to do the same task for maize.) Repeat Masker is used for simple repeats and Printrepeats is used for long repeats. TIGR will develop a training set of rice sequences to be used for retraining prediction programs.
2) The Annotator program is used to graphically display the prediction results.
3) Human editing
4) Assignment of putative function.
Annotation Standards:
Maria-Ines Benito led the group in adopting the following Annotation Standards:
1) In addition to running sequence similarity searches, all groups agree to use, at a minimum, GenScan and Gene Finder. GenScan has been trained for rice at the RGP, Gene Mark is currently being evaluated, and it is anticipated that Grail will be trained for rice.
2) All groups agree on a standard nomenclature for predicted proteins:
Sequences with 100% identity at the amino acid level to known proteins will receive the same name.
Sequences with less than 100% identity but with stringent homology to known proteins will be called "putative" proteins of the same name. (The alternative, "same protein-like" will be readressed in Tsukuba.)
Sequences with homology to unknown ESTs will be called "unknown protein." The EST hit will be included in a note.
Sequences predicted by at least two gene prediction programs with no homology to an EST will be called "hypothetical protein." The gene prediction programs will be included in a note.
3) Coordinates of predicted proteins or other recognizable features such as repeated sequences or transposable elements will be deposited in the databases.
4) A repeated sequence database will be created at TIGR. All groups are urged to submit their discoveries.
5) An annotator e-mail group will be started. Annotators should submit e-mail addresses to Maria-Ines Benito at mbenito@tigr.org.
Rob Martiennsen (Cold Spring Harbor, USA): Arabidopsis as a model for Rice Sequencing
Rob, the current chairman of Arabidopsis Genome Initiative, told the group that the Arabidopsis sequencing project was a model for the rice sequencing project. He followed with more general remarks about lessons from the Arabidopsis project that he felt were relevant to rice. He emphasized the necessity of a deep map before sequencing can begin.
RGP Plans:
MAFF has requested funds to double the sequencing budget of the RGP which would permit the completion of the International Rice Genome Sequencing Project in 2004.
US Plans:
The USDA/NSF/DOE awarded $12 million (3 years) to TIGR and CWW for rice genome sequencing. TIGR and CWW will have a combined capacity of 50 Mb. They will work together to generate a minimum tiling path of chromosome 10 in six months. CWW and TIGR will commence sequencing on the top and bottom arms, respectively, of chromosome 10 in October, 1999. For the first six months, TIGR and CWW will provide information on their sequencing progress on their own web sites. Within six months, a central web site for the US Rice Genome Sequencing Project will be developed where TIGR and CWW rice data will be displayed. TIGR and CWW will have the same standards for annotation. TIGR and CWW expect to complete sequencing of chromosome 10 by March 2001.
Assurances that there were no links between TIGR and Celera were relayed to the group.
Jo Messing is not funded by this award but will continue to sequence near the centromere of chromosome 10 in coordination with TIGR and CWW. Messing will remain a member of the International Rice Genome Sequencing Project and is committed to sequence 1 Mb/year.
Plans of new groups:
Michel Delseny and Marcel Salanoubat (France):
Michel talked about preliminary steps that his group has taken to begin sequencing chromosome 12. The operation will shortly move from Perpignan to Genoscope (Evry). Marcel Salanoubat described the sequencing capacity and operation of Genoscope and explainded that they were prepared to begin sequencing on chromosome 12 pending the authorization of thier scientific committee.
Villo Morwalla-Patel (India):
Villoo described the initial steps her group, Auesthagen Graine Technologies, in Bangalore has taken in preparation for sequencing chromosome 8 while funding is pending. These steps involve training, assembly of resources, and an aggressive bioinformatics initiative to adopt CGI tools into a workbench for sequencing rice.
Sally Leong (University of Wisconsin, USA):
Sally Leong described the faculty group assembled by Fred Blattner at the University of Wisconsin which would permit them to initiate a rice genome project. The basis for their project is a rice genome optical map being constructed by David Schwartz. The group would like to sequence up to 25% of chromosome 11, a chromosome not yet claimed, in a few contiguous sequences with nucleation points that are liable to contain disease resistance genes. They have begun sequencing a few BACs presumed to be from chromosome 11 as well as a centromeric BAC selected by Jiming Jiang.
Increasing Coordination:
The following means of increasing coordination among the international groups were discussed:
1) Mechanisms of material exchanged.
Takuji Sasaki said that sequencing groups could request probes and individual YACs or PACs. They did not currently have the resources to distribute libraries. However, they have discussed the possibility that these could be distributed by the clone center at Clemson or by some other national group such as the John Innes Centre.
Rod Wing repeated that his BAC libraries were available for distribution and listed the laboratories that had received them.
2) INE and the IRGSP web site.
Takuji Sasaki briefly described INE, the RGP central database for rice genome information. A unique web URL is planned shortly and will be implemented shortly. Mirror sites are envisioned. The RGP is planning to request all of the sequencing groups to maintain a uniform format for data display to facilitate data transfer to INE.
3) Monthly conference calls between PI's are expected to begin shortly.
4) The possibility of developing consortia to purchase supplies for the smaller projects was brought up.
The next Working Group Meeting:
The next regular Working Group meeting will be held on February 8, 2000, in Tsukuba, Japan. This will be followed by a database workshop on February 9 and the Rice Genome Forum on February 10. Detailed information including lodging will soon be announced.
16 October 1999, B. Burr and T. Sasaki
|
|
|