WORKING GROUP MEETING
1) Mechanism for assigning and claiming regions to sequence.
Claims will be based on capacity and funding and will be for discrete regions with specified boundaries. The requests will be coordinated with the group or groups sequencing the chromosome. Progress, based on public database submissions, will be monitored at each of our meetings.
As it is important to tackle chromosome 9 as soon as possible, Japan will claim this chromosome in the absence of any other claims.
2) Goal for completing phase 2.
Phase 2 will be completed by the end of 2002.
3) Enforcement of the 1 Mb rule for membership.
Groups not meeting the 1 Mb rule will have non-voting status in the Working Group.
4) More meetings?
The working group agreed to three meetings a year: the annual sequencing workshop in Tsukuba in February, a workshop in May or June, and an interim meeting in September or October.
5) Syngenta sequence.
Takuji Sasaki has had no reply from Adrian Dubock to his questions on the proposal Dubock sent prior to the June 8 meeting in
Tsukuba.
WORKSHOP
The meeting was held in a round table format.
Sequencing Progress and Plans
Robin Buell, TIGR
A total of 26 Mb (199 BACs) in production. 11.12 Mb in production for chromosome10. On 10L all BACs are in closure/finishing/annotation/complete. Robin anticipates having complete sequence from their border with PGIR (~41 cM) to the telomere with the exception of a 30 kbp gap. They have sequenced to within 40 kbp of the telomere. TIGR is also sequencing a portion of 10S and all BACs are in production.
On chromosome 3L TIGR's allocation is 101 cM, about 27.7 Mb. Of this, 15.11 Mb (114 BACs) is in production. Nineteen gaps covering 7-9 Mb remain.
TIGR is preparing to sequence in a region of 46 cM (12.6 Mb) on chromosome 11. About 77 BACs (10.8 Mb) have been identified
for production.
Bin Han, China
20 Mb of Nipponbare sequence in phase 1 or 2 have been completed on chromosome 4. 93% of the chromosome is covered by the
physical map constructed from CUGI and Monsanto BACs. Bin showed a dramatic chromosome 4 spread from Jiming Jaing that
indicates that the short arm is very heterochromatic and likely to be difficult for sequencing.
Yue-ie Hsing, Taiwan
5 Mb (39 BACs) has been submitted to Genbank. Taiwan expects to finish phase 2 sequencing of chromosome 5 by next
September.
Ho-Il Kim, Korea
In the chromosomal region between 68.2cM and 77.7cM covering about 1.7Mb on chromosome 9, Ho-Il's group sequenced and
submitted 3 BACs. They further selected 14 BAC contigs for sequencing. Shotgun libraries are being constructed and will be
sequenced before the 2002 Tsukuba meeting. Their next target area is between 93.0cM and 94.4cM.
Jo Messing, PGIR, Rutgers
Jo presented the following helpful summary of claims on chromosome 11:
PGIR has completed 7 BACs, about 1 Mb, in the two contigs they are working on. Ten additional BACs, about 1.2 Mb are in production.
Jo also showed some interesting comparative sequence results for rice, maize and sorghum on chromosome 11. One interesting
point is that rice does not consistently have the smallest genome. In one region and number of genes not present in the other two
genomes were inserted into the rice sequence.
Francis Quetier, Genoscope, France
A physical map of chromosome 12 has been constructed and 60 BACs are in production. Genoscope expects to be finished with the chromosome by the end of 2002.
Francis reported that the primary bottle necks are caused by lack of access to the raw fingerprint data and that some available FPC
contigs appear to link clones that map to different chromosomes.
Takashi Matsumoto, RGP
Takashi reported that a total of 121 Mb sequence on chromosomes 1, 2, 6, 7, and 8 has been submitted.
The RGP has completed 80% of chromosome 1 to at least phase 2 quality. 55% of the chromosome has been finished and the RGP continues to finish BACs at the rate of 20 a month.
The RGP has doubled production by using robotics:
11520 shotgun templates are sequenced a day
6 BACs or PACs are sequenced with 10 X coverage a day
Automated informatics is used to classify clones sequenced to 10 X coverage. 85% of clones sequenced to 10 X by the RGP have
phase 2 quality, while 80% of clones sequenced 5X by Monsanto and 5 X by the RGP have phase 2 quality. 80% of the clones
classified as phase 2 by automation agree with manual classification.
Rod Wing, CCW
Rod reported that the CCW consortium has submitted 12.16 Mb of sequence. On chromosome 10S, 7.88 Mb are finished or in phase 3 and 2.21 are in phase 2. A total of 99 BACs cover the 10 S region, 82 sequenced by CCW and 17 by TIGR, covering 14.53 Mb. CCW will have all of their 82 BACs in phase 3 by the end of November. This includes 6 Monsanto BACs.
On chromosome 3S, CCW has completed 3 BACs. Fourteen BACs are in phase1 finishing (about 2.01Mb) and 27 BACs are in production sequencing (about 3.57Mb). Fifteen BACs are selected and in library construction. The BACs chosen on 3S include 20 from Monsanto.
Rod reported that two deep coverage 10 kb insert libraries had been constructed for the purpose of filling gaps. A Hae III library has
4.6 genome equivalents and a Sau3AI library has 4.2 genome equivalents. Sequence-specific probing with overgo's gives about 12
hits. Rod illustrated the use of the libraries for gap filling.
Akhilesh Tyagi, India
Akhilesh reported that eleven contigs of BACs/PACs cover about 85% of India's allotment (56.9-109.3 cM) of the chromosome 11.
Sequence of 25 clones is in production and about 2 Mb will be submitted by the end of 2001.
Sally Leong, Wisconsin
Sally reported that ten chromosome 11 BACs have been sequenced in Wisconsin. The quality is unknown. The trace files are
expected to be made available to other groups sequencing on this chromosome.
Summary and Projection
| Chromosome | Total (Mb) | Completed (phase1or2) (Mb) | Remain (Mb) | Expect By February (Mb) | Remain (Mb) | Expect to Complete by 12/02 | Deficit (Mb) |
| 1 | 52 | 42 | 1 | 1 | yes | ||
| 2 | 44 | 18 | 26 | 6 | 20 | yes | |
| 3 | 46 | 21 | 25 | 7 | 18 | no | 12 |
| 4 | 36 | 23 | 13 | 13 | yes | ||
| 5 | 34 | 5 | 29 | 8 | 21 | yes | |
| 6 | 35 | 20 | 15 | 6 | 9 | yes | |
| 7 | 33 | 15 | 18 | 6 | 12 | yes | |
| 8 | 34 | 10 | 24 | 6 | 18 | yes | |
| 9 | 26 | 1.5 | 24.5 | 24.5 | yes | ||
| 10 | 23 | 24.5 | 1 | 1 | yes | ||
| 11 | 34 | 3 | 31 | 4.5 | 27.5 | no | 2.5 |
| 12 | 30 | 1.2 | 28.8 | 8 | 20.8 | yes | |
| Totals | 427 | 184.2 | 236.3 | 65.5 | 147.3 | 14.5 |
Bioinformatics and Annotation Reports
Katsumi Sakata, RGP
Katsumi demonstrated automated daily recovery of rice genome sequences from GenBank for automated annotation and posting in INE. Most of the sequences by members of the IRGSP have been annotated in this manner and the results can be observed by going to http://rgp.dna.affrc.go.jp/cgi-bin/statusdb/seqcollab.pl and selecting the hypertext clone numbers. The listing of most individual clones has a highlighted "RiceGAAS" tag. This the integration of all clones sequenced by IRGSP members into INE will be completely implemented by the end of 2001.
INE will be enhanced by the end of 2001 by the assembly of 6,500 EST markers and the addition of 900 RFLP markers. The total number of RFLP markers will then equal 3,200.
Katusumi compared RGP annotation with that done by TIGR and contrasted the differences whereas the RGP calls about 20% more genes. The RGP identifies fewer putative and more hypothetical sequences. In part, these differences are the result of the use of different EST databases, gene prediction programs, and gene modeling tools. However, it is also evident that the IRGSP Annotation Standards are being interpreted differently by the two groups.
Katsumi suggested additions (in red) to the IRGSP Annotation Standards with the aim of reducing these inconsistencies:
IRGSP standard nomenclature of predicted genes
Robin Buell, TIGR
Robin described annotation tools developed at TIGR. She illustrated in silico mapping of sequenced BACs. The Rice Repeat
database has been updated and now includes over 4,500 sequences that can be assembled to less than 2.5 Mb. She described
automated annotation for unfinished rice BACs. She also described a BLAST server for the whole rice genome, a forthcoming whole
rice genome annotation database, and TOGA (TIGR Orthologous Gene Alignments), a utility for identification of related sequences.
All of these extensive resources can be accessed from http://www.tigr.org/tdb/e2k1/osa1/
Dick McCombie, Cold Spring Harbor
Dick described new gene prediction software written by Lance Palmer. The results are loaded into a graphical viewer. The software can process a half a BAC per day with human intervention.
Dick also said that the Genome Browser developed at the University of California at Santa Cruz would be modified so that it also
pointed at rice.
Discussion:
The group discussed the advisability of adopting specific gene calling software. Robin Buell stated that the most reliable package is FGENESH, but she and others expressed reservations about relying on a single package.
The group suggested additional language to be added to the IRGSP standards: "A minimum standard for gene prediction is FGENESH and the following databases ... , but other programs are recognized as being complimentary."
Status of the Physical Map
Jianzhong Wu, RGP
The original sequence-ready map of chromosomes 1 and 6 was constructed in 1998 and has continued to be improved. Jianzhong's
group began to construct sequence-ready physical maps of chromosomes 2, 7 and 8 this April. Using about 8000 genetic and EST
marker sequences Monsanto BAC clones were mapped and selected for sequencing respectively for these three chromosomes.
More than 30 Monsanto BACs were also mapped to the gap regions of the sequence-ready maps of chromosomes 1 and 6. About
1/3 of the RGP marker sequences were not found in the Monsanto BAC genomic sequences. These markers were used to screen
PAC clones from the RGP PAC library by PCR. The screened clones were fingerprinted and analyzed by FPC software. PAC clones
comprising the minimum tiling path were selected from each contig based on the clone overlap and marker positions within a contig
and added to the sequence-ready maps of chromosomes 2, 7 and 8. The RGP is now is using the CUGI BAC contigs for gap
closure, based on STCs, as well as using end-walking to increase the chromosomal coverage.
| Chromosome | Number of | Total Length | Coverage | Added | Combined | Combined | Gaps |
| Monsanto | of Path | % | Number of | Length of | Coverage | ||
| BACs | (Mb)* | PACs | Map (Mb) | % | |||
| 1 | 41.6 | 81.4 | 12 | ||||
| 6 | 27.6 | 77.7 | >40 | ||||
| 2 | 199 | 20.4 | 46 | 147 | 32 | 72.1 | >40 |
| 7 | 154 | 15.3 | 46 | 126 | 23.7 | 71.2 | >40 |
| 8 | 116 | 12.2 | 35.8 | 128 | 22.4 | 65.7 | >40 |
| Total | 147.3 | 74.2 |
*Sequence redundancy of 12.9 to 16%
Cary Sonderland, Clemson, described new developments in FPC software.
BSS allows one to blast STCs with sequenced clones so that they may be placed on the physical map. The FTCs can also be queried with markers. FSD performs simulated restriction enzyme digests to make FPC fingerprints.
Mingsheng Chen illustrated the use of the FSD utility to check genomic sequences.
Mingsheng Chen, Clemson, showed that recombination rate varied throughout the genome. One interesting observation was that
rates are higher at the ends of chromosmes. He used local recombination rate to estimate the size of gaps. Mingsheng used a
systematic analysis of the physical map to estimate the size of the genome and came up with an estimate of 370-390 Mb exclusive
of centromeres. Mingsheng has kindly provided the following summary view of his work:
| Chromosome | Genetic Markers | Probes (markers included) | Contig Number | Predicted Chromosome Size (Mb) | Previously Estimated Size (Mb) | Size of Anchored Contigs (Mb) | Coverage(%) |
| 1 | 231 | 413 | 32 | 41.4 | 51.5 | 39.5 | 95.4 |
| 2 | 184 | 316 | 26 | 36.8 | 43.4 | 32.8 | 89 |
| 3 | 224 | 364 | 26 | 38 | 47.5 | 32.9 | 86.6 |
| 4 | 119 | 273 | 24 | 36.6 | 36.8 | 32.1 | 87.7 |
| 5 | 139 | 239 | 27 | 30.8 | 33.6 | 28.5 | 92.5 |
| 6 | 129 | 229 | 22 | 29.6 | 35.1 | 26 | 87.8 |
| 7 | 158 | 292 | 25 | 32.6 | 33.1 | 27.9 | 85.6 |
| 8 | 88 | 181 | 18 | 25.6 | 33.6 | 23.8 | 93 |
| 9 | 80 | 139 | 16 | 20 | 27 | 18.7 | 93.5 |
| 10 | 136 | 337 | 20 | 24.9 | 23.7 | 22.7 | 91.2 |
| 11 | 118 | 245 | 27 | 27.9 | 33.7 | 26.2 | 93.9 |
| 12 | 98 | 171 | 20 | 28.6 | 30.9 | 23.5 | 82 |
| Total | 1704 | 3199 | 283 | 372.8 | 430 | 334.6 | 89.8% |
A version of WebFPC that integrates CUGI and Monsanto data is available at http://www.genome.clemson.edu/projects/rice/fpc/integration/ This version will remain unchanged. The dynamic version of WebFPC athttp://www.genome.clemson.edu/projects/rice/fpc/will be periodically updated. The CUGI team has fingerprinted 4000-5000 more clones and added them to the updated fFPC site. Contig numbers are different in the two sites and can be connected through clone names or marker names.
Rod Wing, Clemson, proposed a workshop to construct a minimum tiling path. Rod and Takuji Sasaki agreed to first integrate PAC
fingerprints and STS data into the CUGI map prior to the workshop. Tentative plans have the workshop taking place at Clemson
within the next two months, and representatives from all active groups are welcome to participate.
Planning for Completing Phase 2
When is a chromosome complete? Various suggestions were made to for a standard when to call a chromosome complete. One suggestion was to sequence until there are no more clones (or templates) to sequence. Dick McCombie said that the standard for Arabidopsis had been to sequence from the telomere to as deep as possible into the centromere. It was pointed out that there are no telomere adjacent sequences for rice at the moment. A tentative agreement was reached with the suggestion that an arm would be called "complete" when a certain high percentage of markers know to map to that arm had been identified in the sequence. This percentage was provisionally set at 98%, but will be revised based on experience.
Quality Standards for phase 2 sequence.
Planning for Finishing
Dick McCombie, Cold Spring Harbor, reviewed proposals he made at the June 8 meeting. He emphasized that we should begin a BAC by BAC countdown to monitor progress. The number of BACs required to complete a region can be estimated from the minimum tiling path. He also repeated the idea that phase 2 sequence permits problem areas to be identified in advance which in the end will speed up finishing.
Dick described preliminary experiments that would permit a sequencing laboratory to finish sequence that had previously been
sequenced to phase 2 quality. The procedure he outlined uses software that identifies misassemblies, gaps, and low quality
sequences in reassembled reads. Clones from a large insert library that span problem areas are picked and subjected to
sequencing using transposed priming sites and the reads reassembled. Assemblies from two clones were tried. One was done
manually and the handling of the second was automated. In the first case there were seven gaps and the procedure managed to fill
four gaps. In the second case, there were three gaps and two were closed. In all cases, failure to close gaps was the result of a lack
of gap spanning clones.
Next Meeting
The annual meeting of the IRGSP will take place at the Tsukuba International Conference Center, Tsukuba, Japan. There will be an all day public meeting on February 6, 2002 as part of the International Rice Genome Meeting 2002. This will be preceded by a half day meeting for P.I.'s on February 5 at the STAFF Institute.
There will be a meeting May-June, 2002 (no location proposed or decided) and a meeting in September-October at Genoscope.
Akhilesh Tyagi has proposed a meeting in New Dehli in 2003.
Participants
Robin Buell, TIGR
Ben Burr, Brookhaven
Mingsheng Chen, Clemson
Bin Han, China
Yue-ie Hsing, Taiwan
Jiming Jiang, TIGR
Ho-Il Kim, Korea
Sally Leong, Wisconsin
Takashi Matsumoto, RGP
Dick McCombie, Cold Spring Harbor
Jo Messing, PGIR, Rutgers
Francis Quetier, Genoscope
Akhilesh Tyagi, India
Katsumi Sakata, RGP
Takuji Sasaki, RGP
Cary Sonderland, Clemson
Rod Wing, Clemson
Jianzhong Wu, RGP
Kimiko Yamamoto, RGP
Other Attendees
Machi Dilworth, NSF
Ed Kaleikau, USDA
Kenzo Miyahara, MAFF
Judy Plesset, NSF
|
|
|