Summary of the Sequencing Workshop and Working Group Meeting, February 8, 2001 (Tsukuba, Japan)
Highlights:
- The RGP announced that chromosome 1 would be completed in two months. They also announced the mapping of 6,605 EST sites on the physical map.
- The useful incorporation of Monsanto data to fill gaps in tiling paths was demonstrated in three laboratories.
- Takuji Sasaki proposed a change in strategy that would achieve a 10X coverage of the rice genome in one year. This proposal was accepted by the Working Group.
Meeting:
Tomoya Baba (RGP) A sequence-ready physical map for the rice genome sequencing: Overview of chromosome 1
Abstract: A sequence-ready physical map of the rice genome is being constructed as the core of rice genome sequencing in RGP. We are mainly using a PAC-Sau3AI library for screening with STS and EST primers. In addition, we have also developed a new BAC-MboI library which is effective for filling the gaps between contigs. The RGP BAC-ends and the CUGI BAC-end database (from BAC-HindIII and BAC-EcoRI libraries) are used for are used for chromosome walking and STC strategy, respectively. A total of 262 PAC and 94 BAC clones has been mapped using the primers of 216 DNA markers, 268 ESTs and 1595 PAC or BAC-ends. This corresponds to about 35 Mb (82%) of chromosome 1. Nine sequence-ready contigs are more than 1 Mb in length including a 10.1 Mb contig which is the longest obtained so far. Linkage analysis using F2 population which consists of 500 plants has enabled us to confirm the orientation and position of three contigs in the centromeric region. Moreover comparison of genetic and physical distance suggests that the entire size of the rice chromosome 1 is shorter than the estimated 52 Mb. Our ultimate goal is to generate a single contig corresponding to the entire chromosome using these resources and strategies.
Gerard Barry (Monsanto) reviewed Monsanto sequence figures and resources. 3,391 BACs were sequenced to an average depth of 11.7 reads per kb. Altogether, 4.45 million reads were done with an average read length of 782 bp of sequence per read. The resulting assembly had 68,572 contigs covering 376 Mb. Based on sequence, overlapping BACs fell into 583 clusters covering 259 Mb.
The sequenced BACs were anchored to the map through STSs. 1,432 BACs were anchored through 1,255 markers. An additional 1,711 BACs were anchored by assembly. Currently 2,825 BACs have been unambiguously assigned.
The assembled sequence has been posted to www.rice-research.org for use by public researchers and around 400 researchers are currently registered and using the data. The assembled sequenced was scanned for SSRs. 6,655 SSRs with 24 bp or more repeats along with 100 bp of flanking sequence have been deposited in GenBank. These are actively being mapped by a consortium of scientists in USA, the Philippines, and China, and the results will be posted to a new Cornell University web site.
Rod Wing (CUGI) talked about the CCW organization and strategy. The Clemson BAC libraries and STCs were reviewed. 1,260 STCs contain SSRs with 20 bp or more of repeats. 86% of the BACs were fingerprinted and 82% of these are in 1,039 contigs. 472 of the contigs are anchored for a coverage of about 50% of the genome.
Rod fingerprinted 303 of the Monsanto BACs assigned to chromosomes 3 and 10. He showed the distribution of these on 3S and !0S relative to the genetic and physical map. On 3S, the Monsanto BACs permitted merging of a 1.5 Mb contig. Rod demonstrated how the combination of these libraries could produce a minimum rice tiling path.
Rod described the use of Gentinterpret for automated automation.
Takashi Matsumoto (RGP) Progress of rice genome sequencing for chr. 1 and 6
Abstract: The Rice Genome sequencing project was initiated in 1998 with the aim of elucidating all biological information that constitutes the rice plant with nucleotide-level resolution. For these three years, we have constructed and revised the systems for both the high-throughput template production and sequence production. The core of our sequencing facilities, 20 capillary sequencers, could produce 5.8 Mb which correspond to 10x shotgun sequences for 3.8 PACs per day. Our current system, which is compatible with those high-throughput capillary sequencing, will be presented. Finishing is the step for the conversion of Phase II contigs to the high-quality sequence devoid of gaps. Here some structural features of rice genome such as repeats, GC biased sequences and secondary structures make it difficult to get the reliable sequences. Basic strategies for overcoming those regions will be shown. We have chosen the first target as chromosome 1. As of Feb. 1st, most of the 336 PAC/BAC clones on the physical map are on the sequencing pipeline. A total of 121 PACs and BACs (16.3 Mb in total) have been completed and their genomic sequences are already deposited in DDBJ. The HTG phase II sequence data in which some gaps still remain but all of the assembled contigs are ordered and oriented, will be released in accordance with the new data release policy of IRGSP. Combining phase II sequence with completed sequence, 243 PAC/BAC clones are already sequenced and the total length is estimated as 25-30 Mb. The status of sequencing chromosomes 1 and 6 will be described in detail.
Robin Buell (TIGR) Progress of TIGR rice genome sequencing
Abstract: TIGR is participating in the International Rice Genome Sequencing Project and has been assigned chromosomes 3 and 10. We are collaborating with Clemson University/Cold Spring Harbor Laboratory/Washington University and the Plant Genome Initiative at Rutgers to complete these chromosomes. We currently have ~14 Mb of rice genomic DNA in our high throughput sequencing pipeline at TIGR (http://www.tigr.org/tdb/rice). All sequence is released to Genbank/DDBJ/EMBL to either the High Throughput Sequence (HTGS) or the PLANT division. A total of ~11.4 Mb has been deposited in Genbank. We have in production all clones for our allocation on chromosome 10 (lower arm) with the exception of two small gaps. All completed BACs are annotated for genes and this information can be accessed through Genbank and the TIGR web site (http://www.tigr.org/tdb/rice). We have begun providing automated annotation of our unfinished rice BACs, for which this annotation is also available on the TIGR web site. We have evaluated five gene prediction programs for their accuracy in rice and results from these comparisons will be presented. We are experimentally verifying the hypothetical genes from our gene prediction programs to provide empirical data on the accuracy of current gene prediction programs for rice. We have further extended our bioinformatic analyses of rice and have identified putative orthologues of rice using the TIGR Orthologous Gene Alignments (http://www.tigr.org/tdb/toga/toga.shtml). We also have performed global alignments of rice genome sequences with all available plant transcripts to further identify orthologous genes. As with all other TIGR Rice information, these data can be accessed on the TIGR Rice web site at www.tigr.org/tdb/rice.
Jo Messing (Waksman Inst., Rutgers Univ.) The rice chromosome 10 initiative
Abstract: Rice (Oryza sativa L.) has been considered an ideal model system for the study of grasses based on its commercial value, relatively small genome size (~430 Mb), diploid origin (2X =24), and close relationship to other important cereal crops. An international effort has been initiated to sequence the genome of rice. The Plant Genome Initiative at Rutgers (PGRI) was assigned to sequence 12 cM of the middle portion of rice chromosome 10, from positions 29.8 to 41.8. The Rice Genome Research Program (RGP in Japan has provided PGIR with 17 markers that have been mapped to this region. RGP has also used these markers to map YAC clones to this region, which has facilitated the construction of a physical map of 2,228 kb. However, not all YAC clones overlap, leaving two gaps. This posed the question whether these gaps can be resolved without additional markers. PGIR has now constructed a minimum tiling path of 21 BAC clones and one subclone of another BAC clone that comprise the entire 12 cM region. From this analysis it became clear that one gap was smaller than one BAC clone, but the other one required sequencing 6 overlapping BAC clones. The two mapped markers flanking this region are 922.7 kb apart. This gap contains numerous repetitive DNA elements and a low gene content, which explains the lack of markers and the coverage by anchored YAC clones. This is an indication that the current BAC libraries and the end sequence database can be used to form contiguous sequence information through regions even if they are missing genetic markers and are enriched with repetitive DNA sequences. All clones result in a total of 3,091,527 base pairs that overlap in the average by 7.7%. Overall physical-to-genetic distance ratio in the region (237 kb/cM) is similar to that estimated for the whole genome (283 kb/cM). Local physical-to-genetic distance ratios vary considerably and are correlated to the presence of repetitive elements and genes. Out of 523 genes, at least 190 have conserved homologs in the A. thaliana genome and/or are represented in the O. sativa EST database. Overall density is at least 1 gene/15 kb for non-transposon-related genes, with considerable local variation.
Bin Han (NCGR, Chinese Academy of Sciences) Rice genome chromosome 4 sequencing progress in China
Abstract: We are currently working on sequencing chromosome 4 of Oryza sativa ssp indica cv Guangluai 4. Our strategy to completely sequence rice chromosome 4 is map-based BAC clone by clone shotgun sequencing approach. Two BAC libraries containing DNA fingerprinting information and 13,000 BAC end sequences were used for constructing the extensive physical map of chromosome 4 with 139 marker-aided hybridization. 59 BAC-contigs were constructed and located on chromosome 4. More than 100 BAC clones have been completely sequenced, and 27 of them have been released to the GenBank. The strategies for contig extension and finishing analysis of sequences will be presented. We have also initiated to prepare sequencing the chromosome 4 of Oryza sativa ssp japonica cv Nipponbare. To construct the chromosome 4 physical map of Nipponbare, Monsanto's rice BAC sequence databases and our sequence scaffolds have been analyzed with alignment comparison. The strategy for sequencing chromosome 4 of Nipponbare will also be presented.
Bin Han also reported that his group had been authorized by the Chinese Academy of Sciences to begin sequencing Nipponbare chromosome 4 immediately.
Yue-ie Hsing (Academia Sinica) Sequence analysis of rice chromosome 5
Abstract: Rice is a model species for the cereals and a good candidate for genome sequencing due to its relatively small genome (430 Mb), dense physical and genetic maps, and good transgenic systems. As part of an international effort to decode the rice genome, Taiwan work on the sequencing project of chromosome 5. A central lab is established in Institute of Botany, Academia Sinica. Currently the funding comes from Academia Sinica, Insitute of Botany, National Science Council, and Council of Agriculture.
Fourteen PAC clones localized at the short arm of chromosome 5 were kindly supplied by RGP. Shotgun sequencing was used to complete these regions. As of January 2001, we have over 2Mb DNA sequences released to Genbank/DDBJ/EMBL and displayed on the ASPGC Web Site (http://genome.sinica.edu.tw/index.htm). Many repeated sequences, GC-rich region and AT-rich region are present in rice genome and cause problems during finishing. Also there are many transposons, retrotransposons and MITEs in the sequences. Several proteins contained internal tandem repeats and takes quite a while for accurate annotation. We used Artemis (http://www.sanger.ac.uk/Software/Artemis/) to fine tune the correct splicing site and find it very helpful.
Francis Quetier (Genoscope) has just received approval to begin sequencing chromosome 12 and already reported initial results. Eleven BACs distributed on the chromosome have been chosen by Michel Delseny. Of these 5 are finished, 2 are in phase 2, and one is in phase 1.
Moo Young Eun (NIAST, RDA) summarized Korean sequencing results from chromosome 1 and the submission of four BAC sequences.
Apichart Vanavichit (Kasetsart University) described the Thai sequencing efforts on chromosome 9 and the submission of four BAC sequences. These sequences are in the vicinity of an important submergence tolerance gene.
Hiroyuki Kanamori (RGP) Trouble shooting strategies in the finishing phase
Abstract: Sequencing can be divided into 3 phases: sequencing phase, finishing phase and final check phase. The bottleneck in these phases is in the finishing phase wherein gaps and low quality regions due to sequence ambiguities, repeated sequences and specific structure of template DNA have to be resolved to achieve sequence closure and high-quality data. In general, problems due to specific structure of template DNA (2nd, 3rd structure, GC-rich regions etc.) can be resolved by using different chemistries, optional enzymes and extraction of cut fragment with restriction enzyme. Repeated sequences characteristic of the rice genome typically represent the most difficult aspect of rice genome sequencing. As a major strategy to resolve repeats, an approach involving sonication of template DNA is used in several cases. Repeated sequences often result in misassembly of the subclone sequences depending on the repeat data quality and quantity as well as ambiguities in ordering shotgun clones and border regions. This can be resolved by making data file for at least 10 kb repeated sequence including the border region and then assembling the sequence. We report here several trouble shooting strategies in the finishing phase that we have tried so far to accelerate our sequencing effort.
Dick McCombie (CSHL) Genome sequence quality assessment
Abstract: The generation of high-quality, contiguous genomic sequence is an arduous task requiring quality control all steps. One activity that has been very valuable for human genome sequencing centers has been to carry out quality control exercises. In these exercises, random clones are chosen from public sequence repositories for checking. The sequencing center that completed a clone sends their trace data to another two centers, who reassemble and computationally attempt to finish the clones and check their accuracy. They also digest the DNA from the clones and compare this to the predicted assembly. They then submit a written report on the results to the center that submitted the sequence. The submitting center responds, again in writing to the report. This is very valuable in improving overall quality. One way in which it has improved quality is by making each center think more about the way in which they assessed quality internally.
Katsumi Sakata (RGP) Recent informatics activities at RGP - Annotation and database development
Abstract: An extensive sequencing effort of the rice genome has resulted in rapid accumulation of sequence data. It has also been a driving force in accelerating the annotation of the sequence. The short arm region on rice chromosome 1 has been almost completely sequenced. Annotation of the sequence is now in progress and so far, we have finished preliminary gene prediction covering 80% of the region. Further, the acceleration of sequencing also requires an improvement of informatics facilities such as annotation software, database and computer hardware. We describe the following activities of RGP informatics group:
- annotation of chromosome 1 short arm including such aspects as gene density, sequence feature and classification of predicted genes;
- improvement of annotation process by using an automated annotation system RiceGAAS (Rice Genome Automated Annotation System, http://RiceGAAS.dna.affrc.go.jp/) and integrating RiceHMM (Rice Hidden Markov Model);
- current status of the integrated rice genome database INE and IRGSP central database at Tsukuba and future plans; and
- upgrade of computer facilities for sequencing and annotation.
Rod Wing (CUGI) described the Syngenta sequencing results and gave figures based on conversations with Steve Goff.
Takuji Sasaki (RGP) underlined the importance of putting high quality sequence in the public domain at a much quicker pace. Talks in the previous dayfs workshop demonstrated that functional genomics tools for reverse genetics are at hand and that publicly available sequence information is needed to exploit these tools. He described recent advances in the IRGSP that will speed the public release of rice sequence. These are the realization that base quality equivalent to phrap 30 is sufficient to ensure an error rate less than 1 in 10,000 and the change in IRGSP policy that mandates immediate release of phase 2 sequence.
Takuji outlined a proposal to obtain phase 2 coverage of the entire rice genome within 10 months to a year. This proposal is based on two new developments:
1) Takashi Matsumotofs talk demonstrated that production sequencing with 5X coverage with 2 kb inserts plus 5X coverage with 5 kb inserts most often leads to better than phase 2 quality with sequence in a single contig containing few low quality bases.
2) Rod Wing showed that when his group fingerprinted the Monsanto sequenced BACs for chromosomes 3 and 10 and assembled these with the CUGI BACs, he could obtain extensive contigs. This demonstration presents a way for the IRGSP to obtain a minimum tiling path that would cover much of the genome.
Takuji emphasized that this plan would not abandon the IRGSP goal of obtaining complete sequence for the entire genome.
The key points in the plan are the following:
1) Use fingerprinting and STCs to merge the CUGI, Monsanto, and RGP BACs/PACs to obtain a minimum tiling path.
2) Using the Monsanto sequenced BACs as a starting point, obtain 10X coverage from BACs covering the remainder of the genome.
3) Recognizing that not all members will not be able to switch to this strategy, chromosomal boundaries will be adjusted based on mutually agreed upon goals during this phase of the work.
4) Return to completing and annotating the sequence of the entire genome.
Takuji said that the IRGSP currently lacks the capacity to complete this ambitious plan in the time specified but said that Japanese officials were exploring ways to dramatically increase capacity for phase 2 sequencing.
@
Working Group Meeting
The members asked that Syngenta be contacted immediately to see whether it was possible to form a collaboration that would permit us to use their sequence and physical map information to speed the completion of a complete public rice sequence.
The members approved a plan proposed by Takuji Sasaki to rapidly move to obtain phase 2 quality sequence for the entire rice genome. A section of the Guidelines is being prepared to cover this activity.
A proposal by Dick McCombie to mutually check sequence quality and assemblies was accepted. Dick will provide wording for this agreement.
RICE GENOME RESEARCH PROGRAM (RGP) HOME PAGE
webmaster@staff.or.jp
Copyright (C) The International Rice Genome Sequencing Project (IRGSP). 2005 All rights reserved.
|
|
|