IRGSP Sequencing Meeting 2002
Highlights:
Sequencing Round Table February 5, 7 pm Epochal Center, Tsukuba
As at TIGR, this evening meeting was held for P.I.s and essential colleagues; funding agency representatives and other auditors were also welcome. The point of the round table sessions is to cover those issues that can be best resolved by discussion among the P.I.s.
1) Sequencing progress and anticipated output in the next four months:
| Chrom. | Total1 | Phase 3 | Remain3 | Expected. | Deficit5 | Expected |
| (Mb) | or 22 | (Mb) | by May4 | (Mb) | Completion6 | |
| 1 | 47a | 42 | 5 | 2 | 0 | yes |
| 2 | 40b | 19.4 | 20.6 | 5.8 | 14.8 | yes |
| 3 | 41b | 21.4 | 19.6 | 6 | 13.6 | probably |
| 4 | 36a | 35.7 | 0.3 | 0.3 | 0 | yes |
| 5 | 33b | 10 | 23 | 8 | 15 | yes |
| 6 | 32b | 22.6 | 9.4 | 5.8 | 3.6 | yes |
| 7 | 35b | 21.9 | 13.1 | 5.8 | 7.3 | yes |
| 8 | 28b | 23 | 5 | 5 | 0 | yes |
| 9 | 22b | 3.8 | 18.2 | 6.4 | 11.8 | yes |
| 10 | 23a | 23 | 0 | 0 | 0 | yes |
| 11 | 30b | 2 | 28 | 7.5 | 20.5 | probably |
| 12 | 31b | 1.2 | 29.8 | 15 | 14.8 | yes |
| Totals | 398 | 226 | 172 | 67.8 | 104.4 |
1 Current size of chromosomes estimated by sequencing groupsa or by Chen et alb.
2 Estimated non-overlapping amount submitted to public databases.
3 Amount remaining to be sequenced.
4 Additional sequence expected to be submitted by May, 2002.
5 Deficit remaining after May.
6 Expected completion of phase 2 sequence by December, 2002.
2) Revision of annotation standards.
All groups agree on a standard nomenclature for predicted proteins:
Sequences with 100% identity at the amino acid level to known proteins will receive the same, original gene name.
Sequences with less than 100% identity but with significant homology to known proteins will be called "putative" proteins of the same name. The name of the nearest hit will be included as a note. Sequences that are clearly related to a gene family can be called "XXX-like" or "similar to XXX" protein.
Protein matches with BLASTP bit scores of >100, e-values of < e-20, or equivalent criteria, will be regarded as significant homologies.
Sequences with homology to unknown ESTs will be called "unknown." The EST hit will be included in a note. The homology standard is at least 95% identity at the nucleic acid over ~90% of the length of the entire EST, and should cover two adjacent exons.
Sequences predicted by multiple gene prediction programs with no homology to an EST will be called "hypothetical protein." The gene prediction programs will be included in a note.
Homology to proteins with higher, but still significant, e values or bit scores should be examined to estimate the function of as many predicted genes as possible.
3) Plans for Clemson workshop to make a minimum tiling path. Takuji Sasaki and Jianzhong Wu explained how other groups using RGP clones could integrate them into the Clemson physical map. There seemed to be diminished interest in holding such a workshop.
4) BAC Registry. The group decided not to act on Dick McCombie's earlier proposal to participate in the NIH registry of BACs in progress.
5) Commitment to completing the genome with finished quality sequence.
Reasons rice and cereal grass communities support obtaining finished quality sequence:
Representatives of the participating countries were polled about their commitment to continue to obtain finished quality sequence past the end of 2002. Representatives of Japan, Korea, India, Brazil, and France said that they were committed to complete the finished sequence and believed that they had support to do so. Representative of Taiwan and the US said that while they were committed to finishing the genome with high quality sequence, they were uncertain of continued support.
6) Publications of completed chromosomes.
Takuji Sasaki said that Japan would be submitting a paper on chromosome 1 within a month. Dick McCombie said that the US was about ready to start writing and was assigning tasks in preparation. They would submit by mid May at the earliest. Bin Han said that annotation would be completed soon and they were prepared for writing. They could submit within two months.
There was strong support for submitting the papers at the same time. This would have a greater impact and, as no paper submitted now would be published in time for the expected Bejing and Syngenta publications, it was most important for the papers to have a 2002 publication date.
Dick McCombie further made the point that it was most important for these papers to be intellectually interesting. Possible comparisons with draft sequence and between japonica and indica were examples.
7) Problem of exploitation of genome wide annotation work.
Francis Quetier pointed out that while genome sequencers want and are happy to have their sequence used with appropriate attribution, they rightfully regard the use of their sequences for whole genome analysis as theft of unpublished material. This is a difficult area wherein responsible journal editors generally keep unscrupulous authors in check. The only solution proposed was to follow the example of sequencing groups who post use policies on sequencing group web sites stating that the material is unpublished and should be treated as such by groups who want to use it to prepare publications. Dick McCombie suggests that the Data Release Policy, especially the wording for microbial organisms, posted by the Sanger Center (http://www.sanger.ac.uk/Projects/release-policy.shtml) is an appropriate model.
8) There will be two round table format meetings in 2002. The first will take place at Genoscope May 23-24. The second will take place at Cold Spring Harbor in October.
9) The draft of a letter from the U.S. Advisory Committee to support the completion of a high quality public rice sequence was read
and suggested amendments were made and transmitted.
Public Sequencing Workshop, February 6, 9 am to 5 pm Epochal Center, Tsukuba
Sequencing Reports:
Rice Genome Research Program, NIAS/STAFF-Institute, Tsukuba, Japan, K. Yamamoto, H. Kanamori, S. Hosokawa, M. Hamada, H. Yamagata, S. Hijishita, M. Nakamura, K. Kamiya, K. Machita, M. Nigishi, A. Kikuta, Y. Nakama, N. Ono, T. Mizubayashi, K. Tsuji, H. S. Zhong, S. Ito, N. Namiki, Y. Katayose, K. Sakata, T. Matsumoto, T. Sasaki.
Progress report of RGP genomic sequencing
The rice genome sequencing project in Japan has started in April 1998 with the aim of sequencing the entire genome of rice. The project is also a part of the International Rice Genome Sequencing Project (IRGSP), in which Japan is in charge of sequencing chromosomes 1, 2, 6, 7, 8 and 9.
In an effort to facilitate the immediate and efficient completion of genome sequencing, we have introduced a semi- automated laboratory system and fully automated data analysis system. The system includes an automated liquid handling system (Biomek FX) for preparation of sequence sample with 384-well plate; sequencing with 384-well plate by ABI3700 capillary sequencers; automatic transfer of sequence data files; automatic assembly by Phred/Phrap; and phase classification.
As a result, we have already released ca. 136 Mb sequence data to the public domain corresponding to 1003 PAC/BAC clones. As
for chromosome 1, about 90% (47Mb) of phase 2 sequence has been published as of January 2002 and 67% (34.8Mb) is
completed as phase 3.
National Centre for Gene Research, Shanghai, Bin Han
Rice chromosome 4 sequencing progress
To completely sequence the chromosome 4 of Oryza sativa ssp. japonica cultivar Nipponbare, we have produced a fine bacterial artificial chromosome (BAC)-based physical map of the Nipponbare chromosome 4 through an integration of 114-sequenced Oryza sativa ssp. indica Guangluai 4 BAC clones and 182 RFLP and 407 EST markers with the fingerprinted data of the Nipponbare genome from the Clemson University Genomics Institute. The map consists of nine contigs accounting for 35.2 Mb covering 96% of the estimated chromosome size (36.6 Mb). BAC clones corresponding to telomeres as well as the centromere position were determined by BAC-pachytene chromosome fluorescence in situ hybridization (FISH) giving rise to an estimated length ratio of the long arm vs. the short arm of 5.13 bigger than 2.9 based on the physical map, indicating that the short arm is a highly condensed arm. Combining the FISH analysis and physical mapping also showed that the short arm and the pericentromeric region of the long arm are rich in heterochromatin occupied about 45% of the chromosome, indicating that this chromosome is likely very difficult to sequence.
We have completely sequenced 282 BAC clones. 235 of the sequenced BACs have been released to the public databases. The
total length of the sequenced BACs is 34.8 Mb excluding the overlaps (15.1% overlap of the sequenced BACs). We therefore
believe that 95% of the chromosome 4 has been sequenced completely. The total length of remaining 8-physical gaps should be
less than 1.4 Mb. Most of the physical gaps are located in the high heterochromatic regions, indicating that these gaps will be
difficult to be filled. The annotation and findings of chromosome 4 were presented in the meeting.
Academia Sinica Plant Genome Center, Taipei, Teh-Yuan Chow
Progress report of rice chromosome 5 sequencing project in Taiwan
The Academia Sinica Plant Genome Center (ASPGC) working on rice chromosome 5 DNA sequencing has submitted about 90
BAC/PACs, with more than 10 Mega base pairs at the phase 2 level (http://genome.sinica.edu.tw/). Although most of the submitted
BACs DNA were isolated from the Monsanto"s BAC library and the sequences assembled using Monsanto's provided raw sequence
data, some of the BACs DNA sequences were finished with our own sequence data. Another 22 BACs were at the shotgun library
and sequencing stages. BAC contigs cover about 30 % of the chromosome 5. Physical mapping of the BAC clones in the gaps of
BAC contigs are supported and cooperated with RGP, STAFF.
CCW Rice Genome Sequencing Consortium, Rod Wing and Dick McCombie
Sequencing the short arms of chromosomes 10 and 3
The objectives of the CCW Rice Genome Sequencing Consortium, funded in October 19991, are to sequence and annotate the short arms of rice chromosomes 10 and 3. As a prelude rice genome sequencing, CUGI has focused on the development of a sequence tagged connector (STC)/BAC fingerprint framework to facilitate the International Rice Genome Sequencing Project (IRGSP). The framework consists of 4 elements: 1) two deep-coverage large insert BAC libraries (HindIII and EcoRI)2; 2) a STC database3; 3) a Fingerprint database3 and 4) a Genome Anchoring database.
1. The Hind III BAC library contains 36,874 clones with an average insert size of 130 kb and the EcoR1 library contains 55,296 cones with average insert size of 120 kb. Combined, the BAC libraries cover approximately 26 genome equivalents.
2. The STC-DB is composed of DNA sequence derived from the ends of the DNA inserts in the HindIII and EcoRI BAC libraries. Sequencing was completed in April 2000 and resulted in the generation of 63,432 and 47,006 end sequences from the HindIII and EcoRI libraries respectively, and comprise over 36 Mb of high-quality rice genomic sequence deposited in Genbank.
3. The Fingerprint DB contains 63,233 high-resolution HindIII fingerprints derived from both BAC libraries. BAC clones were fingerprinted in duplicate on 1% agarose gels and assembled into contigs using FPC (FingerPrinted Contigs: Soderlund et al. 2000). At a cut off of 1 X 10-12, the 63,233 clones assembled into 1038 contigs and 2927 singletons.
4. The Genome Anchoring DB contains hybridization data (wet lab and in silico) to the BAC libraries and STC and FP databases. Currently 706 markers (RFLPs and Overgos derived from STCs) have been placed on the physical map thereby anchoring 268 of the 454 BAC contigs (59%). We estimate the 268 contigs cover about 323 Mb of the rice genome.
Using this framework, CCW has validated and sequenced 148 BAC clones from chr 3 and 10 (about 10 Mb finished and 10 Mb draft sequence). Data will be presented describing the Framework Project, CCW's sequencing progress to date, and early applications of our sequencing efforts to better understand and sequence the rice genome.
Our data is updated regularly and can be viewed at http://www.genome.clemson.edu/projects/rice/ccw/.
Funded by: 1 USDA-CREES/NSF/DOE Rice Genome Sequencing Program, 2 Rockefeller Foundation, 3 Novartis
The Institute for Genomic Research, Robin Buell
Progress of chromosomes 3, 10 & 11 sequencing
TIGR is participating in the International Rice Genome Sequencing Project and has been assigned portions of chromosomes 3, 10, and 11. We currently have ~30 Mb of rice genomic DNA in our high throughput sequencing pipeline (http://www.tigr.org/tdb/e2k1/osa1/). All sequence is released to Genbank/DDBJ/EMBL to either the High Throughput Sequence (HTGS) or the PLANT division. A total of ~25 Mb has been deposited in Genbank. We have in production all clones for our allocation on chromosome 10 (10S and 10L) with the exception of one small (30 kbp) gap. All completed BACs are manually annotated for genes and this information can be accessed through Genbank and the TIGR web site (http://www.tigr.org/tdb/e2k1/osa1/). We provide automated annotation of all rice BACs (>1200), for which this annotation is available on the TIGR web site. We have extended our bioinformatic analyses of rice and have identified putative orthologues of rice using the TIGR Orthologous Gene Alignments (http://www.tigr.org/tdb/tgi/ego/). We also have performed global alignments of rice genome sequences with all available plant transcripts to further identify candidate orthologous genes. As with all other TIGR Rice information, these data can be accessed on the TIGR Rice web site at http://www.tigr.org/tdb/e2k1/osa1/.
At the TIGR Rice Genome Project Site (http://www.tigr.org/tdb/e2k1/osa1/) can be found the newly updated Rice Gene Index and Rice Repeat Database. Also the The TIGR Orthologous Gene Alignment (TOGA) database has recently been updated.
Robin described automated annotation tools at TIGR and said that while five gene prediction programs are used, they have found that FGENESH is the most robust. A Whole Automated Genome Annotation Database is available constructed from processing all the rice BAC/PAC sequences (phase 2 and phase 3) in GenBank.
Other tools that are available are a whole rice genome BLAST search, and databases of in silico mapping of rice genetic markers to
rice STCs and to rice BAC/PAC sequences.
Plant Genome Initiative at Rutgers, Joachim Messing
Progress of chromosome 11 sequencing
Chromosome 11 has been divided between India, France, and the United States. The region north of the centromere has been
divided between France, TIGR, Wisconsin University and our group PGIR. The region South of the centromere has been assigned
to India and TIGR. Within the southern region is an area of collaboration between India and PGIR around position 97.3 cM.
Because France is sequencing chromosome 12, which contains a segmental duplication from 0.3 - 8.5 cM of rice 11, US will
coordinates its efforts with France. The regions assigned to PGIR include the sequence between marker C104B at position 4.1 and
R2916 at position 10.3 and the sequence between marker S20163S at position 27.8 and G320 at position 34.8. From each region 6
clones have been sequenced. An additional 3 clones from the region around 97.3 cM have been sequenced as well, yielding a total
of 15 new clones in about 6 months. The region between 4.1 and 10.3 is contained within the FingerPrintedContig (FPC) #203. A
Minimum Tiling Path (MTP) has been selected that spans 4 cM from position 4.1 to 8.1. To complete this sequencing contig, one
more clone north and 4 clones south will be necessary. The second region contains two FPCs, #205 and #206, which have been
merged by the sequence of a single BAC clone. Here we will need only one or two more clones south to complete this sequence
contig.
Korea Rice Genome Research Program, Ho-il Kim
Progress of chromosome 9 sequencing
The National Inst. of Agricultural Science and Technology (NIAST) /RDA has participated in IRGSP to sequence whole rice genome
since 1998. Now we focused on the 68.2-77.7cM and 93.2-94.4 cM region of chromosome 9. We have screened some seed BACs
by RFLP markers on our maps and combined the CUGI BAC fingerprinting contigs and RGP map data. We finally selected 16
sequence-ready BAC clones with about 1.7Mb on the 68.2-77.7cM and 3 BACs with 0.4Mb on 93.2-94.4 cM region. Shotgun
libraries of those BACs were constructed by sonication, then sequenced and assembled by PhredPhrap. Currently 2.1 Mb(11 BACs
from chr.1 and 3 BACs from chr. 9) were registered in GenBank and 12 sequences from chr. 9 were ready to submit as phase 2.
Our rice genome sequence information has been released in NIAST web site (http:// biogen.niast.go.kr).
Indian Initiative for Rice Genome Sequencing, Akhilesh K. Tyagi
Progress of Indian Initiative for Rice Genome Sequencing (IIRGS)
The Indian Initiative for Rice Genome Sequencing was started in June 2000 with the financial support from the Department of Biotechnology, Government of India, with the main purpose of sequencing a region spanning 56.9 to 109.3 cM on the long arm of chromosome 11 as part of IRGSP. The work is being carried out at the University of Delhi South Campus and the Indian Agricultural Research Institute, New Delhi. The infrastructure for physical mapping, sequencing and genome informatics has been established and two ABI3700, two ABI377 and one MegaBace1000 DNA sequencers have already been made functional.
By screening one PAC library at RGP and two BAC libraries at CUGI, along with analysis of fingerprints and in silico information, it has become possible to develop a physical map covering ~80% of the region assigned to India in the form of 13 major contigs encompassing ~80 genetic markers and ~200 ESTs. Attempts have also been made to overlap and integrate PAC and BAC physical maps. About 100,000 sequencing reactions have been performed to generate data of ~50 million bases at phase 0 level from 2K/5K shotgun clones of 10 PACs and 14 BACs. Several of them have been processed to phase 1 and phase 2 level with a potential to achieve >3 Mb sequence coverage at >10X level. The sequencing of six more BACs is in progress. Assembled and ordered sequences of 5 BAC/PAC clones have already been submitted to GenBank. In these assembled BAC/PAC sequences, the overall ge! ne frequency is estimated to be about 1 gene/6 kb based on annotation using RiceGAAS software, notwithstanding individual BAC/PAC variations.
IIRGS appreciates RGP, Japan, as well as CUGI, TIGR, PGIR and CSH, USA, for hosting short duration visits of various Indian
scientists.
GENOSCOPE and UMR , Nathalie Choisne, Gisela Orjeda, Nadia Deivange, Eric Pellitier, Marcel SalanoubatT, Jean Weissenbach and Francis Quetier
Progress report on the sequencing of rice chromosome 12
The chromosome 12 of the rice genome is estimated at 30 Mb. Physical mapping contructed on FPC basis by CUGI (USA) is currently checked and completed by PCR and hybridizations. The anchoring of seed BACs is achieved by using genetical markers complemented with the set of EST markers recently published by the Japanese group.
34 BACs from the Clemson library are under sequencing. In addition, 75 BACs sequenced by MONSANTO with a moderate coverage are receiving a further 2-3 x coverage to be finished.
Segmental duplication between chromosome 12 and chromosome 11, which is estimated at 2.5 Mb, is worked out in collaboration
with the Rutger University group (USA) and TIGR (USA).
Universidade Federal de Pelotas, Brazil, Antonio Costa de Oliveira, Fernando Iraja Felix de Carvalho, Paulo Dejalma Zimmer, Luiz Anderson Teixeira de Mattos.
The Brazilian Rice Genome Initiative - BRIGI
Rice is an important crop in Brazil as it is worldwide and the demand is currently above internal supply. Advances in the discovery of the rice sequence can lead to increases on productivity and yield. Also, rice can be seen as a model crop for the grasses, leading to further gains on other important cereal crops. Officially a member of IRGSP since September 2000, the Brazilian Rice Genome Initiative (BRIGI) has obtained support from the Ministry of Education to start sequencing part of the rice genome on January 2002. Reorganized in 1995, the Setor de Fitomelhoramento at the Faculdade de Agronomia Eliseu Maciel (FAEM) from the Universidade Federal de Pelotas (UFPel), is now entering a new era. Two main labs, one called Genomics and Plant Breeding (Laboratorio de Genomica e Fitomelhoramento - LGF) and the other called Hydroponics and Tissue Culture (Laboratorio de Hidroponia e Cultura de Tecidos - LHCT), give support to studies focusing inheritance, morphological and molecular characterization, genetic and physical mapping, mutation induction and now sequencing of genes related to root development and agronomically important traits. Comparative genetic and genomic studies of rice-oats and rice- maize involving those traits are becoming the major focus of our research group.
We propose to IRGSP, to sequence a region of ~1.4 Mb spanning Contig number 263 from chromosome 9 obtained from CUGI's library. Sixteen BAC clones aligned on a minimum path will be shotgun sequenced and released to public databases as a contribution to the goal of having the complete high quality draft of the rice genome by December 2002.
Laboratorio de Genomica e Fitomelhoramento - LGF, Departamento de Fitotecnia, FAEM, UFPel. Campus Universitario, P.O. Box
354, Capão do Leão - RS, 96001-970, Brazil.
Special Topics:
Why we must finish the rice genome and how we can do it, W. Richard McCombie
The draft of a complex genome such as rice is a valuable resource for the plant biology and agronomic communities. However, it is an inadequate resource for many purposes, particularly studies of the structure of the genome (as opposed to the genes) and especially for comparative genomics. The lack of completion in the draft as well as the lack of order provided to the islands of sequence it contains make overall genome structure studies and comparative studies difficult to carry out. Since rice is to serve as a reference sequence for monocot genomics, it is crucial that the sequence be brought to finished levels of quality and contiguity so that it will provide a structural and genetic anchor for studies on other cereals.
To do this we have been developing software and lab protocols designed to rapidly and cheaply finish draft level sequence, even if
subclones from the draft are not available. We have been optimizing a transposon based finishing strategy and directed primer
walking of BAC templates. The software we have developed generates a worklist for this finishing process. It also monitors the
progress of clones through finishing to allow efficient management of the process. In combination, these tools will rapidly accelerate
finishing. Dick's presentation may be viewed here.
Current status of sequence-ready physical maps constructed for rice chr. 1, 2, 6, 7, 8 and 9, Jianzhong Wu, Satoshi Katagiri, Yoshino Chiden, Mika Hayashi, Masako Okamoto, Yukiyo Ito, Shoko Saji, Hiroshi Mizuno, Shoji Yoshiki, Wataru Karasawa, Yoko Ichikawa, Maiko Ikeno, Rie Yoshihara, Masao Hamada, Marina Nakashima, Tomoya Baba, Takashi Matsumoto and Takuji Sasaki
RGP is now constructing sequence-ready physical maps of rice chromosomes 1, 2, 6, 7, 8 and 9 as part of an international
collaboration to sequence the rice genome. BAC clones with 5 x sequences provided by the Monsanto company were used after
BLASTing the BAC sequences against about 8,000 RGP genetic and EST marker sequences to determine the position of the
clones on these six chromosomes. The resulting physical maps showed a coverage of 35~50% varying in six chromosomes. In
regions where no Monsanto BAC clones were confirmed, we used our STS / EST markers to screen the RGP PAC library by PCR
and then fingerprinted the selected PAC clones. As a result, the coverage increased to 70~75% for each chromosome. At present,
we are filling the gaps by STC strategy using the CUGI (Clemson Univ. Genomics Institute) BAC clones. BLASTing our finished
BAC/PAC sequences against the STC database of CUGI could obtain bridge clones thereby closing the gaps between contigs or
elongating the contigs. The remaining gap regions correspond mainly to regions close to centromere and we are now making efforts
to cover these regions by end-walking method. These combined strategies should enable us to construct a minimum number of
contigs covering the entire region of each chromosome to complete the rice genome sequence. This work is supported by the MAFF
Rice Genome Project grant GS-1101.
An integrated physical and genetic map of the rice genome, Mingsheng Chen, CUGI
Rice was chosen as a model organism for genome sequencing due to its economic importance, its small genome size, and its
syntenic relationship with other cereal species. We have constructed a BAC based fingerprint physical map of the rice genome
(Oryza sativa ssp. japonica cv. Nipponbare) to facilitate whole genome sequencing of rice. Over 91% of the physical contigs are
genetically anchored by overgo hybridization, Southern hybridization, and in silico anchoring. Genome sequencing data were also
integrated into the rice physical map. Correlation of the genetic and physical maps reveals that recombination is severely
suppressed in centromeric regions as well as the short arms of chromosome 4 and 10. This integrated physical and genetic map of
the rice genome will greatly facilitate whole genome sequencing by identifying a minimally redundant tiling path of clones to
sequence and organizing the efforts of the many sites participating in this effort. Furthermore, the high-resolution physical map will
aid map-based cloning of agronomically important genes, and will provide an important tool for comparative genomic analysis of
grass genomes.
| Chromosome | Genetic Markers | Probes | Contig Number | Previously Estimated | Predicted Chromosome | Size of Anchored | Coverage (%) |
| Size (Mb) | Size (Mb) | Contigs (Mb) | |||||
| 1 | 231 | 413 | 32 | 51.5 | 44 | 42.7 | 97 |
| 2 | 184 | 316 | 26 | 43.4 | 39.8 | 35.8 | 90 |
| 3 | 224 | 364 | 26 | 47.5 | 40.8 | 35.7 | 87.5 |
| 4 | 119 | 273 | 24 | 36.8 | 39 | 34.5 | 88.5 |
| 5 | 139 | 239 | 27 | 33.6 | 33.2 | 30.9 | 93.1 |
| 6 | 129 | 229 | 22 | 35.1 | 31.8 | 28.2 | 88.7 |
| 7 | 158 | 292 | 25 | 33.1 | 35 | 30.3 | 86.6 |
| 8 | 88 | 181 | 18 | 33.6 | 27.6 | 25.8 | 93.5 |
| 9 | 80 | 139 | 16 | 27 | 21.6 | 20.3 | 94 |
| 10 | 136 | 337 | 20 | 23.7 | 26.8 | 24.6 | 91.8 |
| 11 | 118 | 245 | 28 | 33.7 | 30.3 | 28.6 | 94.4 |
| 12 | 98 | 171 | 20 | 30.9 | 30.6 | 25.5 | 83.3 |
| Total | 1704 | 3199 | 284 | 430 | 400.5 | 362.9 | 90.6 |
Landscape of rice chromosome 1, Takashi Matsumoto, Baltazar A. Antonio, Masatoshi Masukawa, Atsuko Idonuma, Manami Negishi, Michie Shibata, Yuichi Ito, Kimiko Yamamoto, Katsumi Sakata and Takuji Sasaki, RGP
The Rice Genome Research Program (RGP) has been participating in the worldwide sequencing collaboration centered at the
International Rice Genome Sequencing Project (IRGSP). Our initial target for constructing a physical map and for sequencing
analysis was chromosome 1 which is the longest (51 Mb) of the 12 rice chromosomes. Since sequencing of 80% of chromosome 1
except the centromere, telomeres and some gap regions in the physical map was finished at either phase 2 or 3 HTG grade in
March 2001, we started a chromosome-level annotation. Sequences from individual PAC or BAC clones were connected
considering the overlapped regions, and the resulting long sequence contig was divided into 1-3 Mb sections. Utilizing the newly
equipped high performance auto-annotation program, RiceGAAS we could predict the location of genes coding for proteins, tRNA,
ESTs (including genetic markers) and some repeats on the chromosome. Statistical analysis of the gene structure has elucidated
that the rice genome has more GC-rich exon and longer intron as compared with Arabidopsis. About 7000 genes were predicted by
multiple gene prediction programs including RiceHMM which is tuned for the gene prediction from rice genome, but only 30% of
them were functionally categorized. Many transposons and MITES were found throughout the chromosome, indicating that rice
genome is abundant in repeat structures. Many genes are redundant and arrayed tandemly. Forty predicted genes showed strong
homology to Arabidopsis proteome, while many has no shared sequences.
Two relevant reports from the Chromatin Dynamics Workshop (February 7) and the Rice Genome Forum (February 8):
Characterizing the genomic regions surrounding rice centromeres, Jianzhong Wu, Kimiko Yamamoto, Takashi Matsumoto and Takuji Sasaki, RGP
Centromeres play a key role in the process of chromosomal segregation and transmission in cell divisions, in the karyotypic stability, and in generating artificial chromosomes as cloning and expression vectors. As the central site for mitotic and meiotic spindle fiber attachment, molecular organization of centromeres has been studied extensively in some eukaryotes, the majority of these species have plenty of highly repetitive DNA (~ several Mb). In the plant kindom, for instance, the 180 bp repeat forms large tandem arrays and occupy the central domain of all five chromosomes in Arabidopsis centromeres (Fransz et al. 1998, 2000. Heslop-Harrison et al. 1999). In rice, the RCS2 family is a tandem repeat of a 168 bp (about 6000 times in the genome) and is organized into long uninterrupted arrays with the centromeric regions of Oryza species (Dong et al. 1998). We have screened and mapped a number of PAC or BAC clones onto the regions close to the centromeres of chromosomes 1, 2, 6, 7 and 8. Some of these clones have been sequenced. Repetitive sequences including the RCS2 family have been identified. Details on the possible structure and sequence organization within these rice centromere regions will be reported. This work is supported by the MAFF Rice Genome Project grant GS-1101.
References
Dong et al. (1998) Proc. Natl. Acad. Sci. USA, 95, 8135-8140.
Fransz et al. (1998) Plant J., 13, 867-876.
Fransz et al. (2000) Cell, 100, 367-376
Heslop-Harrison et al. (1999) Plant Cell, 11, 31-42
Rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon, JIMING JIANG, University of Wisconin
The centromere is the most characteristic landmark of chromosomes in high eukaryotic species. Centromeres are responsible for sister chromatid cohesion and is the site for kinetochore assembly and spindle fiber attachment, allowing for faithful pairing and segregation of sister chromatids during cell division. In the majority of eukaryotic species, centromeres are embedded within megabases of highly repetitive DNA. Thus determining the precise DNA boundaries of centromeres has proven to be a difficult task. Several centromeric repetitive DNA elements have been reported in rice (Oryza sativa) (Dong et al., 1998; Nonomura and Kurata, 1999). Sequence analysis revealed that most of these centromeric DNA elements are derived from a Ty3/gypsy class retrotransposon family that is specific to the centromeric regions of grass chromosomes. However, a 155-bp satellite repeat RCS2 is unique to rice and is exclusively located in the rice centromeres. We have mapped the RCS2 satellite within the chromosomal regions to which the spindle fibers attach. RCS2 is organized as long tandem arrays and is interrupted irregularly by centromere-specific retrotransposon elements. Each rice centromere contains from 100 kb to 2 Mb of the RCS repeat. Quantification of the RCS2 satellite in normal and telocentric rice chromosomes revealed that the breakpoints for several centromere misdivision events are located in the middle of the RCS2 arrays. Our results demonstrated that the centromeric DNA of rice chromosomes is organized similarly to several other model eukaryotic species, including humans and Arabidopsis thaliana, and that the RCS2 satellite is a key DNA element of rice centromeres.
|
|
|