Interim IRGSP Meeting
Tucson, Arizona
November 18, 2004
Summary
A proposal was made to submit all of the remaining individual chromosome
papers that are ready at the same time to afford the greatest possible impact.
The possibility of submitting them with the paper describing the results
of RAP1 in March was discussed.
Current sequencing progress was reviewed.
About 50 of 3465 minimum tiling path clones will not be finished by the
end of the year. The IRGSP agreed to deposit the trace files for all unfinished
clones in an electronic repository at the RGP so that these could be examined
and possibly finished by other groups.
Dick McCombie described the initial results
from a quality control exercise where in silico digests of finished clones
were compared with experimental digests. This exercise will be followed
by a reassembly of sample clones from each group.
Takashi Gojobori and Takeshi Itoh described
the goals and planning for First
Rice Annotation Project Meeting (RAP1).
This annotation jamboree, to be held December 13 to 18 in Tsukuba will be
attended by an international group representing genome and plant databases
as well as by rice geneticists and members of the IRGSP. RAP1 will be modeled
on the successful H-Invitational human annotation meetings.
Individual Chromosome Papers
Several scenarios were presented for the
publication of individual chromosome papers. One view that was presented
that it would be difficult to publish these in high impact journals without
some special features.
One idea was that all of the papers that
were ready at the time would be submitted with the paper that would be prepared
as the result of RAP1. Another suggestion was that chromosomes 11 and 12
would be submitted together. Finally, as the preparation of the chromosome
3 paper is in its final stages, this might be submitted along with the joint
paper. An e-mail from Chris Gunter to Dick McCombie made this latter scenario
unlikely.
State of the rice genome - what
has to be done?
Minimum Tiling Path Clones, November
28, 2004
Site
|
Total
|
Finished
|
Phase 1 |
Phase 2
|
ASPGC
|
270
|
266
|
3
|
1
|
CSHL
|
113
|
110
|
0
|
3
|
Genoscope
|
277
|
271
|
6
|
0
|
KRGRP
|
15
|
11
|
3
|
1
|
NCGR
|
293
|
275
|
18
|
0
|
PGIR
|
44
|
43
|
1
|
0
|
TIGR
|
386
|
369
|
14
|
3
|
RGP
|
1810
|
1799
|
11
|
0
|
IRGS
|
116
|
116
|
0
|
0
|
AGI
|
132
|
132
|
0
|
0
|
BRIGI
|
2
|
2
|
0
|
0
|
It appears that about 50 of 3458 minimum
tiling path clones will not be finished by the end of the year. Dick McCombie
suggested that all unfinished clones plus all of those submitted as finished
with internal N's be submitted to a common repository so that other labs
could attempt to finish these clones. He suggested that both trace files
and assemblies be deposited. The RGP has agreed to host this electronic
repository.
Physical map - Jianzhong Wu
Jianzhong has identified all of the fosmid
clones that contain the telomere repeat. They have used this library to
fill telomere gaps for seven chromosomes. He proposes to sequence the ends
of these clones and make the sequences available so that members might fill
other telomere gaps.
Jianzhong also proposed to rebuild the pseudomolecules based on December
31 submissions.
Quality control - Dick McCombie
Dick previously proposed to check the
quality of a sample submitted finished IRGSP BACs/PACs.
1) Ten finished clones from each of the
seven groups that is doing finished sequencing will be chosen at random from
GenBank.
2) Multiple digests will be performed at
AGI on the BAC DNA chosen from the original collection rather from the clone
used by the sequencing group because of the difficulty of reimporting the
clones into the US. Enzymes will be chosen based on the size of fragments
that are predicted from the sequence. The sizes of the fragments in the
digests will be automatically scored and compared with in silico
digests.
Except for clones from one group, at least
90% of the clones passed this test.
For those that didn't pass it was suggested
that the laboratories responsible for the finished sequences could repeat
the digests on the questionable clones and send the electronic images to
Dick.
Dick originally thought of requesting trace
files from only three clones from each group to check assemblies. He subsequently
decided to request trace clones for all clones that were tested from each
group. This request was made and many of the groups have already arranged
to transfer files.
In the meantime, the RGP is checking about
20 clones from CSHL/AGI.
Planning for the RAP1 annotation meeting - Takashi Gojobori
and Takeshi Itoh
Takashi Gojobori described
the goals of RAP1 as follows:
- Coordination of Rice genome and
full-length cDNA sequencing and annotation projects world-wide
- Manually curated annotation under
the unified criteria (with possible reference to the TIGR semi-automatic
annotation)
- Creation of a database and tools
to present an integrated view to the wider scientific public
- Promote scientific findings through
annotation and the database construction
Dr. Gojobori described the involvement
of the DDBJ in the annotation of the mouse genome based on the use of full-length
cDNAs and published in Nature (2001) 409:685-690 and Nature (2002) 420: 563-573.
The DDBJ was instrumental in organizing the Human Full-Length cDNA Annotation
Invitational (H-Invitational) meetings also based on the use of full-length
cDNAs in which there was participation by 40 international organizations.
The two meeting resulted in publications in Nature (2002) 419: 3-4 and
PLoS (2004) 2: 1-21. Among the interesting findings as a result of the meetings
were:
- 21,037 human genes were annotated
among which 5,155 were unique to H-Inv.
- In most cases first introns and
last exons were found to be longer than previously described.
- Existence of 847 cDNA clusters that
could not be completely mapped to the human genome suggesting that 4% of
the current human genome sequence was incomplete or incorrectly assembled.
- Defined an experimentally validated
alternative splicing dataset composed of 8,553 isoforms and encoded by 3,181
loci. In 55% of cases alternative usage of exons was found to contain different
functional domains.
- Non-protein coding genes accounted
for 6.5% (1,371 loci); of these 296 were classified as putative non-coding
RNAs.
Takashi Gojobori emphasized that the
first three days of the initial H-invitational meeting was devoted to establishment
of unified criteria.
Takeshi Itoh described
the preparation and planning for RAP1. The primary goals of the meeting
are:
- Construction of a fully curated
database of the entire rice genes/genome based on manual curation of automated
annotations.
- Development of collaborations between
geneticists/breeders and bioinformaticians
In preparation for the international
meeting in Tsukuba, the pseudomolecules from build 3 were masked with the
TIGR repeat database and automated annotation was performed with Fgenesh,
Genscan (maize and arabidopsis models), and GLocate and 69,002 models were
predicted. 20,545 full-length cDNAs (90% of the clusters) were mapped to
the pseudomolecules. 18,790 of the 20,545 mapped full-length cDNAs matched
predicted models and an additional 3,897 models matched ESTs for a total
of 26,714 gene models with transcriptional support.
In other preparatory activities, cDNAs
and predicted coding genes will be subjected to blast search against DNA
and proteins databases and InterProScan search. Other activities include
mapping of FSTs and BAC ends of other Oryza spp. to the pseudomolecules,
identification of non-coding RNAs, genome-wide alignment between japonica
and indica (93-11), and matching orthologs of other monocots and Arabidopsis
genes.
The results of the preliminary annotation
are available at https://www.jbirc.aist.go.jp/intedb/h-inv/scargot/index.jsp.
The results of RAP1 annotations will be released to Uniprot and the database
will be made public in conjunction with publication of the work. Two subsequent
RAP meetings are envisioned based on feedback from researchers and other
developments.
The following suggestions were made by
IRGSP members:
- An electronic discussion should
be conducted with international participants to reach agreements on criteria
in advance of the meeting in Tsukuba. As a basis for this discussion we
will communicate the IRGSP annotation standards, and gene number standards
proposed by TIGR and Gramine.
- Additional rice peptide information
(http://proteomics.arl.arizona.edu/research.html) as well as cross-species
peptide information should be considered to validate hypothetical genes.
- Massively Parallel Signature Sequencing
(MPSS) data from the University of Deleware (http://mpss.udel.edu/rice/)
should be utillized.
- Robin Buell suggested the use of
Program to Assemble Spliced Alignments (PASA) developed at TIGR which uses
experimental evidence to update models.
Proposed discussion topics to be covered at RAP1
Database issues
RAP database structure
Submission to DDBJ, NCBI, EMBO
Links to other databases Gramine, TIGR, Rice Proteome DB
GO issues
Nomenclature
Assigning gene numbers
Agreement on names
What to do with improperly named genes
Coordination with other databases
Transposable Element Annotation and TE Models
Simultaneous updates of databases
Distributed Annotation by Scientists
Comparative genomics
Comparison with other Oryza species
Comparison with Arabidopsis
Functional domains
Species-specific comparison of ESTs against the rice genome
Functional Genomics
Insertional element flanking sequences
Mutations
SNPs?
Experimental genomics to validate or discover gene models
Whole genome arrays
SAGE
Non-coding RNAs
Proteomics
Full-length cDNA validation
Tentative agenda for the IRGSP members who will be attending
Thursday, December 16
9:00 am Registration
of Researchers
(IRGSP Tutorial for non-IRGSP attendees)
10:00 am Discuss Topics
12:00 pm Lunch
1:00 pm Break up into interest groups
7:00 pm Reception
Friday, December 17
9:00 am Interim
Report of the Genome Annotation
10:00 am Database Release and Publication Policy
11:00 am Preliminary reports of interest groups
12:00 pm Lunch
1:00 pm Reassemble Breakout Groups
Saturday, December 18
9:00 am Summarize
and Consensus of Specific Issues
12:00 pm Lunch
1:00 pm Perspective of the Rice Genome Annotation
2:00 pm Feedback and Conclusion
3:00 pm End of Meeting
Announcement of Completion
Takuji Sasaki will give members advanced notice of the formal announcement
of the completion of the genome so that this announcement can be coordinated
internationally. This date will be either the week before RAP1 or during
RAP1 and will coincide with a visit to the Minister of MAFF to make the announcement.
Posted December 3, 2004 by B. Burr and
T. Sasaki
RICE GENOME RESEARCH PROGRAM (RGP) HOME PAGE