Bioviz Home
PlantIGB Home
LoraineLab Research
By default, PlantIGB accesses the PlantQuickload data web site hosted at http://www.bioviz.org/plant_quickload.
The PlantQuickload Web site is just a set of directories (folders) on
our server that IGB can access and load via the Internet. If you are
a computational biologist interested in doing data-mining experiments
with Arabidopsis data, the files in these directories may be very
useful. Be sure to read the README.html file(s) for notes on formats
and other topics. And if you have questions, send us an email:
aloraine
uab.edu.
The data files currently available in PlantQuickload include the following:
| IGB menu name | Description |
|---|---|
| TAIR7_protein_coding_gene | genes encoding proteins |
| TAIR7_mirna |
genes encoding microRNAs (example: AT4G05105.1.) |
| TAIR7_rrna | genes encoding ribosomal RNAs |
| TAIR7_pre-trna | genes encoding tRNAs |
| TAIR7_snorna |
genes encoding small nucleolar RNAs example: AT4G13245.1 |
| TAIR7_pseudogene |
pseudogene example: AT5G20800.1 |
| TAIR7_snrna |
genes encoding small nuclear RNAs example: AT5G09585.1 |
| TAIR7_other_rna |
genes encoding other types of RNAs not in previously-listed categories, such as potential natural antisense genes example: AT5G40348.1 |
These data all are from the file named TAIR7_GFF available from The Arabidopsis Information Resource (TAIR) ftp site, downloaded in April, 2007.
To load data into IGB, click the checkboxes under the Data Access tab. Each data set will appear in a separate track. To find out more about a particular annotation, right-click the annotation (or control-click on Mac) and select the arabodopsis.org option, which should tell your Web browser to open the corresponding locus page at TAIR. The new page should tell you what the category ("Gene Model Type") the gene belongs to -- these should match the tier label in IGB.
TAIR version 7 EST and cDNA alignments
These datasets include the following:
| IGB label | Description |
|---|---|
| EST_TAIR7mm |
ESTs that align reasonably well to more than one location in the genome. |
| EST_TAIR7sm |
ESTs that align to just one location in the genome. |
| cDNA_TAIR7sm |
full-length cDNA sequences that align to a single location in the genome |
| cDNA_TAIR7mm |
cDNAs that align reasonably well to more than one location in the genome. |
These data represent genomic alignments for Arabidopsis ESTs and cDNA sequences provided by TAIR; they correspond to the "Transcripts" track in the TAIR SeqViewer tool.
We've divided them into four different tracks for display in IGB. Please note that the EST_TAIR7sm data set is quite large and may take more time to load than the other data sets.
Please note also that there are a variety of methods available for aligning expressed sequences to genomic sequence, and they do not all operate in the same way or produce the same answers for every sequence.
To find out more about the computational pipeline that generated these alignments, visit the Genome Annotation page at TAIR, which describes how the alignment pipeline operates.
These data are from the TAIR ftp site (see: ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR7_pre-release/TAIR7_Transcripts_by_map_position) These correspond to the "transcripts" tier in the TAIR on-line seqviewer (genome browser) tool.
If you would like to work with the source data file from TAIR, you will need to know what the different fields represent:
Fields:
0 Locus - AGI locus code
1 Locus_orientation_is_5 - gene orientation relative to genomic sequence
2 Genbank_acc - Genbank accession
3 external_id - Genbank gi number
4 Type(1=cDNA_2=EST)
5 Chromosome number (6 is chloroplast, 7 is mitochondrion)
6 Transcript_orientation_is_5 - transcript orientation relative to genomic sequence
Note that a 3-prime EST will likely appear on the opposite
strand from the associated gene.
7 Map_start_coordinate - one-based
8 Map_end_coordinate - one-based
For visualization in IGB, we subdivided the EST and cDNA alignment annotations into two new annotation subsets based on the number of map positions TAIR has reported for a single expressed sequence:
IGB menu name: EST_TAIR7mm - TAIR7 ESTs that map to more than one location in the genome. (The suffix "mm" stands for multi-mapper.)
IGB menu name: EST_TAIR7sm - TAIR7 ESTs that map to more than one location in the genome. (The suffix "sm" stands for single-mapper.)
These data are from a file named
EST-2006-12-19.txt 12-Mar-2007 12:54 14Mwhich is from: http://natural.salk.edu/database/transcriptome/ hosted at the Salk Institute.
ATH1 probeset-to-genome alignments from Affymetrix
This data set contains ATH1 probeset-to-genome alignments from
Affymetrix. Note that numerous probe sets align to the genome in
multiple locations. We have not yet done any quality-testing or
screening to sort out why this is the case, but hope to do so in
future.
In IGB, probes will appear as light-colored bars superimposed the
genomic alignment of the original "design" sequence, which
the sequence provided to Affymetrix that represents an intended target
transcript for interrogation on the array. To find out more
about how the ATH1 array was designed, see:
Note that probes typically occupy positions near the three-prime end of the design sequences, with some exceptions.
The data shown in IGB are from a data file provided by Affymetrix.
The file from Affymetrix uses a data representation format
that captures gaps or insertions in the design sequence relative to
the genomic sequence. When you examine the probe set alignments in
IGB, you may see immediately adjacent or overlapping
blocks in some probe sets. This means that a portion of the design
sequence is missing the in the genomic sequence and corresponds to
a gap in the genomic sequence relative to the design sequence.
Note also that some probes overlap with each other;
this is quite common.
You can obtain a copy of the ATH1 probe set
alignments from the "Support" section of the Affymetrix
Web site. It is likely to be identical to the version posted here.
These data were generated from the sequence viewer data files on the
TAIR ftp site. We subdivided the annotations into two datasets:
annotations that included an open reading frame (TAIRv6prot) and
annotations that did not (TAIRv6non-coding.)
These were generated using the sequence viewer data files on the TAIR
ftp site. We subdivided the annotations into two datasets: annotations
that included an open reading frame (TAIRv5prot) and annotations that
did not (TAIRv5noncoding.)
Genomic sequence
The sequence data are from:
ftp://ftp.arabidopsis.org//home/tair/home/tair/Sequences/whole_chromosomes
and are identical to the Genbank versions listed below, except for the mitochondrial sequence file, which differed in length by one base.
Sequence data files
| chromosome | TAIR sequence file | Genbank equivalent | Size (bp) | IGB .bnib file |
|---|---|---|---|---|
| 1 | ATH1_chr1.1con.01222004 | NC_003070.5 | 30432563 | chr1.bnib |
| 2 | ATH1_chr2.1con.01222004 | NC_003071.3 | 19705359 | chr2.bnib |
| 3 | ATH1_chr3.1con.01222004 | NC_003074.4 | 23470805 | chr3.bnib |
| 4 | ATH1_chr4.1con.01222004 | NC_003075.3 | 18585042 | chr4.bnib |
| 5 | ATH1_chr5.1con.04172003 | NC_003076.4 | 26992728 | chr5.bnib |
| chloroplast | ATH1_chloroplast.1con.01072002 | NC_003071.3 | 154478 | chrC.bnib |
| mitochondrion | ATH1_mitochondria.1con.01072002 | Y08501.2 | 366923 | chrM.bnib |
The IGB "bnib" files are compressed versions of the sequence data files. They are a compressed to reduce the amount of time it takes them to load when you click the "Load all sequence" button under the "Data Access" tab in IGB.