The bigGenePred format stores annotation items that are a linked collection of exons, much as BED files indexed as bigBeds do, but bigGenePred has additional information about the coding frames and other gene specific information in eight additional fields.
bigGenePred files are created using the program bedToBigBed
with a special AutoSQL file
that defines the fields of the bigGenePred. The resulting bigBed files are in an indexed binary
format. The main advantage of the bigBed files is that only portions of the files needed to display
a particular region are transferred to UCSC. So for large data sets, bigBed is considerably faster
than regular BED files. The bigBed file remains on your web-accessible server (http, https, or ftp),
not on the UCSC server. Only the portion that is needed for the chromosomal position you are
currently viewing is locally cached as a "sparse file".
The following autoSQL definition is used for bigGenePred gene prediction files. This is the
bigGenePred.as
file defined by the
-as=
option when using bedToBigBed
. Click this
bed12+8
file for an example of bigGenePred
input. In alternative-splicing situations, each transcript has its own row.
table bigGenePred
"bigGenePred gene models"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome"
string name; "Name or ID of item, ideally both human readable and unique"
uint score; "Score (0-1000)"
char[1] strand; "+ or - for strand"
uint thickStart; "Start of where display should be thick (start codon)"
uint thickEnd; "End of where display should be thick (stop codon)"
uint reserved; "RGB value (use R,G,B string in input file)"
int blockCount; "Number of blocks"
int[blockCount] blockSizes; "Comma separated list of block sizes"
int[blockCount] chromStarts; "Start positions relative to chromStart"
string name2; "Alternative/human readable name"
string cdsStartStat; "enum('none','unk','incmpl','cmpl')"
string cdsEndStat; "enum('none','unk','incmpl','cmpl')"
int[blockCount] exonFrames; "Exon frame {0,1,2}, or -1 if no frame for exon"
string type; "Transcript type"
string geneName; "Primary identifier for gene"
string geneName2; "Alternative/human readable gene name"
string geneType; "Gene type"
)
Note that the bedToBigBed
utility uses a substantial amount of memory: approximately
25% more RAM than the uncompressed BED input file.
To create a bigGenePred track, follow these steps:
bed12+8
bigGenePred format file that has the first twelve fields
described by a normal BED file as described here.
(You can also read about genePred here.)name2, cdsStartStat, cdsEndStat, exonFrames, type, geneName, geneName2, geneType.
sort -k1,1 -k2,2n unsorted.bed > input.bed
bedToBigBed
program from the
directory of binary utilities.fetchChromSizes
script from the same
directory to create a chrom.sizes file for
the UCSC database you are working with (e.g. hg38).bedToBigBed
utility:
bedToBigBed -as=bigGenePred.as -type=bed12+8 bigGenePred.txt chrom.sizes myBigGenePred.bb
myBigGenePred.bb
) to an http, https, or ftp
location.track type=bigGenePred name="My Big GenePred" description="A Gene Set
Built from Data from My Lab" bigDataUrl=http://myorg.edu/mylab/myBigGenePred.bb
The bedToBigBed
program can also be run with several additional options. A full list o the available options can be seen by running bedToBigBed
with no arguments to display
the usage message.
In this example, you will use an existing bigGenePred file to create a bigGenePred custom track. A bigGenePred file that contains data on the hg38 assembly has been placed on our http server. You can create a custom track using this bigGenePred file by constructing a "track" line that references this file:
track type=bigGenePred name="bigGenePred Example One"
description="A bigGenePred file"
bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.bb
Paste the above "track" line into the custom track management page for the human assembly hg38 (Dec. 2013), then press the submit button. Note that line breaks have been inserted into the above track line for readability; they must be removed for the example to work correctly in the browser.
Custom tracks can also be loaded via one URL line. The link below loads the same bigGenePred track, but includes parameters on the URL line (line break inserted for readability):
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgct_customText=track%20type=bigGenePred
%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.bb
With this example bigGenePred loaded, click into a gene from the track. Note that the details page
has a "Links to sequence:" section that includes "Translated Protein",
"Predicted mRNA", and "Genomic Sequence" links. Click the "Go to ... track
controls" link. There change the "Color track by codons:" option from "OFF"
too "genomic codons" and be sure "Display mode:" is "full" then click
"Submit". Then zoom to a region where amino acids display, such as
chr9:133,255,650-133,255,700
and see how bigGenePred allows the display of codons.
Click back into the track controls page and click the box next to "Show codon numbering".
Return to the browser to see amino acid numbering.
You can also add a parameter in the custom track line, baseColorDefault=genomicCodons
,
to set the display of codons:
browser position chr10:67,884,600-67,884,900
track type=bigGenePred baseColorDefault=genomicCodons name="bigGenePred Example Two"
description="A bigGenePred file" visibility=pack
bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.bb
Paste the above into the hg38 custom track page (removing the line breaks) to see an example of bigGenePred amino acid display around the beginning of the gene SIRT1 on chromosome 10.
In this example, you will create your own bigGenePred file from an existing bigGenePred input file.
bed12+8
bigGenePred.txt
example input file to your machine (satisfies above step 1).bedToBigBed
utility (step 2).hg38.chrom.sizes
text file to your machine.
It contains the chrom.sizes for the human (hg38) assembly (step 3).bigGenePred.as
text file to your
machine.bedToBigBed -type=bed12+8 -tab -as=bigGenePred.as bigGenePred.txt hg38.chrom.sizes bigGenePred.bb
bigGenePred.bb
) on a web-accessible server
(step 5).Note the above description in Example #1 on how to view genomic codons, including numbering.
If you would like to share your bigGenePred data track with a colleague, learn how to create a URL by looking at Example 11 on this page.
Since the bigGenePred files are an extension of bigBed files, which are indexed binary files, they can be difficult to extract data from. We have developed the following programs, all of which are available from the directory of binary utilities.
bigBedToBed
— this program converts a bigBed file to ASCII BED format.bigBedSummary
— this program extracts summary information from a bigBed
file.bigBedInfo
— this program prints out information about a bigBed file.As with all UCSC Genome Browser programs, simply type the program name at the command line with no parameters to see the usage statement.
If you encounter an error when you run the bedToBigBed
program, it may be because your
input bigGenePred file has data off the end of a chromosome. In this case, use the
bedClip
program here before the
bedToBigBed
program. It will remove the row(s) in your input BED file that are off the
end of a chromosome.