Annotation databases
MAJIQ requires as an input an annotation of the transcriptome and currently only supports the GFF3 format
as specified. The GFF3 annotation files used in
the MAJIQ paper for mouse and human can be downloaded here, Mouse (Ensembl, mm10 build),
Human (Ensembl, hg19 build)
If you would like to use a different annotation file you can use the following tool to
convert a GTF file (e.g. downloaded from Ensembl) to the GFF3 format that MAJIQ accepts.
It is important to note that MAJIQ makes some assumptions when parsing the hierarchical GFF3 file and has some specific requirements:
- We only consider sequence features with the type (column 3) “gene”
- For every gene, we only consider isoforms of a gene with a type of “mRNA” or “transcript”
- All entries (except for genes) should have a parent attribute
- All genes should have a unique ID attribute
- Within a gene all entries should have a unique ID attribute.
- A gene can have a Name attribute, otherwise the ID will be used instead in the output.
Keeping these in mind will be important for analyzing the types of transcripts you care about and modifying your GFF3 annotation file may be necessary.