Reference no: EM1389771
QUESTION 1
For each of the following tasks you'll use the xxxxxxxx' database. You need only provide the query or command you used within MySQL for each.
a. How many columns are in the cv table? (The command here doesn't need to return a number, just show the columns so you can count them.)
b. Which ID (cv_id) corresponds to the GO ontology stored in the database?
c. How many controlled vocabulary terms (cvterm table) are linked to the GO ontology?
d. How many entries in the feature table are linked to any of the GO terms? (see the feature_cvterm linking table.)
Question 2:
The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation. You can find the specification here:
sequenceontology.org/gff3.shtml
Vi editor as below:
______________________________________________________________________________
##gff-version 3
#date Tue Feb 8 19:50:12 2011
#
# Saccharomyces cerevisiae S288C genome
#
# Features from the 16 nuclear chromosomes labeled chrI to chrXVI,
# plus the mitochondrial genome labeled chrMito and the 2-micron plasmid.
#
# Created by Saccharomyces Genome Database
#
# Weekly updates of this file are available via Anonymous FTP from:
# ftp.yeastgenome.org/yeast/data_download/chromosomal_feature/saccharomyces_cerevisiae.gff
#
#
____________________________________________________________________________
Within the feature table another column of note is the 9th, where we can store any key=value pairs relevant to that row's feature such as ID, Ontology_term or Note.
Your task is to write a GFF3 feature exporter. A user should be able to run your script like this:
$ export_gff3_feature.pl /path/to/some.gff3 gene ID YAR003W
There are 4 arguments here that correspond to values in the GFF3 columns. In this case, your script should read the path to a GFF3 file, find any gene (column 3) which has an ID=YAR003W (column 9). When it finds this, it should use the coordinates for that feature (columns 4, 5 and 7) and the FASTA sequence at the end of the document to return its FASTA sequence.
Your script should work regardless of the parameters passed, warning the user if no features were found that matched their query. (It should also check and warn if more than one feature matches the query.)
The output should just be printed on STDOUT (no writing to a file is necessary.)