Bioinformatics for representing sequence annotation

Assignment Help Database Management System
Reference no: EM1389771

QUESTION 1

For each of the following tasks you'll use the xxxxxxxx' database. You need only provide the query or command you used within MySQL for each.

a. How many columns are in the cv table? (The command here doesn't need to return a number, just show the columns so you can count them.)

b. Which ID (cv_id) corresponds to the GO ontology stored in the database?

c. How many controlled vocabulary terms (cvterm table) are linked to the GO ontology?

d. How many entries in the feature table are linked to any of the GO terms? (see the feature_cvterm linking table.)

Question 2:

The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation. You can find the specification here:

sequenceontology.org/gff3.shtml

Vi editor as below:
______________________________________________________________________________
##gff-version 3
#date Tue Feb 8 19:50:12 2011
#
# Saccharomyces cerevisiae S288C genome
#
# Features from the 16 nuclear chromosomes labeled chrI to chrXVI,
# plus the mitochondrial genome labeled chrMito and the 2-micron plasmid.
#
# Created by Saccharomyces Genome Database
#
# Weekly updates of this file are available via Anonymous FTP from:
# ftp.yeastgenome.org/yeast/data_download/chromosomal_feature/saccharomyces_cerevisiae.gff
#

#

____________________________________________________________________________

Within the feature table another column of note is the 9th, where we can store any key=value pairs relevant to that row's feature such as ID, Ontology_term or Note.

Your task is to write a GFF3 feature exporter. A user should be able to run your script like this:

$ export_gff3_feature.pl /path/to/some.gff3 gene ID YAR003W

There are 4 arguments here that correspond to values in the GFF3 columns. In this case, your script should read the path to a GFF3 file, find any gene (column 3) which has an ID=YAR003W (column 9). When it finds this, it should use the coordinates for that feature (columns 4, 5 and 7) and the FASTA sequence at the end of the document to return its FASTA sequence.

Your script should work regardless of the parameters passed, warning the user if no features were found that matched their query. (It should also check and warn if more than one feature matches the query.)

The output should just be printed on STDOUT (no writing to a file is necessary.)

Reference no: EM1389771

Questions Cloud

Calculate the dividend yield and the capital-gain yield : calculate the dividend yield, the capital-gain yield, and the total return to the stock. Express your calculations in percentage terms.
Determine the critical values for the test : The null hypothesis is to be tested at 95% confidence. Determine the critical values for this test.
Elements in frequency histogram : When making the histogram from frequency table, (a) what goes along the bottom, (b) what goes along the left edge, and (c) what goes above each value?
Interaction of calcium with other proteins : Explain the interaction of calcium with other proteins and how this alternate control system affects the rate and duration of smooth muscle contraction.
Bioinformatics for representing sequence annotation : The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation and which ID (cv_id) corresponds to the GO ontology stored in the database and how many controlled vocabulary terms (cvterm table) are linked to the GO onto..
Estimation of the proportion of hospital referrals : What size sample would be required to estimate the proportion of hospital referrals with a margin of error of 0.04 or less at 95% confidence?
Grouped frequency table : Describe to a person who has never taken a course in statistics the meaning of a grouped frequency table.
Summarize the structural organization of dna : Provide summary the structural organization of DNA. In your answer, be certain that you identify the chemical components of the molecule, and the arrangement of the molecule
Centrifugation of a cell suspension : Assume if centrifugation of a cell suspension at a rotation speed of 1200 rpm takes three min, Determine how much time will be required to achieve the same degree of cell.

Reviews

Write a Review

Database Management System Questions & Answers

  Knowledge and data warehousing

Design a dimensional model for analysing Purchases for Adventure Works Cycles and implement it as cubes using SQL Server Analysis Services. The AdventureWorks OLTP sample database is the data source for you BI analysis.

  Design a database schema

Design a Database schema

  Entity-relationship diagram

Create an entity-relationship diagram and design accompanying table layout using sound relational modeling practices and concepts.

  Implement a database of courses and students for a school

Implement a database of courses and students for a school.

  Prepare the e-r diagram for the movie database

Energy in the home, personal energy use and home energy efficiency and Efficient use of ‘waste' heat and renewable heat sources

  Design relation schemas for the entire database

Design relation schemas for the entire database.

  Prepare the relational schema for database

Prepare the relational schema for database

  Data modeling and normalization

Data Modeling and Normalization

  Use cases perform a requirements analysis for the case study

Use Cases Perform a requirements analysis for the Case Study

  Knowledge and data warehousing

Knowledge and Data Warehousing

  Stack and queue data structure

Identify and explain the differences between a stack and a queue data structure

  Practice on topic of normalization

Practice on topic of Normalization

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd