Please answer the following three questions on Sequence Z:
Metadata
The GO Ontology is a very widely-used resource in the bioinformatics community as a tool to annotate genes and their products. Websites serving genome databases such as TAIR use GO to annotate genes and other biological entities to enrich the data stored within by using semantic metadata.
1. BLAST Sequence Z into TAIR - from which gene does this sequence derive?
2. What are the Molecular Function terms that this gene is thought to have?
3. What are the GO IDs for these terms?
4. How do you think that biologists can benefit from the annotation of biological data with metadata in an ontology such as GO?
5. How could a bioinformatician exploit metadata such as GO terms programmatically?
Perl scripting
BLAST Sequence Z into EMBL-Bank and retrieve the flat file (Text view) output of the record. Then write a Perl script to read in the flat file and to write out the following fields:
1. The Accession Number for this record
2. The Description of the entry
3. Any Database Cross-references to InterPro records (i.e. InterPro Accession Numbers)
4. The Protein ID in the Feature Table
5. The length (in base pairs) of the nucleotide sequence
Please append your code to your coursework script.
Microarray databases
1. Which is the Affymetrix probe ID of this gene?
2. Using Genevestigator, answer the following:
- In which developmental stage is the expression of this gene at it's highest?
- In which part of the root does this gene typically exhibit higher expression:
The lateral root or the endodermis?
3. Explain what criteria other than co-expression you'd want to use in order to be convinced that two or more genes are truly transcriptionally co-regulated?
4. Name two uses of microarray technology apart from transcriptomics. Briefly describe (1/3 page each) each technique.
Here is a coding sequence in fasta format:
>Sequence Z
ATGTGGAGGCTGAGAACTGGACCGAAGGCTGGAGAGGATACTCACCTGTTCACCACCAAC
AACTATGCAGGGAGGCAGATTTGGGAATTTGATGCCAACGCAGGCTCTCCACAAGAAATT
GCCGAGGTAGAGGATGCTCGGCACAAATTCTCAGACAACACGTCACGTTTCAAGACTACT
GCCGATCTCTTATGGCGCATGCAGTTTCTTAGGGAGAAGAAATTCGAACAGAAGATTCCA
CGAGTGATAATCGAGGATGCAAGAAAGATAAAGTACGAAGATGCAAAGACAGCATTGAAA
AGAGGGTTACTCTATTTCACAGCCTTGCAGGCTGATGATGGACACTGGCCAGCTGAAAAC
TCTGGCCCAAATTTCTATACCCCTCCTTTTTTGATATGCTTGTACATCACTGGACATCTG
GAGAAAATCTTCACTCCCGAGCATGTTAAAGAGTTACTACGTCACATCTACAACATGCAG
AACGAAGATGGTGGGTGGGGTTTACACGTAGAAAGCCACAGTGTTATGTTCTGTACAGTC
ATTAATTACGTCTGTCTACGAATTGTGGGAGAAGAAGTCGGTCATGATGATCAAAGAAAT
GGTTGTGCAAAGGCTCATAAGTGGATCATGGACCATGGTGGTGCTACCTACACGCCCTTG
ATCGGAAAAGCGTTGCTTTCGGTTCTTGGAGTGTATGATTGGTCTGGCTGCAATCCTATA
CCTCCAGAGTTCTGGTTGCTTCCGTCTTCTTTTCCTGTTAATGGAGGGACTCTCTGGATT
TATTTACGGGATACTTTCATGGGGTTGTCATACTTGTATGGTAAAAAATTTGTGGCTCCC
CCAACACCTCTCATTCTCCAGCTCCGAGAAGAGCTTTATCCGGAGCCTTATGCAAAAATC
AATTGGACGCAAACACGAAACCGATGTGGAAAGGAAGATCTCTACTATCCACGCTCATTT
TTACAAGATTTGTTTTGGAAGAGTGTTCACATGTTCTCAGAGAGTATCCTAGATCGATGG
CCTTTAAACAAGCTAATAAGACAAAGAGCTCTTCAATCCACTATGGCACTCATTCACTAT
CATGACGAATCCACCAGATATATTACAGGCGGATGCCTGCCAAAGGCCTTTCATATGCTT
GCATGTTGGATAGAAGACCCTAAGAGTGATTATTTTAAAAAACATCTTGCTCGAGTTCGC
GAATACATATGGATTGGCGAGGATGGCCTGAAAATTCAATCTTTTGGTAGCCAATTATGG
GATACAGCCTTATCGCTACATGCATTACTAGACGGAATTGATGATCATGATGTTGATGAT
GAGATTAAAACAACGCTCGTTAAAGGATATGATTACTTGAAGAAATCACAAATTACAGAG
AACCCTCGCGGTGATCACTTCAAAATGTTTCGTCACAAGACAAAAGGTGGATGGACATTT
TCAGATCAAGATCAAGGATGGCCTGTTTCAGATTGTACTGCTGAAAGCTTAGAGTGTTGT
CTATTCTTCGAGAGCATGCCGTCCGAGCTTATTGGAAAAAAAATGGATGTGGAGAAACTC
TATGATGCCGTTGATTATCTTCTCTATCTGCAGAGTGATAATGGAGGCATAGCAGCATGG
CAACCAGTTGAAGGAAAAGCCTGGTTAGAGTTGTTAAATATCATGATTTTTAGGTATGTA
GAATGTACGGGGTCAGCGATTGCAGCATTGACTCAGTTTAACAAACAGTTTCCAGGGTAT
AAAAACGTAGAGGTTAAACGGTTTATAACAAAGGCTGCAAAGTACATTGAAGACATGCAA
ACGGTGGATGGTTCATGGTACGGAAATTGGGGAGTGTGTTTTATATACGGGACCTTCTTT
GCGGTAAGAGGTCTTGTGGCCGCTGGGAAGACTTACAGTAACTGTGAAGCAATTCGTAAA
GCAGTTCGTTTTCTTCTAGACACACAAAATCCGGAGGGTGGCTGGGGAGAGAGCTTTCTC
TCTTGTCCAAGCAAGAAATATACTCCTTTGAAAGGAAACAGCACAAATGTGGTGCAAACA
GCACAAGCACTTATGGTGCTAATTATGGGTGATCAGATGGAGAGAGATCCTTTACCGGTT
CATCGTGCTGCTCAAGTGTTGATCAATTCACAGTTGGATAATGGCGATTTTCCACAGCAG
GAAATAATGGGAACGTTCATGAGAACTGTGATGCTCCATTTTCCGACCTATAGGAACACG
TTCTCTCTTTGGGCTCTCACACATTACACACATGCTCTGCGACGTCTCCTCCCTTAA