Reference no: EM132424984
PHAR5340 Bioinformatics Assignment, Leicester School of Pharmacy - De Montfort University, UK
You should perform the three tasks and submit them as a single Word document or pdf containing your results with a discussion of the methods and parameters chosen where appropriate. Note - be selective about the data you show.
Task 1 -
(1) You are required to generate a novel plasmid, which will enable you to express a protein in a yeast expression system. You have been provided with the raw DNA sequence that gives rise to this vector (see Sequence 1). You must first construct the vector using Table 1.0 and Sequence 1.
a. Present a figure showing this freshly derived vector map designated pSYEADH2.
b. Indicate on the map where the multiple cloning region is located.
Table 1.0 Vector feature table
Name
|
Start
|
End
|
Complimentary
|
Description
|
Draw as
|
Cyct1
|
89
|
337
|
|
Cyc1 terminator
|
Gene
|
pUC Ori
|
519
|
|
|
pUC ORI
|
Marker
|
AmpR
|
2197
|
1337
|
C
|
Beta-lactase
|
Region
|
URA3
|
3322
|
2215
|
C
|
URA3 gene
|
Gene
|
2 Micron
|
3326
|
4797
|
|
|
Region
|
F1 Ori
|
5190
|
4865
|
C
|
F1 Ori
|
Region
|
ADH2p
|
5196
|
5775
|
|
ADH2 promoter
|
Gene
|
(2) You have been asked to clone a molecule (Accession Number: NM_000103.4) into this novel vector. There are many variants of this molecule.
a. Identify the sequence with Accession Number NM_000103.4? HINT: You may need to use Blast to determine this.
NM_000103.4 gives rise to a protein, which is greater than 200 amino acids in size.
b. Using this information identify the correct ORF of this protein.
Design a strategy for the insertion of the ORF of the gene given by NM_00103.4 into the novel vector. Note: a direct restriction digest and ligation protocol is not possible in this strategy.
c. Describe the forward and reverse primers you would need to design in order to change the NM_000103.4 sequence.
d. You should present your cloning strategy documenting the steps taken to obtain the final plasmid map as generated in Clone Manager.
e. Present your final plasmid map.
(3) Sequence 2 gives rise to a protein, which is greater than 200 amino acids in size.
a. Using this information identify the correct ORF of this protein and provide the amino acid sequence of the functional protein.
b. Align this protein with the protein produced from NM_000103.4.
c. Analyse the results of this alignment. Are there any differences between the two proteins?
(4) Sequence 3 is the cDNA sequence of a mammalian secreted protein.
a. Using Clone Manager identify the signal peptide of this sequence.
b. Briefly explain the steps you undertook to perform this task.
c. Identify the mature secreted protein sequence for sequence 3.
Task 2 -
(1) Perform a quick nucleic acid BLAST search with Sequence 4.
a. What is the identity of Sequence 4? HINT: Make sure you choose the mRNA.
b. Are there any related DNA sequences found by this method?
c. Are there any homologues of this gene in other species by this method?
(2) Use a protein sequence search strategy to find human paralogues of Sequence 4.
a. Comment on the graphic summary of the distribution of BLAST hits on the Query.
b. List any human paralogues and show the extent of their similarity to Sequence 4.
c. Why might you be able to find paralogues by searching for protein sequence but not DNA?
(3) Find the nearest homologue of the Sequence 4 protein in yeast species.
a. What is the effect of changing the BLOSUM scoring matrix and the E value threshold on the results? Explain these effects.
(4) Use an appropriate pair-wise alignment tool to align Sequence 4 with Q9CQ70 and A0A2L0PQQ2.
a. Comment on the alignments in each case, using the biological information available for each sequence.
Task 3 -
(1) Locate contig AL035695.17 within the human genome
a. Give the chromosomal location of this contig.
b. Identify the gene located on this contig and describe its location.
c. How many transcripts are encoded by this gene.
d. How many transcripts are expected to be expressed as proteins?
e. Do the longest and the shortest transcripts of this gene share any exons?
(2) Analyse the longest expressed protein for the gene identified in (1d) above.
a. Describe the likely location and topology of the encoded protein.
b. Identify the position of amino acids that are likely to be sites for post translation modification by
i. N-linked glycosylation and / or
ii. phosphorylation by serine, threonine or tyrosine kinases.
c. Discuss your results.
(3) Search the pfam database for matching families and domains present in the protein.
a. Identify the family / ies and the domain / s with matches to this sequence.
b. How many architectures share these domains?
c. Comment on those architectures that are present in Homo sapiens.
(4) Perform a literature search for this protein.
a. Comment on the functional role of this protein.
Attachment:- Bioinformatics Assignment File.rar