Reference no: EM133129561
You should perform the two mini projects and submit them as a single Word document or pdf containing your results (copied information or screen grabs of text / alignments / images) with a discussion of the methods and parameters chosen where appropriate. Note - be selective about the data you show.
Task 1
Question 1. Nudix hydrolase 15 is an enzyme associated with thiopurine-related haematopoietic toxicity.
a. Give the gene identifier for this enzyme and explain the role it plays in reducing the toxicity of 6-mercaptopurine.
Interrogate the PharmGKB database for variations in this enzyme that have a clinical annotation level of evidence of 1A linked to thiopurinesmercaptopurine and azathioprine.
b. How many variations are there?
c. Create a table of this information. You should include the allele and / or the variation identifier, information about the change and the consequence of the change.
Question 2. Search the GWAS catalog with the gene identifier you identified in 1a for studies that have linked variations in this gene with a phenotypic response.
a. How many associations are there for this gene?
b. How many traits have been linked to this gene?
c. Create a table of any additional SNPS identifying by your search. You should include the variation identifier, information about the change and the consequence of the change.
Question 3. Investigate the small nucleotide polymorphisms database (dbSNP) for variations in Nudix hydrolase 15.
a. How many variations have been found in this human gene and characterise them by variation class
b. How many variations are found in the coding regions of the gene?
c. How many non-synonymous variations are found in the gene?
d. How many missense variations are there in the gene?
e. How many variations have a clinical significance and have been associated with a drug response?
f. Create a table of any additional SNPS identifying by your search in 3e. You should include the variation identifier, information about the change and the consequence of the change.
View the first variation that has an impact on drug response in variation viewer in the GRCh38.p13 assembly release 109.
g. Categorise the variations are located within ~400 bp of this SNP by variant type
h. How many of the variations have a pathogenic clinical significance?
i. Identify any variations in this region that are characterised as a nonsense (stop gained) variation and create a table that has the variation identifier, details of the variation itself.
j. For each unique variation you have tabulated in question 1c, 2c, 3f and 3i view the allele frequency information and identify the population with the highest prevalence of the variant allele and the global frequency of the variant allele. Create a table of this information.
Question 4. Using the genome variation server 150, identify tag SNPs for Nudix hydrolase 15 that are located within 20000 bases upstream and downstream of the gene and are common to the HAPMAP-JPT and HapMap-CEU panels with an r2 value of 0.8 and an allele frequency of 20%
a. Provide a summary of the SNPs found by this search and identify how many of the SNPs are located in the gene you performed the search with
b. How many tag SNPs are there and how many are shared between these two populations?
c. List the SNPs that are in complete linkage disequilibrium.
d. Repeat the analysis with the r2 at 0.8 and allele frequency altered to 30%. Describe the effect this change has on the results and explain why it does so.
Question 5. You are required to design a test to identify people at increased risk of toxicity when taking thiopurines.
a. Discuss which, if any, of the variations you identified in questions 1 to 3 you would include in the test?
b. Perform a literature search to identify additional variations for inclusion.
Provide a critical evaluation of whether the test would be of use in individual population groups.
Task 2
Sequence 1:
MSQVTDMRSNSQGLSLTDSVYERLLSERIIFLGSEVNDEIANRLCAQILLLAAEDASKDISLYINSPGGSISAGMAIYDTMVLAPCDIATYAMGMAASMGEFLLAAGTKGKRYALPHARILMHQPLGGVTGSAADIAIQAEQFAVIKKEMFRLNAEFTGQPIERIEADSDRDRWFTAAEALEYGFVDHIITRAHVNGEAQ
Question 1. You have been provided with a sequence of a protein (above).
a) Identify what the sequence is and discuss its function.
b) Prepare a multiple sequence alignment of sequence 1 with other similar sequences in different but related organisms.
c) Using your multiple sequence alignment, produce a phylogenetic tree. In your answer, consider adding an outlier to help with rooting. Discuss your findings.
Question 2. The next tasks are centred around the human cytochrome P450 genes, with particular focus on CYP2C9.
a) Determine the phylogenetic relationships between members of the humanCYPfamily. In your answer, you should consider the below points:
i. Search strategy for genes in the human CYP family
ii. Multiple sequence alignment of CYP sequences
iii. Optimisation of alignment, if required
iv. Selection of alignment regions for phylogenetic analysis
v. Phylogenetic analysis using suitable method(s)
vi. Visualisation of the phylogeny as a phylogenetic tree
vii. Discussion of your results
b) Compare and contrast the two following studies of gene expression profiles in normal human tissues on the NCBI GEO site. In your comparison, discuss the experimental design and consider the platforms and the number and range of samples used.
i. Series GSE7905
ii. Series GSE2361
c) Determine the expression profile of CYP2C9 in normal human tissues using the above two series.
i. In which tissue is CYP2C9 expression most prominent?
ii. How similar are the results from the two studies?
d) Compare and discuss the expression profile of CYP2C9 from these two studies with that shown in the 53 GTEx RNA-Seq study in the EBI Gene Expression Atlas.
i. Based on what you know of its function, are these results to be expected?
ii. Choose 2 other members of the CYP family you have used to produce your phylogenetic tree and compare their expression profiles. Are these results expected?
Attachment:- Bioinformatics.rar