Sle712 bioinformatics and molecular biology techniques

Assignment Help Other Subject
Reference no: EM132500117

SLE712 Bioinformatics and Molecular Biology Techniques - Deakin University

Part 1: Importing files, data wrangling, mathematical operations, plots and saving code on GitHub

The purpose of this exercise will be for you to develop skills in problem solving, R coding. work together as a team using Rstudio and GitHub. You will be provided with two data files to work with: "gene_expression.tsv and "growth_data.csv" which are available from this URL':

To download a file with R. click on "view raw" and then you can copy the URL from the address bar and then use the download.file command in R.
• For points 1-10 below Describe how you solved the problem.
Provide the answer as directed. The answer could be a descriptive numerical, categorical, table or chart.
• Provide a link to GitHub repository with the following: The code should run without errors, and yield answers to points 1-10 below.
If working in a group, there needs to be evidence that all group members have made contributions to the code repository. This means that there needs to be -commits- and Issues-from each group member.
A README that describes the purpose of each script and their inputs and outputs.

The code should contain sufficient comments so that someone else can understand what each line or chunk of code is trying to achieve

The file "gene_expression.tsv" contains RNA-seq count data for two samples of interest.
1. Read in the file. making the gene accession numbers the row names. Show a table of values for the first six genes.
2. Make a new column which is the mean of the other columns. Show a table of values for the first six genes.
3. List the 10 genes with the highest mean expression
4. Determine the number of genes with a mean <10
5. Make a histogram plot of the mean values in png format and paste it into your report.
The file -growth_data.csy- contains measurements for tree circumference growing at two sites. control site and treatment site which were planted 20 years ago.
6. Import this csv file into an R object. What are the column names?
7. Calculate the mean and standard deviation of tree circumference at the start and end of the study at both sites.
8. Make a box plot of tree circumference at the start and end of the study at both sites.
9. Calculate the mean growth over the past 10 years at each site.
10. Use the t.test and wilcox.test functions to estimate the p-value that the 10 year growth is different at the two sites.

Part 2: Determine the limits of BLAST

In class you will be shown how to
• Download and unzip files
• Perform simple manipulations and analyses with sequence data
• Use a provided function to incorporate point mutations into a sequence
• Use provided functions to perform a BLAST search and interpret results
In this assignment we will be testing your ability to use supplied functions to perform an analysis into the limits of BLAST. Your group will be allocated one E. colt gene sequence found in the file:

• For points 1-6 below

Describe how you solved the problem.

Provide the answer as directed. The answer could be a numerical. categorical. table or chart.

• Provide a link to GitHub repository with the following:

The code should run without errors, and yield answers to questions 1-6 below.

If working in a group, there needs to be evidence that all group members have made contributions to the code repository. This means that there needs to be 'commits" and "issues" from each group member.

A README that describes the purpose of each script and their inputs and outputs.

The code should contain sufficient comments so that someone else can understand what each line or chunk of code is trying to achieve

1. Download the whole set of E. cob gene DNA sequences and use gunzip to decompress. Use the makeblast() function to create a blast database. How many sequences are present in the E.colt set?

2. Download the sample fasta sequences and read them in as above. For your allocated sequence, determine the length (in bp) and the proportion of GC bases.

3. You will be provided with R functions to create BLAST databases and perform blast searches. Use blast to identify what E. colt gene your sequence matches best. Show a table of the top 3 hits including percent identity. E-value and bit scores.

4. You will be provided with a function that enables you to make a set number of point mutations to your sequence of interest. Run the function and write an R code to check the number of mismatches between the original and mutated sequence.

5. Using the provided functions for mutating and BLASTing a sequence. determine the number and proportion of sites that need to be altered to prevent the BLAST search from matching the gene of origin. Because the mutation is random, you may need to run this test multiple times to get a reliable answer.

6. Provide a chart or table that shows how the increasing proportion of mutated bases reduces the ability for BLAST to match the gene of origin. Summarise the results in 1 to 2 sentences.

Attachment:- Bioinformatics and Molecular Biology Techniques.zip

Reference no: EM132500117

Questions Cloud

Budget line with and without the food stamps : Show the budget line with and without the food stamps. If John has homothetic preferences, how much more food will he buy when he receives the food stamps?
Did you see your name on the given websites : Research this: Use a search engine to locate instances of your name or user names. Did the search results list these names? If so, which online social networks.
State at least four assumptions of a perfectly competitive : a. State at least four assumptions of a perfectly competitive, monopoly, oligopoly and monopolistically competitive market structures.
Which system is more vulnerable to unauthorized access : A bank in New York has 15 branches spread throughout Eastern New York, each with its own minicomputer where its data are stored. Another bank has 7 branches.
Sle712 bioinformatics and molecular biology techniques : SLE712 Bioinformatics and Molecular Biology Techniques Assignment Help and Solution, Deakin University - Assessment Writing Service
Cobb-douglas production function : Cobb-Douglas Production Function:y=(2x1)^.5(x2)^.5If both the price of x1 (w1) and the output price (p) are equal to 1
What types of power do people tend to gain : What types of power do people tend to gain by being part of social networks?
Victims of air pollution : According to the Coase Theorem, why would a steel plant that creates air pollution agree to curtail production (and therefore pollution) if it were not legally
Identify whether it represents an accounting change or error : Identify whether it represents an accounting change or an error. If an accounting change, identify the type of change. For accounting errors

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd