Sle712 bioinformatics and molecular biology techniques

Assignment Help Other Subject
Reference no: EM132500117

SLE712 Bioinformatics and Molecular Biology Techniques - Deakin University

Part 1: Importing files, data wrangling, mathematical operations, plots and saving code on GitHub

The purpose of this exercise will be for you to develop skills in problem solving, R coding. work together as a team using Rstudio and GitHub. You will be provided with two data files to work with: "gene_expression.tsv and "growth_data.csv" which are available from this URL':

To download a file with R. click on "view raw" and then you can copy the URL from the address bar and then use the download.file command in R.
• For points 1-10 below Describe how you solved the problem.
Provide the answer as directed. The answer could be a descriptive numerical, categorical, table or chart.
• Provide a link to GitHub repository with the following: The code should run without errors, and yield answers to points 1-10 below.
If working in a group, there needs to be evidence that all group members have made contributions to the code repository. This means that there needs to be -commits- and Issues-from each group member.
A README that describes the purpose of each script and their inputs and outputs.

The code should contain sufficient comments so that someone else can understand what each line or chunk of code is trying to achieve

The file "gene_expression.tsv" contains RNA-seq count data for two samples of interest.
1. Read in the file. making the gene accession numbers the row names. Show a table of values for the first six genes.
2. Make a new column which is the mean of the other columns. Show a table of values for the first six genes.
3. List the 10 genes with the highest mean expression
4. Determine the number of genes with a mean <10
5. Make a histogram plot of the mean values in png format and paste it into your report.
The file -growth_data.csy- contains measurements for tree circumference growing at two sites. control site and treatment site which were planted 20 years ago.
6. Import this csv file into an R object. What are the column names?
7. Calculate the mean and standard deviation of tree circumference at the start and end of the study at both sites.
8. Make a box plot of tree circumference at the start and end of the study at both sites.
9. Calculate the mean growth over the past 10 years at each site.
10. Use the t.test and wilcox.test functions to estimate the p-value that the 10 year growth is different at the two sites.

Part 2: Determine the limits of BLAST

In class you will be shown how to
• Download and unzip files
• Perform simple manipulations and analyses with sequence data
• Use a provided function to incorporate point mutations into a sequence
• Use provided functions to perform a BLAST search and interpret results
In this assignment we will be testing your ability to use supplied functions to perform an analysis into the limits of BLAST. Your group will be allocated one E. colt gene sequence found in the file:

• For points 1-6 below

Describe how you solved the problem.

Provide the answer as directed. The answer could be a numerical. categorical. table or chart.

• Provide a link to GitHub repository with the following:

The code should run without errors, and yield answers to questions 1-6 below.

If working in a group, there needs to be evidence that all group members have made contributions to the code repository. This means that there needs to be 'commits" and "issues" from each group member.

A README that describes the purpose of each script and their inputs and outputs.

The code should contain sufficient comments so that someone else can understand what each line or chunk of code is trying to achieve

1. Download the whole set of E. cob gene DNA sequences and use gunzip to decompress. Use the makeblast() function to create a blast database. How many sequences are present in the E.colt set?

2. Download the sample fasta sequences and read them in as above. For your allocated sequence, determine the length (in bp) and the proportion of GC bases.

3. You will be provided with R functions to create BLAST databases and perform blast searches. Use blast to identify what E. colt gene your sequence matches best. Show a table of the top 3 hits including percent identity. E-value and bit scores.

4. You will be provided with a function that enables you to make a set number of point mutations to your sequence of interest. Run the function and write an R code to check the number of mismatches between the original and mutated sequence.

5. Using the provided functions for mutating and BLASTing a sequence. determine the number and proportion of sites that need to be altered to prevent the BLAST search from matching the gene of origin. Because the mutation is random, you may need to run this test multiple times to get a reliable answer.

6. Provide a chart or table that shows how the increasing proportion of mutated bases reduces the ability for BLAST to match the gene of origin. Summarise the results in 1 to 2 sentences.

Attachment:- Bioinformatics and Molecular Biology Techniques.zip

Reference no: EM132500117

Questions Cloud

Budget line with and without the food stamps : Show the budget line with and without the food stamps. If John has homothetic preferences, how much more food will he buy when he receives the food stamps?
Did you see your name on the given websites : Research this: Use a search engine to locate instances of your name or user names. Did the search results list these names? If so, which online social networks.
State at least four assumptions of a perfectly competitive : a. State at least four assumptions of a perfectly competitive, monopoly, oligopoly and monopolistically competitive market structures.
Which system is more vulnerable to unauthorized access : A bank in New York has 15 branches spread throughout Eastern New York, each with its own minicomputer where its data are stored. Another bank has 7 branches.
Sle712 bioinformatics and molecular biology techniques : SLE712 Bioinformatics and Molecular Biology Techniques Assignment Help and Solution, Deakin University - Assessment Writing Service
Cobb-douglas production function : Cobb-Douglas Production Function:y=(2x1)^.5(x2)^.5If both the price of x1 (w1) and the output price (p) are equal to 1
What types of power do people tend to gain : What types of power do people tend to gain by being part of social networks?
Victims of air pollution : According to the Coase Theorem, why would a steel plant that creates air pollution agree to curtail production (and therefore pollution) if it were not legally
Identify whether it represents an accounting change or error : Identify whether it represents an accounting change or an error. If an accounting change, identify the type of change. For accounting errors

Reviews

Write a Review

Other Subject Questions & Answers

  Do sports betting systems work

Do Sports Betting Systems Work

  How school health is delivered within local school system

Investigate how school health is delivered within your local school system. You can look at one specific school, or an entire school district.

  Assess the challenges and rewards for counselors working

As individuals approach later adulthood and elderhood, they begin to accept the terminations of old roles and embrace the emergence of new roles.

  How culture or behavioral factors influence acquisition

Discuss some options that are available for health information system acquisition. Discuss systems of checks and balances could be used to eliminate or reduce resistance. Discuss how culture and/or behavioral factors influence acquisition

  How does unethical leadership adversely affect organization

Companies have become increasingly aware of the advantages that being ethically conscious have to offer, especially in the global economy.

  Explain what the given terms mean to you

In your own words, explain what the following terms mean to you as they apply to information security and safe computing: Confidentiality, Integrity.

  What kinds of assumptions interfere with critical thinking

What kinds of assumptions interfere with critical thinking? What are a few steps that can help you to refine your position on issues that meet the tests of logic?

  Discuss the federal and state criminal court systems

What are the similarities and differences between the federal and state criminal court systems. Explain

  Consequences of the bias in terms of financial decisions

What is the bias? What are the consequences of the bias in terms of financial decisions?

  Leadership of emperor constantine

Under the leadership of Emperor Constantine.   across the Mediterranean Sea to  Rome

  Australian health and community care system

Which values and principles do you believe the Australian Health and Community Care system should be based on? Why? Is this current practice in your opinion?

  Focus on the character of tony

In this 45-minute film You must focus on the character of Tony. Then must write about him and the middle childhood stage (age 7) using Erik Erikson.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd