Character array to store the entire human genome

Assignment Help Other Subject
Reference no: EM133334515

For this homework, you will need to use the most recent human genome assembly located on Monsoon:
/common/contrib/classroom/inf503/genomes/human.txt

• This file contains multiple scaffolds that comprise the human genome
• The genome is in FASTA format (see insert)
o The headers are unique and always begin with the ">" character. These can be discarded for this homework.

812_Human genome.jpg

 

Each line of genome file is exactly 80 characters long (plus carriage return character)
o The genomic sequences consist of the following alphabet {A, C, G, T, N}

Problem 1: Monsoon account creation and workshop
• Navigate to NAU's High Performance Computing Cluster (Monsoon) account creation
• Complete the Self-Paced Workshop
• Obtain and submit the validation codes to self-validate your account
• Take a screenshot of the successful ‘confirm user' command (see example below) and submit it as part of your writeup to complete problem #1 of the assignment.

Problem #2: basic text processing
Write code to read, store, and analyze the latest human genome assembly (found at:
/common/contrib/classroom/inf503/genomes/human.txt ). At minimum, your code must contain:
• A character array to store the entire human genome in a single data structure
• A separate function to read the human genome file
• A function to compute the number of A, C, G, or T characters in the human genome
• Comments describing major code blocks and control structures

A. Read in and store the human genome. There will be multiple scaffolds (each with a separate header denoted by ">"). Concatenate the entire genome (discard headers) into a single character array data structure. Collect the following statistics (see below) as you are reading the file. Hint: you can keep running totals or store scaffold sizes / names in a separate sets of arrays

• How many scaffolds were there?
• What was the longest and shortest scaffold? Provide names of scaffolds and lengths.
• What was the average scaffold length?

B. Write a function to assess the content of the human genome - count the total number of a given character (A, C, G, or T) in the whole genome.
• What is the ‘big O' notation of your search (linear / quadratic / cubic / etc)?
• How long does it take (in seconds) to execute this function? Hint: You will need to use system time within your code to get accurate time estimates.
• What was the GC content of the human genome (percent of C's and G's in the genome)?

1876_human genome1.jpg

Reference no: EM133334515

Questions Cloud

What is the objective of the product : What is the objective of the product? Who is the target customer of the product? What is the underlying Islamic contract of the product?
Financial stability at mezzo and macro level : Effects of sociopolitical drivers and cultural diversity on Social determinants of health, economic stability or financial stability at mezzo and macro level.
Prepare journal entries for the following transactions : Bmc ltd has 100,000 ordinary shares issued at $5 and paid $4 on 30 June 2021. Prepare journal entries for the following transactions incurred during the year
Explain how influence still impacts quality improvement : Explain how their influence still impacts quality improvement today in their area of expertise.
Character array to store the entire human genome : Monsoon account creation and workshop - character array to store the entire human genome in a single data structure
What entities are included in the ?nancial statements : Prepare a written analysis of the ?rm's disclosures for each set of accounts covered in the course. We will discuss these issues in class, and the ?nished
How to calculate the carrying amount fit the years 2022 : An equipment that was purchased on 1 July 2012 at a cost of RM1,340,000 had a reduction in production capacity since September 2018.
Determine what you would invest in today : Determine what you would invest in today if you were building a portfolio. Keep in mind your being age 21, single, and the financial goals you want to achieve.
What amount will fuzzy monkey report its investment : Prepare the relevant journal entries on the respective dates (record the interest at the effective rate).How would Fuzzy Monkey's 2021 statement of cash flows

Reviews

len3334515

2/9/2023 9:28:12 PM

Each homework submission must include: An archive (.zip or .gz) file of the source code containing: The makefile used to compile the code on Monsoon (5pts) All .cpp and .h files (5pts) A full write-up (.pdf of .doc) file containing answers to homework’s questions (5pts), including the exact command line needed to execute every subproblem of the homework Also hear audio attached student explanation The source code must follow the following guidelines STRICTLY

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd