Calculate the pooled variance

Assignment Help Programming Languages

Reference no: EM132379442

TASK

This data set from Alizadeh et al. at Stanford.

In this study, the investigators were evaluating diffuse large B-cell lymphoma (DLBCL).

Using expression profiling and hierarchical clustering

They were able to identify 2 distinct forms of DLBCL that indicate different stages of B-cell differentiation.

"One type expressed genes characteristic of germinal centre B cells (‘germinal centre B-like DLBCL');

The second type expressed genes normally induced during in vitro activation of peripheral blood B cells (‘activated B-like DLBCL')."

They also found that the germinal centre B-like DLBCL patients had a better survival rate.

• Use this data set to evaluate the power and sample size in this experiment.

• Also look for the necessary number of samples to appropriately power the study.

• First, calculate the power and n required using a single gene calculation for illustration of the formula,

• Then, conduct a more multivariate summary that gives an idea of the power or n required for a specific percentage of genes/probes in the experiment.

• Remember that general power formulas do not apply when attempting to summarize all genes/probes on an array.

Steps:

1- Download the Eisen DLBCL data set and save as a text file

2- Load into R, using read.table and arguments:
header=T
na.strings="NA"
blank.lines.skip=F
row.names=1

• There are missing values in this data frame because it is working with cDNA data.

3- Get the class label file "eisenClasses.txt" and read it into R.
Use the header=T argument.

4- Subset the data frame with the class labels and look at the positions so you know where one class ends and the other begins.

• Remember that ‘subset' means to re-index (i.e. reorder) the column headers.
• If you look at the original column name order with dimnames(dat)[[2]] both before and after you reorder them, u will see the difference

5- Pick a gene, remove cells that have "NAs", and

6- And plot the values for both classes with a:
- boxplot (use the argument col=c("red","blue")
to color separate boxes)
- histogram
(This should have 2 separate histogram plots on 1 page;
Use par(mfrow=c(2,1)) function prior to plotting the first).

Color each class something different in the boxplot and histogram.

7- Calculate the pooled variance,

8- And calculate the minimum sample size necessary to detect a 1.5 fold difference (at 80% power and 99% confidence).

9- Calculate the sample size required for the same gene selected in #5 using the empirically determined delta between the two groups, assuming 99% confidence and 80% power.

10- load the ssize and gdata libraries,

AND calculate the standard deviation for each gene in the matrix

(Use the na.rm=T argument),

And plot a histogram of the standard deviations.
Label the plot accordingly.

11- Calculate AND plot a proportion of genes vs. sample size graph to get an idea of the number of genes that have an adequate sample size for confidence=95%, effect size=3 (log2 transform for the function), and power=80%.

Attachment:- Task.rar

Reference no: EM132379442

Questions Cloud

Compute the cartesian products : Verification whether your functions works correctly by using them to compute the following cartesian products: {1, 2, 3} x {5, 6}

What is the estimated cost for a patient whose surgery has : Recent research into the cost of various medical procedures has shown the impact of certain complications encountered in surgery on the total cost.

What impact might your solution have on the other levels : Many of you have experience in complex adaptive systems whether you realize it or not. Thinking about your current or future practice area, identify an issue.

What was the biggest obstacle you encountered : What worked best for this online course? What was the biggest obstacle you encountered? What suggestion would you have for improving the course layout?

Calculate the pooled variance : Calculate the power and n required using a single gene calculation for illustration of the formula and Calculate the pooled variance

How does the author establish ethos-pathos and logos : How does the author establish ethos, pathos, and logos? Which appeal is dominant? Why? How does the author convince the reader?

The difference between a cpa and a cma in accounting : What is the difference between a CPA and a CMA in accounting? Is one better than the other? The response paper should be in APA format.

Estimate the fixed and variable portions for maintenance : he owner is concerned about the maintenance costs for the production machinery because maintenance costs for the previous fiscal year.

Compare the physical assessments among school-aged children : Compare the physical assessments among school-aged children. Describe how you would modify assessment techniques to match the age and developmental stage.

User Account

All Pages