Reference no: EM132379442
TASK
This data set from Alizadeh et al. at Stanford.
In this study, the investigators were evaluating diffuse large B-cell lymphoma (DLBCL).
Using expression profiling and hierarchical clustering
They were able to identify 2 distinct forms of DLBCL that indicate different stages of B-cell differentiation.
"One type expressed genes characteristic of germinal centre B cells (‘germinal centre B-like DLBCL');
The second type expressed genes normally induced during in vitro activation of peripheral blood B cells (‘activated B-like DLBCL')."
They also found that the germinal centre B-like DLBCL patients had a better survival rate.
• Use this data set to evaluate the power and sample size in this experiment.
• Also look for the necessary number of samples to appropriately power the study.
• First, calculate the power and n required using a single gene calculation for illustration of the formula,
• Then, conduct a more multivariate summary that gives an idea of the power or n required for a specific percentage of genes/probes in the experiment.
• Remember that general power formulas do not apply when attempting to summarize all genes/probes on an array.
Steps:
1- Download the Eisen DLBCL data set and save as a text file
2- Load into R, using read.table and arguments:
header=T
na.strings="NA"
blank.lines.skip=F
row.names=1
• There are missing values in this data frame because it is working with cDNA data.
3- Get the class label file "eisenClasses.txt" and read it into R.
Use the header=T argument.
4- Subset the data frame with the class labels and look at the positions so you know where one class ends and the other begins.
• Remember that ‘subset' means to re-index (i.e. reorder) the column headers.
• If you look at the original column name order with dimnames(dat)[[2]] both before and after you reorder them, u will see the difference
5- Pick a gene, remove cells that have "NAs", and
6- And plot the values for both classes with a:
- boxplot (use the argument col=c("red","blue")
to color separate boxes)
- histogram
(This should have 2 separate histogram plots on 1 page;
Use par(mfrow=c(2,1)) function prior to plotting the first).
Color each class something different in the boxplot and histogram.
7- Calculate the pooled variance,
8- And calculate the minimum sample size necessary to detect a 1.5 fold difference (at 80% power and 99% confidence).
9- Calculate the sample size required for the same gene selected in #5 using the empirically determined delta between the two groups, assuming 99% confidence and 80% power.
10- load the ssize and gdata libraries,
AND calculate the standard deviation for each gene in the matrix
(Use the na.rm=T argument),
And plot a histogram of the standard deviations.
Label the plot accordingly.
11- Calculate AND plot a proportion of genes vs. sample size graph to get an idea of the number of genes that have an adequate sample size for confidence=95%, effect size=3 (log2 transform for the function), and power=80%.
Attachment:- Task.rar