Reference no: EM13326456
1. Read the first file, CERES-report-2012-10.txt, into R as text.
You should end up with a character vector oct2012 that prints like this ...
> head(oct2012, 10)
[1] "Monthly Pan usage report for STATISTICS"
[2] "Date: Tue Oct 30 15:08:01 NZDT 2012"
[3] ""
[4] "Total for all department users:"
[5] "+--------+-------+-------------+------------------+----------------------+"
[6] "| user | jobs | Total_Cores | Total_Core_Hours | Average_Waiting_Time |"
[7] "+--------+-------+-------------+------------------+----------------------+"
[8] "| user35 | 430 | 430 | 67135.3453 | 0.82001744 |"
[9] "| user16 | 280 | 280 | 20280.2439 | 0.25450595 |"
[10] "| user29 | 55551 | 55551 | 13532.4547 | 0.36090980 |"
2. Use text search to determine which lines in the file contain the data that we are interested in (these are lines that start with a vertical bar, |, followed by a space, then the word user, then two digits).
You should end up with a numeric vector dataLines that prints like this
> dataLines
[1] 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
3. Extract the lines of interest and break each line of data into separate pieces, with one data value per piece.
You should end up with a list dataList that prints like this ...
> head(dataList)
[[1]]
[1] "" "user35" "430" "430" "67135.3453" "0.82001744"
[[2]]
[1] "" "user16" "280" "280" "20280.2439" "0.25450595"
[[3]]
[1] "" "user29" "55551" "55551" "13532.4547" "0.36090980"
[[4]]
[1] "" "user02" "935" "935" "1639.0689" "0.00066578"
[[5]]
[1] "" "user09" "379" "379" "813.2947" "0.02124011"
[[6]]
[1] "" "user23" "191" "191" "737.4111" "0.00002036"
4. Reduce the list of vectors to a matrix.
You should end up with a matrix dataMatrix that prints like this ...
> dataMatrix
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "" "user35" "430" "430" "67135.3453" "0.82001744"
[2,] "" "user16" "280" "280" "20280.2439" "0.25450595"
[3,] "" "user29" "55551" "55551" "13532.4547" "0.36090980"
[4,] "" "user02" "935" "935" "1639.0689" "0.00066578"
[5,] "" "user09" "379" "379" "813.2947" "0.02124011"
[6,] "" "user23" "191" "191" "737.4111" "0.00002036"
[7,] "" "user14" "11" "11" "641.2681" "0.00108586"
[8,] "" "user07" "90" "90" "435.6408" "0.00108333"
[9,] "" "user06" "14" "14" "371.6989" "0.95105159"
[10,] "" "user11" "24" "24" "364.8767" "0.00158565"
[11,] "" "user32" "7" "7" "78.6547" "0.00027778"
[12,] "" "user24" "2037" "2037" "44.1633" "0.00499182"
[13,] "" "user20" "17" "17" "19.5492" "0.00003268"
[14,] "" "user31" "5" "5" "1.9553" "0.00005556"
[15,] "" "user19" "1" "1" "0.0011" "0.00027778"
[16,] "" "user01" "1" "1" "0.0003" "0.00000000"
5. Create a data frame from the matrix, with the first column a character vector and all other columns numeric.
You should end up with a data frame oct2012df that prints like this ...
> oct2012df
user jobs cores coreHours waitTime
1 user35 430 430 67135.3453 0.82001744
2 user16 280 280 20280.2439 0.25450595
3 user29 55551 55551 13532.4547 0.36090980
4 user02 935 935 1639.0689 0.00066578
5 user09 379 379 813.2947 0.02124011
6 user23 191 191 737.4111 0.00002036
7 user14 11 11 641.2681 0.00108586
8 user07 90 90 435.6408 0.00108333
9 user06 14 14 371.6989 0.95105159
10 user11 24 24 364.8767 0.00158565
11 user32 7 7 78.6547 0.00027778
12 user24 2037 2037 44.1633 0.00499182
13 user20 17 17 19.5492 0.00003268
14 user31 5 5 1.9553 0.00005556
15 user19 1 1 0.0011 0.00027778
16 user01 1 1 0.0003 0.00000000
6. Write a function that takes the name of a report file as its argument and returns a data frame as its result. The data frame should include a column that contains the date of the report.
You should end up with a function that works like this ...
> readFile("CERES-report-2012-10.txt")
user jobs cores coreHours waitTime date
1 user35 430 430 67135.3453 0.82001744 2012-10
2 user16 280 280 20280.2439 0.25450595 2012-10
3 user29 55551 55551 13532.4547 0.36090980 2012-10
4 user02 935 935 1639.0689 0.00066578 2012-10
5 user09 379 379 813.2947 0.02124011 2012-10
6 user23 191 191 737.4111 0.00002036 2012-10
7 user14 11 11 641.2681 0.00108586 2012-10
8 user07 90 90 435.6408 0.00108333 2012-10
9 user06 14 14 371.6989 0.95105159 2012-10
10 user11 24 24 364.8767 0.00158565 2012-10
11 user32 7 7 78.6547 0.00027778 2012-10
12 user24 2037 2037 44.1633 0.00499182 2012-10
13 user20 17 17 19.5492 0.00003268 2012-10
14 user31 5 5 1.9553 0.00005556 2012-10
15 user19 1 1 0.0011 0.00027778 2012-10
16 user01 1 1 0.0003 0.00000000 2012-10
7. Generate a set of filenames for the 22 report files.
You should end up with a character vector filenames that prints like this ...
> filenames
[1] "CERES-report-2012-10.txt" "CERES-report-2012-11.txt"
[3] "CERES-report-2012-12.txt" "CERES-report-2013-01.txt"
[5] "CERES-report-2013-02.txt" "CERES-report-2013-03.txt"
[7] "CERES-report-2013-04.txt" "CERES-report-2013-05.txt"
[9] "CERES-report-2013-06.txt" "CERES-report-2013-07.txt"
[11] "CERES-report-2013-08.txt" "CERES-report-2013-09.txt"
[13] "CERES-report-2013-10.txt" "CERES-report-2013-11.txt"
[15] "CERES-report-2013-12.txt" "CERES-report-2014-01.txt"
[17] "CERES-report-2014-02.txt" "CERES-report-2014-03.txt"
[19] "CERES-report-2014-04.txt" "CERES-report-2014-05.txt"
[21] "CERES-report-2014-06.txt" "CERES-report-2014-07.txt"
8. Read all of the report files into R and combine them into a single large data frame.
You should end up with a data frame allFiles that prints like this ...
> dim(allFiles)
[1] 197 6
> head(allFiles)
user jobs cores coreHours waitTime date
1 user35 430 430 67135.3453 0.82001744 2012-10
2 user16 280 280 20280.2439 0.25450595 2012-10
3 user29 55551 55551 13532.4547 0.36090980 2012-10
4 user02 935 935 1639.0689 0.00066578 2012-10
5 user09 379 379 813.2947 0.02124011 2012-10
6 user23 191 191 737.4111 0.00002036 2012-10
> tail(allFiles)
user jobs cores coreHours waitTime date
192 user10 36 36 460.1714 0.08055556 2014-07
193 user28 1818 6385 344.2986 0.06411258 2014-07
194 user29 1140 1140 248.2494 0.03663962 2014-07
195 user04 2 10 17.4903 0.00611111 2014-07
196 user05 13 13 14.0053 0.00262821 2014-07
197 user07 1 1 0.4675 0.18361111 2014-07
9. Extract the information for the researcher that used the most Total_Core_Hours in a single month.
You should end up with a data frame biggestUser that prints like this ...
> biggestUser
user jobs cores coreHours waitTime date
105 user33 37286 37286 338691.4 90.17762 2013-08
(That is over 38 and a half years of compute time in one month!)
10. Calculate the total number of jobs for each user (over all 22 months), ordered from largest to smallest number of jobs.
You should end up with a data frame userJobsOrdered that prints like this ...
> userJobsOrdered
user jobs
29 user29 1155542
33 user33 363760
13 user13 81854
10 user10 38057
19 user19 19865
22 user22 16926
14 user14 8041
2 user02 7574
24 user24 5045
8 user08 3743
17 user17 3582
11 user11 2947
28 user28 1818
23 user23 862
16 user16 741
30 user30 719
27 user27 668
4 user04 656
7 user07 610
12 user12 447
35 user35 430
9 user09 426
25 user25 330
34 user34 237
15 user15 163
18 user18 148
6 user06 125
26 user26 76
20 user20 38
5 user05 37
21 user21 12
32 user32 12
31 user31 5
1 user01 1
3 user03 1
11. Calculate the average job size for each researcher (over all 22 months), ordered from largest to smallest, where average job size for each researcher is:
You should end up with a data frame jobSize that prints like this ...
> jobSizeOrdered
user jobs coreHours size
6 user06 125 45155.3056 361.2424448
35 user35 430 67135.3453 156.1287100
18 user18 148 18300.0717 123.6491331
16 user16 741 37189.4808 50.1882332
27 user27 668 13686.0014 20.4880260
30 user30 719 13784.4089 19.1716396
14 user14 8041 147641.4976 18.3610866
26 user26 76 1318.9308 17.3543526
12 user12 447 6562.1198 14.6803575
20 user20 38 535.2561 14.0856868
11 user11 2947 28317.8951 9.6090584
25 user25 330 3038.1591 9.2065427
23 user23 862 7421.1687 8.6092444
19 user19 19865 145747.1024 7.3368791
32 user32 12 81.2500 6.7708333
34 user34 237 1543.3420 6.5119916
7 user07 610 3840.4983 6.2958989
15 user15 163 685.9997 4.2085871
10 user10 38057 102941.5050 2.7049296
2 user02 7574 17164.3167 2.2662156
33 user33 363760 756315.1053 2.0791596
9 user09 426 856.5858 2.0107648
24 user24 5045 8424.0484 1.6697816
8 user08 3743 5861.6250 1.5660232
4 user04 656 992.8083 1.5134273
21 user21 12 12.0733 1.0061083
5 user05 37 26.7850 0.7239189
29 user29 1155542 774963.5608 0.6706494
31 user31 5 1.9553 0.3910600
22 user22 16926 3684.3647 0.2176749
28 user28 1818 344.2986 0.1893832
13 user13 81854 12630.9523 0.1543108
17 user17 3582 368.8628 0.1029768
1 user01 1 0.0003 0.0003000
3 user03 1 0.0003 0.0003000
12. Calculate the total number of jobs per month in 2013, ordered by calendar month.
You should end up with a data frame jobsPerMonth that prints like this ...
> jobsPerMonth
month jobs
1 January 67070
2 February 65532
3 March 61299
4 April 13789
5 May 130891
6 June 14490
7 July 474124
8 August 103171
9 September 241781
10 October 55768
11 November 19902
12 December 13729
13. Write a function that takes the name of a report file as its argument and returns the percentage value from the bottom table in the report (the table with the heading Percentage of the available department resource:) as its result.
You should end up with a function readFilePercent that works like this ...
> readFilePercent("CERES-report-2012-10.txt")
[1] 230
14. Read all of the percentage values from the monthly report files and create a data frame with the year-month for each report in one column and the percentage values in another column.
You should end up with a data frame percentDF that prints like this ...
> percentDF
dates percentages
1 2012-10 230
2 2012-11 278
3 2012-12 436
4 2013-01 863
5 2013-02 96
6 2013-03 151
7 2013-04 397
8 2013-05 110
9 2013-06 53
10 2013-07 391
11 2013-08 807
12 2013-09 294
13 2013-10 112
14 2013-11 225
15 2013-12 152
16 2014-01 36
17 2014-02 7
18 2014-03 8
19 2014-04 96
20 2014-05 60
21 2014-06 15
22 2014-07 8
Attachment:- data.txt