Reference no: EM133147855 , Length: word count:800
ITECH3101 Business Analytics and Decision Support - Federation University
Exercise - SAS programming - Using SAS function and Array
Project 1: Knowing functions other than character functions
The SAS programming language has a rich assortment of functions that can greatly simplify some complex programming problems. We summarize some useful facts about SAS functions:
• All SAS function names are followed by zero or more arguments.
• Arguments to SAS functions can be variables, constants, expressions or even other functions.
• Functions can only return a single values and , in most cases, the arguments to SAS functions do not change after the function is executed.
1. MISSING function
The MISSING function returns a value of true (1) if the argument is a missing value and false (0) otherwise. Argument of this function can be a character or numeric value.
Example
In above code, missing function tests if age is a missing value. If so, we assign a missing value to the variable age_group.
After clicking Run tap, we get:
2. INPUT and PUT functions
INPUT function performs character-to-numeric conversion, and PUT function performs numeric-to-character conversion.
IINPUT function performs character-to-numeric conversion. Its first argument is a character value, typically a character variable, and its second argument is the informat you want to use to associate with the first argument.
Example
In above code, double-trailing @@ symbol " holds the line strongly." In other words, the reading pointer will move to the next record only if there are no more data values to be read on a line. Use the retain statement when you want SAS to preserve a variable's value from the previous iteration of the DATA step. In this program, we want to read data values that may either represent groups ('A' or 'B') or numeric scores. Because we don't know if we'll be reading a character or a number, we read very value as a character and test if it is an 'A' or a 'B'. If not, we assume it is a score and use the INPUT function to convert the character variable to numeric. In the statement score = input(test,5.);, the first argument test includes a character such as 45, the second argument the informat (5.) is larger than we need ( only need 2.). However, INPUT function will not read past the end of a character value, so there is no harm in choosing a large number for the numeric informat.
After clicking Run tap, we get:
PUT function performs numeric-to-character conversion. Its first argument is a numeric or character value, and its second argument is the format( either a built-in SAS format or one that you wrote). PUT function takes the first argument, formats it using the second argument, and assigns the result to a character value.
Example
In above code, we use PROC FORMAT to create own formats. The FORMAT procedure creates formats that will later be associated with variables in a FOTMAT statement. The procedure starts with statement PROC FORMAT and continues with one or more VALUE statement:
PROC FORMAT;
VALUE name range-1 = 'formatted-text-1'
range-2 = 'formatted-text-2'
.
.
;
where name must start with a $ if the format is for character data. Each range is the value of a variable that is assigned to the text given in quotation marks on the right side of the equal sign.
In this example, we have a data about age and create a format that places the ages into four groups. The variable age4 is character variable with values of '1','2','3', or '4'.
After clicking Run tap, we get:
3. LAG and DIF functions
The LAG (lagged) function returns the value of its argument the last time the function executed. If we execute the LAG function for every iteration of DATA step, it returns the value of its argument from previous observation. In SAS, we may want to compare a data value from a current observation with a value form a previous observation and may also want to look back several observations.
Example
In above code, the variable up_down will be the current day's price minus the price from the previous day because the program is executing LAG function for every iteration of DATA step.
After clicking Run tap, we get:
DIF (X) function is equal to X - LAG(X). We can substitute the line on the above program with up-down = dif(price);
4. Arithmetic and mathematical functions
Some of the more common arithmetic and mathematical functions and their purposes are listed below:
Function name Action
LOG Base e log
LOG10 Base 10 log
SIN Sine of he argument(in radians)
COS Cosine (in radians)
TAN Tangent(in radians)
INT Drops the fractional part of a number
SQRT Square root
Example
In above code, the program creates a new variable called loglos that is natural log of los (length of staying in hospital). loglos will be written in data set func_eg and its values will be the natural (base e) log of los.
After clicking Run tap, we get:
Project 2: Character functions
In this project, we discuss some functions that deal with character values.
1. LENGTHE and LENGTHC functions
LENGTHN function returns the length of a character value, not counting trailing blanks. If the argument is a missing value, the function returns a 0. LENGTHC function returns the storage length of a character variable.
Example
After clicking Run tap, we get:
In above code, the variable string is assigned a length of 5 and the variable miss is assigned a length of 4. The lengthn function returns a 3, the length of string with the(2) trailing blanks removed. The storage length shows that string has a length of 5. The lengthn function returns a length of 0 for the variable miss while lengthc returns the storage length (4).
2. COMPRESS function- Remove characters from a string
The COMPRESS function can remove any number of specified characters from a character variable. With the 'k' modifier, we can use this function to extract characters( for example, all digits ) from a string. This is one of the most powerful character functions in the SAS. If we provide only one argument( a character value), this function removes blanks from the string. An optional second argument is a string of characters that we ant to remove from the first argument.
Some of more useful modifiers are:
Modifier Description
'a' All upper- and lowercase letters
'd' All digits
'p' All punctuation(such as periods, commas,
etc
's' All whitespace characters(spaces, tabs.
linefeeds, carriage returns)
'i' Ignore case
'k' Keep the specified characters; remove all
others( very useful)
Example
string1='abc def 123'; string2='(908) 782-1234'; string3='120 Lbs.';
(1) compress (string1)= abcdef123
Because there is only one argument, compress function removes all blanks.
(2) compress(string1,'0123456789') = abc def The second argument are removed.
(3) compress(string1,,'d')= abc def
Using two commas tells the function that 'd' is the third argument(modifier) and you want to remove all digits. Note that using one comma means that 'd' is the second argument and you are trying to remove all 'd' from the string.
(4) compress(string2,,'kd')= 9087821234
This keeps digits and throws everything else away.
3. Character data verification
We sometimes want to be sure that only certain values are present in a character variable. In this case, we can use VERIFY function to test if there are any invalid characters present.
After clicking Run tap, we get:
In above code, only the values 'A','B','C','D', and 'E' are valid data values. To verify data, the VERIFY function inspects every character in the first argument, and if it finds any value not in the verify string (the second argument), it will return the position of the first offending value. If all the values of the string are located in the verify string, a value of 0 is returned. In this program, for the first observation, p will be 0; in the second observation, p will be 3; in the third observation, p will be 1 and in the fourth observation, p will be 4.
Project 3: Using Array
In this project, we work on the usage of array. SAS arrays are good tools that can reduce the amount of coding in a SAS DATA step and save you huge amounts of time.
A SAS array is a collection of SAS variables. Using the array name and a subscript, an array element can represent any one of the variables included in the array.
1. Converting all numeric values of 999 to missing values
In some applications, we may have certain data on a group of subjects such as age, height, and weight, etc. Missing values may be coded as 999 for each of these variables. In this case, we need to convert every value of 999 to a SAS missing value because SAS doesn't treat values of 999 as missing values.
In above code, the first iteration of the DO loop is statement: if miss[1] = 999 then miss
[1] = .; , which becomes if age = 999 then age = . ;.
Once DO loop has finished, each of variables in the array have been processed. As we don't need to DO loop counter (i) in the output data set, so we use a DROP statement to remove it from the data set.
After clicking Run tap, we get:
2. Using temporary arrays
Temporary array does not actually refer to a list of variables. In other words, no real variables are created when you use a temporary array. You can declare an array to be temporary and use the array elements in their subscripted from in DATA step.
In above code, the program uses a temporary array to hold the passing scores on five exams. Students' scores are then read and compared to these passing scores, and the number of failed courses is recorded. We define a temporary array pass by using the keyword _temporary_ following the brackets. pass[1] through pass[5] keep five passing initial values of scores. The five passing scores are available for comparison to the student grades in every iteration of DATA step because values of temporary array elements are automatically retained. The array statement array score[5] is equivalent to array scores[5] score1-score5;. We place an asterisk in the brackets of array score[*] to save time of counting the number numeric variables in the array.
After clicking Run tap, we get:
Project 4. Answering questions (Please do this at your home by using your own computer)
1 What is the difference between white hat and black hat SEO(Search Engine Optimization) activities?
2. A data set (MANY) contains the variables X1-X5,Y1-Y5. First, run the following program to create this data set:
DATA MANY;
INPUT X1-X5 Y1-Y5; DATALINES;
1 2 3 4 5 6 7 8 9 10
3 . 5 . 7 5 . . . 15
9 8 . . . 4 4 4 4 1
;
Write a program to include the following in data set MANY:
(1) The mean(average) of the X1-X5(call it MEAN_X) and the mean of Y1-Y5( call it MEAN_Y(hint: use MEAN function such as MEAN(OF X1-X5)
(2) The minimum value of X1-X5(call it MIN_X) and the minimum value of Y1- Y5(call it MIN_Y)(hint: use MIN function such as MIN(OF X1-X5)).
3. Describe and explain the structure of a typical internet search engine
4. What were the challenges, the proposed solution, and the obtained results for IGN company to increase search traffic?
Project 5. Creating a Professional Report
Summarize the above experiments procedure, results, answering questions and screenshots (project 1, 2, 3, 4) into one report. Your report is the assignment that is required to be submitted for evaluation on week 11. Create a report by following below steps.
You can add a chapter called Chapter 9 in your previous report.
1. Open your last week's report and find the end of last week's report.
2. Copy this week's related experimental results, your findings and Screenshots, and paste them at the end of last week's report.
3. Delete original Table of Content you created.
4. Select all content , align all text to both left and right margin
5. Use shortcut key approach, generate Chapter 9: SAS programming -3 Using SAS function and Array
6. Then use shortcut key approach to generate proper sub-chapters for this week's lab work.
7. Insert Table of Contents to your report.
Attachment:- Using SAS function and Array.rar