Reference no: EM133096441
Series Analysis
Objective
The objective of this exercise is to use time series analysis to forecast sales tax collections in the state of California
Activities
• Acquire, import, and prepare data
• Apply data mining algorithms
• Configure forecasting models
• Create data visualizations
• Analyze and interpret output from models
• Publish results
Scenario
In the state of California, the Board of Equalization (BOE) is responsible for collecting sales and use taxes. These taxes are charged to the end consumer of goods and services and remitted to the State by the businesses that collect the taxes. Sales and use taxes are used by the state, counties, and cities of California to fund their general operations.
Forecasting of future tax revenues is an important part of government budgeting and legislation. Too little tax revenue and the State will need to issue bonds to borrow money or reduce spending on important programs and projects. Excess tax revenues will allow the State to pay down debt or increase spending on programs and projects.
You would like to forecast sales tax revenues five years into the future to get an estimate of the revenues during the upcoming years. While not 100% certain, you strongly believe that the current sales and use tax rate will remain the same because voters have voted down tax increases during the past few years and polls indicate that California residents are unhappy with what they feel are high sales taxes in their state.
ACQUIRE DATA FROM THE SOURCE
1. Go to the BOE data portal
a. Agree to the terms of use.
2. Select Sales and Use Tax.
3. Choose the link to "State Sales & Use Tax Collections and Number of Permits".
4. Export the file to Excel as shown in Figure 2.
a. The file name is SUTStateCollNoPermits.xlsx.
b. Check to see that you have 9 columns and 83 rows of data.
Time Series Analysis
Time series analysis is a technique that analysts use to (a) uncover any implicit structure (patterns or trends) in the data and (b) model that structure to make forecasts. The assumption is that the future, at least in the short term, will continue the structure of the past. This technique is useful wherever forecasting values such as sales quantities, airline passenger volume, economic metrics, and traffic volume is needed.
6. Launch SAP Predictive Analytics.
7. Click on Expert Analytics → Expert Analytics.
8. Import the data.
a. Click on File → New.
b. Choose Microsoft Excel → Next.
c. Choose the file you just downloaded from the BOE site, SUTStateCollNoPermits.xlsx.
(Alternatively the file is downloaded as Hands-on_4_Sales_and_Use_Tax_Data.xlsx)
d. Open.
e. See the preview of the data. You will notice that the last two columns of data are incomplete. It appears that the State only began collecting data on number of Use Tax permits separate from the number of Sales and Use Tax permits starting in 2009, most likely due to changes in legislation. If these values were important to our analysis, we would need to decide how to handle the incomplete data, perhaps by eliminating those two columns or by extending the total tax permits from the Number of permits: Sales and Use Tax into the Number of permits: Total column. Since we are only interested in tax revenues (not the number of permits), we will ignore the missing data.
f. Click Create.
g. After data are acquired, Click on Prepare tab if you are not already there. Notice that Expert Analytics has chosen 7 measures and all columns are formatted as numbers.
9. Prepare the data.
a. To make our analysis more readable (and more professional looking), change the Display Formatting on General Fund Tax Rate to a percentage.
b. Use Display Formatting to remove the decimal places for Collections Taxes and Collections Total.
c. Optional: Use Display Formatting to add comma separators to the Collections: Fee and all Number of Permits columns.
d. Notice that Fiscal Year From and Fiscal Year To both imported as numbers versus dates because the values are numeric in the Excel spreadsheet. Expert Analytics does recognize that these values are related to time because of the column names. You can see the clock icons next to the two Year column names. In order to do a time series analysis, at least one column needs to be a date. One way we can do this is to create a time hierarchy of one time period; in this case, a year.
i. Choose the cog next to Fiscal Year To and select Create a Time Hierarchy.
ii. Keep the default values as shown below.
iii. Confirm.
iv. You will now see a new column called CalculatedTimeStamp_id_12. This column will be the date column you will use for your time series forecast.
10. Save the PA file now and as you work through the exercise.
11. Visualize the data.
a. Click on the Visualize tab.
b. On a line chart, plot Collections: Total by Year.
Question 1: Notice that tax collections increase consistently year over year until 2012. Why does a consistent increase in revenues make sense? What happened in 2012 to cause the dip in collections?
12. You will now use exponential smoothing to forecast the tax collections for the next five years 2012 based on collections data from year ending 1934 to year ending 2015.
Question 2: Based on what you know about Exponential Smoothing, which algorithm would be appropriate for the data you plotted in step 11 and why?
a. Click on Predict tab.
b. Expert Analytics has some preset algorithms that were developed using R code. Rather than create our own R code, we will use one of the available algorithms. Under
Algorithms, double click on Triple Exponential Smoothing. This will connect the Triple Exponential Smoothing algorithm to the filtered data source.
c. Now Configure Settings for the Triple Exponential Smoothing icon by clicking on the cog next to it.
d. Configure the settings as shown in Figure 6 and Figure 7.
e. Click Done
13. Run the algorithm.
f. Click Run.
g. After the model runs, Click OK.
h. You will see the data grid view of the forecasted values of tax collections.
i. Click on Trend Chart.
j. You see a combined column and line chart showing the historical tax collections, fitted curve of collections, and forecast for collection for 5 years into the future (Figure 8).
k. The forecast depends on the choice of alpha, gamma, beta parameters for triple exponential smoothing. You can Click on the Designer tab and then reconfigure the setting for the Triple Exponential Smoothing algorithm. Choose different values for alpha, gamma, beta in the Advanced tab. Then rerun the model.
l. Save your PA file.
Question 3: Why did we originally choose a gamma value of zero to run the algorithm? Explain how alpha and beta settings affect the results of your forecast.
Question 4: Explore the results tab for your original alpha, beta and gamma settings to find the R-square factor. What is the R-square factor telling you about the model? Also look for Goodness of Fit. Comment on it as well.
Attachment:- Series Analysis.rar