Reference no: EM132539358
Assignment: Competencies
• Set up an analytical program.
• Apply data structures, objects, and classes by utilizing the build-in learning resources.
• Utilize scripting and programming languages to read and write data files used in Data Science.
• Utilize core programming fundamentals to achieve desired analytical outcomes.
• Employ advanced data structures.
• Utilize advanced programming fundamentals.
Scenario: After doing some marketing, your organization was able to procure a potential customer who is interested in examining different sales scenarios to find the business model that is most profitable.
You have been provided a set of sales data which has characterized clothing stores with a number of attributes along with associated sales. Through the application of decision tree analysis, you are asked to actually build a model that will read in the data and after loading into a sophisticated data structure (data frames or associative array). By splitting the data into two sets (over and under the median sales), you will then apply a decision-tree analysis in order to find out which attributes contribute to either high or low sales.
Instructions: Taking the Rossman data set from Kaggle, you will use either the python or R programming language to read in the associated data set. Next, you are to load the data into either an associative array or frame-based representation to make it suitable to analysis.
Next, you are to apply the Python or R libraries which may include, but not be limited to, the R (CART) module or the associated Python (scikit learn).
Perform the analysis and output the file containing only the limited feature set.
Note: you will have only a single submission which will be your source code in a plain text file and output generated, and it will be implemented in your preference of either the Python or R programming language.