Develop and implement solutions for processing datasets

Assignment Help Data Structure & Algorithms
Reference no: EM133005762

CIS7031 Programming for Data Analysis - Cardiff University

The aim of this module is to help students acquire skills for job roles of Data Scientist, Data Modellers and Data Analyst. Students taking this module will have the opportunity to understand and implement various statistical and computational techniques for analysing datasets using various industry standard software and programming languages.

Learning Outcome 1: Critically analyse and evaluate various statistical and computational techniques for analysing datasets and determine the most appropriate technique for a business problem;

Learning Outcome 2: Critically evaluate, develop and implement solutions for processing datasets and solving complex problems in various environments using relevant programming paradigms;

Learning Outcome 3: Evaluate and apply key steps and issues involved in data preparation, cleaning, exploring, creating, optimizing and evaluating models;

Learning Outcome 4: Evaluate and apply aspects of data science applications and their use.

Production Planning Analytics Data Challenge

Overview

Production planning is one of the major activities carried out by the planning department of every garment factory. This dataset contains 6 months' production planning data and actual production related data for the same period. The planning data set is given inside the "Plan" folder, and the actual production data is given inside the "Production Quantities" folder.

"Plan" Folder
There are 49 files, which gives production planning data for four-line sections (LC Sec 1, LC Sec 2, LC Sec 3, LC Sec 4) for different periods. For example, according the file name, LC Sec 1- 01.02-01.12, it gives line section 1, planning data for the period 01.02 to 01.12.
"Production Quantities" Folder

There are 118 files, which gives actual production data for the same 6 months' period, which are relevant to the planning data period. Each file represents the actual production data for each day. For example, PR 01.02.2018 - D051, gives the actual production data of 01.02.2018
Following information is also provided for you explore the datasets:
• S/O - Sales Order
• LI - Line Item - An item appearing on a single line with unique color, size, etc. LI differs from one to another with the color, size and other features.
• S/O and LI combined as a key will be a unique key
• SMV - Standard Minute Value (Standard Time taken to finish a particular product)
• Style - Product
• Efficiency = Standard Hours/ Work Hours

As a data scientist, your task will be to clean, normalise and transform these data into R compatible formats and undertake an extensive data mining using Machine Learning. The main objective of this data challenge is to develop Machine Learning model to identify various data patterns, and forecast the actual production depend on the plan. Report on any interesting patterns, (for example, order patterns), that you may reveal from the data analysis and possible visualizations.
In your discussion you will provide a critical synopsis of the challenges of data analysis, integration and visualisation you faced during this exercise. You will provide relevant assumptions you made with valid justifications during this exercise.

Assignment Tasks

a. Provide detailed description of each datasets, their properties and relationships

b. Read data from csv files to R environment for processing

c. Clean any outliers, exceptional values from the datasets

d. Normalizations, Scaling

e. Merge the datasets

f. Create training and test datasets, if required

g. Training a model on the data

h. Apply different Machine Learning approaches and discuss

i. Accuracy of each different models

j. Alternative ways of normalizations, model building, and their performances

k. Patterns identified and their visualizations

l. Describe a detailed comparative analysis between the scaling, Machine Learning approaches - strengths, limitations, uniqueness

m. Comparative analysis should be in relation to integration, transformation, visualization and data mining

n. Provide a brief discussion about the knowledge gained

Attachment:- Programming for Data Analysis.rar

Reference no: EM133005762

Questions Cloud

Compile a detailed report on the nature of an excess : Compile detailed report on the nature of an excess, how it should accounted for and effects of its recognition on subsequent consolidated financial statements
Assignment on covid-19 crisis : COVID-19 crisis: Response management guide and your own research to support the evaluation.
Record the transactions in the general journal : Using the periodic inventory system, record the above transactions in the general journal of WeAreFashion for the month of January
What are the consequences of the hr programs : What are the consequences of the HR programs that don't have good External & Internal Fit?
Develop and implement solutions for processing datasets : Develop and implement solutions for processing datasets and solving complex problems in various environments using relevant programming paradigms
Have federal antidiscrimination laws gone too far : Have federal antidiscrimination laws gone too far? Should public policy in the untied states seek a return to employment-at-will
Analyze the major components of comprehensive quality : Could you please analyze the major components of comprehensive quality assurance and risk management organization?
What was its operating profit margin : Last year Electric Autos had sales of $195 million and assets at the start of the year of $340 million. What was its operating profit margin
Basis of unfair use of test scores for selection : -A male candidate scored three points lower than a female candidate on a selection test. The female candidate was hired. The male candidate filed a reverse disc

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Implement an open hash table

In this programming assignment you will implement an open hash table and compare the performance of four hash functions using various prime table sizes.

  Use a search tree to find the solution

Explain how will use a search tree to find the solution.

  How to access virtualised applications through unicore

How to access virtualised applications through UNICORE

  Recursive tree algorithms

Write a recursive function to determine if a binary tree is a binary search tree.

  Determine the mean salary as well as the number of salaries

Determine the mean salary as well as the number of salaries.

  Currency conversion development

Currency Conversion Development

  Cloud computing assignment

WSDL service that receives a request for a stock market quote and returns the quote

  Design a gui and implement tic tac toe game in java

Design a GUI and implement Tic Tac Toe game in java

  Recursive implementation of euclids algorithm

Write a recursive implementation of Euclid's algorithm for finding the greatest common divisor (GCD) of two integers

  Data structures for a single algorithm

Data structures for a single algorithm

  Write the selection sort algorithm

Write the selection sort algorithm

  Design of sample and hold amplifiers for 100 msps by using n

The report is divided into four main parts. The introduction about sample, hold amplifier and design, bootstrap switch design followed by simulation results.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd