Implementation of a time-series prediction method

Assignment Help Basic Statistics
Reference no: EM131031562

Detailed Question: Have to use random forest prediction by using R or python programming langauage.

In this project, you are asked to study the general topic of time-series data mining, and specifically for time-series data trend prediction. Note that this is not a new topic in the literature, as studies were already around even way before the official advent of data mining research (e.g., in the literature of control theory or pattern recognition). On the other hand, in the literature of data mining, time-series data mining is considered as one of the advanced topics and has many important and hot applications in the real-world such as e-commerce, stock analysis, and weather forecast.

The specific problem in this project is about the time-series data trend prediction. The specific application scenario is in e-commerce. You are given a real dataset obtained from a real-world e-commerce application where there were 1000 products and 31490 customers (i.e., buyers) who bought these products. Of these 1000 products there are 100 key products (popular products). Also these 1000 products are in 15 categories. The specific data are given in the seven tables and the specific details of these tables are given below. The time window of this dataset is in 118 days with data documentation for each day. Hence, the time unit is one day where the timeline goes from the 0-th day to the 117-th day (17 weeks less one day in total). Now you are asked to do the sale quantity prediction for the 100 key products for each day between the 118-th day and the 146-th day (29 days).

-buyer_basic_info.txt: the basic attribute information of the buyers; in particular, the column names of this table are "buyer_id", "registration_time", "seller_level", "buyer_level", "age", and "gender". If we do not know the gender of a buyer, we set this buyer's gender attribute as -1.

-buyer_historical_category15_quantity.txt: the consumption quantities in the 15 categories for the buyers; in particular, the column names of this table are "buyer_id", "consumption quantity in the 1st category", ..., and "consumption quantity in the 15th category". The 15 categories are the ones of the products the customers bought in this dataset.

-buyer_historical_category15_money.txt: the consumption amounts in the 15 categories for the buyers; in particular, the column names of this table are "buyer_id", "consumption amount in the 1st category", ..., and "consumption amount in the 15th category".

-product_features.txt: the basic attribute information of the products; in particular, the column names of this table are "product_id", "attribute_1", "attribute_2", and "original price".

-Key_product_IDs.txt: the key product IDs

-trade_info_training.txt: the trade information between the key products and the buyers from the 0-th day to the 117-th day; in particular, the column names of this table are "product_id", "buyer_id", "trade_time", "trade_quantity", and "trade_price".

-product_distribution_training_set.txt: there are 119 columns, where the 1-st column shows the "product_id" and the 2-nd to the 119-th columns show the "quantities" of the key products from the 0-th day to the 117-th day; for example, the element at the 5-th row and the 10-th column in this table shows the quantity of the 5-th product at the 8-th day.

students are asked to do the prediction for the overall sale quantity of the 100 key products for each day of the time window from the 118-th day to the 146-th day, and also for each key product for each day of the time window.

You are given 10 minutes for the presentation. In the presentation, you must give the following information:

-Explain conceptually what time-series data mining is about
-Showcase the specific problem and the specific method you have implemented or developed as a solution to the problem you are given
-Demonstrate your implementation results in the prediction

The second phase is for the coding part of the project and concerns with the implementation of a time-series prediction method that you either take from the literature or you have developed by yourself as the result of your research in the first phase. You may use any programming language to implement the method and you may also use any existing libraries.

The first two phases begin at the beginning of the semester, and the due date of turning in the coding results is 24 April . Please make sure to follow the format requirement as the text output file specified here. The file puts each prediction as one line where the first prediction is for the overall prediction and each subsequent prediction is for a key product. Each prediction output line begins with the key product id where the overall prediction id is 0. There is a space between the prediction and the key product id. Then there is a space between a pair of the predictions of two neighboring days. The prediction lines in the output file begin with the first line as the overall prediction where the product id is 0, and then the first key product prediction with the smallest product id (i.e., 1), all the way to the last line as the prediction for the last key product prediction (i.e., id = 964). Also note that for undergrad students your output file only has one line prediction just for the overall prediction beginning with the product id = 0.

What you need to turn in: you shall turn in a zipped package containing the source code of your implementation of the prediction method with appropriate comments and documentations in the code, a README file to explain how to compile and run your code under what specific environment, and a text file containing the output matrix following exactly the format requirement stated above.

Verified Expert

Provides a clear workings on exponential time series, future predictions. Comparison of various forecasting techniques and selecting the appropriate technique for model building was done through R program

Reference no: EM131031562

Questions Cloud

Determine how many pounds you need to gain or lose to fit : Convert the previous formula to English units such that the weight is in pounds and the height in inches. Also, calculate your own BMI, and if it is not in the healthy range, determine how many pounds (or kg) you need to gain or lose to be fit.
Discuss what your plan of care would be : Discuss what your plan of care would be, including differential diagnoses and diagnostic exams for patients that present with the following conditions.
Determine additional assessment data that may be needed : These tools can help you determine additional assessment data that may be needed, develop a list of differential diagnoses, list of diagnostic exams that may be needed and also help with your plan of care.
Determine how long it will take him to lose 5 kg : A 100-kg man decides to lose 5 kg without cutting down his intake of 3000 Calories a day.
Implementation of a time-series prediction method : Have to use random forest prediction by using R or python programming langauage - The specific problem in this project is about the time-series data trend prediction.
Determine the amount of extra heat that must be supplied : Determine the amount of extra heat that must be supplied to the gas in the cylinder which is maintained at constant pressure to achieve this result. Assume the molar mass of the gas is 25.
Describe the evolution of managed care and the forces : Summarize at least one (1) managed care trend managed Medicaid and appraise how this trend will affect managed care's overall goal of managing costs, increasing access, and ensuring quality in the delivery of healthcare.
Determine how long it will take for the bmi of this person : Use the data in the text for calories and take the metabolizable energy content of 1 kg of body fat to be 33,100 kJ.
What is the sample correlation coefficient : A study wants to look at the correlation between sugar consumption and the development of cavities. What is the sample correlation coefficient? What type of correlation does this represent

Reviews

Write a Review

Basic Statistics Questions & Answers

  Find probability that vending company buy no defective units

A shipment of 150 television sets contains three defective units. In how many ways can a vending company buy three of these units and receive (a) no defective units, (b) one defective unit, and (c) at least two good units?

  A continuous probability distribution represents a random

question 1 a basketball team at a university is composed of ten players. the team is made up of players who playthe

  A game charges 1 to play and pays 4 if a card drawn from

a game charges 1 to play and pays 4 if a card drawn from the deck is a face card jack queen or king if not the player

  What is the probability that the test will commit a type ii

What is the probability that the test will commit a Type II error?

  How you use information to make actual business decisions

Read the article, "Better Living Through...Statistics?" and give an example of how you might use increasing information to make actual business decisions.

  Estimating of the population proportion of accounts

Construct a 99% confidence interval estimate of the population proportion of accounts that would purchase the additional line if the handset were free.

  Bureau of labor statistics

The Bureau of Labor Statistics announced that in January 2013, of all adult Americans, (143,322,000 were employed), (12,332,000 were unemployed), and 89,008,000 were not in the labor force. Use this information to calculate:

  A set of 10 data values has a mean and median of 1425 and

a set of 10 data values has a mean and median of 1425 and ranges from a minimum value of 987 to a maximum value of

  Run the chi-square test for independence using the

a researcher is interested in knowing if there are differences between the incidence of accidents among male and female

  Calculating the test statistic of a printing plant

The superintendent of a printing plant has selected a random sample of 100 rolls of paper from a large shipment. The average length of the sample rolls is 416 feet, with a variance of 2704 feet.

  Hypothesis test for one sample meana survey of 50 clientsin

hypothesis test for one sample meana survey of 50 clientsin january 08 fifty clients of a county mental health mental

  There are 10 swedes and 7 finns we should choose a

there are 10 swedes and 7 finns. we should choose a committee that consists of 9 people. the committee has a chairman

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd