Project about the time-series data trend prediction

Assignment Help JAVA Programming

Reference no: EM13968814

The specific problem in this project is about the time-series data trend prediction. The specific application scenario is in e-commerce. You are given a real dataset obtained from a real-world e-commerce application where there were 1000 products and 31490 customers (i.e., buyers) who bought these products. Of these 1000 products there are 100 key products (popular products). Also these 1000 products are in 15 categories. The specific data are given in the seven tables and the specific details of these tables are given below. The time window of this dataset is in 119 days with data documentation for each day. Hence, the time unit is one day where the timeline goes from the 0-th day to the 118-th day (17 weeks in total). Now you are asked to do the sale quantity prediction for the 100 key products for each day between the 119-th day and the 146-th day (four weeks).

• buyer_basic_info.txt: the basic attribute information of the buyers; in particular, the column names of this table are "buyer_id", "registration_time", "seller_level", "buyer_level", "age", and "gender". If we do not know the gender of a buyer, we set this buyer's gender attribute as -1.

• buyer_historical_category15_quantity.txt: the consumption quantities in the 15 categories for the buyers; in particular, the column names of this table are "buyer_id", "consumption quantity in the 1st category", ..., and "consumption quantity in the 15th category". The 15 categories are the ones of the products the customers bought in this dataset.

• buyer_historical_category15_money.txt: the consumption amounts in the 15 categories for the buyers; in particular, the column names of this table are "buyer_id", "consumption amount in the 1st category", ..., and "consumption amount in the 15th category".

• product_features.txt: the basic attribute information of the products; in particular, the column names of this table are "product_id", "attribute_1", "attribute_2", and "original price".

• Key_product_IDs.txt: the key product IDs

• trade_info_training.txt: the trade information between the key products and the buyers from the 0-th day to the 118-th day (17 weeks); in particular, the column names of this table are "product_id", "buyer_id", "trade_time", "trade_quantity", and "trade_price".

• product_distribution_training_set.txt: there are 120 columns, where the 1-st column shows the "product_id" and the 2-nd to the 120-th columns show the "quantities" of the key products from the 0-th day to the 118-th day; for example, the element at the 5-th row and the 10-th column in this table shows the quantity of the 5-th product at the 8-th day.

For grad students you are asked to do the prediction for the overall sale quantity of the 100 key products for each day of the four weeks (i.e., for each of the time window from the 119-th day to the 146-th day), and also for each key product for each day of the four weeks.

This phase is for the coding part of the project and concerns with the implementation of a time-series prediction method that you either take from the literature or you have developed by yourself as the result of your research in the first phase.

Please make sure to follow the format requirement as the text output file specified here. The file puts each prediction as one line where the first prediction is for the overall prediction and each subsequent prediction is for a key product. Each prediction output line begins with the key product id where the overall prediction id is 0. There is a space between the prediction and the key product id. Then there is a space between a pair of the predictions of two neighboring days. The prediction lines in the output file begin with the first line as the overall prediction where the product id is 0, and then the first key product prediction with the smallest product id (i.e., 1), all the way to the last line as the prediction for the last key product prediction (i.e., id = 964). Also note that for undergrad students your output file only has one line prediction just for the overall prediction beginning with the product id = 0.

What you need to turn in: you shall turn in a zipped package containing the source code of your implementation of the prediction method with appropriate comments and documentations in the code, a README file to explain how to compile and run your code under what specific environment, and a text file containing the output matrix following exactly the format requirement stated above.

Attachment:- Data.rar

Reference no: EM13968814

Questions Cloud

Full interpretation of the results : The results are analyzed via two- factor ANOVA, one factor being network and the other factor being news time. Complete the following ANOVA table for this study, and give a full interpretation of the results.

Problem regarding the mixed design and blocking : The results were analyzed using a mixed design and blocking, and the reported results were as follows.14

Which types of tides are found in the united states : The purpose of this activity is to learn how to graph tidal data from locations in the United States and to interpret your results.

Problem regarding the random-effects model : 1. Discuss the context in which can be analyzed by using a random-effects model. 2. What are the reasons for conducting a two-way analysis rather than two sep- arate one-way ANOVAs? Explain.

Project about the time-series data trend prediction : The coding part of the project and concerns with the implementation of a time-series prediction method that you either take from the literature or you have developed by yourself

Difference between monopoly and strong competition : Further analysis reported that for the difference between monopoly and limited competition, F (1, 272) = 67.9 and for the difference between monopoly and strong competition, F (1, 272) = 71.3.10

Identify the biological concepts from the course : Include a cover page containing the title of the assignment, the student's name, the professor's name, the course title, and the date. The cover page and the reference page are not included in the required assignment page length.

Degrees of freedom for factor : A further analysis of differences between every pair of groups was reported as all p 0.01.9 Interpret these reported ?ndings. What were the degrees of freedom for Factor, Error, and Total?

Create a document which defines and describes it : Create a document which defines and describes IT. You may use any resource; however, be sure to cite any resources you use

User Account

All Pages