Explanation of your algorithm and pseudocode

Assignment Help Basic Statistics

Reference no: EM131142323

I need help with my project for big data

It's up to us to define the specific design and limit or expand the scope but should involve substantial design, analysis, programming, and validation. I have listed the topic and technology that should be used for the project. You can make changes and consider other attributes for the system. Let me know the price and process to move further.

Topic: Recommendation or clustering system for IMDB movie dataset.

Technology: Spark will be used for solving the problem.

Data: https://www.imdb.com/interfaces

The movie database will be considering following attributes for the movie recommendation system.

1) Title

2) Genre

3) Actors

4) Actresses

5) Directors

6) Rating

Hypothesis: We wish to build a movie recommendation system, which will suggest movies to user that he/she might be interested in based on the tastes, interests, and people connections. also need a report

Parts for the report-

I. Design:

Design document should contain your proposed design of the solution.

Summary of problem definition - Focus on explaining what you want to do in the project, any assumptions, and limitations.
Description of input data - In the design document, you have to provide a summary of your data - for example, data format, attributes, and metadata. Please do not include data inside the report.
Explanation of your algorithm and pseudocode
Explanation of your Big Data strategy - Which Big Data strategy did you use and how does it make sense in the overall picture. If you used multiple technologies, list the project phases where you used each.
Create a data flow diagram for your application. - An example of DFD for MapReduce is here: https://creately.com/diagram/example/h21wfdxq2/MapReduce Similarly, you can create a data flow diagram for your machine learning/data analysis strategy.
Details of how your application handles bad or missing data and is your strategy robust i.e. can it recover from errors. Similarly, if you use machine learning, how do you handle over fitting?

II. Analysis of Results:

In this section, you will present your final results and analyze them. Following are certain key points:

• Summarize your results well. This could be in the form of tables, graphs, plots, or other visualization tools.

• Validate your results.

For example, you can compute the accuracy of your model on the test dataset, or use cross-validation on the training dataset, or you can show that there is a correlation between positive review and star rating, or that there is a correlation between positive sentiment and stock price. Try to come up with numerical results.

• If you results are below expectation, explain probable reasons.

III. Conclusion

Following are key points:

• Explain how using Big Data helped you with this project? Explain how using Big Data helped you arrive at a better/faster/more efficient solution.

Describe what you learned in this project.
Describe how your technique/strategy can be improved

PROJECT SOURCE CODE - Source Code and Sample data files:

Some of the coding requirements are as follows:

Please include a README file indicating which language and technology you used and how to compile your code.
You need to use HDFS for at least some part of the project.

- This could mean that you use HDFS for data extraction, pre-processing, or actual classification or clustering. The key is you have to use HDFS somewhere.

Your code should be well documented.
Ideally, you should create a UNIX script such that the entire workflow - data extraction, parsing, pre-processing, analysis, MapReduce, machine learning task - can be run using that script. The script can accept parameters from the command line.

Please attach a sample of your data. This should not be the entire dataset, but just the top few lines, so the TA can run your code. About 1000 lines/records should be fine.

Reference no: EM131142323

Questions Cloud

What is the average delay per vehicle : Each booth processes trucks at a uniform rate of 2 per minute. What is the average delay per vehicle, the maximum queue length, and the average queue length?

Present recommendation to sincere college board of directors : You have been hired as an HR staffing consultant by the administration of Sincere College. You are to prepare a comprehensive research paper that presents your recommendations to Sincere College's board of directors.

Why a free-rider problem might arise in this situation : Group projects are often assigned in classes, with everyone in the group receiving the same grade for the project. Explain why a free-rider problem might arise in this situation.

Which protist causes a sexually transmitted disease : What advantage does sound communication have over visual communication? Which Protist causes a sexually transmitted disease? Which of the following is considered to be most closely related to the plants?

Explanation of your algorithm and pseudocode : Explanation of your algorithm and pseudocode, Explanation of your Big Data strategy - Which Big Data strategy did you use and how does it make sense in the overall picture. If you used multiple technologies, list the project phases where you used eac..

What two characteristics define a public good : What two characteristics define a public good? Give an example. Why will private markets not supply the efficient level of public goods?

Why are goods with negative externalities often overproduced : Why are goods with negative externalities often overproduced? Why are goods with positive externalities often underproduced? Give an example for each.

What kind of negotiations could help engage indian employees : What kind of negotiations could help engage Indian employees and overcome some of the cultural problems encountered? How might culture play a role in the approach the Indian employees take in their negotiation with the financial firm?

What is the cloud and internet of things : Material visibility is always a topic in logistics and supply chain management- What is the cloud and Internet of things and how is it changing supply chain management?

User Account

All Pages