Cluster and cloud computing assignment

Assignment Help Other Subject
Reference no: EM133114188

Cluster and Cloud Computing Assignment

Problem Description

Your task in this programming assignment is to implement a simple, parallelized application leveraging the University of Melbourne HPC facility SPARTAN. Your application will use a large Twitter dataset and a grid/mesh for Sydney to identify the languages used in making Tweets. Your objective is to count the number of different languages used for tweets in the given cells and the number of tweets in those languages and hence to calculate the multicultural nature of Sydney!

You should be able to log in to SPARTAN through running the attached command:

If you are a Windows user then you may need to install an application like Putty.exe to run ssh. (If you are coming from elsewhere with different firewall rules, then you may need to use a VPN).

The files to be used in this assignment are accessible at:

You should make a symbolic link to these files, i.e. you should run the following commands at the Unix prompt from your own user directory on SPARTAN:

The sydGrid.json file includes the latitudes and longitudes of a range of gridded boxes as illustrated in the figure below, i.e., the latitude and longitude of each of the corners of the boxes is given in the file.

Your assignment is to (eventually!) search the large Twitter data set (bigTwitter.json) and using the language used when tweeting, the number of tweets in those languages and the tweet location (lat/long) count the total number of tweets in a given cell that are made in different languages. The final result will be a score for each cell with the following format, where the numbers are obviously representative.

Here cell A1 has 11,111 tweets in total with 11 different languages used for tweets with the most popular being English (9,000 tweets), Chinese (555 tweets), French (444 tweets) with 10th most popular being Greek (66 tweets). Cell A2 has 22 languages used for tweeting with the most popular being English (21,000), Turkish (77 tweets), Swedish (66 tweets) and French being the 10th most popular language (2 tweets).

You may treat Simplified Chinese (zh-cn) and Traditional Chinese (zh-tw) as both being Chinese. Tweets with null or undefined (und) for the language attribute can be ignored. Further information on languages that might be used for tweeting is given in Tweets with no location information can be ignored. Tweets made outside of the Grid can also be ignored.

If a tweet occurs right on the border of two cells, e.g., exactly between the B1/B2 cell border then assume the tweet occurs in B1 (i.e., to the cell on the left). If a tweet occurs exactly on the border between B2/C2 then assume the tweet occurs in C2 (i.e., to the cell below). If a tweet occurs anywhere else on the boundary of a cell, e.g. the upper or leftmost border of A1 then it can be regarded as being in cell A1.

Your application should allow a given number of nodes and cores to be utilized. Specifically, your application should be run once to search the bigTwitter.json file on each of the following resources:
• 1 node and 1 core;
• 1 node and 8 cores;
• 2 nodes and 8 cores (with 4 cores per node).

The resources should be set when submitting the search application with the appropriate SLURM options. Note that you should run a single SLURM job three separate times on each of the resources given here, i.e. you should not need to run the same job 3 times on 1 node 1 core for example to benchmark the application. (This is a shared facility and this many COMP90024 students will consume a lot of resources!).

You can implement your solution using any routines that you wish from existing libraries however it is strongly recommended that you follow the guidelines provided on access and use of the SPARTAN cluster. Do not for example

think that the job scheduler/SPARTAN automatically parallelizes your code - it doesn't! You may wish to use the pre- existing MPI libraries that have been installed for C, C++ or Python. You should feel free to make use of the Internet to identify which JSON processing libraries you might use.

Your application should return the final results and the time to run the job itself, i.e. the time for the first job starting on a given SPARTAN node to the time the last job completes. You may ignore the queuing time. The focus of this assignment is not to optimize the application to run faster, but to learn about HPC and how basic benchmarking of applications on a HPC facility can be achieved and the lessons learned in doing this on a shared resource.

Attachment:- Cluster and Cloud Computing Assignment.rar

Reference no: EM133114188

Questions Cloud

Computing the equivalent cash flows : A friend of yours argues that WACC should be used as a discount rate in valuing a risky project using certainty-equivalent cash flows, when revenues are hedged
Create an annual income statement : Labor and other administrative costs add an additional $5,000 to the cost of each vault. Create an annual income statement
Some challenges of risk and quality management : What are some challenges of risk and quality management? Explain.What are some challenges of risk and quality management? Explain.
Find the maximum price : a. Use the? variable-growth DVM and a required rate of return of 9.00?% to find the maximum price you should be willing to pay for this stock
Cluster and cloud computing assignment : Implement a simple, parallelized application leveraging the University of Melbourne HPC facility SPARTAN - implement your solution using any routines
How much accrued interest must be paid to the? seller : ?A(n) 18?-year bond has a coupon of 8?% and is priced to yield 15?%. Calculate the price per? $1,000 par value using? semi-annual compounding. If an investor pu
What will be? tina share volume and price after the? split : XYZ corp. board of directors decided a 1 for 10 reverse split. Tina owned 5000 shares of XYZ corp. which was selling for? $1.50 per share.
Value of a long position in one futures contract : ABC common stock has a price of $100 and is expected to pay a dividend of $1 per share in 2 months and 5 months.
Compute the minimum income from business for tax purposes : Based on the foregoing information, compute the minimum income from business for tax purposes for ABC in respect of its 2021 fiscal year

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd