Cluster and cloud computing assignment

Assignment Help Other Subject

Reference no: EM133114188

Cluster and Cloud Computing Assignment

Problem Description

Your task in this programming assignment is to implement a simple, parallelized application leveraging the University of Melbourne HPC facility SPARTAN. Your application will use a large Twitter dataset and a grid/mesh for Sydney to identify the languages used in making Tweets. Your objective is to count the number of different languages used for tweets in the given cells and the number of tweets in those languages and hence to calculate the multicultural nature of Sydney!

You should be able to log in to SPARTAN through running the attached command:

If you are a Windows user then you may need to install an application like Putty.exe to run ssh. (If you are coming from elsewhere with different firewall rules, then you may need to use a VPN).

The files to be used in this assignment are accessible at:

You should make a symbolic link to these files, i.e. you should run the following commands at the Unix prompt from your own user directory on SPARTAN:

The sydGrid.json file includes the latitudes and longitudes of a range of gridded boxes as illustrated in the figure below, i.e., the latitude and longitude of each of the corners of the boxes is given in the file.

Your assignment is to (eventually!) search the large Twitter data set (bigTwitter.json) and using the language used when tweeting, the number of tweets in those languages and the tweet location (lat/long) count the total number of tweets in a given cell that are made in different languages. The final result will be a score for each cell with the following format, where the numbers are obviously representative.

Here cell A1 has 11,111 tweets in total with 11 different languages used for tweets with the most popular being English (9,000 tweets), Chinese (555 tweets), French (444 tweets) with 10th most popular being Greek (66 tweets). Cell A2 has 22 languages used for tweeting with the most popular being English (21,000), Turkish (77 tweets), Swedish (66 tweets) and French being the 10th most popular language (2 tweets).

You may treat Simplified Chinese (zh-cn) and Traditional Chinese (zh-tw) as both being Chinese. Tweets with null or undefined (und) for the language attribute can be ignored. Further information on languages that might be used for tweeting is given in Tweets with no location information can be ignored. Tweets made outside of the Grid can also be ignored.

If a tweet occurs right on the border of two cells, e.g., exactly between the B1/B2 cell border then assume the tweet occurs in B1 (i.e., to the cell on the left). If a tweet occurs exactly on the border between B2/C2 then assume the tweet occurs in C2 (i.e., to the cell below). If a tweet occurs anywhere else on the boundary of a cell, e.g. the upper or leftmost border of A1 then it can be regarded as being in cell A1.

Your application should allow a given number of nodes and cores to be utilized. Specifically, your application should be run once to search the bigTwitter.json file on each of the following resources:
• 1 node and 1 core;
• 1 node and 8 cores;
• 2 nodes and 8 cores (with 4 cores per node).

The resources should be set when submitting the search application with the appropriate SLURM options. Note that you should run a single SLURM job three separate times on each of the resources given here, i.e. you should not need to run the same job 3 times on 1 node 1 core for example to benchmark the application. (This is a shared facility and this many COMP90024 students will consume a lot of resources!).

You can implement your solution using any routines that you wish from existing libraries however it is strongly recommended that you follow the guidelines provided on access and use of the SPARTAN cluster. Do not for example

think that the job scheduler/SPARTAN automatically parallelizes your code - it doesn't! You may wish to use the pre- existing MPI libraries that have been installed for C, C++ or Python. You should feel free to make use of the Internet to identify which JSON processing libraries you might use.

Your application should return the final results and the time to run the job itself, i.e. the time for the first job starting on a given SPARTAN node to the time the last job completes. You may ignore the queuing time. The focus of this assignment is not to optimize the application to run faster, but to learn about HPC and how basic benchmarking of applications on a HPC facility can be achieved and the lessons learned in doing this on a shared resource.

Attachment:- Cluster and Cloud Computing Assignment.rar

Reference no: EM133114188

Questions Cloud

Computing the equivalent cash flows : A friend of yours argues that WACC should be used as a discount rate in valuing a risky project using certainty-equivalent cash flows, when revenues are hedged

Create an annual income statement : Labor and other administrative costs add an additional $5,000 to the cost of each vault. Create an annual income statement

Some challenges of risk and quality management : What are some challenges of risk and quality management? Explain.What are some challenges of risk and quality management? Explain.

Find the maximum price : a. Use the? variable-growth DVM and a required rate of return of 9.00?% to find the maximum price you should be willing to pay for this stock

Cluster and cloud computing assignment : Implement a simple, parallelized application leveraging the University of Melbourne HPC facility SPARTAN - implement your solution using any routines

How much accrued interest must be paid to the? seller : ?A(n) 18?-year bond has a coupon of 8?% and is priced to yield 15?%. Calculate the price per? $1,000 par value using? semi-annual compounding. If an investor pu

What will be? tina share volume and price after the? split : XYZ corp. board of directors decided a 1 for 10 reverse split. Tina owned 5000 shares of XYZ corp. which was selling for? $1.50 per share.

Value of a long position in one futures contract : ABC common stock has a price of $100 and is expected to pay a dividend of $1 per share in 2 months and 5 months.

Compute the minimum income from business for tax purposes : Based on the foregoing information, compute the minimum income from business for tax purposes for ABC in respect of its 2021 fiscal year

User Account

All Pages