Cluster and cloud computing assignment

Assignment Help Other Subject
Reference no: EM133114188

Cluster and Cloud Computing Assignment

Problem Description

Your task in this programming assignment is to implement a simple, parallelized application leveraging the University of Melbourne HPC facility SPARTAN. Your application will use a large Twitter dataset and a grid/mesh for Sydney to identify the languages used in making Tweets. Your objective is to count the number of different languages used for tweets in the given cells and the number of tweets in those languages and hence to calculate the multicultural nature of Sydney!

You should be able to log in to SPARTAN through running the attached command:

If you are a Windows user then you may need to install an application like Putty.exe to run ssh. (If you are coming from elsewhere with different firewall rules, then you may need to use a VPN).

The files to be used in this assignment are accessible at:

You should make a symbolic link to these files, i.e. you should run the following commands at the Unix prompt from your own user directory on SPARTAN:

The sydGrid.json file includes the latitudes and longitudes of a range of gridded boxes as illustrated in the figure below, i.e., the latitude and longitude of each of the corners of the boxes is given in the file.

Your assignment is to (eventually!) search the large Twitter data set (bigTwitter.json) and using the language used when tweeting, the number of tweets in those languages and the tweet location (lat/long) count the total number of tweets in a given cell that are made in different languages. The final result will be a score for each cell with the following format, where the numbers are obviously representative.

Here cell A1 has 11,111 tweets in total with 11 different languages used for tweets with the most popular being English (9,000 tweets), Chinese (555 tweets), French (444 tweets) with 10th most popular being Greek (66 tweets). Cell A2 has 22 languages used for tweeting with the most popular being English (21,000), Turkish (77 tweets), Swedish (66 tweets) and French being the 10th most popular language (2 tweets).

You may treat Simplified Chinese (zh-cn) and Traditional Chinese (zh-tw) as both being Chinese. Tweets with null or undefined (und) for the language attribute can be ignored. Further information on languages that might be used for tweeting is given in Tweets with no location information can be ignored. Tweets made outside of the Grid can also be ignored.

If a tweet occurs right on the border of two cells, e.g., exactly between the B1/B2 cell border then assume the tweet occurs in B1 (i.e., to the cell on the left). If a tweet occurs exactly on the border between B2/C2 then assume the tweet occurs in C2 (i.e., to the cell below). If a tweet occurs anywhere else on the boundary of a cell, e.g. the upper or leftmost border of A1 then it can be regarded as being in cell A1.

Your application should allow a given number of nodes and cores to be utilized. Specifically, your application should be run once to search the bigTwitter.json file on each of the following resources:
• 1 node and 1 core;
• 1 node and 8 cores;
• 2 nodes and 8 cores (with 4 cores per node).

The resources should be set when submitting the search application with the appropriate SLURM options. Note that you should run a single SLURM job three separate times on each of the resources given here, i.e. you should not need to run the same job 3 times on 1 node 1 core for example to benchmark the application. (This is a shared facility and this many COMP90024 students will consume a lot of resources!).

You can implement your solution using any routines that you wish from existing libraries however it is strongly recommended that you follow the guidelines provided on access and use of the SPARTAN cluster. Do not for example

think that the job scheduler/SPARTAN automatically parallelizes your code - it doesn't! You may wish to use the pre- existing MPI libraries that have been installed for C, C++ or Python. You should feel free to make use of the Internet to identify which JSON processing libraries you might use.

Your application should return the final results and the time to run the job itself, i.e. the time for the first job starting on a given SPARTAN node to the time the last job completes. You may ignore the queuing time. The focus of this assignment is not to optimize the application to run faster, but to learn about HPC and how basic benchmarking of applications on a HPC facility can be achieved and the lessons learned in doing this on a shared resource.

Attachment:- Cluster and Cloud Computing Assignment.rar

Reference no: EM133114188

Questions Cloud

Computing the equivalent cash flows : A friend of yours argues that WACC should be used as a discount rate in valuing a risky project using certainty-equivalent cash flows, when revenues are hedged
Create an annual income statement : Labor and other administrative costs add an additional $5,000 to the cost of each vault. Create an annual income statement
Some challenges of risk and quality management : What are some challenges of risk and quality management? Explain.What are some challenges of risk and quality management? Explain.
Find the maximum price : a. Use the? variable-growth DVM and a required rate of return of 9.00?% to find the maximum price you should be willing to pay for this stock
Cluster and cloud computing assignment : Implement a simple, parallelized application leveraging the University of Melbourne HPC facility SPARTAN - implement your solution using any routines
How much accrued interest must be paid to the? seller : ?A(n) 18?-year bond has a coupon of 8?% and is priced to yield 15?%. Calculate the price per? $1,000 par value using? semi-annual compounding. If an investor pu
What will be? tina share volume and price after the? split : XYZ corp. board of directors decided a 1 for 10 reverse split. Tina owned 5000 shares of XYZ corp. which was selling for? $1.50 per share.
Value of a long position in one futures contract : ABC common stock has a price of $100 and is expected to pay a dividend of $1 per share in 2 months and 5 months.
Compute the minimum income from business for tax purposes : Based on the foregoing information, compute the minimum income from business for tax purposes for ABC in respect of its 2021 fiscal year

Reviews

Write a Review

Other Subject Questions & Answers

  What development are currently implemente in nafta countries

Identify what training and development are currently implemented in NAFTA countries? What sorts of training and development programs are there?

  Social support systems-social responsibility to the elderly

How might the U.S. Government encourage stronger social support systems for the elderly as a means to provide for the aging population? Should the U.S. Government step in when the social support system fails? Why or Why not? Your initial post should ..

  Securing a warrant or the employer’s permission

Explain circumstances where OSHA is allowed enter and inspect a worksite without securing a warrant or the employer’s permission.

  Explain the advantages of multimodal communication

Augmentative and alternative communication (AAC) systems can be used to meet the needs of students with language impairment. There are many forms of AAC.

  What is your leadership philosophy

Many of us can think of leaders we have come to admire, be they historical figures, pillars of the industry we work in, or leaders we know personally.

  Distinct styles during its evolution-autocratic-bureaucratic

Prison management has had two very distinct styles during its evolution: autocratic and bureaucratic. What do you think are the most important aspects of each style?

  Summarizing the article and the issues outlined

Students are to complete two separate two to three page summaries of a current events article or news story on a religious subject. The selection can come from.

  Explain the purpose and potential results of a file system

Describe and explain the processes involved in live analysis of digital evidence in the cyber forensic environment and Explain the purpose and potential results

  How is the value of a nation currency determined

How is the value of a nation's currency determined? Is there such a thing as its "true" value? Should a government try to influence the value of its currency?

  What does this new data mean for science

What does this new data mean for science? Does it change the way we look at old ideas and/or theories?

  Floating settlement rates

Floating (LIBOR) settlement rates were 8% at inception and 9%, 7%, and 7% at the end of 2011, 2012, and 2013, respectively. The fair values of the swap are quotes obtained from a derivatives dealer.

  Define theories and beliefs related to transcultural nursing

Identify theories, concepts, and beliefs related to transcultural nursing. Chapter 1 of Andrews and Boyle (2016) discusses Leininger's Sunrise Model.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd