Critically reflect on the software

Assignment Help Python Programming
Reference no: EM133268737

Industrial Programming

Data Analytics

Data Analysis of a Document Tracker

The aim of this coursework is to develop a simple, data-intensive application in Python.

This is a pair project, and you will have to submit your own, original solution for this coursework specification, consisting of a report, the source code and an executable.

The learning objective of this coursework is for students to develop proficiency in advanced program- ming concepts, stemming from both object-oriented and functional programming paradigms, and to apply these programming skills to a concrete application of moderate size. Design choices regarding languages, tools, and libraries chosen for the implementation need to be justified in the accompanying report.

This coursework will develop personal abilities in using modern scripting languages as a "glueware" to build, configure and maintain a moderately complex application and deepen the understanding of integrating components on a Linux system.

In a dedicated section, the report needs to critically reflect on the software used for implementing this application, and discuss advantages and disadvantages of this choice. The report should also contain a discussion, contrasting software development on Windows and Linux systems and comparing software de- velopment in scripting vs. systems languages (based on the experience from the two pieces of coursework).

Lab Environment

Software environment: You should use Python 3 as installed on the Linux lab machines or on the Linux MACS VM for the implementation. This installation also provides the pandas, tkinter, and matplot libraries. These Linux lab machines are available remotely using thex2go client for remote desktops, running on jove (and from there use ssh to log into the lab machines). For technical HOWTOs about accessing software of relevance for this course, see the resources section of the Canvas course page.

Data Analysis of a Document Tracker

In this assignment, you are required to develop a simple Python-based application, that analyses and displays document tracking data from a major web site.

Theissuu.complatform is a web site for publishing documents. It is widely used by many on-line publishers and currently hosts about 15 million documents. The web site tracks usage of the site and makes the resulting, anonymised data available to a wider audience. For example, it records who views a certain document, the browser used for viewing it, the way how the user arrived at this page etc. In this exercise, we use one of these data sets to perform data processing and analysis in Python.

The data format uses JSON and is described onthis local page, describing the data spec. Note that the data files contain a sequence of entries in JSON format, rather than one huge JSON construct, in order to aid scalability. Familiarise yourself with the details of the data representation before you start implementation. As the assignment opens, two data-sets are available: a small data set (10k lines), and a tiny sample dataset for use for testing. At a later stage, larger data-sets, in the range of 100k-5M lines will be posted, and your final implementation should be able to cope with these sizes of input data.

The application needs to run on an up-to-date Linux platform (Ubuntu 22.04 or equivalent). The ap- plication should be developed in Python 3.10, using appropriate libraries for input, data processing and visualisation. Possible choices are the json library for parsing, the pandas library for processing the input data (optional), the tkinter library for GUI functionality and the matplot library for visualising the results. You need to identify the advantages of your choice of libraries.

The application must provide the following functionality:

1. Python: The core logic of the application should be implemented in Python 3.10.

2. Views by country/continent: We want to analyse, for a given document, from which countries and continents the document has been viewed. The data should be displayed as a histogram of countries,
i.e. counting the number of occurrences for each country in the input file.

(a) The application should take a string as input, which uniquely specifies a document (a document UUID), and return a histogram of countries of the viewers. The histogram can be displayed using matplotlib.
(b) Use the data you have collected in the previous task, group the countries by continent, and generate a histogram of the continents of the viewers. The histogram can be displayed using matplotlib.

3. Views by browser: In this task we want to identify the most popular browser. To this end, the application has to examine the visitor useragent field and count the number of occurrences for each value in the input file.
(a) The application should return and display a histogram of all browser identifiers of the viewers. (b)In the previous task, you will see that the browser strings are very verbose, distinguishing
browser by e.g. version and OS used. Process the input of the above task, so that only the main browser name is used to distinguish them (e.g. Mozilla), and again display the result as a histogram.

4. Reader profiles: In order to develop a readership profile for the site, we want to identify the most avid readers. We want to determine, for each user, the total time spent reading documents. The top 10 readers, based on this analysis, should be printed.

5. "Also likes" functionality: Popular document-hosting web sites, such as Amazon, provide informa- tion about related documents based on document tracking information. One such feature is the "also likes" functionality: for a given document, identify, which other documents have been read by this document's readers. The idea is that, without examining the detail of either document, the informa- tion that both documents have been read by the same reader relates two documents with each other. Figure 1gives an example of this functionality. In this task, you should write a function that generates such an "other readers of this document also like" list, which is parametrised over the function to determine the order in the list of documents. Display the top 10 documents, which are "liked" by other readers.

To achieve this task you will need to do the following:

(a) Implement a function that takes a document UUID and returns all visitor UUIDs of readers of that document.

Figure 1: Example of identifying also-likes documents. Starting from the current reader and document (green), all readers are identified, who have also read the input document (blue). From the other documents, read by these readers, the top 10 documents, counted by number of readers are identified and displayed. In this example the red document is top of this list, and the two pink documents are also on the result list. The automatically generated graph should display all three result documents, but doesn't have to distinguish between "best" and "others" by shading. The unused, black users and documents shouldn't be shown in that graph.

(b) Implement a function that takes a visitor UUID and returns all document UUIDs that have been read by this visitor.
(c) Using the two functions above, implement a function to implement the "also like" functionality, which takes as parameters the above document UUID and (optionally) visitor UUID, and addi- tionally a sorting function on documents. The function should return a list of "liked" documents, sorted by the sorting function parameter. Note: the implementation of this function must not fix the way how documents are sorted, and use the sorting function parameter instead.
(d) Use this function to produce an "also like" list of documents, using a sorting function, based on the number of readers of the same document. Provide a document UUID and visitor UUID as input and produce a list of top 10 document UUIDs as a result.

6. "Also likes" graph: For the above "also like" functionality, generate a graph that displays the rela- tionship between the input document and all documents that have been found as "also like" documents (and only these documents). Highlight the input document and user by shading in that graph, and use arrows to capture the "has-read" relationship (i.e. arrow from reader to document). In the graph shorten all visitor UUIDs and document UUIDs to the last 4 hex-digits. As an example, the graph below uses document b4fe and reader 6771 as input (shaded green) and displays 7 "also like" doc- uments, together with the readers that relate these documents with the input document. For added clarity, shade the documents according to how many other readers also read them:

Hint: Use the .dot formatas graph representation. Use the graphviz packagewith the dot tool to translate the .dot into a .ps format (and then optionally in .pdf format). For a detailed description see this dot User Manual. You can install the graphviz package on an Ubuntu machine by typing (in a terminal window): sudo apt-get install graphviz
As an example of graphviz/dot usage, the source file for the above graph is available in Canvas. You can generate the resulting graph as follows:

7. GUI usage: To read the required data and to display the statistical data, develop a simple GUI based on tkinter or another package of your choice that reads the user inputs described above, and with buttons to process the data as required per task. In case you are using a package other than tkinter, document its requirements in detail in the report.

8. Command-line usage: The application shall provide a command-line interface to test its functionality in an automated way, like this:

to check the results of implementing task task_id using inputs user_uuid for the user UUID and doc_uuid for the document UUID; file_name is the name of the JSON file with the input data. The task ids should be: 2a, 2b, 3a, 3b, 4, 5d, 6, 7, matching the tasks above (task id 7 should run Task 6 and automatically launch a GUI with fields to input document and (optionally user ids and show the resulting also-likes graph).

The report should have between 8-12 pages and use the following format (if you need space for additional screenshots, put them into an appendix, not counting against the page limit, but don't rely on the screenshots in your discussion):

1. Introduction: State the purpose of the report, your remit and any assumptions you have made during the development process.

2. Requirements' checklist: Here you should clearly show which requirements you have delivered and which you haven't.

3. Design Considerations: Here you should clearly state what you have done to your application to make it more usable and accessible.

4. User Guide: Use screen shots of the running application along with text descriptions to help you describe how to operate the application.

5. Developer Guide: Describe your application design and main areas of code in order to help another developer understand your work and how they might develop it. You may find it useful to supplement the text with code fragments.

6. Testing: Show the results for testing all cases and prove that the outputs are what are expected. If certain conditions cause erroneous results or the application to crash then report these honestly.

7. Reflections on programming language and implementation: Based on your experience in imple- menting this application, reflect which language features and technologies have been most helpful, identify limitations of your application and suggest ways how to overcome this limitations. Also re- flect on the usability of the (kind of) language (either system or scripting language) for this application domain, and on its wider applicability.

8. What did I learn from CW1? A short discussion on lessons learnt from the feedback given on CW1 and a discussion how you integrated this feedback into CW2. Cover both coding and report writing, possibly more (project management, preparing for interview style questions etc).

9. Conclusions: Reflect on what you are most proud of in the application and what you'd have liked to have done differently.

Attachment:- Data Analysis.rar

Reference no: EM133268737

Questions Cloud

Explain the concept of urban bias : Explain the concept of urban bias. What policies are associated with it, and what are their likely effects on urban and rural areas?
Prepare a time-motion study or flow process chart : Purpose: A time motion study or flow process chart is often created to examine the tasks/steps in a job, Prepare a Time-Motion Study or Flow Process Chart
Different types of sandwiches : Please Provide 1. the appearance - colour, contrast, balance 2. Freshness and quality indicators and 3. combination of ingredients for following different types
What is something that can be done to help better digestion : What is something that can be done to help better digestion? Are personal food choices based off of smell, taste, or appearance the most
Critically reflect on the software : F21SC Industrial Programming - Heriot-Watt University critically reflect on the software used for implementing this application, and discuss advantages
Describe three host-country benefits : 1. Describe three host-country benefits and three host-country costs of foreign direct investment (FDI). Support your answer with real-life examples.
Assess the impact of the globalisation process on jobs : Question 1. How would you assess the impact of the globalisation process on jobs, wages, and inequality in France?
What is the most likely diagnosis for marcus : What is the most likely diagnosis for Marcus? What are some of the possible causes for people in general (not just Marcus) developing this disorder
Different types of business analytics : Advantages and disadvantages associated with different types of business analytics including descriptive, predictive and prescriptive analytics.

Reviews

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd