Design a web scraping program by python to collect

Assignment Help Management Information Sys
Reference no: EM131963781

SIT742 Modern Data Science 2018 T1

Assignment : General data processing and using big data

This is an individual work on the understanding of data science, big data and their applications. It contains written answers and some programming-related tasks based on topics presented during Weeks 1 to 3.

Assignment  is broken down to three tasks below. You can use Google to find the data sources (i.e., websites). After your practice, please write down your executable Python codes and put the collected data in tables for above demonstration. You also need to write several paragraphs to explain your comparison and make a conclusion.

Task 1. Data Acquisition

Design a web scraping program by Python to collect weather forecast report data of a city (e.g., Melbourne) from a website, such as temperature, humidity, weather status (cloudy, sunny etc.), and store the data in a csv file. Please do this task in both the following ways:

(1) Collecting data by regular interval sampling. You need to find the best sampling interval in terms of space efficiency and demonstrate using numeric results why it is the best solution.

(2) Collecting data by change detection. You store one data object only when any of the weather forecast report data is changed at the website.

Both you need to record weather data with their timestamps. Then, compare the two collection methods, conclude the optimal one and demonstrate using numeric results.(Please refer to Lecture 2.)

Task 2. Data Integration

Use the optimal method you demonstrated above to collect weather report data from more than one websites and integrate the data from different sources (websites) and write the integrated data into a csv file. Please demonstrate

(1) how to do schema alignment and

(2) how to determine which is correct if two data from different sources do not agree with each other.

(Please do a survey about the existing techniques and use one to resolve the problem, Lecture 2 provides you some basic concepts and you may do a broader search by yourselves.)

Task 3. Missing Data Prediction

Use the data you collected in Tasks 1 and 2, please design a method to predict a missing data object, for example, between two consecutive data objects (time, temperature) in your csv file as below:

11:00AM, 15
12:00PM, 17

the user want to query about the temperature at 11:30AM.

(Please do a survey about the existing techniques and use one to resolve the problem. Lecture 3 provides you some basic concepts and you may do a broader search by yourselves.)

Reference no: EM131963781

Questions Cloud

Compute the value of the test statistic : a. At 5% should the null be rejected? b. Compute the value of the test statistic c. What is the P-value?
What inventory costing method would you prefer : Assume you own a restaurant. What inventory costing method would you prefer, and why? Also, include a discussion as to whether the costing method.
Find the 60th percentile : A bank's loan officer rates applicants for credit the rating are normally distributed with a mean f 200 and a standard deviation of 50 find the 60th percentile
What salary represents the 15th percentile : What salary represents the 15th percentile? You MUST show what went into the calculator and then your final answer.
Design a web scraping program by python to collect : Design a web scraping program by Python to collect weather forecast report data of a city from a website.
Describe the costs associated with software quality work : Describe the costs associated with software quality work? What practices should software engineers follow to enhance quality of software produced by their team?
Number of deliveries than the second delivery truck : Can we assert at the level of significance a = 0.05 that the first delivery truck on its route makes a larger number of deliveries than the second delivery.
Discuss in detail the role that an ids or ips would play : Discuss in detail the role that an IDS / IPS would play in the IR efforts, and explain how these systems can assist in the event notification.
Identify responsibilities of the decision maker : Consider the impact of the options on the stakeholders (consequences, risks, benefits, harms, costs).Identify responsibilities of the decision maker.

Reviews

Write a Review

Management Information Sys Questions & Answers

  What functionality the screen will provide

Provide a description of what functionality the screen will provide. What can the user do with this screen

  How the main characteristics of cloud computing might help

Explore how the main characteristics of cloud computing might help them. Would you recommend a new model, a new paradigm, a new network design?

  Research job boards for project manager positions

Research job boards for project manager positions. Take note of the various industries that hire project managers.

  Discuss the purpose and philosophical approach

Discuss the scope of the resource.Discuss the purpose and philosophical approach.Discuss the underlying assumptions.

  How does internet change consumer and supplier relationships

It has been said that there is no such thing as a sustainable strategic advantage. Do you agree? Why or why not? How does the Internet change consumer and supplier relationships

  How health care and coverage could be approached

Is predictive informatics that uses genomics racist, sexist, or homophobic? Defend your answer. How can genomics and data analytics change how health care

  Determine who is attending conferences and events

Sophisticated search capabilities are required, and the ability to add scheduled events to the employees' calendars is desired. The system needs to support social networking to allow employees to determine who is attending conferences and events.

  Explain the core principles of information assurance

Compare and contrast the core principles of information assurance and prioritize in the order each in your own opinion (Support your opinion).

  Draw the first connection among the msolm program content

Draw the first (of several) connection among the MSOLM Program content, the Abrashoff text, and your professional experiences.

  Explain value of card sorting in the system analysis stage

Explain the value of card sorting in the system analysis stage. What value does the BS7799 Standard for information security have in helping improve information security?

  Purchasing and supply managementbased on your experience or

purchasing and supply managementbased on your experience or readings discuss the interaction between purchasing and

  Case study analysisi need help answering the following

case study analysisi need help answering the following questions to this scenarioscenario the missing data data

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd