Identifying inconsistent data encodings

Assignment Help Basic Computer Science
Reference no: EM13998102

Data Preparation - Cleaning up any issues in the data to allow it to be analyzed using various software tools such as Tableau. In a project, this phase can take 80 to 90% of the overall effort.

• Decide how to handle any blank values. If blank is unknown, you may want to leave the value blank. On the other hand, it blank means "not applicable", you may want to replace the blank cell with "NA".

• If feasible, merge tables together as needed to join together two or more tables that have different information about the same objects. A common field in multiple tables is needed to join the tables together.

• Manually (or using tools if available), review the data to look for unusual patterns or distributions in the data that might call into question the validity of the data. It involves using a critical eye to examine the data.

Identifying inconsistent data encodings (e.g., different abbreviations might be used for state)

Identifying suspicious data responses (e.g., when physically questionable numbers are put in for a response such as the same answer on a survey for all the questions.)

Are there outliers that don't seem to make sense? For example, salaries for teenagers that are in the six figures or average traffic at a store that is typically in the thousands but then seeing some values that are in the ten range or million range.

• Perform any other needed data preparation required. This is an open-ended step and specific details will depend on the changes needed and software tools used. Make sure to

• Compare the data provided as well as the data that you have prepared to the questions to be analyzed from the Business Understanding phase. Does it appear that it is possible to answer the questions from the data provided?

If you are missing needed data and the sponsor does not have the data nor can the data be generated by the sponsor; the project needs to be revised or cancelled. Make sure to document the data that is needed. If feasible, determine how this data can be collected or generated for future analysis.

• Keep track of issues found during this phase. This might be recommendation back to the sponsor to capture that data originally using a different format or method to reduce the effort needed to clean the data. In some cases, this can be one of the more valuable contributions of your project. Data preparation can take 80 to 90% of a project's overall time and resources.

If issues can be reduced going forward, this can save a great deal of time and money and allow further analysis to be performed easier.

Reference no: EM13998102

Questions Cloud

When choosing a store or business : When choosing a store or business, select a medium or large operation as it will be easier to complete the assignment. Look at the information the store or business uses in their daily business operations and identity four (4) key security risks to t..
Computer security is not an issue for organizations alone : Computer security is not an issue for organizations alone. Anyone whose personal computer is connected to a network or the Internet faces a potential risk of attack. Identify all the potential security threats on a personal computer. Identify some of..
What is the equivalent resistance of the bulbs connected : What is the equivalent resistance of the bulbs connected in parallel? What peak current Imax will be supplied to the parallel combination by the 120-V rms source?
The roles of is professionals : Select who you think are the five main information security professionals as described in Chapters 1 and 11.Write an essay describing the responsibilities for each role you chose and your reasons for including the IS professional role in your top fiv..
Identifying inconsistent data encodings : • Perform any other needed data preparation required. This is an open-ended step and specific details will depend on the changes needed and software tools used.
Review several online newspapers : Review several online newspapers, news sites, and professional magazines and locate examples of three (3) different types of threats "and" three (3) different types of attacks. Write a summary for each article followed by an explanation of how they i..
What are the concepts of sickness disease and imbalance : What are the concepts of body-person between these relativity models of medicine? Know the difference between biomedicine and traditional medicine approaches. What are the concepts of sickness, disease and imbalance
Moving target defenses to network security : Application of Moving Target Defenses to Network Security Resource Mapping System Adaptation Engine Analysis Engine
Develop a list of questions about the patient flow process : Contact the emergency department at a local hospital and ask to set up an interview with an administrator or manager in this ED. This must be someone who can discuss process improvement efforts

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Identifies the cost of computer

identifies the cost of computer components to configure a computer system (including all peripheral devices where needed) for use in one of the following four situations:

  Input devices

Compare how the gestures data is generated and represented for interpretation in each of the following input devices. In your comparison, consider the data formats (radio waves, electrical signal, sound, etc.), device drivers, operating systems suppo..

  Cores on computer systems

Assignment : Cores on Computer Systems:  Differentiate between multiprocessor systems and many-core systems in terms of power efficiency, cost benefit analysis, instructions processing efficiency, and packaging form factors.

  Prepare an annual budget in an excel spreadsheet

Prepare working solutions in Excel that will manage the annual budget

  Write a research paper in relation to a software design

Research paper in relation to a Software Design related topic

  Describe the forest, domain, ou, and trust configuration

Describe the forest, domain, OU, and trust configuration for Bluesky. Include a chart or diagram of the current configuration. Currently Bluesky has a single domain and default OU structure.

  Construct a truth table for the boolean expression

Construct a truth table for the Boolean expressions ABC + A'B'C' ABC + AB'C' + A'B'C' A(BC' + B'C)

  Evaluate the cost of materials

Evaluate the cost of materials

  The marie simulator

Depending on how comfortable you are with using the MARIE simulator after reading

  What is the main advantage of using master pages

What is the main advantage of using master pages. Explain the purpose and advantage of using styles.

  Describe the three fundamental models of distributed systems

Explain the two approaches to packet delivery by the network layer in Distributed Systems. Describe the three fundamental models of Distributed Systems

  Distinguish between caching and buffering

Distinguish between caching and buffering The failure model defines the ways in which failure may occur in order to provide an understanding of the effects of failure. Give one type of failure with a brief description of the failure

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd