Identifying inconsistent data encodings

Assignment Help Basic Computer Science
Reference no: EM13998102

Data Preparation - Cleaning up any issues in the data to allow it to be analyzed using various software tools such as Tableau. In a project, this phase can take 80 to 90% of the overall effort.

• Decide how to handle any blank values. If blank is unknown, you may want to leave the value blank. On the other hand, it blank means "not applicable", you may want to replace the blank cell with "NA".

• If feasible, merge tables together as needed to join together two or more tables that have different information about the same objects. A common field in multiple tables is needed to join the tables together.

• Manually (or using tools if available), review the data to look for unusual patterns or distributions in the data that might call into question the validity of the data. It involves using a critical eye to examine the data.

Identifying inconsistent data encodings (e.g., different abbreviations might be used for state)

Identifying suspicious data responses (e.g., when physically questionable numbers are put in for a response such as the same answer on a survey for all the questions.)

Are there outliers that don't seem to make sense? For example, salaries for teenagers that are in the six figures or average traffic at a store that is typically in the thousands but then seeing some values that are in the ten range or million range.

• Perform any other needed data preparation required. This is an open-ended step and specific details will depend on the changes needed and software tools used. Make sure to

• Compare the data provided as well as the data that you have prepared to the questions to be analyzed from the Business Understanding phase. Does it appear that it is possible to answer the questions from the data provided?

If you are missing needed data and the sponsor does not have the data nor can the data be generated by the sponsor; the project needs to be revised or cancelled. Make sure to document the data that is needed. If feasible, determine how this data can be collected or generated for future analysis.

• Keep track of issues found during this phase. This might be recommendation back to the sponsor to capture that data originally using a different format or method to reduce the effort needed to clean the data. In some cases, this can be one of the more valuable contributions of your project. Data preparation can take 80 to 90% of a project's overall time and resources.

If issues can be reduced going forward, this can save a great deal of time and money and allow further analysis to be performed easier.

Reference no: EM13998102

Questions Cloud

When choosing a store or business : When choosing a store or business, select a medium or large operation as it will be easier to complete the assignment. Look at the information the store or business uses in their daily business operations and identity four (4) key security risks to t..
Computer security is not an issue for organizations alone : Computer security is not an issue for organizations alone. Anyone whose personal computer is connected to a network or the Internet faces a potential risk of attack. Identify all the potential security threats on a personal computer. Identify some of..
What is the equivalent resistance of the bulbs connected : What is the equivalent resistance of the bulbs connected in parallel? What peak current Imax will be supplied to the parallel combination by the 120-V rms source?
The roles of is professionals : Select who you think are the five main information security professionals as described in Chapters 1 and 11.Write an essay describing the responsibilities for each role you chose and your reasons for including the IS professional role in your top fiv..
Identifying inconsistent data encodings : • Perform any other needed data preparation required. This is an open-ended step and specific details will depend on the changes needed and software tools used.
Review several online newspapers : Review several online newspapers, news sites, and professional magazines and locate examples of three (3) different types of threats "and" three (3) different types of attacks. Write a summary for each article followed by an explanation of how they i..
What are the concepts of sickness disease and imbalance : What are the concepts of body-person between these relativity models of medicine? Know the difference between biomedicine and traditional medicine approaches. What are the concepts of sickness, disease and imbalance
Moving target defenses to network security : Application of Moving Target Defenses to Network Security Resource Mapping System Adaptation Engine Analysis Engine
Develop a list of questions about the patient flow process : Contact the emergency department at a local hospital and ask to set up an interview with an administrator or manager in this ED. This must be someone who can discuss process improvement efforts

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Public boolean chackanagram

write the anagramList() chackanagram. checkAnagram returns true if its two Word parameters have original words that are nagrams. If not, checkAnagram returns false. use the ethod header below to write checkanagram. Assume that all Word methods work a..

  The components of a traditional analysis model

The components of a traditional analysis model

  Indentify a network topology

indentify a network topology that would best suit each of the following environments and explain why: dormitory floor univeristy campus airport state of province

  Create a boardroom-quality presentation

Develop a local area network plan for Taylor & Sons Financial Consulting, including the layout of the network, user and group access, and security. Create a boardroom-quality Microsoft® PowerPoint® presentation of 10-12 slides detailing your plan.

  What are two possible ways to achieve this goal

You need to relocate the existing user and computer objects in your company to different organizational units. What are two possible ways to achieve this goal? (Note: not the two "options" in the book.)

  What else can you do to improve performance

You have a site (Site1) that has about 20 users. For the last few months, users at Site1 have been complaining about the performance when accessing multiple files located on servers at the corporate office, particularly if the files are relativel..

  Finalize a navigation system

List at least three interactive features that could be added to your site and what purpose each would serve for your site and its visitors. The form created in Part Two of this assignment can be included as one of the interactive features.

  Edmonds-karp algorithm for finding the maximum flow

1. Edmonds-Karp algorithm for finding the maximum flow of O (NM 2 ) 2. Method push predpotoka finding maximum flow of O (N 4 ) 3. Modified method for pushing predpotoka O (N 3 ) 4. Feed restrictions 5. The flow of minimal cost (min-cost-flow). Algori..

  Identify and explain some different types of risks

Identify and explain some different types of risks that a network environment might face.

  What other errors are there if any cin value

void getValue(int value&) { cout > value& } I already know "cint" should be "cin". What other errors are there, if any? Should the cin value "value&" have the ampersand attached, or not?

  Determine decimal value on big-endian machine

A 32-bit word on the little-endian computer has decimal value of 261. Determine its decimal value on big-endian machine?

  Union-by-height or union-by-size

Prove that if path halving is performed on the finds and either union-by-height or union-by-size is used, the worst-case running time is O(Mα(M, N)).

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd