Analyze the class performances

Assignment Help Basic Computer Science
Reference no: EM13536180

Question 1: As a highly application-driven discipline, data mining has been widely applied in many areas. We briefly presented two highly successful and popular application examples of data mining: business intelligence and Web search engines, in our textbooks. Do you think that data mining can also be applied to the following areas?

If yes, please provide a brief yet concrete example, if not, please briefly state your reasons.

1) Software Engineering.

2) Transportation.

3) Sociology.

Question 2: Suppose a student collected the price and weight of 20 products in a shop with the following result

price $11.78 $85.12 $10.47 $298.00 $38.45 $102.14 $123.62 $203.29 $65.00 $225.50

weight 3.2 3.4 4.5 35.4 9.1 5.7 1.5 23.8 8.6 42.3

price $9.25 $164.32 $102.45 $120.45 $73.15 $625.00 $125.00 $242.64 $441.76 $325.45

weight 5.9 12.3 6.5 11.8 12.2 32.9 11.6 48.0 52.9 78.2

Q2.1. Calculate the mean, Q1, median, Q3, and standard deviation of price and weight;

Q2.2. Draw the boxplots for price and weight

Q2.3. Draw scatter plot and Q-Q plot based on these two variables

Q2.4. Normalize the two variables based on the min-max normalization (min = 1, max = 10)

Q2.5. Normalize the two variables based on the z-score normalization

Q2.6. Calculate the Pearson correlation coefficient. Are these two variables positively or negatively correlated?

Q2.7. Take the price of the above 20 products, partition them into four bins by each of the following methods

1) equal-width partitioning

2) equal-width partitioning

Question 3: Design a data warehouse for a university's gradebook data to analyze the class performances. Suppose the data warehouse  consisting of the following dimensions: department, semester, course, student, instructor, and gradebook; and a set of measures you would like to define.

1. Draw a star-schema, based on your consideration of power and convenience of analysis of the Warehouse
2. Is top 10% in a class a holistic or algebraic measure? Discuss how to develop an efficient (maybe approximate) methods to compute a query like: find those Engineering students whose final score is within top 10% in class in at least 80% of the CS courses that he or she has taken?
3. Is it a good idea to merge this data warehouse and the current university's gradebook database system together into one big data management/analysis system? Why?

Question 4:

A location-based social networking website which provides check-in services hires you to help them build a data warehouse.

Users of this service can "check-in" at venues using mobile device applications by running the applications and selecting from a list of venues that the application locates nearby. Also, users can "add" each other as "friends". The website also has sufficient information about venues, including address, GPS location, and category of the venue (e.g., a Japanese restaurant), and users tend to provide their personal information to the website when they register.

1. Design a data warehouse that may facilitate effective on-line analytical processing for this website (provide both schema and measures, also explain why).

2. Check-in data collected from the website and mobile applications are noisy. Besides network and device errors, are there any other reasons which might cause noises in this data set? For the reason you come up with, discuss a method that can clean-up check-in data effectively in the data warehouse.

3. One may like to performance on-line analytical processing to the checks-in data at different venues by month, by cities and by categories (Italian or Japanese, etc.). How can this be done efficiently in the data warehouse?

4. Hackers create fake profiles on this website. They are using bots to manipulate fake profiles, generate fake check-in data and try to add everyone as their friends (yes this is a common problem for many social network websites, and no, I am not telling you how to write bots). Although bots are trying to mimic real users, they still behave differently, e.g., they check-in at random places (Chicago this minute, Las Vegas next minute), they check-in way too often than real users, and their social network structures are usually very large but also very sparse (your friends on facebook tend to form communities but bots don't do that). Discuss possible solutions on how to identify fake profiles (bots) in your data warehouse.

Reference no: EM13536180

Questions Cloud

Explain liquid phosphorus trichloride is added to water : When liquid phosphorus trichloride is added to water, it reacts to form aqueous phosphorous acid, H3PO3(aq), and aqueous hydrochloric acid
Determine the dollar amount that will be reported for land : What dollar amount will the land be shown in the financial statements and determine the dollar amount that will be reported for land that is shown in the financial statement.
Explain what would the volume of the evaporated water : A tablespoon of water has about 30.0 grams of water in it. If this were to be heated to 300 Farenheit the water evaporates. What would the volume of the evaporated water be if it were collected at 29 psi? (14.7 psi = 1 atm)
Estimate the initial gravitational energy of the box : A box starts out at the top of a frictionless ramp, then slides down. The ramp has a heighth=3meters and a slope of5degrees with respect to the ground. What is the initial gravitational energy of the box at the top of the ramp
Analyze the class performances : Design a data warehouse for a university's gradebook data to analyze the class performances. Suppose the data warehouse  consisting of the following dimensions: department, semester, course, student, instructor, and gradebook; and a set of measures y..
Element of financial statements or an account : Determine the balance in the Retained Earnings account as of January 31, 2010 and comment on whether retained earnings is an element of financial statements or an account.
Explain compounds has electrovalent : Which of the following compound or compounds has electrovalent, covalent, co-ordinate as well as hydrogen bond. Explain the reasons for each making appropriate use of diagrams wherever necessary.
Determine how much power does it deliver to the clock : A grandfather clock is powered by the descent of a 4.40-kg weight. If the weight descends through a distance of 0.750 m in 3.00 days, how much power does it deliver to the clock
Classifying events as asset source : Classifying events as asset source - Receive cash from customers for services rendered

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Explain one 1 way in which a gui can take merits of

gui and multithreading please respond to the followingbull describe one 1 way in which a gui can take advantage of

  What is the position of the first character in a string

1. What is the position of the first character in a string? 2. Which C++ function(s) return(s) the number of characters currently in a string? 3. Is the data type string part of the C++ language? 4. How many parameters does the substr function hav..

  How byod is utilised and its associated risks

Conduct an Internet search to gather relevant background information of BYOD - how BYOD is utilised and its associated risks

  The research process and findings

The Research Process and Findings

  Find out the number of candidate keys

A relation R(ABCDEF) and functional dependency set F={AB->CDEF ,F->C,C->A,B->D,D->E,F->B}. Find out the number of candidate keys

  Explain local-state-national resources unintentional injury

Explain the local, state, and national resources that are available for addressing unintentional and intentional injuries in American communities.

  Network connectivity and protocols

Network Connectivity and Protocols-

  List and describe three guidelines for sound policy

List and describe briefly the three guidelines for sound policy, as stated by Bergeron and Bérubé. Are policies different from standards? In what way? Are policies different from procedures? In what way?

  Guidelines to pre-test the designed survey

One of the guidelines is to pre-test the designed survey on a few participants to see what might be unclear or weak in the survey design.

  Describe a dbms and its functions updated dbms technology

Describe a DBMS and its functions. Name some of the popular DBMS software? You should search the Internet for the updated DBMS technology.

  Use a traditional for loop to process the array

Write a Java method which takes an integer array parameter and fills the array with random numbers between 1 and 1000, including 1 and 1000. Use a traditional for loop to process the array.

  How code function that multiplies two matrices using matlab

How to code a function that multiplies two matrices using MATLAB?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd