Reference no: EM133019623
Research and Development: You need to simulate a distributed DBMS
Problem Scenario: A company "Data5408" has two branches, VM1 and VM2. Assume that the datasets you received from Kaggle are data of "Data5408". In this question, you need to perform two tasks:
Task 1: Build Distributed Database
- If the datasets are converted to database tables, and database(s), how will it be placed, state the reasons? (E.g. why did you consider specific Fragmentation, transparency etc.)
• You need to create two MySQL instances in two GCP Virtual Machines {VM1, and VM2). Your VM1 site is responsible for storing customer, geolocation, user related information. VM2 site is responsible for storing all remaining information such as, item, product, payments etc. [Note: If you experience issues in handling large datasets, then consider a random reasonable size (<1000 data points) subset of the given data.]
• If required, please perform data cleaning, decomposition of dataset etc. before creating the database and record your logic in the PDF. Cleaning using spreadsheet is sufficient
• Since "Data5408" implemented a distributed database, it should create and maintain a Global Data Catalogue or Global Data Dictionary. How will you create it? Where will it be placed? [Hint: Global data dictionary (GDD) is an additional component, which does not eliminate the need of local data dictionaries. GDD usually contains information on databases, tables that are located at different sites, and connected using the network.]
• You do not have to write SQL script for this part, you can use import statement to upload your clean table on VM1 and VM2 database.
Problem #2 - Task 1- Submission Requirements:
• A single PDF file with data cleaning, formatting logic or screenshots
• Screenshots of VM1, VM2 MySQL instances
• SQL dump {structure and value) taken from VM1, and VM2
Task 2: Perform Concurrent Remote Transactions (programming needed) on a single DBMS (VM1 MySQL)
• Write a simple DBMS Transaction processing logic using Java program*, and run the program on your local machine (TP). This program will access VM1 MySQL instance (DP) and execute concurrent remote transactions.
• Your program will perform three concurrent execution of transactions written in SQL.
• Your program will also create a simple text file, which will act as a Transaction Log.
• The details of the transactions are given below:
You must follow the sequence. Write your observation on how MySQL handled this particular case
Attachment:- Distributed DBMS.rar