Reference no: EM133092507
DSC-550 Neural Networks and Deep Learning - Grand Canyon University
Using reinforcement learning concepts within deep learning theory, develop a model that will solve the multi-armed bandit problem. This is accomplished by the Thompson Sampling Model, which enables the quick finding of the highest number of unknown conversion rates. With the foundation of deep learning and Q-learning, deep Q-learning is addressed.
Assume you are at your favorite casino, in a room containing five slot machines. For each of them the game is the same: You bet a certain amount of money, pull the arm, and the machine will either take your money, or give you the twice your money back. When the machine takes your money, the reward is -1. If the machine returns twice the money to you, the reward is +1. Now, consider that one of these machines has a higher probability of giving you a +1 reward than the others when you pull its arm. It must be part of the problem assumptions. Your goal is to obtain the highest accumulated reward during your time of play. If you bet 1,000 dollars in total, it means you are going to bet 1 dollar, 1,000 times, each time by pulling the arm of any of these slot machines. Your strategy must be to figure out, in the minimum number of plays, which of these five slot machines has the highest chance of giving you a +1 reward and quickly. The challenge is to have the highest chance of giving a +1 reward quickly from trhe five slot machines. The hard part is to find the best slot machine in the minimum number of trials.
You are going to use the Thompson Sampling Model to find the best slot machine with the highest winning chance. The code is available and called "Thompson-sampling.py". You have to use β-distribution to take a random draw from each of the five distributions corresponding to the five slot machines. Consider the following:
Define the state (inputs), the actions (outputs), and the environment.
Copy the code and paste to your IDE environment. Make sure it runs in your environment. Report any issues encountered. Understand the code, submit the code, and add comments to the code.
Obtain the β distribution, collect the data, and screenshot the plots for each slot machine:
N_i^1 (n): The number of times the slot machine number i returned a 1 reward up to round n.
N_i^0 (n): The number of times the slot machine number i returned a 0 reward up to round n.
Using the code "comparison.py", compare the Thompson Sampling against the standard model for 200, 1,000 and 5,000 samples, the number of slot machines ranging from 3 to 20, and conversion rate ranges of 0-0.1; 0-0.3; and 0-0.5.
Plot the comparison using the Thompson Sampling percentage of gain. Analyze the percentage gain and include in your document.
How would you emphasize the idea of ethical design specifications? Consider how to verify these. What techniques are available to verify the design complies with ethical principles?Please discuss why having ethical principles should be a moral responsibility from the Christian worldview.
Attachment:- Neural Networks and Deep Learning.rar