Reference no: EM132249272
Assignment -
Overview and Assignment Goals: The objectives of this assignment are the following:
- Use/implement feature selection/reduction technique(s).
- Experiment with various classification models: Decision Tree, Naïve Bayes and Neural Network are the minimum requirements.
- Think about dealing with imbalanced data.
- F1 Scoring Metric.
Detailed Description: The goal of this competition is to allow you to develop predictive models that can determine, given 54 cartographic variables, the correct forest type. There are 7 forest types (1-7) in the dataset. Each observation (record) represents a 30x30 meter region. As such, the goal is to develop the best classification model that can predict the forest type given the observation.
Since the dataset is imbalanced, the scoring function will be the F1-score instead of Accuracy.
Caveats:
- The dataset has an imbalanced class distribution. No information is provided for the test set regarding the distribution.
- Use your data mining knowledge till now, wisely to optimize your results.
- Try at least the following three classification methods: Decision Tree, Naïve Bayes, Neural Network.
Data Description: The dataset is split into training and test sets; both files are in CSV format. The training dataset consists of 14,528 records and the test dataset consists of 116,205 records. We provide you the class labels in the training set, and the test labels are held out. There are 55 attributes in each of the training and test sets. Attributes 1-54 are numeric cartographic variables - some of them are binary variables indicating absence or presence of something, such as a particular soil type. Specifically, attributes #1, 8, 9, 20, 22, 31, 42, 47, 50, 54 are numeric, and the rest are all binary (except the one for class labels). The last column contains the class labels.
- train.csv: Training set with 55 attributes. The last attribute is the class label (1~7).
- test.csv: Testing set with 54 attributes since the class labels are withheld.
- format.dat: A sample submission with 116,205 entries of randomly chosen numbers between 1 and 7.
Rules:
Feel free to use the programming language of your choice for this assignment.
While you can use libraries and templates for dealing with this problem, remember implementation is 50% of the grade. There should still be programming needed even if you choose to use existing packages. You should be able to explain these methods and their choice in sufficient detail.
Implementation will be graded based on the quality of your code, the amount of effort put in for classifier/model selection, scalability, etc. You are required to try at least the following three classifiers (1)Decision Tree, (2) Naïve Bayes, and (3) Neural Network. You can try more classifiers if you want to, but if it's something we have not covered in class, make sure you provide explanation of the method(s) to demonstrate your understanding of it. Justify the choice of your method via experiments and report the results using tables. Submit your best predictions. Summarize your findings in the report.
Your results should be reproducible. If we find that we cannot reproduce your results, or if the description in your report does not match what your code does, you will receive penalty on the assignment, and this may result in honor code violation.
You are allowed 5 submissions in a 24 hour cycle.
Attachment:- Assignment Files.rar
What types of it applications might it consider
: Describe the strategy a healthcare organization can use to lower its cost of care. What types of IT applications could they use to help them achieve this goal?
|
Create a table differentiating the bell la padula model
: Create a table in Microsoft Word differentiating the Bell La Padula model, the Denning Information Flow model, Rushby's model, the Biba model.
|
What biblical principle can you think of that could decrease
: If you cannot think of a single principle, then what biblical principle would you identify as guiding your responsibilities as a network security manager?
|
Prepare a high-level plan for your evaluation study
: Consider what type of formal evaluation study could be used to learn more about this technology and how it is likely to interact with people.
|
Develop predictive models that find cartographic variables
: The goal of this competition is to allow you to develop predictive models that can determine, given 54 cartographic variables
|
How would granting access to this impact their business
: Companies like Google, Apple, Microsoft, Twitter, Amazon and Facebook offer up free services to customers all across the globe.
|
Discuss project management tools
: Discuss project management tools that will help you accomplish this task and conduct a risk analysis of what can go wrong.
|
Demonstrate a connection to your current work environment
: If you are not currently working, share how this could be applied to an employment opportunity in your field of study.
|
Describe how you would start this incident off correctly
: Describe how you would start this incident off correctly by properly protecting and securing the evidence on the laptop.
|