Reference no: EM133129674
The PlantVillage dataset consists of healthy and unhealthy leaf images divided into different categories by species and disease.
"We will work with a subset of the dataset, we will be interested on classify leaf images to different species (we don't look to diseases). We provide the functions needed to download and start working with the dataset.
In this CAE you will need to:
- Classify the images with BoF + SVN approach
- Classify the images with a CNN algorithm using pytorch
- Use an AutoML system to train a CNN
Once the previous cell has finished you should see the dataset in Colab. Please explore the left menu to see the Dataset. Particularly you can see the color images that we will use in the notebook in:
Part 1:
Question 1 Analyze the data provided:
- How many images are in the test and train datasets?
- How many categories are in the data provided? Which ones?
- Do we have equal number of samples for each c
Question 2 Please load and display one image using cv2_imshow function:
Part 2:
Question 1 Compute the Bag of Features (BoF) from the **training** dataset, you can use all the code needed from the reference notebook. Use a dictionary of **70 features** to train the BOWKMeansTrainer.
You can create multiple cells to ease the execution of the differents parts involved
Question 2 Use a Support Vector Machines (SVM) classifier from the Scikit-learn library to train a classifier based on the BoW features computed in Question 1.
Question 3 Now load the test dataset and check the performance on the test set of the SVM trained on the previous training set.
Question 4 Analyze the results obtained. Which method obtains better results? It is more important the results obtained in the train or the test dataset?
Question 5 In the Machine Learning chapter we have seen alternatives to SVM to build different kinds of classifiers. Can you use the sklearn library to build a classifier that obtains similar results than the SVM trained before?
Part 3:
Question 1 The reference notebook uses the MNIST dataset that is integrated with pytorch. The Plant Village Dataset that we are using doesn't have the Dataloader prepared.
To be able to do a deep learning classifier with pytorch we have to create the Dataloader. We want to use the same network than the one done in the reference notebook. To do so, we need to design the Dataloader as follow:,
" - Use the CustomImageDataset structure from [pytorch tutorial]
" - We want to reuse the network from the 5_2_imageClassification_CNN_MNIST. To do so the Dataloader has to:,
" - **resize** the images to 28x28,
" - transform them to **grayscale**,
" - convert them to **float 32**,
Question 2 Once we have the dataloader prepared we can train the network from
Question 3 Analize the results obtained until this point. How is the train / test loss evolution? How the CNN results of exercise 2 compare to SVM results from exercise 1
Question 4 Adapt the Dataloader to work with the full resolution image. We expect:,
" - Images with the full resolution 256 x 256 pixels,
" - Work with RGB images ( 3 channels ),
" - Convert the images to float32 tensors"
Question 5 Define a neural network to work with the full resolution images. Train it using the same train and test functions defined above. You can propose any network in this part, whether starting from a known architecture or adding layers manually until getting the correct network shape.
Question 6 Compare the results obtained with the low and high resolution dataset. Does the high resolution dataset improve the results significantly?
Part 4:
Question 1: Study the tutorial on [image classification]. Build an AutoML model using this example as a reference.
Question 2 Analyse the results obtained with autokeras and compare them with the other ML solutions developed in this notebook. Which structure has the model generated with autokeras? Which obtains better results? Would you consider AutoML models for future use before training and personalizing models in other libraries?
Attachment:- Image_classification.rar