Reference no: EM133678490
Assignment - Computer Vision Concepts Implementation
The context of this assignment is the common computer vision task object detection and recognition. Specifically, the task is to implement and evaluate classifiers to automatically recognise different species of birds in images.
Dataset:
We will use the Caltech-UCSD-Birds_200_2011 dataset. This dataset contains 11,788 images of 200 bird species (in the range of 40-60 images per bird species). Here are some examples:
The dataset is available on the Nutanix SciTech Student Virtual Desktop with GPU (in folder U:\Faculty of SciTech\Units\un8890 - Computer Vision and Image Analysis PG\CUB_200_2011) or you can download a ZIP file from OneDrive. The size of the ZIP file is about 1.1GB.
The images can be found in the images folder, with a subfolder for each of the 200 classes. The images.txt file contains a list of the individual image file names including the subfolder path. The classes.txt file contains the class names (bird species). The image_class_labels.txt file contains the ground truth class labels (bird species label) for each image. The dataset is ‘benign' in the sense that each image contains only one bird. The bounding_boxes.txt file contains the (x, y, width, height) parameters of a single bounding box for each image, describing the image area that contains the bird. More information about the structure of each of these files can be found in the README.txt file.
We use a 60:20:20 split for training, validation, and test data. That is, for each class, the first 60% of images are to be used for training, the next 20% of images for validation, and the remaining 20% for testing. The details of which image files belong to these partitions can be found on the Assignment page (see the links to train200.txt, validate200.txt, and test200.txt).
We also provide a smaller subset containing only 20 classes that can be used for development purposes to reduce the time individual runs of experiments take. The final experiments and results reported in the assignment submission must be on the full 200-class dataset.
Task:
Your task is to implement and evaluate classifiers for the automatic classification of bird images from this dataset. Specifically, you are asked to implement (both!):
A handcrafted image feature method with a classic machine learning approach (e.g. SIFT features with a Support Vector Machine classifier).
A deep learning-based approach, this is both features and classifier are learned together (e.g. a Convolutional Neural Network).
The choice of specific method/approach is up to you, as are the specific parameters (for example, will you resize the images first (note: strongly recommended!)? How many layers does your DL network have? What model/network architecture do you use?) but must be clearly documented in the Matlab code and described in the performance evaluation report (see below).
Both types of approaches need to be evaluated in two scenarios (both!):
The entire images are used as input.
Only the image area marked by the bounding box information is used as input. Consequently, you will need to perform four experiments to evaluate the performance:
Performance evaluation here means the class-weighted overall average accuracy and the individual class correct and incorrect recognition rates for the test partition of the data (include a table with these performance measures for all 200 classes).
Finally, you are to write a report (max four A4 pages using the IEEE A4 conference paper template (DOC, LaTeX)) that describes:
Briefly describe the methods/approaches you chose to implement, including any specific parameter choices.
Present and discuss the results of the performance evaluation of the four experiments.
Describe the lessons learnt: What could be done differently to improve the results?
A further five marks will be awarded for using good programming principles such as clear program structures, good use of comments, use of functions (where appropriate) etc.
Bonus up to 10 marks (but no more than max 50 in total, i.e. you can make up some marks but cannot get more than the maximum possible):
Run Experiment 4 as a fivefold cross-validation. Assume we partition the data in each class into five parts of 20% each. We can then run the experiment five times with different parts used for training, validation and test, while maintaining the 60:20:20 split. For example:
Run 1: Parts 1-3 for training, Part 4 for validation, Part 5 for test
Run 2: Parts 2-4 for training, Part 5 for validation, Part 1 for test
Run 3: Parts 3-5 for training, Part 1 for validation, Part 2 for test
Run 4: Parts 4, 5, and 1 for training, Part 2 for validation, Part 3 for test
Run 5: Parts 5, 1, and 2 for training, Part 3 for validation, Part 4 for test
Report on the average overall accuracy and the average class correct and incorrect recognition rates by averaging across the five runs. This is what we mean by cross-validation.
Experiment 1: Basic Image Classification
In this experiment, we attempt to classify bird species using traditional image processing and machine learning techniques. It involves feature extraction using methods like SIFT or SURF from the images, followed by training a Support Vector Machine (SVM) classifier with these features. The process includes loading the image data, extracting features, training the classifier, and then evaluating the model's performance on validation and test datasets.
Experiment 2: CNN-based Image Classification
This experiment advances from traditional methods to deep learning, employing Convolutional Neural Networks (CNNs). We define a CNN architecture suitable for the task, train it using backpropagation with a set of labeled bird images, and then validate and test its performance. This method should automatically learn the necessary features for classification from the data, potentially outperforming the first experiment.
Experiment 3: Bounding Box Integration in Classification
Experiment 3 enhances the CNN approach by integrating bounding box information to focus the network's attention on relevant parts of the image where the bird is located. It involves modifying the data loading process to include the bounding box information and adjusting the CNN's input layer to take cropped images based on these bounding boxes, before training and testing the classifier.
Experiment 4: Fine-tuning a Pre-trained CNN
The fourth experiment involves using a pre-trained network and fine-tuning it on the bird species dataset. This transfer learning approach leverages a network that has been trained on a large and diverse dataset to extract rich features, which are then fine-tuned for the specific task of bird classification. This often results in improved performance due to the pre-trained network's generalization capabilities.