Indian Institute of Information Technology, Allahabad

Computer Vision and Biometrics Lab (CVBL)

Deep Learning

Jan-June 2023 Semester

Course Information

Objective of the course: To get the students and researchers exposed to the state-of-the-art deep learning techniques, approaches and how to optimize their results to increase its efficiency and get some hands-on on the same to digest the important concepts.

Outcome of the course: As deep learning has demonstrated its tremendous ability to solve the learning and recognition problems related to the real world problems, the software industries have accepted it as an effective tool. As a result there is a paradigm shift of learning and recognition process. The students and researchers should acquire knowledge about this important area and must learn how to approach to a problem, whether to deal with deep learning solution or not. After undergoing this course they should be able to categorize which algorithm to use for solving which kind of problem. Students will be able to find out the ways to regularize the solution better and optimize it as per the problem requirement. Students will be exposed to the background mathematics involved in deep learning solutions. They will be able to deal with real time problems and problems being worked upon in industries. Taking this course will substantially improve their acceptability to the machine learning community – both as an intelligent software developer as well as a matured researcher.

Class schedule
Lecture: Wednesday (CC3-5206, 09.00 am - 11.00 am), Tute: Monday (CC3-5206, 05.00 pm - 07.00 pm), Practice: Wednesday (CC3-5241, 03.00 pm - 05.00 pm)

Computational Projects Added to Teaching Laboratories

Project ID Team Project Title Abstract
DEL23_P01 Mayank Bharati (IEC2020053), Abhishek Wani (IEC2020027), Shreyansh Sharma (IEC2020089), Chahit Kumar (IIT2020094) Text to Image Synthesis using Deep Learning Text-to-image synthesis is an emerging research focused on creating realistic images from descriptive text. This report provides an overview of the various techniques and standards for text-to-image synthesis, along with their uses and limitations. We also present the latest research results showing the latest technologies in this field.
DEL23_P02 Rupesh G (MML2022001), Raj Ahamed Shaik (MML2022004), Ashutosh Verma (MML2022016), Vikas Rajput (MHC2022012) Single Image Dehazing using Transformer Image dehazing is the process of separating hazy photographs from those that are clear and haze-free. Although convolutional neural networks have been routinely utilised for this job, picture dehazing has not yet benefited from recent advances in high-level vision problems employing vision Transformers. The authors of this work examine the use of Swin Transformer for picture dehazing and suggest DehazeFormer, which includes adjustments to the normalisation layer, activation function, and spatial information aggregation method. To show the efficiency of DehazeFormer, many variations were trained on distinct datasets. The big model scored the greatest PSNR on the SOTS indoor set among all prior state-of-the-art approaches, whereas the tiny model surpassed FFA-Net with a significantly smaller number of parameters and computational cost. The effectiveness of the approach on extremely non-homogeneous haze was further assessed using a sizable realistic remote sensing dehazing dataset that the researchers gathered.
DEL23_P03 Diya Srivastava (MHC2022004), Pragati (MML2022005), Dipankar Karmakar (MML2022003), Sayantan Chakraborty (MML2022007) Image Inpainting using Context Encoder Generative Adversarial Network Images Inpainting is an important topic of research in the field of image processing. The prime goal of image inpainting is to recover missing details in an image, demosaicing the image etc. In this paper we discuss progress of the image inpainting project using deep learning models, namely Context Encoder Generative Adversarial Network. The project has been implemented in the environment of google colaboratory.
DEL23_P04 Raj Jaiswal (MHC2022006), Akash Tyagi (MRM2022002), Arvind Kumar (MHC2022016), Behera Jyothikrishna (MRM2022006) Face Pose Correction Using Deep Learning Face pose correction is an important problem in computer vision, with applications in areas such as humancomputer interaction, biometrics, and augmented reality. The goal of face pose correction is to correct the orientation of a face in an image or video so that it is upright or in a desired pose. Face pose correction using deep learning involves using neural networks to detect and correct the orientation of faces in an image. This technique can be used in various applications, such as in photography, video conferencing, and surveillance systems.
DEL23_P05 Mohd Faiz Ansari (MML2022006), Rakshit Sandilya (MML2022008), Nikhil Rajput (MML2022010), Himanshu Mishra (MRM2022005) Thermal to Visible Image Translation This review report details the implementation and assessment of a pix2pix GAN for thermal to visual-picture translation. Deep learning algorithms will be used in the project to produce high-quality visual images from thermal photos. An overview of the methodology, data collection, and pre-processing procedures is provided in the report. The SSIM and PSNR measurements were used to assess the model. The outcomes demonstrate that the suggested method is successful in converting thermal images into realistic and excellent visual images. This method may be used in a variety of industries, including surveillance, search and rescue, and medical imaging. Overall, the article shows how well pix2pix GAN works for picture translation tasks and offers suggestions for further study in this field.
DEL23_P06 Aditya Biswakarma (IIT2020033), Harshit Kushwah (IIT2020039), Sanjeet (IIT2020052), Rohan Tirkey (IIT2020088) Image Caption Generator using Deep Learning techniques Image captioning is the process of writing narratives to go along with an image's occurrences. Virtual assistants, editing software, picture indexing, and assistance for those with impairments can all benefit from image captioning, which is an essential task. It connects computer vision with natural language processing, the two main branches of artificial intelligence. Encoder-decoder frameworks are frequently the foundation of more recent methods for picture captioning. Encoders frequently employ large convolutional neural networks that have already been trained. However, different authors' image captioning models employ various encoder designs. Because of this, it is more challenging to ascertain how the encoder affects the performance of the model as a whole. Because of this, figuring out how the encoder affects the performance of the model as a whole is more challenging. In this paper, we have done a comparative study between two popular convolution networks architectures – VGG and ResNet –as encoders for the identical image captioning model to determine which approach is the most effective at representing images used to generate captions. Based on this data we can determine how big the encoder plays and how significantly it can improve the model without changing a decoder architecture.
DEL23_P07 Harsh (MML2022002), Sumit Bhimte (MHC2022003), Shyam Dongre (MHC2022008), Umesh Maurya (MML2022011) Detecting Tiny Faces using Deep Learning Using a generative adversarial network (GAN) methodology, this project suggests a unique technique for identifying small faces in photos. Small and little faces, which are frequent in real-world photos, are difficult to recognize with current face identification techniques. To detect them a two-stage method is suggested in which a face detector is trained after highresolution pictures of tiny faces are initially created using a GAN. The GAN learns to produce high-quality pictures of tiny faces by conditioning on the input image and the target face size after being trained on a sizable dataset of face photos. The created photos are then utilised to supplement the face detector’s training data, which is trained on a different set of photographs.
DEL23_P08 Trisha Mistry (IIT2020167), Anish Jain (IIT2020173), Jayesh Ginnare (IIT2020191), Neeraj Gupta (IEC2020083) Weapon Detection using CNN and YOLOV3 In our project we have aimed to detect the presence of weapons and while doing so we have tried to compare two elaborate deep learning algorithms, that is YOLO and CNN. In YOLO, we use a single convoluted layer and thus in this algorithm its faster to reach to the output as compared to CNN, which has several convoluted layer and thus takes longer time to reach to the output. Hence, we can conclude that YOLO has overperformed CNN in terms of on accuracy and processing speed in detecting weapons and as been a more efficient algorithm. However, the study also revealed some limitations of the models, such as the difficulty in detecting small or partially occluded weapons.
DEL23_P09 Amit Roy (MHC2022001), Harshit Gupta (MHC2022013), Bhargav Burman (MHC2022011), Hemant Singh (PMD2022001) Crop Disease Detection using Deep Learning There is a lack of raw materials and food supply due to the increase in world population. The major and most important source to get around this particular problem is now the agricultural industry. Nevertheless, pests and numerous crop diseases provide a problem for the sector as a whole. Plant diseases are a serious factor in crop losses in the world’s agriculture. Due to a lack of specialised expertise, detecting illnesses in plants is complex and difficult. The identification of plant diseases from leaf photos is made possible by deep learningbased models.Larger training set requirements, computational complexity, overfitting, and other concerns are the main problems with these algorithms that still need to be resolved. This research focuses on the more effective machine learning model that we have created, which is based on conventional neural networks (CNN), and attempts to give a brief overview of the published solutions that already without actually taking extra pictures, some augmentation techniques including shift, shear, scaling, zooming, and flipping are used to generate more samples, expanding the training set.
DEL23_P10 Harsh Garg (IIT2020082), Anuj Chaturvedi (IIT2020019), Sameer Ahme (IIT2020053), Siddhant Agarwal (IIT2020228) Brain Tumor Segmentation Using Deep Learning The paper discusses deep learning models for segmenting MRI images, specifically the UNET model for Brain Tumor Segmentation. In simple UNET local features are lost during encoding or downsampling, resulting in constant learning as the model goes deeper. The use of spatial attention helps preserve local features, and the modified architecture involves additional modules in the encoding path which are concatenated during upsampling or decoding.
DEL23_P11 Himanshu Mittal (MML2022012), Khyati Kavathiya (MML2022014), Harshit Gupta (MML2022017) Image Deraining using Deep Learning The goal of the present study is to solve the issue of image deraining, which entails clearing raindrops and streaks from photographs taken in a wet environment. The suggested method uses a deep CNN architecture to develop an end-toend mapping between these two domains using pairs of clean and rainy photos. By using both local and global contextual information, the network seamlessly removes rain while maintaining visual characteristics and patterns. Experimental findings on benchmark datasets show that the proposed technique outperforms state-of-the-art methods in terms of both quantitative measurements and visual quality. The created model has the potential to be used to a variety of technologies, including driverless vehicles, security cameras, and outdoor photography.
DEL23_P12 Mohd Wasif (IIT2020227), Sanjay Ram (IIT2020247), Mohd Sarfaraz (IIT2020242), Yogiraj Chaudhari (IIT2020254) X-ray Images Abnormality Detection using Transformer Due to pulmonary fibrosis, virus-infected lungs differ structurally from healthy lungs, which can be seen using chest X-ray imaging. By separating chest X-ray images between those of normal lung tissue and those that have been infected by a virus, our project simulates the diagnostic procedure. On the Chest X-Ray dataset, we improved R50+ViT-B/16.
DEL23_P13 Pushkal Madaan (IIT2020005), Ritej Dhamala (IIT2020006), Jitesh Kumar (IIT2020224), Rahul (IIT2020244) Unsupervised Image Retrieval using Deep Learning In this project, we have implemented an unsupervised image retrieval system with the help of the CNN autoencoder, RESNET, and CBIR (Content-Based Image Retrieval) framework. The model design consists of an autoencoder that converts the input data into a new representation in the form of feature vectors from which we evaluate the similarity between images using the KNN (K-Nearest Neighbor) method. Images that are closer to one another in the latent space resemble one another more than those that are further apart. Images then selected are transformed into their original representations from the decoder part of the autoencoder.
DEL23_P14 Bhavesh Kumar Bohara(MML2022013), Harsh Kag(MHC2022015), Ankit Raj Ravi(MRM2022003), Abhijeet Pratap Singh(MDE2022003) Enhancing Image Quality with a Generative Adversarial Network Despite improvements in accuracy and speed, a significant problem still exists when employing faster and deeper convolutional neural networks for single image super-resolution. In this paper, we create a generative adversarial network (GAN) for picture superresolution (SR) identified as SRGAN. The use of SRGAN results in highly significant improvements in perceptual quality, according to a thorough mean-opinion-score (MOS) test. In comparison to any method that is state-of-the-art, the original high resolution photos are closer the MOS scores generated with SRGAN.
DEL23_P15 Aditya (MRM2022004), Manish Kumar (MML2022009), Naveen Sharma (MHC2022009), Dasaroju Jagannadhachari (MHC2022005) Improving Generalization of Deep Learning Models Deep learning models have achieved state-of-the-art results on a variety of tasks, but they are often susceptible to overfitting. Data augmentation is a technique that can be used to improve the generalization of deep learning models by artificially increasing the size and diversity of the training set. In this paper, we present a comprehensive survey of data augmentation techniques for deep learning. We discuss the different types of data augmentation techniques, their benefits and drawbacks, and their applications to different deep learning tasks.We also present a number of case studies that demonstrate the effectiveness of data augmentation in improving the generalization of deep learning models.
DEL23_P16 Aditya Vaishy (MHC2022017), Hritu Raj (MHC2022007), Vivek Kumar Soni (MHC2022010), Aditya R Patil (MHC2022014) Micro Expression Recognition using Deep Learning Face recognition is an important part of humancomputer interaction, which has important applications in many fields such as emotion, crime, behavior, and social engineering. The aim of this project is to classify human facial expressions into one of seven groups using Convolutional Neural Network (CNN). We create image recognition using Python programming language and TensorFlow library using CNN architecture with many convolutional, pooling, and full layers. The system is trained on a large image dataset using stochastic gradient descent optimization algorithms and data augmentation techniques to avoid overfitting. In this report, we discuss the process and results of our project. Furthermore, the paper addresses the challenges associated with real-world applications of micro expression recognition. These challenges include variations in lighting conditions, head poses, occlusions, and individual differences in facial expressions. The impact of these factors on the performance of micro expression recognition systems is analyzed, along with potential strategies for mitigating their effects. Finally, the paper highlights the diverse applications of micro expression recognition, ranging from psychological research on deception detection and emotion analysis to practical applications in security, human-computer interaction, and clinical diagnosis. The ethical implications and considerations surrounding the use of micro expression recognition technology are also discussed.
DEL23_P17 Mitul Varshney (IIT2020145), Divyansh Gupta (IIT2020207), Kavita (IIT2020252) Improving Data Augmentation for Deep Learning This study proposes a breast cancer image classification method using Convolutional Neural Networks (CNNs) with the Inceptionv3 architecture. The dataset used is the BreaKHis dataset, which contains 9,109 microscopic images of breast tumor tissue that are categorized into benign and malignant tumors. The images are preprocessed by using gryds.Interpolate() and gryds.BSplineTransformation() methods for image augmentation. The Inceptionv3 architecture is chosen as the backbone with pretrained weights from the imagenet dataset and is fine-tuned by adding additional convolutional layers with the Adam optimizer during training. The model is trained on the augmented dataset, and its performance is evaluated by comparing the predicted labels with the true labels of the images. The aim of this study is to improve the data augmentation so effectiveness and accuracy of the model will improve and provide better diagnosis and treatment for patients with breast cancer.
DEL23_P18 Anupam Dwivedi (IIT2020198), Perisetla Sri Satwik (IIT2020060), Marpina Srujana (IIT2020208), Savala Deepika (IIT2020164) Semantic Segmentation using Lightweight Transformer Our goal is to employ a compact transformer model. Many domains, including NLP and computer vision, require transformers. They can be made more cost-effective while maintaining their efficiency, which makes them better suited for usage in devices with limited resources. We concentrated on semantic segmentation using a simple transformer model. Semantic segmentation aims to assign a corresponding class of what is being represented to each pixel of an image. For each pixel in the image, we are making predictions.
DEL23_P19 Vibhu Garg (IIT2020028), Anubhav Rao (IEC2020103), Ekagra Sinha (IIT2020070), Anurag Patel (IIT2020253) Sentiment Analysis on Code-Mixed Data Sentiment analysis on code-mixed data, which refers to the phenomena of using two or more languages in a single statement or text, is the focus of this study area. Sentiment analysis is the process of removing irrational information from text data, such as views or feelings. However, due to the intricacy of language mixing and the dearth of annotated datasets, sentiment analysis on code-mixed data poses particular difficulties. The research will concentrate on creating efficient methods for performing sentiment analysis on data that has been code-mixed. To do this, a dataset of codemixed text data will be created, annotations for sentiment will be added, and machine learning models will be created to categorise the sentiment of the text. We will investigate various deep learning methods, including classical classification models and algorithms based on neural networks.
DEL23_P20 Sahil Pote (IIT2020240) Hyperspectral Image Classification Hyperspectral image classification is an important task in remote sensing, with applications in agriculture, forestry, and environmental monitoring. Recent research has explored the use of deep learning methods for hyperspectral image classification, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). However, these methods may still struggle to capture the full complexity of hyperspectral data. This work performs hyperspectral image classification using transformers, which may offer improved accuracy and efficiency compared to traditional methods.
DEL23_P21 Akash Verma (RSI2023001) Transformer for Small Datasets Vision Transformers have already caught up with the performance of convolutional neural networks in many tasks, however, they generally rely on pre-training with very large-scale datasets such as JFT-300M. Implementing them on small-sized datasets without the use of any pre-trained model requires the use of certain techniques, which can be combined and built upon further to improve performance.
DEL23_P22 Vatsala Upadhyay (RSI2022509), Nitu Kumari (RSI2022506), Seema Singh (RSI2022508) Medical Image Classification using Self-supervised Learning Self-supervised learning now becomes a popular choice for analyzing medical images because it annotates the given unstructured data and uses these self-generated data labels as ground truths for subsequent model training rounds. Early diagnosis of Alzheimer’s disease (AD) is an important task that facilitates the development of treatment and preventive strategies and can improve patient outcomes. This project focuses on the Alzheimer’s disease (AD) brain MRI classification using the SimCLR self-supervised learning for training with the ResNet50 model.In pretext task data augmentation random cropping is used and For fine-tuning evaluation, the data set model is divided into different 1%,10%,20%,50%, and 100%. Finally, the model performance is evaluated for accuracy in different split data sets.
DEL23_P23 Ravi Saroj (RWI2023003) Advances and Applications of Medical Image Segmentation Techniques Medical image segmentation using deep learning is an active area of research that has shown promising results in recent years. Deep learning algorithms, specifically convolutional neural networks (CNNs), have been widely used for medical image segmentation due to their ability to learn complex features and structures from medical images. This work coverx the current state of medical image segmentation using deep learning. Additionally, this report will cover the CVC-ClinicDB dataset and benchmarks used for evaluating the performance of medical image segmentation using deep learning.


  • C1 (30%)
  • C2 (30%)
  • C3 (40%)


  • Computer Programming
  • Data Structures and Algorithms
  • Machine Learning
  • Image and Video Processing
  • Ability to deal with abstract mathematical concepts


The content (text, image, and graphics) used in this slide are adopted from many sources for Academic purposes. Broadly, the sources have been given due credit appropriately. However, there is a chance of missing out some original primary sources. The authors of this material do not claim any copyright of such material.