Indian Institute of Information Technology, Allahabad

Computer Vision and Biometrics Lab (CVBL)

Deep Learning

July-Dec 2021 Semester

Course Information
Schedule
Computational Projects
Grading
Prerequisites
External Resources
Deep Learning Course
CVBL Courses
CVBL Home

Course Information

Objective of the course: To get the students and researchers exposed to the state-of-the-art deep learning techniques, approaches and how to optimize their results to increase its efficiency and get some hands-on on the same to digest the important concepts.

Outcome of the course: As deep learning has demonstrated its tremendous ability to solve the learning and recognition problems related to the real world problems, the software industries have accepted it as an effective tool. As a result there is a paradigm shift of learning and recognition process. The students and researchers should acquire knowledge about this important area and must learn how to approach to a problem, whether to deal with deep learning solution or not. After undergoing this course they should be able to categorize which algorithm to use for solving which kind of problem. Students will be able to find out the ways to regularize the solution better and optimize it as per the problem requirement. Students will be exposed to the background mathematics involved in deep learning solutions. They will be able to deal with real time problems and problems being worked upon in industries. Taking this course will substantially improve their acceptability to the machine learning community – both as an intelligent software developer as well as a matured researcher.

Course Instructors

Dr. Satish Kumar Singh

Dr. Shiv Ram Dubey

Teaching Assistants

Nayaneesh Mishra

Nand Yadav

Albert Mundu

Suranjan Goswami

Pravin Kumar

Class meets: Monday: 08.00 am - 10.00 am, Tuesday: 06.00 pm - 08.00 pm, Thursday 10.00 am - 12.00 pm; Remote

Schedule - Lectures

Date	Topic	Optional Reading
L01: August 02: 08.00 AM - 09.00 AM	Introduction Lecture: Linear Machines and Learning Slide, Recorded Video	Deep Learning, Ian Goodfellow, Aaron Courville, and Yoshua Bengio, MIT Press
L02: August 02: 09.00 AM - 10.00 AM	Supervised Learning Algorithms Slide, Recorded Video
L03: August 05: 10.00 AM - 12.00 PM	Linear Classifiers Slide, Recorded Video
L04: August 09: 08.00 AM - 10.00 AM	Support Vector Machines Slide, Recorded Video
L05-06: August 16: 08.00 AM - 10.00 AM	Neural Networks and Pre-Deep Learning Essentials Slide, Recorded Video
L07-08: August 24: 06.00 PM - 08.00 PM	Deep Learning: Introduction, Motivation and Status Slide, Recorded Video1 Recorded Video2	LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553 (2015): 436-444.
L09: August 26: 10.00 AM - 11.00 AM	Convolutional Neural Networks Slide, Recorded Video
L10: August 30: 08.00 AM - 09.00 AM	CNN Essentials Slide, Recorded Video
L11: August 30: 09.00 AM - 10.00 AM	CNN Performance Slide, Recorded Video	CRP Discussion
L12: September 02: 10.00 AM - 12.00 PM	Activation Functions Slide, Recorded Video
L13-14: September 06: 08.00 AM - 10.00 AM	Loss Functions and Regularization Slide, Recorded Video1 Recorded Video2
L15: September 07: 06.00 PM - 07.00 PM	Transfer Learning Slide, Recorded Video
L16-17: September 23: 10.00 AM - 12.00 PM	CNN Architectures for Image Classification Slide, Recorded Video
L18-19: September 27: 08.00 AM - 10.00 AM	CNN Architectures for Object Detection Slide, Recorded Video
L20-21: September 30: 10.00 AM - 12.00 PM	CNN Architectures for Image Segmentation and Dense Prediction Slide, Recorded Video
L22: October 11: 08.00 AM - 09.00 AM	Adversarial Attack: Fooling Deep Learning Models Slide, Recorded Video
L23-25: October 11: 09.00 AM - 10.00 AM & October 12: 06.00 PM - 08.00 PM	Generative Adversarial Networks Slide, Recorded Video(Oct 11), Recorded Video(Oct 12)
L26-28: October 21: 11.00 AM - 12.00 PM & October 28: 10.00 AM - 12.00 PM	Recurrent Neural Networks Slide, Recorded Video(Oct 21), Recorded Video(Oct 28)
L29: October 24: 11.00 AM - 12.00 PM	Special Lecture on Face Recognition under Surveillance by Dr. Satish Kumar Singh Lecture Slide
L30: October 26: 06.00 PM - 07.00 PM	Special Lecture on Biometric Security by Prof. Pritee Khanna (IIITDM Jabalpur) Recorded Video
L31: October 26: 07.00 PM - 08.00 PM	Special Lecture on DeepFakes by Dr. Kiran Raja (NTNU Norway) Recorded Video
L32-33: October 27: 06.00 PM - 08.00 PM	Special Lecture on Biometrics Recognition using Data Analytics and Predictive Technologies by Prof. Sanjay Kumar Singh (IIT BHU) and Deep Learning for 3D Biometric by Prof. Surya Prakash (IIT Indore) Recorded Video

Schedule - Tutorials and Labs

Date	Topic	Optional Reading
TL01-02: July 27: 06.00 PM - 08.00 PM	Introduction to Python Recorded Video	Python Hands-On
TL03-04: July 29: 10.00 AM - 12.00 PM	Introduction to Python Recorded Video
TL05-06: August 03: 06.00 PM - 08.00 PM	Introduction to Python Recorded Video
TL07-08: August 10: 06.00 PM - 08.00 PM	Project Discussions
TL09-10: August 12: 10.00 AM - 12.00 PM	Project Discussions
TL11-12: August 17: 06.00 PM - 08.00 PM	Project Discussions
TL13-14: August 19: 10.00 AM - 12.00 PM	Project Discussions
TL15-16: Sept 28: 06.00 PM - 08.00 PM	Project Discussions
TL17-18: Oct 04: 08.00 AM - 10.00 AM	Project Discussions
TL19-20: Oct 05: 06.00 PM - 08.00 PM	Project Discussions
TL21-22: Oct 07: 10.00 AM - 12.00 PM	Project Work
TL23-24: Oct 18: 08.00 AM - 10.00 AM	CRP2 Assessment
TL25-26: Oct 21: 10.00 AM - 12.00 PM	CRP2 Assessment

Computational Projects Added to Teaching Laboratories

Project ID	Team	Project Title	Abstract
DEL21_P01	Abhinav Batta (IIT2018010), Ashish Patel (IIT2018175), Shubham Soni (IIT2018177), Rohit Haolader (IIT2018008)	Subjective Assessment of Handwritten Digits	Our main goal in this project is to look for automation using deep learning. Our project aims to score the handwriting of the toddlers based upon how close it is to the ground truth. Simply speaking, our model acts as a teacher scoring the handwriting of the students. We take the ground truth as the perfect as the alphabet can be. We consider the lower case english alphabet as the starting point in our case.
DEL21_P02	Sawant Mrigyen (IIT2018033), Anish Gir Gusai (IIT2018044), Avneesh Gautam (IIT2018050), Avinash Kumar (IIT2018054)	Optimizing Resource Consumption of Capsule Networks on End Devices	Capsule networks have been a network that has been an improvement over CNNs for specific computer vision tasks, but the downside is that its resource requirements are still too high to be used as part of resource constrained devices. To make capsule network architectures more viable for use in resource constrained devices, we introduce one optimization to the capsule network: the Tucker Decomposition. The use of Tucker Decomposition reduces the time required for inference by 50% and reduces the parameters of the network by 20%, which is a significant improvement over the baseline capsule network implementation.
DEL21_P03	Ishan Agrawal (IIT2018081), Ankit Singh (IEC2018076), Shakti Majhwar (IEC2018006), Amit Yadav (IEC2018081)	Personalized Recommendation System using ANNOY	In this work, we implemented a real-time image recommendation system using ResNet and ANNOY. The aim of this project is to increase the efficiency of the traditional recommendation by decreasing the prediction time in a realtime environment drastically.
DEL21_P04	Nehal Singh (IIT2018119), Puja Kumari (IIT2018191), Prabha Kumari (IIT2018195), Akshat Solanki (IEC2018055)	Theme based colorization	We consider image transformation problems, where an input image is transformed into such output image which is a combination of the content images and style images. Recent approaches to such problems typically train neural feed-forward convolutional networks using the loss of each pixel between output and low-resolution images. Concurrent performance has shown that high-quality images can be created by defining and enhancing vision loss functions based on high-quality features extracted from pre-trained networks. So In this project we are going to use convolutional neural networks for image transformation tasks and train using perceptual loss function with VGG19 model.
DEL21_P05	Harsh Kochar (IIT2018049), Tushar Singh Parte (IIT2018035), Md. Atif Ayaz (IIT2018047), Nikhil Gujrati (IIT2018048)	Deep Reinforcement Learning Model: Asynchronous Advantage Actor Critic in domain Of Algorithmic Trading	The goal of this work is to construct a time series neural network model which will implement the A3C reinforcement learning algorithm to handle and optimize the profit margin from cryptocurrency trading. All of this is achieved by using the previous patterns observed in the prices of two of the most famous crypto currencies Bitcoin and Ethereum and trying to predict the future curve of their pricing graph.
DEL21_P06	Rahul Kumar (IIT2018013), Krishna Kant Chaudhary (IIT2018019), Ankit Kumar Das (IIT2018020), Jeet Mandal (IIT2018039)	Deep Learning Techniques for Player Performance Prediction in Cricket	Finding the correct combination of players is one of the most important tasks in any team sport. The performance of a player can depend upon a variety of factors. In Cricket, these factors include the current form, the venue, the opposition, the format, and the average strike rate, etc. A cricketer’s performance is maximized by scoring the maximum possible runs in the case of a batsman whereas a bowler maximizes by taking the maximum possible wickets and giving the minimum runs possible. In our project, we try to predict the performance of a cricketer i.e. runs scored or/and wickets were taken by the player via classification models created based on past statistics. The number of runs and the number of wickets are split into different ranges for the purpose of discretization. We have used Multi Layered Perceptron and Decision Tree classifiers to generate the classification model and compare their performances and Multi Layered Perceptron is found to be the most accurate among these by beating the Decision Tree accuracy by significant range.
DEL21_P07	Divyanshi Raisinghani (IIT2018022), Roshni Prajapati (IIT2018059), Manisha Kumari (IIT2018062), Trupti Pendharkar (IIT2018097)	Sentiment Analysis on Twitter Data using Deep Learning	With the progress in technology and advancement of social networking sites, it has created huge collection of product reviews and resulting polarity of opinions. These data collection can be used efficiently to solve various market related objectives such as product recommendation, market prediction and reviewer’s sentiment. However, as we know it is very difficult task to manage unstructured data which is available on social networking website. So, to manage this we use deep learning approach. In this report we will talk about CNN-LSTM based deep learning method with pre trained embedding approach learns to extract feature automatically for analysing sentiments and classification of reviews or opinions labelled into two polarity namely positive and negative.
DEL21_P08	Tarun Phate (MIT2020080), Ranjith Kalingeri (MIT2020017), Amit Kushwaha (MIT2020031), Majithia Tejas Vinodbhai (MIT2020058)	Lung Cancer Detection using 3D Convolutional Neural Networks	Lung cancer has one of the highest morbidity and mortality rates in the world. Lung nodules are an early indicator of lung cancer. Therefore, accurate detection and image segmentation of lung nodules is of great significance to the early diagnosis of lung cancer. This paper proposes a CT (Computed Tomography) image lung nodule segmentation method based on 3D-CNN and 3D-GoogleNet. 3D-GoogleNet enables the network to extract features of fine grained and coarse grained lung nodes to differentiate between cancerous and non-cancerous lung nodules. The method in this paper was trained and tested on the SPIE-AAM public dataset, Where the accuracy of 3D-GoogleNet reached 78.2% and 3D-CNN reached 57%. Considering the fact that we have trained our model on less data set, these models are giving reliable results.
DEL21_P09	Aayushi Thakur (IIT2018009), Ankita Chandra (IIT2018053), Kumar Utkarsh (IIT2018007), Soumyadeep Basu (IIT2018001)	Intrusion Detection System Using Deep Learning	Intrusion detection system is an essential information security technology that uses deep learning and machine learning algorithms to detect host and network level anomalies through the traffic information collected at a discrete point in the network. With the recent advancements, the malicious attacks over the network have also grown significantly thereby demanding a scalable solution which can detect and analyse these attacks dynamically. Our paper aims at proposing an advanced intrusion detection system that uses deep learning to detect the unpredictable, unforeseen and continuously evolving cyber attacks, analyse their types and enhance the network performance thereby improving upon the security of the network without compromising the user experience.The proposed solution has been employed using deep neural network (DNN), which is a deep learning model over real time data to curate an effective and flexible Intrusion detection system (IDS).
DEL21_P10	Akhil Shukla (IIT2018112), Parag Goyal (IIT2018164), Akshit Aggarwal (IIT2018166), Yash Katiyar (IIT2018170)	Human Pose Estimation using CNN	We propose a method for human pose estimation based on Convolutional Neural Networks (CNNs). The pose estimation is formulated as a CNN-based regression problem towards body joints. We present a cascade of such CNN Regressors which results in high precision pose estimates. The approach has the advantage of reasoning about pose in a comprehensive manner and has a simple but yet powerful formulation which capitalizes on recent advances in Deep Learning. We present a detailed empirical analysis with state-of-art or better performance on two datasets COCO and MPII consisting of diverse real-world images.
DEL21_P11	Shubham Kumar (IIT2018146), Kaustubh Chetan Parmar (IIT2017042), Abhinav Bansal (IIT2018155), Ankit Raj (IIT2018174)	Text Sentiment Analysis using Deep Learning	In this work, a systematic and well structured method for text sentiment analysis is presented.We have used the BERT (Bidirectional Encoder Representations from Transformers) model for sentiment analysis. BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.We have used pretrained transformers as our embedding layer and only train the remainder of the model which learns from the representations produced by the transformer, using a multi-layer bi-directional GRU.
DEL21_P12	Kushagr Garg (IIT2018107), Aditya (IIT2018161), Sushant Singh (IIT2018171), Vijit Jain (IEC2018086)	Handwritten Equation Verification	We propose a auto-checking mechanism for mathematical equations. It can save precious time of educators from checking the students’ submissions themselves. There hasn’t been any substantial strides in this domain. The only major advancement in the domain of auto-checking was Optical Mark Recognition which is very old, restricted to only multiple choice questions and tightly bound in a restricted format. Additionally, multiple candidates face various problems in OMR sheets in exams. So to fix these issues we have built a Deep Learning model that can autocheck the handwritten the freehand documents containing mathematical equations. It will allow learners to freely express their know-how on the problem meanwhile reducing the effort of the educators considerably. More on that below the report.
DEL21_P13	Ashutosh Mishra (IIT2018133), Nandini Goyal (IIT2018173), Kartic Choubey (IIT2018181), Eishaan Singh (IIT2018183)	Computer Aided Detection for Glaucoma using Machine Learning	Glaucoma is the principal reason for visual lack in the working age people wherever in the world. The finding of Glaucoma through concealing fundus pictures expects specialists to extract the presence and importance of various little features which, close by a complex assessing system, makes this an extremely tedious task. Many datasets that are publicly available contain very few images and that too which does not contain any other eye diseases.Here we are endeavoring to develop a framework with Convolutional Neural Network(CNN) design and data expansion which can recognize the versatile features related with the gathering task, for instance, exudate, micro aneurysms and haemorrhages on the retina. We by then will set up this framework using a first in class graphic processor unit (GPU). Open source RIGA,G1020 and DRISHTI-GS datasets are used as a commitment for Glaucoma.
DEL21_P14	Deepak Katre (IIT2018116), Moksh Grover (IIT2018186), Tejas Mane (IIT2018135), Aditya Kamble (IIT2018126)	Classification of COVID-19 patients using Chest X-ray Images	COVID-19 is the illness caused by a novel coronavirus now known as severe acute respiratory syndrome coronavirus, which was categorized as a breakout of respiratory infection. As we can see, the coronavirus is an ongoing pandemic that spread so quickly between people and approached 257 million people worldwide. The second wave in many countries caused the utmost destruction. There is a need for a mechanism which could be scalable, reliable and fast. The currently available methods suffer due to the limited and low equality samples.We proposed a very accurate and efficient model that is Detection of Covid-19 using Deep Learning and for better results, we have applied Image Super-Resolution on the dataset. Image Super-Resolution is the technique in which high-resolution images are restored from the lower resolution images. The analysis is performed on a publicly available dataset. The model is classifying the Covid-19 patients with an accuracy of 97.36% and these high results will help the radiology specialist to reduce the false detection rate.
DEL21_P15	Raktim Bijoypuri (IIT2018125), Ayush Raj (IIT2018188), Harsh Bajaj (IIT2018190), Raunak Rathour (IIT2018196)	Multiclass classification of Brain MRI for tumor using CNN and Image Augmentation	Brain tumour is one of the most fatal diseases and for cure early diagnosis and treatment is required. Manual classification of MRI is complex and takes so much time and so the problem here is to automate the classification of brain tumours using different machine learning algorithms and image processing techniques. Here we have used transfer learning algorithm with pre built convoluted neural network architectures like - ResNet, VGGnet and InceptionV3. These models are trained on Imagenet dataset.
DEL21_P16	Rohit Rai (MIT2020098), Gaurav Kumar (MIT2020099), Anoop Kumar (MIT2020123)	VGGNet - LSTM based model for VQA-RAD	This work presents VGGNET-LSTM based VQA model that answers the questions based on images. We have discussed three major components in this paper namely CNN based Image Model, the LSTM based Question Model, and a Dense Neural Network in order to concatenate the output from those two models followed by softmax activation function to obtain the final answer as output. The image model uses a CNN to get the representation of images. Specifically, the VGGNet is used to extract the image feature map from a raw image. Question layer passed through LSTM, and preprocessed image passed through dense network (activation is tanh). Experiments have beed conducted on VQA-RAD dataset that consist of 3,063 radiological images of different parts of human body mainly head, chest and abdomen.
DEL21_P17	Mandeep Kumar (MIT2020019), Akbar Ansari (MIT2020086), Nikhil Jaiswal (MIT2020076), Apoorv Bhardwaj (MIT2020113)	Object detection over Indian Driving Dataset	Object detection, which combines object categorization and object location within a scene, is one of the most challenging tasks in computer vision. Object detection is a computer vision problem that entails detecting the existence, position, and type of one or more items in an image. It's a challenging topic that necessitates integrating methodologies for object identification (where are they), object localisation (where are they), and object categorization (what are they).On typical benchmark datasets and in computer vision contests, deep learning algorithms recently reached state-of-the-art object recognition results. With a single end-to-end model that can conduct target detection in real-time, the YOLO family of Convolutional Neural Networks, for example, produces near-state-of-the-art results.
DEL21_P18	Mudit Goyal (2018132), Abhishek Kumar Gupta (IIT2018187), Shiv Kumar (2018134), Karan Chatwani (IIT2018194)	Handwritten word recognition in JPEG compressed domain using deep learning	The ability for an automated system to receive and interpret messages from a handwritten format has been a field of interest for the past few decades.The concept discussed in this paper uses a hybrid approach combining the aspects of both RNN and CNN to achieve the same in the case of images in JPEG compressed image domain.
DEL21_P19	Chinmay Sethi (IIT2018182), Gourav Yadav (IIT2018024), Himanshu Janbandhu (IIT2018025), Priyatam Reddy Somagattu (IIT2018093)	Detection of Pneumonia by Chest Radiography with Combination of Deep Learning-SVM Based Model	Chest Radiography is one amongst the foremost used clinical tests employed in the medical imaging diagnosing. Pneumonia could be a common infection that have a great affects on health of human respiratory organ areas.Aim of this research is to provide a deep learning based approach for pneumonia detection using a combination of SVM with ResNet and compare the result with the results found when only ResNet was used. During this article we’ve planned the application of a pretrained Convolutional Neural Network(CNN), to extract the feature vector on which classifiers are going to be used for classifications.
DEL21_P20	Anshul Agarwal (IWM2017008), Manav Vallecha (IRM2017007), Vardhan Malik (IWM2017007), Kumar Raju Bandi (IWM2017502)	Adversarial Networks for Image-to-Image Translation	In this work, we demonstrate a generalised solution to learn mapping of one data-set of images to another dataset of images. This is done using adversarial networks which are conditional in nature. Unlike other generative adversarial networks, the generated data in this case is conditional to the input images passed from the origin data set. Recent studies related to this problem have been discussed followed by a detailed report on the methodology and implementation. The performance of this solution has further been demonstrated on varied data sets such as converting satellite images to map images and converting edges to photos. The corresponding results are included along with the conclusion and future scope of work that can be done on this.
DEL21_P21	Onkar Telange (IIT2018065), Shreyansh Sahu (IIT2018073), Sahma Anwar (IIT2018074), Harshit Kumar (IIT2018075)	Deep Learning using Multi-Neural Networks	Deep Learning is an ever-growing domain of computer science that has ushered a new era with computers now capable of performing human-like tasks of detection, identification and classification. Central to these human-like capabilities is a special form of neural network, known as Convolutional Neural Network(CNNs), which utilises Convolution to learn low level features from the image samples. CNNs perform exceptionally well for simple classification problems, but with growing complexity, it sometimes fails to capture the essence due to problems such as overfitting, vanishing gradient, etc. Vanishing gradient, in particular, is a problem of Deep Networks with many layers, which may be necessary to capture sophisticated patterns in complex scenarios. Had it not been for the Residual Networks, the CNNs might have been effective only up to a certain degree of complexity. Though we can utilise the skip connections to construct very deep CNNs, the training time for such networks increases manifold. And, it has been observed that performing some form of aggregation(either layerwise or output) of two or more networks results in increased efficiency and generalization, but then we need to train multiple such networks. In order to overcome this significant training time, we propose to train multi Neural Networks simultaneously while connecting them in a common weight space, and compare this arrangement with other forms of aggregations that we mentioned above.
DEL21_P22	Rajat Sharma (2018141), Gowni Sai Pavan (IIT2018139), Mallampalli Maheswarnath (IIT2018137), Yatharth Parate (IIT2018157)	Sketch-to-Image Synthesis using Deep Learning	Our work is on generating RGB images from sketches. One can ask why this is important. Because this work has a lot of real world applications. For example police and other agencies can use this concept to generate RGB images from sketches and match them against databases to find matches against potential criminals from sketches based on information from eye witnesses. It can also have a wide range of applications in the animation industry where sketch to RGB conversion can be used for cost efficient animation.
DEL21_P23	Achyut Dubey (MIT2020082), Riya Shah (MIT2020083), Anshul Mishra (MIT2020084), Samiksha Gupta (MIT2020105)	Leveraging LSTM and Bi-Directional LSTM Model to predict Stock Price	Stock price forecasting is becoming very popular in the financial world. Stock price prediction is important for the growth of shareholders in a company’s stock because it increases the interest of speculators in investing money in the company. A successful prediction of a stock’s future price could give good benefits. Various approaches have been used in forecasting stock trends in previous years. A new stock price prediction framework is proposed in this study, based on two popular models: the Recurrent Neural Network (RNN) model, i.e. the Long Short Term Memory (LSTM) model, and the BiDirectional Long Short Term Memory (BI-LSTM) model. According to the simulation results, our proposed scheme can predict future stock trends with high accuracy using these RNN models, namely LSTM and BI-LSTM, with proper hyper-parameter tuning. The RMSE for both the LSTM and BI-LSTM models was calculated by varying the number of hidden layers, epochs, and hidden layer units, dense layers to find a better model that can be used to accurately forecast future stock prices. The evaluations are carried out by utilising a freely available dataset for stock markets with open, high, low, and closing prices.
DEL21_P24	Nikhil Goyal (IRM2017005), Prabhakar Kumar (IRM2017008), Siddharth Gupta (IRM2017002), Ritesh Yadav (IRM2017001)	Image Restoration using Generative Adversarial Network	This report is a description of our deep learning project, titled “Image restoration using Generative Adversarial Network”. In this work, we have restored the images which has gone through several types of degradation with time using the deep learning approach. As compared to the traditional restorations methods which uses supervised learning, real photos are degraded in a much more complex manner and these types of traditional methods fail to generalize. Therefore, we are using generative adversarial networks (GANs) to approach this problem.
DEL21_P25	Pankaj P (MIT2020114), Arun Kumar (MIT2020116), Pankaj Kumar Saini (MIT2020117)	NextWord Prediction	The next-word prediction feature in text generation, in particular, helps users to type without errors and at a faster pace. Therefore, a personalized text prediction system is essential to the analysis of all languages. The sequential nature of their output (current output is dependent on previous output) helps them cope with next-word prediction successfully.
DEL21_P26	Shubham Chandra Joshi (MIT2020047), Pranjit Das (MIT2020011), Kunwar Pratap (MIT2020048), Chandra Mani Rai (MIT2020112)	A Contactless Palm Based Person Identification Using CNN	Due to the COVID-19 pandemic, automated contactless person identification based on the human hand has become very vital and an appealing biometric trait. Since, people are expected to cover their faces with masks, and advise avoiding touching surfaces. It is well known that usually contact-based hand biometrics suffer from issues like deformation due to uneven distribution of pressure or improper placement on sensor, and hygienic concerns. Whereas, to mitigate such problems, contactless imaging is expected to collect the hand biometrics information without any deformation and leading to higher person recognition accuracy; besides maintaining hygienic and pandemic concerns. Towards this aim, in this paper, an effective multi-biometric scheme for personal authentication based on contactless fingerprint and palmprint selfies has been proposed. In this study, for simplicity and efficiency, three local methods: normalization, Palm-ROI-Extraction, and Local tetra pattern(LTrP) features( for image indexing), have been employed to extract salient features from contactless fingerprint and palmprint selfies. Experimental results on a publicly available database (IIT-Delhi touchless palmprint dataset) show that the proposed contactless multi-biometric selfie system can easily outperform uni-biometrics.
DEL21_P27	Raju Yadav (MIT2020065), Soraj Kandari (MIT2020066), Maneesh Sagar (MIT2020062), Sharad Goyal (MIT2020119)	Music Genre Classification using Deep Learning	Music genre classification is a vital activity that involves categorizing music genres from audio data. In the field of music information retrieval, music genre classification is frequently utilized.The proposed framework deals with three main steps: data pre-processing, feature extraction, and classification. Convolutional neural network (CNN) is the method used to tackle music genre classification. The proposed system uses feature values of spectograms generated from slices of songs as the input into a CNN to classify the songs into their music genres. A recommendation system is also implemented after the classification process. The recommendation system aims to recommend songs on each user’s preferences and interests. Extensive experiments carried out on the GTZAN dataset show the effectiveness of the proposed system with respect to other methods.
DEL21_P28	Nikhil Ojha (MIT2020081), Priyank Makwana (MIT2020045), Vaibhav Sharma (MIT2020015), Priyush Kumar (MIT2020055)	Incorrect Face Mask Detection	The coronavirus disease (COVID 19) has spread rapidly in the world, causing a worldwide catastrophe. The impact of the disease is being felt not just economically but also socially and in terms of human lives. Of the many mechanisms to fight this disease, wearing a facemask was found to be most effective, but the effectiveness of face masks has been diminished, mostly due to improper wearing. In this study we develop a facemask wearing identification system that classifies whether the given 2D image is wearing the face mask correctly, incorrectly or not wearing a face mask. For solving this three class classification problem we used MTCNN for face detection and a convolution neural network (CNN) as a classification network. The proposed method can be outlined in 4 steps : Image pre-processing, face detection and cropping, facemask condition classification. The dataset used for this study was Medical Mask Dataset, a publicly available dataset containing 3835 images with each image containing multiple faces. Total faces in the dataset is 8490 with 1741 faces wearing no mask, 6520 wearing the correct mask, and 223 wearing the incorrect mask. The proposed model achieved 86% accuracy. The findings of our study indicate that the proposed model can achieve identification of facemask-wearing conditions with high accuracy, and as the model automatically detects faces it can be used along with a video surveillance system thus having potential applications in COVID-19 pandemic prevention.
DEL21_P29	Yashwant Kumar Chandra (MIT2020030), Kartik Singhal (MIT2020046), Pradeep Kumar (MIT2020053), Sunil Kumar Maurya (MIT2020073)	Hate Speech Detection Using Deep Learning	This work proposes an experimental description to detect hate speeches using deep learning methods. This work also compares traditional approaches with deep learning based approaches.
DEL21_P30	Kapil Deshpande (MIT2020040), Prashant Sikarwar (MIT2020042), Ankit Kumar (MIT2020041), Jitendra Kumar Sahu (MIT2020038)	Automated Essay Grading System Using Deep Learning	Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. Essay evaluation is a time consuming process, a teacher denotes a huge amount of time in evaluation of essays because of its subjectivity. Therefore, AES can helps in reducing enormous amount of cost and time and can remove unfairness which arrives when many teachers evaluate the paper. In this report we have proposed a deep LSTM based model for AES. In an attempt to make our results more accurate and precise we have also introduced attention layer in deep LSTM based model. It can perform much better than traditional feature based approach which typically rely on hand-crafted features to predict essay quality, but such systems are limited by the cost of feature engineering.
DEL21_P31	Nagendra Tomar (IIT2018057), Ravi Kumar (IIT2018094), Giriraj Chandak (IIT2018095), Ankit Kumar (IIT2018100)	Mask RCNN: Image Segmentation and classification	One of the most significant achievements in the field of machine learning has been object detection. Object detection seeks to identify all instances of a known class of items in a photograph, such as people, cars, or faces. Deep learning algorithms have recently been used to detect objects, and previous systems have certain issues with viewpoint changes and occlusion. This work provides a method for automatic object detection by incorporating instance segmentation at the pixel level, as well as a number of RCNN approaches. The proposed Mask RCNN identifies images and adds bounding boxes, class labels, and masks to them. The mask RCNN model was trained and tested with the Coco dataset. The future scope of this work is also mentioned in this paper, so that the model’s resilience and reliability can be increased further.
DEL21_P32	Nayan Agarwal (IWM2017501), Shubham (IWM2017004), Aman Gupta (IWM2017006), Anshul Kumar (IWM2017002), KL Rohit (IWM2017009)	Detection of Diabetic Retinopathy Using Convolutional Deep Neural Networks	This project deals with the detection of Diabetic Retinopathy and the determination of its stage. DR can be diagnosed by studying interior surface of eye, which includes retina,optical disk, fovea and macula. The project will then be deployed using a web app or on a website as per industrial requirements.
DEL21_P33	Madhushekhar (IRM2017004), Pratham Singh (IRM2017006)	JPEG Artifact Removal using Deep Learning	Complex compression artifacts, such as blocking artifacts, ringing effects, and blurring, are introduced by lossy compression. Some methods either remove blocking artifacts and create blurred output, or restore sharpened pictures with ringing effects. For seamless attenuation of various compression artifacts, we propose a compact and efficient network. We also employ layer decomposition and the combined usage of large-stride convolutional and deconvolutional layers to try to speed up the model.
DEL21_P34	Akhilesh Kumar (RWI2021001)	Target Detection using DETR – a customization study	We explored the DETR model proposed in the End-toend Object Detection with Transformers paper by the team at Facebook AI. The authors have demonstrated interesting object-detection results from DETR model. That triggered the curiosity to use the model for detection of custom objects / targets. Here, we are presenting the way to train the DETR model with pre-trained weight initialization over custom dataset. We shall look inside the results obtained while moving forward towards more training iteration cycles. The results demonstrate significant improvement with respect to number of training epochs both visibly as well as statistically.
DEL21_P35	Neeraj Baghel (RSI2021003)	SRT: Generating Super-Resolution Images using Transformers	Image super-resolution aims at recovering a high resolution image from its low-resolution counterpart. The image super-resolution task has witnessed great strides with the development of deep learning. However, more complex neural networks bring high computational costs and memory storage. It is a still active area for offering the promise of overcoming resolution limitations in many applications. In recent years, Transformers has made significant progress in computer vision tasks as its robust selfattention mechanism. Due to the high computation task in transformers, there is no complete transformer network designed for image super-resolution. To address this problem, we proposed a complete transformer network SRT (Super-Resolution Transformer). SRT network consists of two transformer networks: transformer generator network and Transformer discriminator network. We introduce the novel Transformer Encoder Generator module to take image patches as input and generate the ×2 and ×4 resolution images progressively. This module can help to decrease GPU memory utilization.
DEL21_P36	Arindam Ghosh (RSI2021001)	A Study of VGG16 Network on Dog Breed Classification	Computer vision is the most emerging field of AI research in the present decades. Several new technology has been developed for image classification, object detection etc. Convolutional neural network is a significant development in this field for image classification, detection and segmentation. In this paper, we discuss an experiment on image classification by using VGG network. We have taken a dog dataset and classify the dog breed among 120 classes like Afghan Hound, Dingo, Doberman etc. We use two approach to learning the concept one is to apply VGG network from scratch and another is to use Transfer Learning of VGG net.
DEL21_P37	Pradeep K. Pant (RWI2020001)	Object Detection in Scientific Plots	Reasoning over plots is a task which requires multiple things e.g.; Object detection, OCR, extracting data into a semi-structured etc. PlotQA dataset was introduced in 2020 with 28.9 million question-answer pairs over 224,377 plots on data from real-world. PlotQA was introduced because existing datasets (FigureQA, DVQA) for reasoning over the plots do not contain variability in data label’s, real-valued data or complex reasoning questions also models based on these datasets perform very poor on reasoning over the scientific plots. One of the main reason why these models were not performing well because they work with a fixed size vocabulary. In particular, they assume that the answer comes either from a small fixed size vocabulary or from a bounding box within the image. However, in practice, this is an unrealistic assumption because many questions require reasoning and thus have real-valued answers which appear neither in a small fixed size vocabulary nor in the image. PlotQA aim to bridge this gap between existing datasets and real-world plots. PlotQA addresses these problems, further, 80.76 of the out-of-vocabulary (OOV) questions in PlotQA have answers that are not in a fixed vocabulary. PlotQA dataset 1. PlotQA use a hybrid approach: Specific questions are answered by choosing the answer from a fixed vocabulary or by extracting it from a predicted bounding box in the plot, while other questions are answered with a table question answering engine which is fed with a structured table generated by detecting visual elements from the image. On the existing DVQA dataset, our model has an accuracy of 58%, significantly improving on the highest reported accuracy of 46%. On PlotQA, our model has an accuracy of 22.52%, which is significantly better than state of the art models. In this work we have tried some enhancement in existing PlotQA model by introducing DETR algorithm for end-to-end object detection during Visual Element detection (VED) phase. Initial results shows improvement in accuracy on a sliced data-set.
DEL21_P38	Ashok Yadav (RSI2021002)	Plant Leaf disease classification using MobilenetV2	Agriculture was a major contributor to the economies of developing countries. As a small country, we have reached our population limit. We’re a developing nation. We must raise production in order for GDP to expand. Plant diseases cause massive productivity losses in agriculture every year. As we all know, the majority of farmers in our nation are illiterate and lack proper knowledge of the disease, thus they are unable to detect it manually. You can solve the problem if you can correctly identify it in the early stages of the sickness. We’re working on a model to classify leaf diseases. Deep learning technology will aid our farmers. Using a convolutional neural network, classify photos. Because the method can’t fully capture the features of existing data, the MobileNetv2 architecture was employed. For mobile devices, MobileNetv2 is extremely useful. This poll has a validation accuracy rate of 90.38 percent. As a result of this technique, the agricultural sector is now assisting farmers in classifying diseases prior to harvest. Our model’s major purpose is to minimize damage to infected plants that can help to production growth. Farmers can save money by tackling this problem on their own. Our goal is for them to be able to heal their crops at the appropriate moment. We prefer to design strategies to identify leaf diseases in order to attain this goal. We collect a variety of leaves. And our model can be put to the test on any vacation. We strive to reduce leaf disease in our model.

Grading

C1 (30%): 10% Written + 20% Practice
C2 (30%): 10% Written + 20% Practice
C3 (40%): 20% Written + 20% Practice

Prerequisites

Computer Programming
Data Structures and Algorithms
Machine Learning
Image and Video Processing
Ability to deal with abstract mathematical concepts

Books

Deep Learning, Ian Goodfellow, Aaron Courville, and Yoshua Bengio, MIT Press

Disclaimer

The content (text, image, and graphics) used in this slide are adopted from many sources for Academic purposes. Broadly, the sources have been given due credit appropriately. However, there is a chance of missing out some original primary sources. The authors of this material do not claim any copyright of such material.