VLR23-P01 |
IIT2020011 ANKIT KUMAR |
Image Super-resolution |
SwinIR gives good results for
the task of image super
resolution. In the paper we
explain the architecture of swinir
and provide comparison of
performances of different
techniques. |
VLR23-P02 |
IIB2020008 SAMRIDDHI V WALIAm IIB2020014 MOHAN LAL AGARWALA, IIB2020502 ANIRUDDH SHARMA, IIT2020166 SHANTANU CHAUDHARY |
Human Counting in Crowded Scenerio using DETR |
In this report, we present a Human
Detection and Counting System developed using
YOLOv3, a state-of-the-art deep learning algorithm for
real-time object detection. The primary objective of this
system is to provide efficient and accurate human
detection in various surveillance scenarios, ranging
from retail space monitoring to crowd management in
public transportation systems. The importance of this
system is underscored by its potential applications in
public safety and health, particularly in contexts like
monitoring crowd sizes for disease control purposes.
The YOLOv3 algorithm is chosen for its balance
between speed and accuracy, making it suitable for
real-time application scenarios. Our system
demonstrates its capability to effectively detect and
count humans in diverse and dynamic environments,
highlighting its potential as a versatile tool in
surveillance and monitoring applications. |
VLR23-P03 |
MML2022001 RUPESH G, MML2022004 RAJ AHAMED SHAIK, MML2022016 ASHUTOSH VERMA |
Single Image Dehazing |
Image dehazing is the process of generating clear
and haze-free from hazy photographs. Despite the fact that
convolutional neural networks are commonly used for this task,
image dehazing has yet to gain from recent breakthroughs
in high-level vision problems using vision Transformers. This
paper’s authors investigate the usage of Swin Transformer for
image dehazing and propose DehazeFormer, which comprises
changes to the normalisation layer, activation function, and
spatial information aggregation approach. Many iterations of
DehazeFormer were trained on different datasets to demonstrate
its efficiency. The large model outperformed all previous stateof-
the-art approaches on the SOTS indoor set, whereas the small
model outperformed FFA-Net with a substantially lesser number
of parameters and computational cost. The approach’s efficiency
on severely non-homogeneous haze was further evaluated using
a large realistic remote sensing dehazing dataset acquired by the
researchers. |
VLR23-P04 |
IIT2020018 BOTTE SHREYA, IIT2020040 KATAM BALA PRASANNA BABU, IIT2020199 VELPULA VAMSHI, IIT2020217 VELAGANA NAGENDRA, IIT2020255 DONTHOJU RAGHAVA |
Cross Day-Night Image Classification |
Image classification under cross day and night
scenarios is a challenging problem in computer vision. The
challenge of training a model on daytime photos from six distinct
classes and assessing its performance on nighttime images from
the same classes is covered in detail in this article. In addition to
reviewing pertinent literature, describing the dataset, outlining
the approach, and presenting experimental findings, we also
explore the problems that this endeavour presents. The purpose
of the study is to clarify any potential obstacles and solutions to
this issue. |
VLR23-P05 |
IIT2020173 ANISH JAIN, IIT2020181 JINIYA SINGAL, IIT2020182 DABERAO AKSHAY GAJANAN, IIT2020185 PATEL SAURABH, IIT2020188 SOLANKI TANMAY MOHANBHAI |
Hand Gesture Recognition using Deep CNN |
Hand gesture recognition is a critical component
that offers a natural and intuitive means of communication with
machines. In this paper, we are presenting a novel approach to
automated hand gesture recognition utilizing a deep convolutional
neural network model. This model is designed to address the
challenges like variations in hand poses, complex backgrounds,
and lighting conditions, so that it can works on real-world
applications. |
VLR23-P06 |
IIT2020031 RAUNAK KRISHAN JAISWAL, IIT2020033 ADITYA BISWAKARMA, IIT2020055 SAURABH KUMAR, IIT2020106 NEEL PATEL, IIT2020243 AKULA ABHIRAM |
Facial Micro-Expression Recognition using Deep Learning Techniques |
In computer vision, micro-expression (ME) detection
refers to the process of detecting micro facial expressions in still
and moving images and videos. Our work presents a novel CNNbased
method based on ME datasets to measure real emotions.
MEs are very brief and unconscious to notice even by humans
and reverals hidden emotions of inner mind thereby presents
a challenging and potential are of research.Our model aims at
improving accuracy by overcoming the limitations in the present
method. As final-year BTech students in India, our research
contributes to understanding hidden emotions, paving the way
for future investigations across various applications. |
VLR23-P07 |
IIT2020005 PUSHKAL MADAAN, IIT2020006 RITEJ DHAMALA, IIT2020008 AVISHKAR SINGH, IIT2020077 ANUSHKA AJIT DANDAWATE, IIT2020252 KAVITA |
Self-Supervised Image Retrieval |
In this paper we purpose a self-supervised image retrieval system to effectively and efficiently be able to work on large, unlabeled datasets, specifically on the satellite image dataset called UC Merced dataset. This system leverages a pre-trained ResNet50 model which helps our main Siamese network efficiently learn from the semantic similarities of our unlabeled data. In addition, we have used KNN to retrieve most similar images to a queried test image and measured its top-5 accuracy. The study also looks into how deep neural architectures might improve self-supervised retrieval systems. Design decisions that might impact the effectiveness of self-supervised models are examined, including architectural options, model complexity, and transferability between datasets. |
VLR23-P08 |
MML2022002 HARSH, MML2022011 UMESH MAURYA |
Tiny Face Detection |
This project proposes an innovative approach for
the detection of small faces in photographs by leveraging the
power of Generative Adversarial Networks (GANs). Recognizing
small and diminutive faces in real-world images has proven to
be a challenging task using existing face identification methods.
To address this challenge, a two-stage methodology is introduced,
where a GAN is initially employed to generate high-resolution
images of small faces. The GAN is trained on an extensive dataset
of facial photos and learns to generate high-quality images of
small faces while considering both the input image and the
desired face size.
These artificially generated small face images serve as a
valuable resource for augmenting the training data of a face
detection model. This face detector is trained separately using a
distinct set of images. By incorporating the generated images into
the training dataset, the face detection model becomes more adept
at accurately identifying small faces in photographs, enhancing
its overall performance. |
VLR23-P09 |
IIB2020030 MANISH KUMAR, IIT2020021 HARSHITA VYAS, IIT2020037 SAKSHI, IIT2020095 AMBIKESH ARMAN, IIT2020134 SHAH KRISHNA DINESHKUMAR |
Image Denoising using Image-to-Image Translation |
Image denoising, a crucial feature in today’s visual
technology, involves the elimination of unwanted noise
from images. Despite modern cameras capturing high-resolution
pictures, the challenge persists in obtaining noise-free images.
This necessitates pre-processing or post-processing techniques to
diminish noise without compromising image quality. Our approach
involves leveraging ”Autoencoder” technology for image
denoising. Autoencoders possess the unique ability to self-learn
from provided data, generating a model based on data rather
than predefined filters. Additionally, they ensure consistency by
delivering the same output as input, affirming image quality
preservation. Though computational time remains a potential
concern, the benefits of employing Autoencoders in denoising
applications, from medical systems to smartphone image enhancement,
are substantial. This paper explores the utilization
of Autoencoders, a deep learning technique employing downsampling
and up-sampling methods, as a solution to the denoising
problem. Keywords: image, noise, autoencoder, denoise. |
VLR23-P10 |
IIB2020036 MIRIYALA POOJITHA, IIT2020144 PRANAV RAJ, IIT2020151 SHIVAM KATIYAR, IIT2020163 SARTHAK DALMIA, IIT2020205 ADITYA RAJ |
Drowsiness Detection using Faces |
Drowsiness has become the focus of
researchers' attention in recent years as it is the cause
of many traffic accidents. It can also help determine
the need for rest. Drivers experience fatigue when
they drive for long periods of time, which affects their
driving ability and can lead to death and injuries in car
accidents. Fatigue can be caused by long driving,
discomfort, headache, alcohol and drugs, etc. it could
be. This research will play an important role in the
lives of drivers and could save their lives. In this article,
I introduce an Android application that can detect
sleep, activity and blink count. The app will sound an
alarm when the driver falls asleep and could save the
driver's life. The app will provide five data: state
percentage, sleep, blink count, yawn count, and
number of frames captured by the app's camera
photo. It retrieves login information, then provides it
and sounds a warning if the driver is drowsy. |
VLR23-P11 |
IIT2020227 MOHD WASIF, IIT2020242 MOHD SARFARAZ, IIT2020247 SANJAY RAM, IIT2020254 CHAUDHARI YOGIRAJ PRAKASH, IIT2020259 ANKIT KUMAR |
Photo ID Retrieval from Arbitrary Face Query |
The ”Photo ID Retrieval from Arbitrary Face Query” project aims to develop a sophisticated face recognition system capable
of identifying individuals based on their facial features. The project uses the CNN model for feature extraction and evaluates the
system’s performance on the LFW (Labeled Faces in theWild) dataset, a widely recognized benchmark for face recognition. This
report provides a comprehensive overview of the system, including data collection, preprocessing, feature extraction, database
creation, face query process, visualization, experimental results, and a thorough discussion. |
VLR23-P12 |
IIB2020016 ANURAG HARSH, IIB2020018 ABHISHEK KUMAR, IIB2020024 VAIDIK SHARMA, IIB2020027 AMAN UTKARSH, IIT2020140 AYUSHI |
Image Caption Generation |
Deep learning model for Image Caption Generation.
This model takes images as input and then
generates captions for those images. We have
made use of transfer learning, word embeddings
and custom data generators for building this
model. We have evaluated the relevancy of
generated captions using BLEU score and
computed both the individual and cumulative
BLEU scores. We have used python,keras and
tensorflow for the development of this model. |
VLR23-P13 |
IIT2020052 SANJEET, IIT2020053 SAMEER AHMED, IIT2020082 HARSH GARG, IIT2020218 JITU RAJAK, IIT2020244 RAHUL |
Selfie vs Non-selfie Classification |
In the era of ubiquitous smartphone usage and
social media platforms, the line between personal and nonpersonal
images has blurred significantly. Selfies, which are selfportraits
typically taken with a smartphone camera, have become
a ubiquitous form of self-expression. However, distinguishing
between selfie and non-selfie images automatically presents an
interesting and challenging problem in computer vision. This
semester project targets to address the problem by developing a
robust image classification system that can accurately differentiate
between selfie and non-selfie images. |
VLR23-P14 |
MML2022009 MANISH KUMAR, MML2022013 BHAVESH KUMAR BOHARA, MML2022014 KAVATHIYA KHYATI HARESHBHAI |
Image Deraining |
It can be challenging to remove rain streaks from
a single photograph because rainy photos usually include rain.
streaks of different densities, sizes, shapes, and directions. Most
Current methods for deraining use a deep network that adheres
to a broad model. Low-level characteristics are recorded by a
”encoder-decoder” design. above the first layers, with elevated
characteristics at a deeper level. The must-be rain streaks be
taken out for the purpose of deraining are quite small, thus
Stressing global aspects is not always an effective strategy. to
resolve the issue. Therefore, in this essay, We suggest utilizing a
convolutional network architecture that is excessively complex
emphasizes understanding local structures. By restricting the
filters’ receptive field.In order to compute the derained image, we
combine it with U-Net to ensure that it concentrates more on lowlevel
features and does not overlook global structures. The overand-
under complete deraining network (OUCD) is a suggested
network that is split into two branches: an undercomplete branch
that focuses on global structures and an overcomplete branch that
focuses on local structures with bigger receptive fields. Numerous
experiments on artificial and real-world datasets demonstrate
that the proposed strategy performs better than the most recent
state-of-the-art techniques. |
VLR23-P15 |
IIT2020158 S ANURAG REDDY, IIT2020164 SAVALA DEEPIKA, IIT2020213 ANKADALA JEEVAN, IIT2020250 PULUKURI JAGADEESH, IIT2020266 NENAVATH ABHIRAM NAIK |
Homography Matrix Computation between Images using Deep Learning |
We introduce a deep convolutional neural network
designed to approximate the relative homography of two images.Ten
layers make up our feed-forward network, which requires two stacked
with grayscale pictures as input, it generates eight degrees of freedom
homography that enables the mapping of the pixels from the first to the
second picture. We offer two convolutional neural
networks.HomographyNet's network architectures: a regression
network.It computes the real-valued homography parameters directly
as well as a classification network that generates a distribution across
homographies quantized.We employ a 4-point homography
parameterization, which involves projecting the four corners of one
image onto the other. Our networks are trained with distorted
MS-COCO pictures in an end-to-end manner. Our method functions
without requiring distinct phases for transformation estimates and
local feature detection. We compare our deep models with a
conventional homography estimator based on ORB features, and we
show the situations in which HomographyNet performs better than the
conventional method. We further highlight the versatility of a deep
learning method by describing a range of applications driven by deep
homography estimation. |
VLR23-P16 |
IIT2020044 PRIYA DEVI, IIT2020060 PERISETLA SRI SATWIK, IIT2020065 DASA AKSHITHA, IIT2020196 KALYANI BHUSHAN PHARKANDEKAR, IIT2020208 MARPINA SRUJANA |
Face Recognition from Partial Faces |
Partially hidden or missing facial traits can be
used to identify people using partial face recognition, a crucial
branch of facial recognition technology. Due to its potential
uses in security, surveillance, and human-computer interaction,
this specialized topic has grown in popularity. People frequently
exhibit their faces under a variety of circumstances, such as
partial occlusion by masks, accessories, or poor lighting, hence
the capacity to identify and confirm persons using constrained
facial information is crucial in practice. This article examines
the difficulties, approaches, and developments in partial face
recognition, providing information on how it is changing in the
context of biometric identification and surveillance systems. |
VLR23-P17 |
MHC2022001 AMIT ROY, MHC2022011 BHARGAV BURMAN, MHC2022013 HARSHIT GUPTA, MML2022003 DIPANKAR KARMAKAR |
Plant Disease Classification |
The global population growth has resulted in a
scarcity of essential resources such as raw materials and food
supply. The agricultural industry has emerged as the primary and
key source for addressing this specific issue. However, the agricultural
industry as a whole faces significant challenges due to the
presence of pests and various crop diseases.I would like to kindly
request that you rewrite my previous text in a more academic
manner. Plant diseases provide a significant challenge to global
agriculture, leading to substantial crop losses. The complexity and
difficulty of recognising illnesses in plants can be attributed to a
dearth of specialised expertise. The utilisation of deep learningbased
models enables the diagnosis of plant diseases through
the analysis of leaf photographs.The primary challenges that
remain to be addressed in these algorithms include the necessity
for larger training sets, considerations related to computational
complexity, the issue of overfitting, and other associated difficulties.
This study centres on a novel machine learning model,
derived from conventional neural networks (CNN), with the aim
of enhancing its effectiveness. Additionally, it provides a concise
summary of the existing published solutions in this domain.In
order to increase the size of the training set without the need for
additional photographs, several augmentation techniques such
as shift, shear, scaling, zooming, and flipping are employed.
These approaches serve to produce additional samples, hence
enlarging the training set. The Convolutional Neural Network
(CNN) model has been trained on a publically accessible dataset
called PlantVillage. The purpose of this training is to accurately
detect and classify the presence of Early Blight and Late Blight
illnesses in potato leaves. |
VLR23-P18 |
MHC2022005 DASAROJU JAGANNADHACHARI, MML2022005 PRAGATI, MML2022007 SAYANTAN CHAKRABORTY, MRM2022006 BEHERA JYOTHIKRISHNA |
Image Inpainting Using GAN |
Image Inpainting is an important topic of research
in the field of image processing. The prime goal of image
inpainting is to recover missing details in an image, demosaicing
the image etc. In this paper we discuss progress of the image
inpainting project using deep learning models, namely Generative
Adversarial Network. The project has been implemented in the
environment of google colaboratory. |
VLR23-P19 |
IIT2020025 MANPREET SINGH, IIT2020032 KARTIK GUPTA, IIT2020219 Tanu Shree Suthar, IIT2020221 TUSHAR AGGARWAL |
Visual Grounding using CNNs |
Computer visions algorithms are used to predict a
limited number of object types, limiting their generality and
applicability. Learning directly from raw text about images is a
potential option that takes advantage of a larger supply of
supervision. On a large dataset of ~million image and text pairs
acquired from the internet, a simple pre-training job of
predicting which caption goes with which image is an efficient
and scalable technique to train on State of art image
representations from scratch. Following pre-training, natural
language is used to refer to previously learned visual concepts or
explain new ones, allowing for zero-shot model transfer to
downstream tasks. The model easily applies to most tasks and is
frequently competitive with a fully supervised baseline without
requiring dataset-specific training. |
VLR23-P20 |
IIT2020009 AASHISH AGRAWAL, IIT2020010 RAJ CHHARI, IIT2020183 LOKESH MEHTA, IIT2020209 AADITYA RATHOD, IIT2020505 AKSHAT GHARIYA |
Viewpoint Invariant Scene Recognition of IIITA Campus using Deep Learning |
In this paper, we use deep learning approaches
to handle the challenge of perspective invariant scene detection
on the campus of Indian Institute of Information Technology
Allahabad (IIITA). The principal aim is to create a clever
computer system that can identify different sights and landmarks
on the IIITA campus, all the while maintaining resilience to
changes in perspective. We investigate the creation and use of a
Convolutional Neural Network (CNN) model specifically designed
for picture classification. Our main goal is to use this model to
categorize photos from the ”Campus Images Dataset,” which is
a set of ten different categories. The campus is covered by these
categories, which include admin, adminback, audi, cafeteria, cc2,
cc3, cc3back, library, mandir, and rm. |
VLR23-P21 |
IIB2020021 GAGAN BANSAL, MML2022006 MOHD FAIZ ANSARI, MML2022008 RAKSHIT SANDILYA, MML2022010 NIKHIL RAJPUT, MML2022012 HIMANSHU MITTAL |
Thermal to Visible Image Translation |
This review report details the implementation and
assessment of a pix2pix GAN for thermal to visual-picture
translation. Deep learning algorithms will be used in the project
to produce high-quality visual images from thermal photos. An
overview of the methodology, data collection, and pre-processing
procedures is provided in the report. The SSIM and PSNR
measurements were used to assess the model. The outcomes
demonstrate that the suggested method is successful in converting
thermal images into realistic and excellent visual images. This
method may be used in a variety of industries, including
surveillance, search and rescue, and medical imaging. Overall, the
article shows how well pix2pix GAN works for picture translation
tasks and offers suggestions for further study in this field. |
VLR23-P22 |
IIT2020154 SHIVEK PAMNANI, IIT2020160 ANUSHKA ARUN KALWALE, IIT2020179 KARUS MANISHA, IIT2020189 ROUNAK DEV, IIT2020190 MALYALA MEGHAMSH |
Clothing Outfit Rating using CNNs |
The fashion industry is changing, and it’s all because
of the internet. The way we shop and see what everyone else
is wearing has shifted to online platforms. Now, we need a
system that tells us how good our outfit is. With this digital
transformation came a need for automated clothing outfit rating
systems. The one presented in this paper uses Convolutional
Neural Networks (CNNs). Using deep learning and computer
vision techniques, our system analyzes and evaluates clothing
outfits based on many visual features. Some of those include color
combinations, clothing styles, and overall aesthetics. To make it
work, we took a pre-trained CNN architecture and fine-tuned it
with a large dataset of labeled clothing outfits. The methodology
works like this: the system first takes each piece of clothing in an
outfit to extract feature representations from them. Then they’re
combined to give the entire outfit an overall rating. We tested the
performance using countless outfits with different styles, colors,
and more. The results showed that our approach was highly
effective at providing accurate ratings that have meaning behind
them. |
VLR23-P23 |
MRM2022002 AKASH TYAGI, MRM2022003 ANKIT RAJ RAVI, MRM2022004 ADITYA, MRM2022005 HIMANSHU MISHRA |
Impact of Different Activation Functions on ViT Model |
The advent of Visual Transformer (ViT) models
has heralded a novel approach in handling image data, veering
from traditional Convolutional Neural Networks (CNNs) towards
leveraging transformer architectures. A significant factor influencing
the ViT model’s performance and training dynamics is the
choice of activation functions, which induce the requisite nonlinearity
for complex pattern recognition. This study embarks
on an in-depth examination of various activation functions to
discern their impact on the ViT model’s effectiveness, training
dynamics, and computational efficiency across multiple datasets.
The aim is to furnish a nuanced understanding of how activation
functions affect the learning, generalization, and robustness of
ViT models, and provide empirical guidelines for their optimal
selection in different computer vision applications. Our findings
elucidate the critical role of activation functions, offering valuable
insights for the enhanced tuning and optimization of ViT models
in computer vision tasks. |
VLR23-P24 |
IIT2020007 SHUBHAM KUMAR BHOKTA, IIT2020022 RAHUL MAHTO, IIT2020024 SHASHIKANT THAKUR, IIT2020043 ROHIT CHOWDHURY, IIT2020220 MOHIT KUMAR |
Identification of Artificially Generated Images |
In our proposed approach, we use Deep Convolutional
Neural Networks (DNNs), in particular the ResNet
structure, to distinguish between real and fake images. We
concentrate on using unique patterns, and features found in the
pixel level and structural properties of the generated images. |
VLR23-P25 |
IIT2020067 ADITYA SINGH, IIT2020070 EKAGRA SINHA, IIT2020089 DEVESH KUMAR PARTE, IIT2020101 LUKESH NITIN PATIL, IIT2020105 JAMBHULE SAHAS DEVIDAS |
Student Counting in Classroom |
In this paper we propose a model to count the
number of students in a classroom like
environment to estimate the count. This would
be helpful to estimate the crowd density and
may be helpful in making some decisions or
predictions. |
VLR23-P26 |
RSI2022502 AJAY KUMAR YADAV |
Analysis of Robustness in Deep Learning Models |
|
VLR23-P27 |
RSI2023001 AKASH VERMA |
Efficient ViT Models for Small-scale Datasets |
|