Projects

A mix of computer-vision, deep-learning, and classical machine-learning work — spanning research, course projects, and side experiments.

Interpretability in Image Classification Techniques

How do computer vision algorithms work internally? How can we interpret their inner workings? These are some of the questions which many researchers are trying to answer lately. While, there are various algorithms like ResNets, GoogleNet, VGGNet, DenseNets, etc and many more CNN architectures that exist, we often don’t know how do they work internally and what kind of intuition do these models build in the intermediate layers before classifying a particular image. This work is an independent study I pursued under the guidance of Professor Nik Brown, where I tried to interpret these models.

Gender and Age Prediction from Face Images

Gender and Age Prediction is a fairly challenging task. Determining whether a person is a Male or a Female is a simple binary classification task. However, determining the age of a person can be pretty challenging. It is difficult to get an estimate of a person’s age just by their looks. Humans are surprisingly deceiving and this project is aimed to tackle this challenge of predicting human age and gender from face images.

Text-to-Image Metamorphosis using AttnGANs

Text-to-Image Metamorphosis is the translation of a text to an Image. Essentially, it is inverse of Image Captioning. In Image Captioning, given an image, we develop a model to generate a caption for it based on the underlying scene. Text-to-Image Metamorphosis generates an image from a corresponding text by understanding the language semantics. In this project we have developed images of Birds given a caption describing the properties of the bird.

Image Captioning using COCO Dataset

What does a particular scene in an image mean? Given an image it is easy for humans to describe the scene and various elements in it, but it is a fairly challenging problem for a computer to do that and yet generalize it. This project is an effort to explore image captioning and understand how images are captioned with descriptive texts by training a CNN-LSTM hybrid network.

Mouse Brain MRI Image Classification with Data Parallelism

Mouse Brain MRI scans are used in medical research to understand brain activity as a result of testing a new drug before public release. The brain scans are done after slicing the mouse brain along 3 planes - horizontal, sagittal and coronal. Given a set of mouse brain MRI scans after cut along the 3 planes, we want to identify the class of an MRI scan given an image. This project acquires high resolution (5000x5000) histopathology images chunked into smaller bits across the servers and auto-stitches them to form an image with a specified magnification level. These images are then scaled to 227x227 to be further used for inference using pretrained ResNet-50, Inception-ResNet and ResNext-50 models and the models are trained using transfer learning with IMAGENET weights as initializers.

American Sign Language (ASL) detection using CNNs

We are privileged as humans to talk and converse and communicate with each other. But what about those people who can’t talk or speak? Such people find it hard to convey their ideas. This project is an effort to develop a system to help people who cannot speak better to communicate with individuals. We identify all the 26 English alphabets along with 3 special characters - space, delete, and blanks. The goal is to generate a real-time system to translate signs on the fly.

Text Generation using Long-Short Term Networks

Traditional Neural Networks or deep learning models, in general, cannot store information over successive layers (i.e these architectures lack memory component over time). However, LSTMs can remember relationships between instances and we can exploit this property to generate new sequences from a model trained on large sequence-based data. In this project given a large textual data of essays written by Paul Graham (a computer scientist), we generate new sequences by understanding the underlying semantics using LSTMs

Avazu Click-Through Rate (CTR) Prediction

In online advertising, click-through rate (CTR) is a very important metric for evaluating ad performance. As a result, click prediction systems are essential and widely used for sponsored search and real-time bidding. This project is an effort to predict CTR given data from Avazu. First EDA is performed on the data, followed by preprocessing pipeline of - standardizing features, automatic outlier detection, Variance-Inflation factor (VIF) analysis, automatic feature selection and balancing imbalanced data usign SMOTE analysis. This is followed by developing models using XGBoost, Multi -Layer Perceptrons and Logistic Regression. Finally a 95% confidence interval is developed for predicting how likely is it for an ad to be clicked.

King County Housing Price Prediction

Real estate is a booming business and using AI we can better understand the housing price trend and get an estimate of a house given various parameters like # of bedrooms, geographic location, sq ft area, etc. This project models the housing price data of King County, USA using Generalized Linear Models (GLMs), Gradient Boosting Machines (GBMs) and Random Forests (RF) with L2 Regularization (Ridge Regression). The data was subjected to normality tests using Q-Q plots and the homocedastic nature was observed and maintained. K-Best features were selected using ANOVA and Variance Thresholding methods and the final models were evaluated using R-Squared and Adjusted R-Squared metrics.

Amazon Product Reviews Sentiment Prediction

Understanding user sentiment for a product is one of the key metrics in the E-commerce space to judge the success of a product and drive future decisions related to it. In this project, we used Selenium to scrape ~5000 reviews of Amazon Kindle Paperwhite product. For every review given by user there was a rating given too. The textual reviews were analyzed and encoded using 4 techniques - Word Level TF-IDF (Term Frequency Inverse Document Frequency), Character Level TF-IDF, N-Gram Level TF-IDF and Count Vectorizers and this transformed data was modeled using Naive Bayes for predicting user rating on scale the of 0 (Bad) - 5 (Great) given a textual review of the product

Dysphonia Detection Using Gaussian Mixture Models

Dysphonia is a voice disorder which usually occurs in people who speak or use their vocals frequently - singers, teachers, etc. This causes the Lrynx (organ forming an air passage to the lungs, the voice box) to introduce hoarseness in the voice. Subjective methods exist and often rely on human sense of hearing. This work is an effort to create an objective solution to the problem of Dysphonia Detection. The speech signals are preprocessed and we achieve approx 25 speech features which include but are not limited to Mel Frequency Cepstral Co-efficients (MFCC), Jitter and Shimmer, Short-Time Energy, etc. This low dimensional feature space is projected to a higher dimension using I-Vectors (a processing technique) which uses Universal Background Model (UBM) based on Gaussian Mixture Model (GMM). This high-dimensional feature space is then used as a speaker recognition prior, following which the transformed speech signals are classified using Support-Vector Machines (SVMs), K-Nearest Neighbors (KNN) and Naive Bayes.