Skip to content

title image

Introduction

In today's fast-paced world, road safety is a top concern for both drivers and pedestrians alike. Shocking statistics reveal that nine people lose their lives every day in the United States due to crashes involving distracted drivers. But what exactly constitutes distracted driving? According to the CDC, it encompasses three main types of distractions: visual (Keeping your eyes off the road), manual (taking your hands off the wheel), and cognitive (taking your mind off driving). These distractions can range from using mobile phones for texting or calling to eating, drinking, personal grooming, or even adjusting the radio controls in the car. In this blog post, we'll delve into how computer vision techniques aid our understanding of driver behavior to tackle this critical issue and enhance road safety.

Objective

Recognizing and addressing these distractions in real-time can play a vital role in preventing accidents and saving lives. The goal of this project is to develop a system capable of accurately identifying various driver activities through computer vision techniques. The primary objective is to design and implement a deep learning model that can classify different activities performed by drivers inside vehicles. Additionally, the system aims to annotate and detect objects like bottles, cups/cans, cell phones etc., within the vehicle cabin using YOLO models, that helps provide additional context to driver activities and enhance distraction detection.

About the dataset

The dataset sourced from Kaggle is provided by State Farm comprising about 22424 images captured inside vehicles, each describing a driver engaged in various activities. These include 10 categories of activities such as safe driving, texting, talking on the phone, operating the radio, drinking, reaching behind, engaging in hair and makeup tasks, and conversing with passengers. To discourage hand labeling, the test dataset containing about 79726 unlabeled images (lot more than train images) includes processed images that are resized and ignored during evaluation. The challenging part to this dataset , is the vast number of images to test the classification model with accuracy. Additionally, the task involving detection of objects within the vehicle cabin introduces additional complexity as the model needs to localize and classify multiple objects simultaneously.

To get a better understanding of the input image data, here is a visual representation of the 10 classes of images with their labels:

class images

This bar chart shows the distribution of images in each categories:

bar chart

Implementation and Approach

Classification of driver activity:

The first part includes implementing a custom CNN model using the Keras framework to classify the different activities performed by drivers. This model consists of multiple convolutional and pooling layers followed by fully connected layers for classification. In this case, the Adam optimizer was used for training the model. To prevent overfitting, a dropout layer was included and early stopping to monitor the loss metrics during training. Although pre-trained models are often convenient and may offer superior performance, I chose to design a custom CNN architecture to better understand its effectiveness for the task at hand. The performance metrics of the model shown in Table 1 below suggest that the model performs effectively.

Table 1 - Performance scores of the classification mode

To visually track the performance improvement over epochs, the graph below illustrates the accuracy and loss curves.

accuracy and loss plots

By iteratively processing image metadata, class indices are extracted to retrieve actual labels, and labels are predicted using the CNN model. Predicted and actual labels are compared to assess accuracy, with the results stored in a DataFrame. The evaluation of the custom CNN model on training data revealed 22,215 correct predictions and 209 incorrect predictions, with insights into assessing the model's ability to identify different driver activities.

Object Detection in the vehicle

YOLO, is a new state of the art computer vision model, developed by Ultralytics that support object detection classification and segmentation tasks. The first step involves loading the pre-trained YOLOv5 model, which has been trained on the COCO dataset, for detecting various objects in complex scenes. This model was used for detecting specific objects within the vehicle cabin. Post-detection, I annotated the original images with bounding boxes and labels to signify the detected objects and their confidence scores. While the model successfully identified and annotated many instances of these objects, challenges arose in localizing cell phones or cups held by drivers in certain cases.

Object detection on user-provided images were logged specifically taking the counts of each detected object type across multiple images and presenting the results in a structured DataFrame. Through the objectDetectionModel function, the pre-trained model was utilized to annotate the desired object classes such as cups, cell phones, and bottles with class labels and confidence scores. This approach allowed for a comprehensive summary of the detected objects and their frequencies, for a deeper understanding of the model's performance in real-world scenarios.

Here is an illustration of some examples where the model detected the specific objects we wanted and annotated them really well.

Taking a closer look at some of the test images, there were instances observed where the YOLOv5 model encountered difficulties detecting specific objects, such as cups, cans, or cell phones. Some images lacked clarity, with objects partially obscured by hands or other elements, leading to inaccurate detections. Additionally, smaller objects often went unnoticed by the model. As a result, there were several cases where the model's performance fell short of expectations. These examples highlight the challenges inherent in object detection tasks, with the need for ongoing refinement and improvement.

Challenges and Future scope

One drawback about this dataset is that it only contains images taken in the daytime, which could give biased results on classifying the activities and the model performance becomes questionable on other unseen images. The Object Detection model also posed challenges in accurately localizing objects obscured by hands or positioned at different angles. Augmenting the dataset with annotated instances of challenging scenarios, such as partially obscured objects, would help the model generalize better to real-world conditions.

Looking at future enhancements, the YOLOV5 model can be fine-tuned to a custom dataset specifically tailored to vehicle interior environments to yield substantial improvements in object detection accuracy. Furthermore, integrating real-time object detection capabilities into onboard vehicle systems or mobile applications could provide immediate feedback to drivers regarding potential distractions, thereby enhancing overall safety on the roads.

Conclusion

This project involved development of custom CNN models and exploring the YOLOV5 object detection, providing an in-depth examination of the dynamics of Driver behaviors analysis. The insights gained from this analysis would help in reducing the occurrence of accidents caused by distracted driving. While there are challenges to overcome, such as dataset limitations and model generalization, the project's future scope holds promise for fine-tuning models and integrating them into practical applications for immediate feedback to drivers.

Source code URL: https://github.com/Aparna003/CSCI_6527-Driver-Distraction-Detection

References :

Introduction:

Driving distractions such as texting, eating, or using a phone, contribute largely to road accidents worldwide. Identifying and addressing these distractions in real time is crucial for enhancing road safety and reducing the risk of accidents. By accurately identifying driver activities in real time, this project aims to improve road safety by alerting drivers or triggering safety mechanisms when potential distractions are detected.

Objective:

Given a dataset of images captured inside vehicles, the system must classify activities such as texting, eating, talking on the phone, applying makeup, or reaching behind the driver's seat. The primary objective is to design and implement a deep learning model using computer vision techniques and analyze these images. Additionally, the idea is to annotate and detect objects within the vehicle cabin, such as smartphones, food items, or makeup accessories. This task can provide additional context to driver activities and help identify potential distractions more accurately.

Dataset Description:

The dataset comprises about 22424 images captured inside vehicles, each describing a driver engaged in various activities. These activities include safe driving, texting, talking on the phone, operating the radio, drinking, reaching behind, hair and makeup tasks, and conversing with a passenger. To preserve the integrity of the computer vision problem, metadata such as creation dates has been removed, and the dataset has been split based on drivers. Additionally, to discourage hand labeling, the test dataset includes processed images that are resized and ignored during evaluation.

Kaggle competition URL: https://www.kaggle.com/competitions/state-farm-distracted-driver-detection/overview

In the realm of agriculture, the health and productivity of crops have become a major concern for farmers worldwide. Paddy, or rice, stands as one of the most essential staple crops, feeding billions of people globally. However, the cultivation of paddy comes with inherent challenges, as the crops are susceptible to various diseases and pests that can significantly impact yield. The primary goal of this field, focusing on the classification of paddy diseases, is to utilize advanced deep learning models to precisely categorize paddy leaf images.

This dataset contains 10407 labeled paddy leaf images across ten classes (nine diseases and normal leaf) along with additional metadata for each image, such as the paddy variety and age. The test images contain about 3469 paddy leaf images randomly shuffled for prediction purposes.

The first bar plot indicates that the majority of the dataset comprises samples from the rice variety ADT45, suggesting its prominence or perhaps higher availability in the dataset, while the rice variety Surya and RR appears to have the least representation in the dataset, indicating a lower prevalence compared to other varieties. The second plot provides an overview of the number of diseased paddy images across nine distinct disease categories, along with a category for normal images.

image description

The following image provides a glimpse of the different types of diseased paddy categories, namely - Hispa, Tungro, bacterial leaf blight, downy mildew, blast, bacterial leaf streak, brown spot, dead heart, bacterial panicle blight and normal paddy.

image preview

Next, the preprocessing function is utilized to normalize the pixel values, that helps in stabilizing the training process and improving convergence. The function casts the image data type to float and by dividing pixel values by 255, it scales the pixel values to the range [0, 1], making the optimization process more efficient.

To model is trained by integrating DenseNet121 architecture as the base layer, utilizing pre-trained weights from the ImageNet dataset. This model was chosen because it is widely used for various computer vision tasks, including image classification, object detection, and segmentation. Leveraging transfer learning, the model incorporates DenseNet121 for feature extraction, followed by the addition of Dense layers for fine-tuning and classification.

To further prevent overfitting and improve generalization performance, early stopping is used as a callback during model training to halt the training process when a monitored metric stops improving. In this instance, the validation loss is monitored, and the weights of the model that produced the best performance on the validation set are restored. The compiled model is then fit with 80% of images (8326) for training and remaining 20% (2081 images) for validation.

A visualization depicting the validation and training accuracies plotted against epochs was plotted as shown below.

accuracy plots

This plot implies that as the model trains over successive epochs, both the validation and training accuracies become more aligned. Notably, the model achieves a validation accuracy of 93% with a loss of 24.62%.

Lastly, the trained model is deployed to predict labels for the paddy test images. For improved understanding of the model's outputs, the subsequent image displays a set of first 10 images from the test set for which disease categories have been predicted.

prediction labels

Conclusion

This project aims to develop a deep learning-based solution for the automatic classification of diseases in paddy plants using computer vision techniques. Leveraging transfer learning with the DenseNet121 architecture pretrained on the ImageNet dataset, the model is trained to accurately identify various disease categories affecting paddy crops. By automating disease detection in paddy, this project seeks to empower farmers with timely insights to mitigate crop losses and ensure food security.

Introduction

Airports stand as bustling hub, with arrivals and departure of planes across different destination. The dynamic nature of air traffic demands a comprehensive understanding of patterns, operational efficiency, and various other factors influencing airport dynamics. The increasing complexity and volume of air traffic necessitate innovative solutions to enhance the precision and agility of air traffic control systems. In air traffic control, computer vision aids in monitoring and managing the movement of aircraft. For this project, I employed object detection using the YOLO (You Only Look Once) algorithm to monitor and analyze the aircraft movement at airports.

Methodology

YOLOV8, is a new state of the art computer vision model, developed by Ultralytics that support object detection classification and segmentation tasks. It is able to achieve a strong accuracy on COCO (Common Objects Context) dataset. The initial step includes loading the pre-trained YOLOV8 model. The pre-trained model, trained on the COCO dataset, serves as a robust foundation for detecting various objects in complex scenes.

In this project, my use-case works for any video time-lapse as input, which is processed to extract individual frames, converting the temporal sequence into a collection of static images and applying object detection to identify and locate flights within each frame.

The YOLOV8 model, trained on the COCO dataset, proves effective in detecting airplanes in diverse scenarios. The source of the videoclip doesn’t provide any timestamp, hence, the results are mapped to image labels for analysis.

The model predicts the presence and location of airplanes within each image. I compiled these results of each object detection iteration, including the count of detected flights and detailed information for each detection, into a structured dataframe for analysis.

Results and analysis

Analyzing a sample video input from Kuala Lumpur airport in a day, 24 frames of images were generated. The resulting bar plot visualizes the frequency of detected flights in each image, providing a temporal understanding of air traffic patterns.

Bar plot

In the daytime frames, there is a higher concentration of flights, indicated by peaks in the bar plot. As the timeline progresses towards the night, the frequency slightly decreases, suggesting a potential correlation between air traffic and time of day. The declining trend in flight frequency during nighttime frames aligns with expectations of reduced air activity during these hours.

The histogram plot of detection probabilities shows insights into the distribution of confidence levels across all detected flights.

The concentration of detections in lower confidence intervals suggests that the YOLOV8 model may face challenges in accurately interpreting certain objects in the airport environment. This could be attributed to challenging scenarios such as poor lighting, occlusions, or complex backgrounds.

Here is an example output of an image frame annotated with bounding boxes highlighting the detected objects.

Analysis of Vision Algorithms

While the model performed well in well-lit scenarios, its effectiveness decreased during nighttime or low-light conditions. This is a common challenge for computer vision algorithms, and YOLOV8 is no exception. Moreover, the quality of input images significantly influenced the accuracy of object detection as observed in the displayed result where some planes on the left remained unidentified. Despite occasional limitations, this system was able to detect airplanes in clear conditions, providing useful insights for the analysis.

Conclusion

The implementation of YOLOV8 for airplane detection offers a powerful tool for understanding air traffic dynamics. The model showcases strong performance in well-lit conditions, providing valuable insights into daytime operations. While there is room for improvement, the project underscores the potential of YOLOV8 in air traffic analysis using Object Detection for various aspects of airport operations, safety, and efficiency.

Github link : https://github.com/Aparna003/Object_detection_system

computer vision

In the evolving landscape of image classification, understanding how classifiers respond to image corruptions and potential adversarial attacks is paramount. Adversarial attacks seek to trick classifiers into misclassifying images through carefully crafted perturbations. These intentional distortions are subtle to the human eye but wield the potential to dismantle the accuracy of even the most robust models. This blog post delves into an exploration of image corruption and its impact on the classification results.

Using blur as a corruption method provides an interesting perspective on the impact of image degradation on classification accuracy. Why blur? Blur, in its various forms, mimics real-world distortions encountered in images due to factors like motion, environmental conditions, or lens imperfections. By progressively increasing the blur levels in a controlled manner, we sought to reveal the classifier's response to images that are gradually losing their clarity.

Now, for the classifier at hand, I used a pre-trained Inception-V3 model trained on a dataset featuring six distinct categories of nature images: Buildings, Streets, Forests, Mountains, Glaciers, and Seas. This model showcased an impressive accuracy rate of 84%. To make our experiments smoother, we've saved the trained model as a .h5 file that lets us put our classifier through its paces on new image inputs. However, the intrigue lies in our attempts to challenge and ultimately break this seemingly robust classifier using stack blur effects. The basic idea behind a stack blur is to average the color values of neighboring pixels in a way that simulates a blurring effect.

Experimenting with corruption levels, I applied the stack blur effect with radii increasing from 0 to 50 to an image depicting a forest-like area. To make things more clearer, witness how the images below gracefully degrade across different blur levels, mimicking the challenges posed by real-world scenarios.

The classifier's ability to make accurate predictions diminishes with higher blur levels, mirroring real-world scenarios where image clarity is compromised. As the blur radius increases, the model encounters difficulty in extracting relevant features, which is why I think it is prone to misclassifying the image category. As the image underwent escalating levels of corruption, the classifier's ability to accurately predict its categorization as "forest" reduced to a mere 2.9%. Caught in the web of these distortions, the classifier fell victim to misclassification. To provide a visual depiction, here is a plotted representation of the extent of corruption against the accuracy of classification, that reveals a decreasing graph as the intensity of blur increases.

accuracy Vs amount of corruption plot

Our exploration into the impact of blur underscores the vulnerabilities that classifiers face when confronted with manipulated visual input. The world of image classification keeps changing, urging us to explore further, learn more, and strengthen our defenses against various distortions. As we uncover the illusions and nuances, the need for strong and durable image classifiers becomes clearer.

Generative AI platforms, like the mighty GPT-4, have broken free from the confines of text, venturing boldly into the visual domain. We're set to delve into the diverse prompts provided to these cutting-edge generative AI platforms and explore how they decode the visual language. Image prompts open up creative possibilities, inviting users to explore a spectrum of visual stimuli, fostering imaginative and novel applications. While experimenting with various images, the results were intriguing.

Bird input image

This is a sample image of a bird perched in a tree, for which I prompted it to sketch/draw the same image.

Generated sketch of bird

Copilot might not be an exact replica machine, but it sure knows how to doodle. Its ability to sketch a comparable version based on its interpretation of the described images seems pretty good from these results.

Impressive is an understatement for its ability to recognize embedded numbers and identify objects and patterns. However, a twist unfolds – it refrains from recognizing people in images, citing privacy concerns.

It hit a bump trying to label objects with bounding boxes in an image loaded with various soda brands. I think with its current capabilities, Copilot is able to provide text-based responses, offering a workaround in certain scenarios, despite limitations in annotating objects with bounding boxes like in this image of soft drinks.

soda bottles

The silver lining though - it managed to tally up the number of drinks for each brand. Yet here’s a catch - while it seamlessly provides answers relevant to the images, GPT-4 leans on online pals for the tough stuff. “Why struggle alone when the internet is full of geniuses?”  It gets the job done, but we secretly hope it becomes the genius all by itself someday.