Skip to content

Driver Distraction Detection using Computer Vision

title image

Introduction

In today's fast-paced world, road safety is a top concern for both drivers and pedestrians alike. Shocking statistics reveal that nine people lose their lives every day in the United States due to crashes involving distracted drivers. But what exactly constitutes distracted driving? According to the CDC, it encompasses three main types of distractions: visual (Keeping your eyes off the road), manual (taking your hands off the wheel), and cognitive (taking your mind off driving). These distractions can range from using mobile phones for texting or calling to eating, drinking, personal grooming, or even adjusting the radio controls in the car. In this blog post, we'll delve into how computer vision techniques aid our understanding of driver behavior to tackle this critical issue and enhance road safety.

Objective

Recognizing and addressing these distractions in real-time can play a vital role in preventing accidents and saving lives. The goal of this project is to develop a system capable of accurately identifying various driver activities through computer vision techniques. The primary objective is to design and implement a deep learning model that can classify different activities performed by drivers inside vehicles. Additionally, the system aims to annotate and detect objects like bottles, cups/cans, cell phones etc., within the vehicle cabin using YOLO models, that helps provide additional context to driver activities and enhance distraction detection.

About the dataset

The dataset sourced from Kaggle is provided by State Farm comprising about 22424 images captured inside vehicles, each describing a driver engaged in various activities. These include 10 categories of activities such as safe driving, texting, talking on the phone, operating the radio, drinking, reaching behind, engaging in hair and makeup tasks, and conversing with passengers. To discourage hand labeling, the test dataset containing about 79726 unlabeled images (lot more than train images) includes processed images that are resized and ignored during evaluation. The challenging part to this dataset , is the vast number of images to test the classification model with accuracy. Additionally, the task involving detection of objects within the vehicle cabin introduces additional complexity as the model needs to localize and classify multiple objects simultaneously.

To get a better understanding of the input image data, here is a visual representation of the 10 classes of images with their labels:

class images

This bar chart shows the distribution of images in each categories:

bar chart

Implementation and Approach

Classification of driver activity:

The first part includes implementing a custom CNN model using the Keras framework to classify the different activities performed by drivers. This model consists of multiple convolutional and pooling layers followed by fully connected layers for classification. In this case, the Adam optimizer was used for training the model. To prevent overfitting, a dropout layer was included and early stopping to monitor the loss metrics during training. Although pre-trained models are often convenient and may offer superior performance, I chose to design a custom CNN architecture to better understand its effectiveness for the task at hand. The performance metrics of the model shown in Table 1 below suggest that the model performs effectively.

Table 1 - Performance scores of the classification mode

To visually track the performance improvement over epochs, the graph below illustrates the accuracy and loss curves.

accuracy and loss plots

By iteratively processing image metadata, class indices are extracted to retrieve actual labels, and labels are predicted using the CNN model. Predicted and actual labels are compared to assess accuracy, with the results stored in a DataFrame. The evaluation of the custom CNN model on training data revealed 22,215 correct predictions and 209 incorrect predictions, with insights into assessing the model's ability to identify different driver activities.

Object Detection in the vehicle

YOLO, is a new state of the art computer vision model, developed by Ultralytics that support object detection classification and segmentation tasks. The first step involves loading the pre-trained YOLOv5 model, which has been trained on the COCO dataset, for detecting various objects in complex scenes. This model was used for detecting specific objects within the vehicle cabin. Post-detection, I annotated the original images with bounding boxes and labels to signify the detected objects and their confidence scores. While the model successfully identified and annotated many instances of these objects, challenges arose in localizing cell phones or cups held by drivers in certain cases.

Object detection on user-provided images were logged specifically taking the counts of each detected object type across multiple images and presenting the results in a structured DataFrame. Through the objectDetectionModel function, the pre-trained model was utilized to annotate the desired object classes such as cups, cell phones, and bottles with class labels and confidence scores. This approach allowed for a comprehensive summary of the detected objects and their frequencies, for a deeper understanding of the model's performance in real-world scenarios.

Here is an illustration of some examples where the model detected the specific objects we wanted and annotated them really well.

Taking a closer look at some of the test images, there were instances observed where the YOLOv5 model encountered difficulties detecting specific objects, such as cups, cans, or cell phones. Some images lacked clarity, with objects partially obscured by hands or other elements, leading to inaccurate detections. Additionally, smaller objects often went unnoticed by the model. As a result, there were several cases where the model's performance fell short of expectations. These examples highlight the challenges inherent in object detection tasks, with the need for ongoing refinement and improvement.

Challenges and Future scope

One drawback about this dataset is that it only contains images taken in the daytime, which could give biased results on classifying the activities and the model performance becomes questionable on other unseen images. The Object Detection model also posed challenges in accurately localizing objects obscured by hands or positioned at different angles. Augmenting the dataset with annotated instances of challenging scenarios, such as partially obscured objects, would help the model generalize better to real-world conditions.

Looking at future enhancements, the YOLOV5 model can be fine-tuned to a custom dataset specifically tailored to vehicle interior environments to yield substantial improvements in object detection accuracy. Furthermore, integrating real-time object detection capabilities into onboard vehicle systems or mobile applications could provide immediate feedback to drivers regarding potential distractions, thereby enhancing overall safety on the roads.

Conclusion

This project involved development of custom CNN models and exploring the YOLOV5 object detection, providing an in-depth examination of the dynamics of Driver behaviors analysis. The insights gained from this analysis would help in reducing the occurrence of accidents caused by distracted driving. While there are challenges to overcome, such as dataset limitations and model generalization, the project's future scope holds promise for fine-tuning models and integrating them into practical applications for immediate feedback to drivers.

Source code URL: https://github.com/Aparna003/CSCI_6527-Driver-Distraction-Detection

References :

Leave a Reply

Your email address will not be published. Required fields are marked *