RoboNation's RoboSub is an international robotics competition where multi-disciplinary teams build autonomous underwater vehicles to perform several pre-determined tasks. I was a part of Cal State LA's computer science team. My role was to use computer vision to train our sub to detect underwater obstacles and interact with them accordingly.

The model used for object detection was You Only Look Once (YOLOv3), a state-of-the-art convolutional neural network. This required a lot of configuration, including the installation of CUDA. To train the model, we gathered images and labeled the objects that needed to be detected. Some image and label datasets were provided by other teams using a different computer vision method, so a custom script was necessary to translate these labels into the correct YOLO format.

A major challenge was the low visibility from the muddy waters of the competition site. To address this, image filtering techniques were used to increase the quality of the footage, making it easier for the model to detect the obstacles. One such technique was histogram equalization with OpenCV, which corrects the contrast of an image.

One of our tasks was to train the model to identify four different dice, for which we used just over 900 images. The mean average precision (mAP) was 97.73%, which evaluates how accurately objects were classified. The intersection over union (IoU) was 86.68%, which evaluates the accuracy of the bounding boxes around the detected objects.

YOLO algorithm labeling multiple large dice detected in murky water
Dice detection