Evaluation of YOLO Model

All three versions of YOLO algorithms were trained on the customized dataset that includes the images of dairy and beef cattle taken from the farms. The raw images were downsized to 512 by 512, and were split into a training set (80%) and a validation set (20%). See below for the comparison between the ground-truth image and the predictions made by the three YOLO algorithms.

For the image, all three of the YOLO algorithms can locate the bounding boxes accurately and assign the correct labels. The left bounding box indicates an M0 DD classification, and the right bounding box indicates an M2 DD classification. However, the model does not generalize well to the validation data.

YOLOv4 and YOLOv5 have a similar precision, recall, and mAP, indicating the comparable performance between the two algorithms, where YOLOv5 slightly outperformed YOLOv4 with a higher number of true positives. These results agreed with the literature.

On the other hand, YOLOv3 has a much lower mAP with only seven correct detections for the validation data. The low mAPs can be due to the difficulty of capturing fine grained features of small detection areas. The prediction tended to be worse if the validation image is taken from a different farm or different angle compared to the training images.

The YOLO algorithms were also compared for runtime. YOLOv3 was the slowest, followed by YOLOv4 with an approximately 20% decrease in time versus YOLOv3, and YOLOv5 was the fastest with an approximately 50% decrease in time versus YOLOv4. These results agreed with the literature.

The YOLO algorithms were tested on a library of cleaned images of the hoof area of the dairy and beef cattle. Since YOLOv3 performed poorly in the initial testing, only YOLOv4 and v5 were studied for the new dataset. The results improved significantly where both YOLOv4 and YOLOv4 with a validation mAP above 97% and only a few incorrect detections.

Summary

YOLO algorithms trained on images of the hoof area performed very well with the validation mAP approaching 100%. On the other hand, YOLO algorithms trained on images of dairy and beef cattle at the herd-level currently performed poorly. This issue can be addressed by implementing an additional image preprocessing steps using the input image and returning the hoof area. Other methods including augmentation or the use of higher-resolution images can also be applied to build a rich, diverse library of labeled images and improve the model performance. Other algorithms including tinyYOLOv3 or SSD can also be implemented to compare the efficiency and efficacy of computer vision models for the real-time detection of DD in dairy and beef cattle.

References

[1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, real-time object detection,” arXiv: 1506.02640v5, 2016.

[2] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” arXiv: 1612.08242, 2016.

[3] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv: 1804.02767, 2018.

[4] A. Bochkovskiy, C.Y. Wang, and H.Y.M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv: 2004.10934, 2020.