The PyTorch Implementation based on YOLOv4 of the paper: Complex-YOLO: Real-time 3D Object Detection on Point Clouds
- Realtime 3D object detection based on YOLOv4
- Support distributed data parallel training
- Tensorboard
- Mosaic/Cutout augmentation for training
- Use CIoU / GIoU loss for optimization.
pip install -U -r requirements.txt
For mayavi
and shapely
libraries, please refer to the installation instructions from their official websites.
Download the 3D KITTI detection dataset from here.
The downloaded data includes:
- Velodyne point clouds (29 GB): input data to the Complex-YOLO model
- Training labels of object data set (5 MB): input label to the Complex-YOLO model
- Camera calibration matrices of object data set (16 MB): for visualization of predictions
- Left color images of object data set (12 GB): for visualization of predictions
Please make sure that you construct the source code & dataset directories structure as below.
For 3D point cloud preprocessing, please refer to the previous works:
This work has been based on the paper YOLOv4: Optimal Speed and Accuracy of Object Detection.
Please refer to several implementations of YOLOv4 using PyTorch DL framework:
- Tianxiaomo/pytorch-YOLOv4
- Ultralytics/yolov3_and_v4
- WongKinYiu/PyTorch_YOLOv4
- VCasecnikovs/Yet-Another-YOLOv4-Pytorch
cd src/data_process
- To visualize BEV maps and camera images (with 3D boxes), let's execute (the
output-width
param can be changed to show the images in a bigger/smaller window):
python kitti_dataloader.py --output-width 608
- To visualize mosaics that are composed from 4 BEV maps (Using during training only), let's execute:
python kitti_dataloader.py --show-train-data --mosaic --output-width 608
By default, there is no padding for the output mosaics, the feature could be activated by executing:
python kitti_dataloader.py --show-train-data --mosaic --random-padding --output-width 608
- To visualize cutout augmentation, let's execute:
python kitti_dataloader.py --show-train-data --cutout_prob 1. --cutout_nholes 1 --cutout_fill_value 1. --cutout_ratio 0.3 --output-width 608
python test.py --gpu_idx 0 --pretrained_path <PATH>...
python evaluate.py --gpu_idx 0 --pretrained_path <PATH> --img_size <SIZE> --conf-thresh <THRESH> --nms-thresh <THRESH> --iou-thresh <THRESH>...
(The conf-thresh
, nms-thresh
, and iou-thresh
params can be adjusted. By default, these params have been set to 0.5)
- Evaluate the complex-YOLOv3 model on the validation set:
Download the trained model from here, then put it to${ROOT}/checkpoints/complex_yolov3/complex_yolov3.pth
and execute:
python evaluate.py --gpu_idx 0 --pretrained_path ../checkpoints/complex_yolov3/complex_yolov3.pth --cfgfile ./config/cfg/complex_yolov3.cfg
-
(Complex-YOLOv4 trained model will be released. Please watch the repo to get notifications for next update.)
-
The comparison of this implementation with Complex-YOLOv2, Complex-YOLOv3 (will be updated ASAP).
python train.py --gpu_idx 0 --multiscale_training --batch_size <N> --num_workers <N>...
We should always use the nccl
backend for multi-processing distributed training since it currently provides the best
distributed training performance.
- Single machine (node), multiple GPUs
python train.py --dist-url 'tcp://127.0.0.1:29500' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0
- Two machines (two nodes), multiple GPUs
First machine
python train.py --dist-url 'tcp://IP_OF_NODE1:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 0
Second machine
python train.py --dist-url 'tcp://IP_OF_NODE2:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 1
To reproduce the results, you can run the bash shell script
./train.sh
- To track the training progress, go to the
logs/
folder and
cd logs/<saved_fn>/tensorboard/
tensorboard --logdir=./
- Then go to http://localhost:6006/:
Backbone | Detector | |
---|---|---|
BoF | [x] Dropblock [x] Random rescale, rotation (global) [x] Mosaic/Cutout augmentation |
[x] Cross mini-Batch Normalization [x] Dropblock [x] Random training shapes |
BoS | [x] Mish activation [x] Cross-stage partial connections (CSP) [x] Multi-input weighted residual connections (MiWRC) |
[x] Mish activation [x] SPP-block [x] SAM-block [x] PAN path-aggregation block [ ] CIoU/GIoU loss |
If you think this work is useful, please give me a star!
If you find any errors or have any suggestions, please contact me (Email: nguyenmaudung93.kstn@gmail.com
).
Thank you!
@article{Complex-YOLO,
author = {Martin Simon, Stefan Milz, Karl Amende, Horst-Michael Gross},
title = {Complex-YOLO: Real-time 3D Object Detection on Point Clouds},
year = {2018},
journal = {arXiv},
}
@article{YOLOv4,
author = {Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao},
title = {YOLOv4: Optimal Speed and Accuracy of Object Detection},
year = {2020},
journal = {arXiv},
}
${ROOT}
└── checkpoints/
├── complex_yolov2/
├── complex_yolov3/
└── complex_yolov4/
└── dataset/
└── kitti/
├──ImageSets/
│ ├── train.txt
│ └── val.txt
├── training/
│ ├── image_2/ <-- for visualization
│ ├── calib/
│ ├── label_2/
│ └── velodyne/
└── testing/
│ ├── image_2/ <-- for visualization
│ ├── calib/
│ └── velodyne/
└── classes_names.txt
└── src/
├── config/
├── cfg/
│ ├── complex_yolov3.cfg
│ ├── complex_yolov3_tiny.cfg
│ ├── complex_yolov4.cfg
│ ├── complex_yolov4_tiny.cfg
│ ├── train_config.py
│ └── kitti_config.py
├── data_process/
│ ├── kitti_bev_utils.py
│ ├── kitti_dataloader.py
│ ├── kitti_dataset.py
│ ├── kitti_data_utils.py
│ ├── train_val_split.py
│ └── transformation.py
├── models/
│ ├── darknet2pytorch.py
│ ├── darknet_utils.py
│ ├── model_utils.py
│ ├── region_loss.py
│ ├── yolo_layer.py
│ └── yolov4_model.py
└── utils/
│ ├── detection_utils.py
│ ├── evaluation_utils.py
│ ├── iou_utils.py
│ ├── logger.py
│ ├── misc.py
│ ├── prediction_utils.py
│ ├── torch_utils.py
│ ├── train_utils.py
│ └── visualization_utils.py
├── evaluate.py
├── test.py
├── test.sh
├── train.py
└── train.sh
├── README.md
└── requirements.txt
usage: train.py [-h] [--seed SEED] [--saved_fn FN] [--working-dir PATH]
[-a ARCH] [--cfgfile PATH] [--pretrained_path PATH]
[--img_size IMG_SIZE] [--hflip_prob HFLIP_PROB]
[--cutout_prob CUTOUT_PROB] [--cutout_nholes CUTOUT_NHOLES]
[--cutout_ratio CUTOUT_RATIO]
[--cutout_fill_value CUTOUT_FILL_VALUE]
[--multiscale_training] [--mosaic] [--random-padding]
[--no-val] [--num_samples NUM_SAMPLES]
[--num_workers NUM_WORKERS] [--batch_size BATCH_SIZE]
[--print_freq N] [--tensorboard_freq N] [--checkpoint_freq N]
[--start_epoch N] [--num_epochs N] [--lr_type LR_TYPE]
[--lr LR] [--minimum_lr MIN_LR] [--momentum M] [-wd WD]
[--optimizer_type OPTIMIZER] [--burn_in N]
[--steps [STEPS [STEPS ...]]] [--world-size N] [--rank N]
[--dist-url DIST_URL] [--dist-backend DIST_BACKEND]
[--gpu_idx GPU_IDX] [--no_cuda]
[--multiprocessing-distributed] [--evaluate]
[--resume_path PATH] [--conf-thresh CONF_THRESH]
[--nms-thresh NMS_THRESH] [--iou-thresh IOU_THRESH]
The Implementation of Complex YOLOv4
optional arguments:
-h, --help show this help message and exit
--seed SEED re-produce the results with seed random
--saved_fn FN The name using for saving logs, models,...
--working-dir PATH The ROOT working directory
-a ARCH, --arch ARCH The name of the model architecture
--cfgfile PATH The path for cfgfile (only for darknet)
--pretrained_path PATH
the path of the pretrained checkpoint
--img_size IMG_SIZE the size of input image
--hflip_prob HFLIP_PROB
The probability of horizontal flip
--cutout_prob CUTOUT_PROB
The probability of cutout augmentation
--cutout_nholes CUTOUT_NHOLES
The number of cutout area
--cutout_ratio CUTOUT_RATIO
The max ratio of the cutout area
--cutout_fill_value CUTOUT_FILL_VALUE
The fill value in the cut out area, default 0. (black)
--multiscale_training
If true, use scaling data for training
--mosaic If true, compose training samples as mosaics
--random-padding If true, random padding if using mosaic augmentation
--no-val If true, dont evaluate the model on the val set
--num_samples NUM_SAMPLES
Take a subset of the dataset to run and debug
--num_workers NUM_WORKERS
Number of threads for loading data
--batch_size BATCH_SIZE
mini-batch size (default: 4), this is the totalbatch
size of all GPUs on the current node when usingData
Parallel or Distributed Data Parallel
--print_freq N print frequency (default: 50)
--tensorboard_freq N frequency of saving tensorboard (default: 20)
--checkpoint_freq N frequency of saving checkpoints (default: 2)
--start_epoch N the starting epoch
--num_epochs N number of total epochs to run
--lr_type LR_TYPE the type of learning rate scheduler (cosin or
multi_step)
--lr LR initial learning rate
--minimum_lr MIN_LR minimum learning rate during training
--momentum M momentum
-wd WD, --weight_decay WD
weight decay (default: 1e-6)
--optimizer_type OPTIMIZER
the type of optimizer, it can be sgd or adam
--burn_in N number of burn in step
--steps [STEPS [STEPS ...]]
number of burn in step
--world-size N number of nodes for distributed training
--rank N node rank for distributed training
--dist-url DIST_URL url used to set up distributed training
--dist-backend DIST_BACKEND
distributed backend
--gpu_idx GPU_IDX GPU index to use.
--no_cuda If true, cuda is not used.
--multiprocessing-distributed
Use multi-processing distributed training to launch N
processes per node, which has N GPUs. This is the
fastest way to use PyTorch for either single node or
multi node data parallel training
--evaluate only evaluate the model, not training
--resume_path PATH the path of the resumed checkpoint
--conf-thresh CONF_THRESH
for evaluation - the threshold for class conf
--nms-thresh NMS_THRESH
for evaluation - the threshold for nms
--iou-thresh IOU_THRESH
for evaluation - the threshold for IoU