diff --git a/LICENSE b/LICENSE index 663aaf6..f8c21e6 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2020 Tianwei Yin and Xingyi Zhou +Copyright (c) 2020-2021 Tianwei Yin and Xingyi Zhou Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md index d182eaa..8d39fc6 100644 --- a/README.md +++ b/README.md @@ -17,11 +17,12 @@ year={2020}, } -## Updates -[2020-12-11] **NEW:** 3 out of the top 4 entries in the recent NeurIPS 2020 [nuScenes 3D Detection challenge](https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any) used CenterPoint. Congratualations to other participants and please stay tuned for more updates on nuScenes and Waymo soon. +## NEWS -[2020-08-10] We now support vehicle detection on [Waymo](docs/WAYMO.md) with SOTA performance. +[2021-01-06] CenterPoint v1.0 is released. Without bells and whistles, we rank first among all Lidar-only methods on Waymo Open Dataset with a single model that runs at 11 FPS. Check out CenterPoint's model zoo for [Waymo](configs/waymo/README.md) and [nuScenes](configs/nusc/README.md). + +[2020-12-11] 3 out of the top 4 entries in the recent NeurIPS 2020 [nuScenes 3D Detection challenge](https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any) used CenterPoint. Congratualations to other participants and please stay tuned for more updates on nuScenes and Waymo soon. ## Contact Any questions or discussion are welcome! @@ -30,48 +31,63 @@ Tianwei Yin [yintianwei@utexas.edu](mailto:yintianwei@utexas.edu) Xingyi Zhou [zhouxy@cs.utexas.edu](mailto:zhouxy@cs.utexas.edu) ## Abstract -Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection, but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objectsas points. We use a keypoint detector to find centers of objects, and simply regress to other attributes, including 3D size, 3D orientation, and velocity. In our center-based framework, 3D object tracking simplifies to greedy closest-point matching.The resulting detection and tracking algorithm is simple, efficient, and effective. On the nuScenes dataset, our point-based representations performs 3-4mAP higher than the box-based counterparts for 3D detection, and 6 AMOTA higher for 3D tracking. Our real-time model runs end-to-end 3D detection and tracking at 30 FPS with 54.2AMOTA and 48.3mAP while the best single model achieves 60.3mAP for 3D detection, and 63.8AMOTA for 3D tracking. +Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objects as points. Our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity. In a second stage, it refines these estimates using additional point features on the object. In CenterPoint, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective. CenterPoint achieved state-of-the-art performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model. On the Waymo Open Dataset, CenterPoint outperforms all previous single model method by a large margin and ranks first among all Lidar-only submissions. -# Highlights -- **Simple:** Two sentences method summary: We use standard 3D point cloud encoder with a few convolutional layers in the head to produce a bird-eye-view heatmap and other dense regression outputs including the offset to centers in the previous frame. Detection is a simple local peak extraction, and tracking is a closest-distance matching. +# Highlights -- **Fast:** Our [PointPillars model](configs/centerpoint/nusc_centerpoint_pp_02voxel_circle_nms.py) runs at *30* FPS with *48.3* AP and *59.1* AMOTA for simultaneous 3D detection and tracking on the nuScenes dataset. +- **Simple:** Two sentences method summary: We use standard 3D point cloud encoder with a few convolutional layers in the head to produce a bird-eye-view heatmap and other dense regression outputs including the offset to centers in the previous frame. Detection is a simple local peak extraction with refinement, and tracking is a closest-distance matching. -- **Accurate**: Our [best single model](configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_flip_testset.py) achieves *60.3* mAP and *67.3* NDS on nuScenes detection testset. +- **Fast and Accurate**: Our best single model achieves *71.9* mAPH on Waymo and *65.5* NDS on nuScenes while running at 11FPS+. -- **Extensible**: Simple baseline to switch in your backbone and novel algorithms. +- **Extensible**: Simple replacement for anchor-based detector in your novel algorithms. ## Main results -#### 3D detection +#### 3D detection on Waymo test set + +| | #Frame | Veh_L2 | Ped_L2 | Cyc_L2 | MAPH | FPS | +|---------|---------|--------|--------|---------|--------|-------| +|VoxelNet | 1 | 71.9 | 67.0 | 68.2 | 69.0 | 13 | +|VoxelNet | 2 | 73.0 | 71.5 | 71.3 | 71.9 | 11 | + +#### 3D detection on Waymo domain adaptation test set + +| | #Frame | Veh_L2 | Ped_L2 | Cyc_L2 | MAPH | FPS | +|---------|---------|--------|--------|---------|--------|-------| +|VoxelNet | 2 | 56.1 | 47.8 | 65.2 | 56.3 | 11 | + + +#### 3D detection on nuScenes test set + +| | MAP ↑ | NDS ↑ | PKL ↓ | FPS ↑| +|---------|---------|--------|--------|------| +|VoxelNet | 58.0 | 65.5 | 0.69 | 11 | -| | Split | MAP | NDS | FPS | -|---------|---------|---------|--------|-------| -| PointPillars-512 | Val | 48.3 | 59.1 | 30.3 | -| VoxelNet-1024 | Val | 55.4 | 63.8 | 14.5 | -| VoxelNet-1440_dcn_flip | Val | 59.1 | 67.1 | 2.2 | -| VoxelNet-1440_dcn_flip | Test | 60.3 | 67.3 | 2.2 | -#### 3D Tracking +#### 3D tracking on Waymo test set -| | Split | Tracking time | Total time | AMOTA ↑ | AMOTP ↓ | -|-----------------------|-----------|---------------|--------------|---------|---------| -| CenterPoint_pillar_512| val | 1ms | 34ms | 54.2 | 0.680 | -| CenterPoint_voxel_1024| val | 1ms | 70ms | 62.6 | 0.630 | -| CenterPoint_voxel_1440_dcn_flip | val | 1ms | 451ms | 65.9 | 0.567 | -| CenterPoint_voxel_1440_dcn_flip | test | 1ms | 451ms | 63.8 | 0.555 | +| | #Frame | Veh_L2 | Ped_L2 | Cyc_L2 | MOTA | FPS | +|---------|---------|--------|--------|---------|--------|-------| +| VoxelNet| 2 | 59.4 | 56.6 | 60.0 | 58.7 | 11 | -All results are tested on a Titan Xp GPU with batch size 1. More models and details can be found in [MODEL_ZOO.md](docs/MODEL_ZOO.md). + +#### 3D Tracking on nuScenes test set + +| | AMOTA ↑ | AMOTP ↓ | +|----------|---------|---------| +| VoxelNet (flip test) | 63.8 | 0.555 | + + +All results are tested on a Titan RTX GPU with batch size 1. ## Third-party resources - [AFDet](https://arxiv.org/abs/2006.12671): another work inspired by CenterNet achieves good performance on KITTI/Waymo dataset. +- [mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/centerpoint): CenterPoint in mmdet framework. ## Use CenterPoint -We provide a demo with PointPillars model for 3D object detection on the nuScenes dataset. - ### Basic Installation ```bash @@ -100,13 +116,13 @@ For more advanced usage, please refer to [INSTALL](docs/INSTALL.md) to set up mo ## Benchmark Evaluation and Training -Please refer to [GETTING_START](docs/GETTING_START.md) to prepare the data. Then follow the instruction there to reproduce our detection and tracking results. All detection configurations are included in [configs](configs) and we provide the scripts for all tracking experiments in [tracking_scripts](tracking_scripts). The pretrained models, log, and each model's prediction files are provided in the [MODEL_ZOO.md](docs/MODEL_ZOO.md). +Please refer to [GETTING_START](docs/GETTING_START.md) to prepare the data. Then follow the instruction there to reproduce our detection and tracking results. All detection configurations are included in [configs](configs) and we provide the scripts for all tracking experiments in [tracking_scripts](tracking_scripts). ## License CenterPoint is release under MIT license (see [LICENSE](LICENSE)). It is developed based on a forked version of [det3d](https://github.com/poodarchu/Det3D/tree/56402d4761a5b73acd23080f537599b0888cce07). We also incorperate a large amount of code from [CenterNet](https://github.com/xingyizhou/CenterNet) -and [CenterTrack](https://github.com/xingyizhou/CenterTrack). See the [NOTICE](docs/NOTICE) for details. Note that the nuScenes dataset is free of charge for non-commercial activities. Please contact the [nuScenes team](https://www.nuscenes.org) for commercial usage. +and [CenterTrack](https://github.com/xingyizhou/CenterTrack). See the [NOTICE](docs/NOTICE) for details. Note that both nuScenes and Waymo datasets are under non-commercial licenses. ## Acknowlegement This project is not possible without multiple great opensourced codebases. We list some notable examples below. @@ -117,51 +133,4 @@ This project is not possible without multiple great opensourced codebases. We li * [CenterNet](https://github.com/xingyizhou/CenterNet) * [mmcv](https://github.com/open-mmlab/mmcv) * [mmdetection](https://github.com/open-mmlab/mmdetection) -* [maskrcnn_benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) -* [PCDet](https://github.com/sshaoshuai/PCDet) - -**CenterPoint is deeply influenced by the following projects. Please consider citing the relevant papers.** - -``` -@article{zhu2019classbalanced, - title={Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection}, - author={Zhu, Benjin and Jiang, Zhengkai and Zhou, Xiangxin and Li, Zeming and Yu, Gang}, - journal={arXiv:1908.09492}, - year={2019} -} - -@article{lang2019pillar, - title={PointPillars: Fast Encoders for Object Detection From Point Clouds}, - journal={CVPR}, - author={Lang, Alex H. and Vora, Sourabh and Caesar, Holger and Zhou, Lubing and Yang, Jiong and Beijbom, Oscar}, - year={2019}, -} - -@article{zhou2018voxelnet, - title={VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection}, - journal={CVPR}, - author={Zhou, Yin and Tuzel, Oncel}, - year={2018}, -} - -@article{yan2018second, - title={Second: Sparsely embedded convolutional detection}, - author={Yan, Yan and Mao, Yuxing and Li, Bo}, - journal={Sensors}, - year={2018}, -} - -@article{zhou2019objects, - title={Objects as Points}, - author={Zhou, Xingyi and Wang, Dequan and Kr{\"a}henb{\"u}hl, Philipp}, - journal={arXiv:1904.07850}, - year={2019} -} - -@article{zhou2020tracking, - title={Tracking Objects as Points}, - author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp}, - journal={arXiv:2004.01177}, - year={2020} -} -``` +* [OpenPCDet](https://github.com/open-mmlab/OpenPCDet) \ No newline at end of file diff --git a/configs/cbgs/nusc_cbgs_01voxel.py b/configs/cbgs/nusc_cbgs_01voxel.py deleted file mode 100644 index 346378d..0000000 --- a/configs/cbgs/nusc_cbgs_01voxel.py +++ /dev/null @@ -1,366 +0,0 @@ -import itertools -import logging - -from det3d.builder import build_box_coder -from det3d.utils.config_tool import get_downsample_factor - -norm_cfg = None - -tasks = [ - dict(num_class=1, class_names=["car"]), - dict(num_class=2, class_names=["truck", "construction_vehicle"]), - dict(num_class=2, class_names=["bus", "trailer"]), - dict(num_class=1, class_names=["barrier"]), - dict(num_class=2, class_names=["motorcycle", "bicycle"]), - dict(num_class=2, class_names=["pedestrian", "traffic_cone"]), -] - -class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) - -# training and testing settings -target_assigner = dict( - type="iou", - anchor_generators=[ - dict( - type="anchor_generator_range", - sizes=[1.97, 4.63, 1.74], - anchor_ranges=[-51.2, -51.2, -0.95, 51.2, 51.2, -0.95], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.6, - unmatched_threshold=0.45, - class_name="car", - ), - dict( - type="anchor_generator_range", - sizes=[2.51, 6.93, 2.84], - anchor_ranges=[-51.2, -51.2, -0.40, 51.2, 51.2, -0.40], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.55, - unmatched_threshold=0.4, - class_name="truck", - ), - dict( - type="anchor_generator_range", - sizes=[2.85, 6.37, 3.19], - anchor_ranges=[-51.2, -51.2, -0.225, 51.2, 51.2, -0.225], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.5, - unmatched_threshold=0.35, - class_name="construction_vehicle", - ), - dict( - type="anchor_generator_range", - sizes=[2.94, 10.5, 3.47], - anchor_ranges=[-51.2, -51.2, -0.085, 51.2, 51.2, -0.085], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.55, - unmatched_threshold=0.4, - class_name="bus", - ), - dict( - type="anchor_generator_range", - sizes=[2.90, 12.29, 3.87], - anchor_ranges=[-51.2, -51.2, 0.115, 51.2, 51.2, 0.115], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.5, - unmatched_threshold=0.35, - class_name="trailer", - ), - dict( - type="anchor_generator_range", - sizes=[2.53, 0.50, 0.98], - anchor_ranges=[-51.2, -51.2, -1.33, 51.2, 51.2, -1.33], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.55, - unmatched_threshold=0.4, - class_name="barrier", - ), - dict( - type="anchor_generator_range", - sizes=[0.77, 2.11, 1.47], - anchor_ranges=[-51.2, -51.2, -1.085, 51.2, 51.2, -1.085], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.5, - unmatched_threshold=0.3, - class_name="motorcycle", - ), - dict( - type="anchor_generator_range", - sizes=[0.60, 1.70, 1.28], - anchor_ranges=[-51.2, -51.2, -1.18, 51.2, 51.2, -1.18], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.5, - unmatched_threshold=0.35, - class_name="bicycle", - ), - dict( - type="anchor_generator_range", - sizes=[0.67, 0.73, 1.77], - anchor_ranges=[-51.2, -51.2, -0.935, 51.2, 51.2, -0.935], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.6, - unmatched_threshold=0.4, - class_name="pedestrian", - ), - dict( - type="anchor_generator_range", - sizes=[0.41, 0.41, 1.07], - anchor_ranges=[-51.2, -51.2, -1.285, 51.2, 51.2, -1.285], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.6, - unmatched_threshold=0.4, - class_name="traffic_cone", - ), - ], - sample_positive_fraction=-1, - sample_size=512, - region_similarity_calculator=dict(type="nearest_iou_similarity",), - pos_area_threshold=-1, - tasks=tasks, -) - -box_coder = dict( - type="ground_box3d_coder", n_dim=9, linear_dim=False, encode_angle_vector=True, -) - -# model settings -model = dict( - type="VoxelNet", - pretrained=None, - reader=dict( - type="VoxelFeatureExtractorV3", - num_input_features=5, - norm_cfg=norm_cfg, - ), - backbone=dict( - type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8, norm_cfg=norm_cfg, - ), - neck=dict( - type="RPN", - layer_nums=[5, 5], - ds_layer_strides=[1, 2], - ds_num_filters=[128, 256], - us_layer_strides=[1, 2], - us_num_filters=[256, 256], - num_input_features=256, - norm_cfg=norm_cfg, - logger=logging.getLogger("RPN"), - ), - bbox_head=dict( - type="MultiGroupHead", - mode="3d", - in_channels=sum([256, 256]), - norm_cfg=norm_cfg, - tasks=tasks, - weights=[1,], - box_coder=build_box_coder(box_coder), - encode_background_as_zeros=True, - loss_norm=dict( - type="NormByNumPositives", pos_cls_weight=1.0, neg_cls_weight=2.0, - ), - loss_cls=dict(type="SigmoidFocalLoss", alpha=0.25, gamma=2.0, loss_weight=1.0,), - use_sigmoid_score=True, - loss_bbox=dict( - type="WeightedL1Loss", - code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], - codewise=True, - loss_weight=0.25, - ), - encode_rad_error_by_sin=False, - loss_aux=None, - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, # (output_channel, num_conv) - ), -) - -assigner = dict( - box_coder=box_coder, - target_assigner=target_assigner, - out_size_factor=get_downsample_factor(model), - debug=False, -) - - -train_cfg = dict(assigner=assigner) - -test_cfg = dict( - nms=dict( - use_rotate_nms=True, - use_multi_class_nms=False, - nms_pre_max_size=1000, - nms_post_max_size=83, - nms_iou_threshold=0.2, - ), - score_threshold=0.1, - post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, -) - -# dataset settings -dataset_type = "NuScenesDataset" -nsweeps = 10 -data_root = "data/nuScenes" - -db_sampler = dict( - type="GT-AUG", - enable=False, - db_info_path="data/nuScenes/dbinfos_train_10sweeps_withvelo.pkl", - sample_groups=[ - dict(car=2), - dict(truck=3), - dict(construction_vehicle=7), - dict(bus=4), - dict(trailer=6), - dict(barrier=2), - dict(motorcycle=6), - dict(bicycle=6), - dict(pedestrian=2), - dict(traffic_cone=2), - ], - db_prep_steps=[ - dict( - filter_by_min_num_points=dict( - car=5, - truck=5, - bus=5, - trailer=5, - construction_vehicle=5, - traffic_cone=5, - barrier=5, - motorcycle=5, - bicycle=5, - pedestrian=5, - ) - ), - dict(filter_by_difficulty=[-1],), - ], - global_random_rotation_range_per_object=[0, 0], - rate=1.0, -) -train_preprocessor = dict( - mode="train", - shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], - global_rot_noise=[-0.3925, 0.3925], - global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, - db_sampler=db_sampler, - class_names=class_names, -) - -val_preprocessor = dict( - mode="val", - shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, -) - -voxel_generator = dict( - range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], - voxel_size=[0.1, 0.1, 0.2], - max_points_in_voxel=10, - max_voxel_num=60000, -) - -train_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=train_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignTarget", cfg=train_cfg["assigner"]), - dict(type="Reformat"), - # dict(type='PointCloudCollect', keys=['points', 'voxels', 'annotations', 'calib']), -] -test_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=val_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignTarget", cfg=train_cfg["assigner"]), - dict(type="Reformat"), -] - -train_anno = "data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl" -val_anno = "data/nuScenes/infos_val_10sweeps_withvelo_filter_True.pkl" -test_anno = None - -data = dict( - samples_per_gpu=4, - workers_per_gpu=8, - train=dict( - type=dataset_type, - root_path=data_root, - info_path=train_anno, - ann_file=train_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=train_pipeline, - ), - val=dict( - type=dataset_type, - root_path=data_root, - info_path=val_anno, - test_mode=True, - ann_file=val_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), - test=dict( - type=dataset_type, - root_path=data_root, - info_path=test_anno, - ann_file=test_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), -) - -# optimizer -optimizer = dict( - type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, -) - -"""training hooks """ -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy in training hooks -lr_config = dict( - type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, -) - -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=5, - hooks=[ - dict(type="TextLoggerHook"), - ], -) -# yapf:enable -# runtime settings -total_epochs = 20 -device_ids = range(8) -dist_params = dict(backend="nccl", init_method="env://") -log_level = "INFO" -work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) -load_from = None -resume_from = None -workflow = [('train', 1)] - diff --git a/configs/centerpoint/nusc_centerpoint_pp_02voxel_circle_nms.py b/configs/centerpoint/nusc_centerpoint_pp_02voxel_circle_nms.py deleted file mode 100644 index f66f990..0000000 --- a/configs/centerpoint/nusc_centerpoint_pp_02voxel_circle_nms.py +++ /dev/null @@ -1,250 +0,0 @@ -import itertools -import logging - -from det3d.builder import build_box_coder -from det3d.utils.config_tool import get_downsample_factor - -norm_cfg = None - -tasks = [ - dict(num_class=1, class_names=["car"]), - dict(num_class=2, class_names=["truck", "construction_vehicle"]), - dict(num_class=2, class_names=["bus", "trailer"]), - dict(num_class=1, class_names=["barrier"]), - dict(num_class=2, class_names=["motorcycle", "bicycle"]), - dict(num_class=2, class_names=["pedestrian", "traffic_cone"]), -] - -class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) - -# training and testing settings -target_assigner = dict( - tasks=tasks, -) - - -# model settings -model = dict( - type="PointPillars", - pretrained=None, - reader=dict( - type="PillarFeatureNet", - num_filters=[64], - num_input_features=5, - with_distance=False, - voxel_size=(0.2, 0.2, 8), - pc_range=(-51.2, -51.2, -5.0, 51.2, 51.2, 3.0), - norm_cfg=norm_cfg, - ), - backbone=dict(type="PointPillarsScatter", ds_factor=1, norm_cfg=norm_cfg,), - neck=dict( - type="RPN", - layer_nums=[3, 5, 5], - ds_layer_strides=[2, 2, 2], - ds_num_filters=[64, 128, 256], - us_layer_strides=[0.5, 1, 2], - us_num_filters=[128, 128, 128], - num_input_features=64, - norm_cfg=norm_cfg, - logger=logging.getLogger("RPN"), - ), - bbox_head=dict( - # type='RPNHead', - type="CenterHead", - mode="3d", - in_channels=sum([128, 128, 128]), - norm_cfg=norm_cfg, - tasks=tasks, - dataset='nuscenes', - weight=0.25, - code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, # (output_channel, num_conv) - encode_rad_error_by_sin=False, - direction_offset=0.0, - bn=True - ), -) - -assigner = dict( - target_assigner=target_assigner, - out_size_factor=get_downsample_factor(model), - dense_reg=1, - gaussian_overlap=0.1, - max_objs=500, - min_radius=2, -) - - -train_cfg = dict(assigner=assigner) - - -test_cfg = dict( - post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, - max_pool_nms=False, - min_radius=[4, 12, 10, 1, 0.85, 0.175], - post_max_size=83, - score_threshold=0.1, - pc_range=[-51.2, -51.2], - out_size_factor=get_downsample_factor(model), - voxel_size=[0.2, 0.2], - circle_nms=True -) - - -# dataset settings -dataset_type = "NuScenesDataset" -nsweeps = 10 -data_root = "data/nuScenes" - -db_sampler = dict( - type="GT-AUG", - enable=False, - db_info_path="data/nuScenes/dbinfos_train_10sweeps_withvelo.pkl", - sample_groups=[ - dict(car=2), - dict(truck=3), - dict(construction_vehicle=7), - dict(bus=4), - dict(trailer=6), - dict(barrier=2), - dict(motorcycle=6), - dict(bicycle=6), - dict(pedestrian=2), - dict(traffic_cone=2), - ], - db_prep_steps=[ - dict( - filter_by_min_num_points=dict( - car=5, - truck=5, - bus=5, - trailer=5, - construction_vehicle=5, - traffic_cone=5, - barrier=5, - motorcycle=5, - bicycle=5, - pedestrian=5, - ) - ), - dict(filter_by_difficulty=[-1],), - ], - global_random_rotation_range_per_object=[0, 0], - rate=1.0, -) -train_preprocessor = dict( - mode="train", - shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], - global_rot_noise=[-0.3925, 0.3925], - global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, - db_sampler=db_sampler, - class_names=class_names, -) - -val_preprocessor = dict( - mode="val", - shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, -) - -voxel_generator = dict( - range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], - voxel_size=[0.2, 0.2, 8], - max_points_in_voxel=20, - max_voxel_num=30000, -) - -train_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=train_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat"), -] -test_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=val_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat"), -] - -train_anno = "data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl" -val_anno = "data/nuScenes/infos_val_10sweeps_withvelo_filter_True.pkl" -test_anno = None - -data = dict( - samples_per_gpu=4, - workers_per_gpu=8, - train=dict( - type=dataset_type, - root_path=data_root, - info_path=train_anno, - ann_file=train_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=train_pipeline, - ), - val=dict( - type=dataset_type, - root_path=data_root, - info_path=val_anno, - test_mode=True, - ann_file=val_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), - test=dict( - type=dataset_type, - root_path=data_root, - info_path=test_anno, - ann_file=test_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), -) - - -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# optimizer -optimizer = dict( - type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, -) -lr_config = dict( - type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, -) - -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=5, - hooks=[ - dict(type="TextLoggerHook"), - # dict(type='TensorboardLoggerHook') - ], -) -# yapf:enable -# runtime settings -total_epochs = 20 -device_ids = range(8) -dist_params = dict(backend="nccl", init_method="env://") -log_level = "INFO" -work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) -load_from = None -resume_from = None -workflow = [('train', 1)] diff --git a/configs/centerpoint/nusc_centerpoint_pp_dcn_02voxel_circle_nms.py b/configs/centerpoint/nusc_centerpoint_pp_dcn_02voxel_circle_nms.py deleted file mode 100644 index d6ca87b..0000000 --- a/configs/centerpoint/nusc_centerpoint_pp_dcn_02voxel_circle_nms.py +++ /dev/null @@ -1,249 +0,0 @@ -import itertools -import logging - -from det3d.builder import build_box_coder -from det3d.utils.config_tool import get_downsample_factor - -norm_cfg = None - -tasks = [ - dict(num_class=1, class_names=["car"]), - dict(num_class=2, class_names=["truck", "construction_vehicle"]), - dict(num_class=2, class_names=["bus", "trailer"]), - dict(num_class=1, class_names=["barrier"]), - dict(num_class=2, class_names=["motorcycle", "bicycle"]), - dict(num_class=2, class_names=["pedestrian", "traffic_cone"]), -] - -class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) - -# training and testing settings -target_assigner = dict( - tasks=tasks, -) - - -# model settings -model = dict( - type="PointPillars", - pretrained=None, - reader=dict( - type="PillarFeatureNet", - num_filters=[64], - num_input_features=5, - with_distance=False, - voxel_size=(0.2, 0.2, 8), - pc_range=(-51.2, -51.2, -5.0, 51.2, 51.2, 3.0), - norm_cfg=norm_cfg, - ), - backbone=dict(type="PointPillarsScatter", ds_factor=1, norm_cfg=norm_cfg,), - neck=dict( - type="RPN", - layer_nums=[3, 5, 5], - ds_layer_strides=[2, 2, 2], - ds_num_filters=[64, 128, 256], - us_layer_strides=[0.5, 1, 2], - us_num_filters=[128, 128, 128], - num_input_features=64, - norm_cfg=norm_cfg, - logger=logging.getLogger("RPN"), - ), - bbox_head=dict( - # type='RPNHead', - type="CenterHead", - mode="3d", - in_channels=sum([128, 128, 128]), - norm_cfg=norm_cfg, - tasks=tasks, - dataset='nuscenes', - weight=0.25, - code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, # (output_channel, num_conv) - encode_rad_error_by_sin=False, - direction_offset=0.0, - share_conv_channel=64, - dcn_head=True - ), -) - -assigner = dict( - target_assigner=target_assigner, - out_size_factor=get_downsample_factor(model), - dense_reg=1, - gaussian_overlap=0.1, - max_objs=500, - min_radius=2, -) - - -train_cfg = dict(assigner=assigner) - -test_cfg = dict( - post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, - circle_nms=True, - max_pool_nms=False, - min_radius=[4, 12, 10, 1, 0.85, 0.175], - post_max_size=83, - score_threshold=0.1, - pc_range=[-51.2, -51.2], - out_size_factor=get_downsample_factor(model), - voxel_size=[0.2, 0.2] -) - -# dataset settings -dataset_type = "NuScenesDataset" -nsweeps = 10 -data_root = "data/nuScenes" - -db_sampler = dict( - type="GT-AUG", - enable=False, - db_info_path="data/nuScenes/dbinfos_train_10sweeps_withvelo.pkl", - sample_groups=[ - dict(car=2), - dict(truck=3), - dict(construction_vehicle=7), - dict(bus=4), - dict(trailer=6), - dict(barrier=2), - dict(motorcycle=6), - dict(bicycle=6), - dict(pedestrian=2), - dict(traffic_cone=2), - ], - db_prep_steps=[ - dict( - filter_by_min_num_points=dict( - car=5, - truck=5, - bus=5, - trailer=5, - construction_vehicle=5, - traffic_cone=5, - barrier=5, - motorcycle=5, - bicycle=5, - pedestrian=5, - ) - ), - dict(filter_by_difficulty=[-1],), - ], - global_random_rotation_range_per_object=[0, 0], - rate=1.0, -) -train_preprocessor = dict( - mode="train", - shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], - global_rot_noise=[-0.3925, 0.3925], - global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, - db_sampler=db_sampler, - class_names=class_names, -) - -val_preprocessor = dict( - mode="val", - shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, -) - -voxel_generator = dict( - range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], - voxel_size=[0.2, 0.2, 8], - max_points_in_voxel=20, - max_voxel_num=30000, -) - -train_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=train_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat"), -] -test_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=val_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat"), -] - -train_anno = "data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl" -val_anno = "data/nuScenes/infos_val_10sweeps_withvelo_filter_True.pkl" -test_anno = None - -data = dict( - samples_per_gpu=4, - workers_per_gpu=8, - train=dict( - type=dataset_type, - root_path=data_root, - info_path=train_anno, - ann_file=train_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=train_pipeline, - ), - val=dict( - type=dataset_type, - root_path=data_root, - info_path=val_anno, - test_mode=True, - ann_file=val_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), - test=dict( - type=dataset_type, - root_path=data_root, - info_path=test_anno, - ann_file=test_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), -) - - -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# optimizer -optimizer = dict( - type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, -) -lr_config = dict( - type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, -) - -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=5, - hooks=[ - dict(type="TextLoggerHook"), - # dict(type='TensorboardLoggerHook') - ], -) -# yapf:enable -# runtime settings -total_epochs = 20 -device_ids = range(8) -dist_params = dict(backend="nccl", init_method="env://") -log_level = "INFO" -work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) -load_from = None -resume_from = None -workflow = [('train', 1)] diff --git a/configs/centerpoint/nusc_centerpoint_voxelnet_0075voxel_circle_nms.py b/configs/centerpoint/nusc_centerpoint_voxelnet_0075voxel_circle_nms.py deleted file mode 100644 index a0d3cea..0000000 --- a/configs/centerpoint/nusc_centerpoint_voxelnet_0075voxel_circle_nms.py +++ /dev/null @@ -1,249 +0,0 @@ -import itertools -import logging - -from det3d.builder import build_box_coder -from det3d.utils.config_tool import get_downsample_factor - -norm_cfg = None - -tasks = [ - dict(num_class=1, class_names=["car"]), - dict(num_class=2, class_names=["truck", "construction_vehicle"]), - dict(num_class=2, class_names=["bus", "trailer"]), - dict(num_class=1, class_names=["barrier"]), - dict(num_class=2, class_names=["motorcycle", "bicycle"]), - dict(num_class=2, class_names=["pedestrian", "traffic_cone"]), -] - -class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) - -# training and testing settings -target_assigner = dict( - tasks=tasks, -) - -# model settings -model = dict( - type="VoxelNet", - pretrained=None, - reader=dict( - type="VoxelFeatureExtractorV3", - # type='SimpleVoxel', - num_input_features=5, - norm_cfg=norm_cfg, - ), - backbone=dict( - type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8, norm_cfg=norm_cfg, - ), - neck=dict( - type="RPN", - layer_nums=[5, 5], - ds_layer_strides=[1, 2], - ds_num_filters=[128, 256], - us_layer_strides=[1, 2], - us_num_filters=[256, 256], - num_input_features=256, - norm_cfg=norm_cfg, - logger=logging.getLogger("RPN"), - ), - bbox_head=dict( - type="CenterHead", - mode="3d", - in_channels=sum([256, 256]), - norm_cfg=norm_cfg, - tasks=tasks, - dataset='nuscenes', - weight=0.25, - code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, - encode_rad_error_by_sin=False, - direction_offset=0.0, - share_conv_channel=64, - dcn_head=False, - bn=True - ), -) - -assigner = dict( - target_assigner=target_assigner, - out_size_factor=get_downsample_factor(model), - dense_reg=1, - gaussian_overlap=0.1, - max_objs=500, - min_radius=2, -) - - -train_cfg = dict(assigner=assigner) - -test_cfg = dict( - post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, - max_pool_nms=False, - circle_nms=True, - min_radius=[4, 12, 10, 1, 0.85, 0.175], - post_max_size=83, - score_threshold=0.1, - pc_range=[-54, -54], - out_size_factor=get_downsample_factor(model), - voxel_size=[0.075, 0.075] -) - -# dataset settings -dataset_type = "NuScenesDataset" -nsweeps = 10 -data_root = "data/nuScenes" - -db_sampler = dict( - type="GT-AUG", - enable=False, - db_info_path="data/nuScenes/dbinfos_train_10sweeps_withvelo.pkl", - sample_groups=[ - dict(car=2), - dict(truck=3), - dict(construction_vehicle=7), - dict(bus=4), - dict(trailer=6), - dict(barrier=2), - dict(motorcycle=6), - dict(bicycle=6), - dict(pedestrian=2), - dict(traffic_cone=2), - ], - db_prep_steps=[ - dict( - filter_by_min_num_points=dict( - car=5, - truck=5, - bus=5, - trailer=5, - construction_vehicle=5, - traffic_cone=5, - barrier=5, - motorcycle=5, - bicycle=5, - pedestrian=5, - ) - ), - dict(filter_by_difficulty=[-1],), - ], - global_random_rotation_range_per_object=[0, 0], - rate=1.0, -) -train_preprocessor = dict( - mode="train", - shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], - global_rot_noise=[-0.3925, 0.3925], - global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, - db_sampler=db_sampler, - class_names=class_names, -) - -val_preprocessor = dict( - mode="val", - shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, -) - -voxel_generator = dict( - range=[-54, -54, -5.0, 54, 54, 3.0], - voxel_size=[0.075, 0.075, 0.2], - max_points_in_voxel=10, - max_voxel_num=90000, -) - -train_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=train_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat"), - # dict(type='PointCloudCollect', keys=['points', 'voxels', 'annotations', 'calib']), -] -test_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=val_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat"), -] - -train_anno = "data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl" -val_anno = "data/nuScenes/infos_val_10sweeps_withvelo_filter_True.pkl" -test_anno = None - -data = dict( - samples_per_gpu=4, - workers_per_gpu=8, - train=dict( - type=dataset_type, - root_path=data_root, - info_path=train_anno, - ann_file=train_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=train_pipeline, - ), - val=dict( - type=dataset_type, - root_path=data_root, - info_path=val_anno, - test_mode=True, - ann_file=val_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), - test=dict( - type=dataset_type, - root_path=data_root, - info_path=test_anno, - ann_file=test_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), -) - - - -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# optimizer -optimizer = dict( - type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, -) -lr_config = dict( - type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, -) - -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=5, - hooks=[ - dict(type="TextLoggerHook"), - # dict(type='TensorboardLoggerHook') - ], -) -# yapf:enable -# runtime settings -total_epochs = 20 -device_ids = range(8) -dist_params = dict(backend="nccl", init_method="env://") -log_level = "INFO" -work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) -load_from = None -resume_from = None -workflow = [('train', 1)] diff --git a/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_flip_circle_nms.py b/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_flip_circle_nms.py deleted file mode 100644 index 85ecd02..0000000 --- a/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_flip_circle_nms.py +++ /dev/null @@ -1,252 +0,0 @@ -import itertools -import logging - -from det3d.builder import build_box_coder -from det3d.utils.config_tool import get_downsample_factor - -norm_cfg = None -DOUBLE_FLIP = True - -tasks = [ - dict(num_class=1, class_names=["car"]), - dict(num_class=2, class_names=["truck", "construction_vehicle"]), - dict(num_class=2, class_names=["bus", "trailer"]), - dict(num_class=1, class_names=["barrier"]), - dict(num_class=2, class_names=["motorcycle", "bicycle"]), - dict(num_class=2, class_names=["pedestrian", "traffic_cone"]), -] - -class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) - -# training and testing settings -target_assigner = dict( - tasks=tasks, -) - -# model settings -model = dict( - type="VoxelNet", - pretrained=None, - reader=dict( - type="VoxelFeatureExtractorV3", - # type='SimpleVoxel', - num_input_features=5, - norm_cfg=norm_cfg, - ), - backbone=dict( - type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8, norm_cfg=norm_cfg, - ), - neck=dict( - type="RPN", - layer_nums=[5, 5], - ds_layer_strides=[1, 2], - ds_num_filters=[128, 256], - us_layer_strides=[1, 2], - us_num_filters=[256, 256], - num_input_features=256, - norm_cfg=norm_cfg, - logger=logging.getLogger("RPN"), - ), - bbox_head=dict( - type="CenterHead", - mode="3d", - in_channels=sum([256, 256]), - norm_cfg=norm_cfg, - tasks=tasks, - dataset='nuscenes', - weight=0.25, - code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, - encode_rad_error_by_sin=False, - direction_offset=0.0, - share_conv_channel=64, - dcn_head=True, - bn=True - ), -) - -assigner = dict( - target_assigner=target_assigner, - out_size_factor=get_downsample_factor(model), - dense_reg=1, - gaussian_overlap=0.1, - max_objs=500, - min_radius=2, -) - - -train_cfg = dict(assigner=assigner) - -test_cfg = dict( - post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, - circle_nms=True, - min_radius=[4, 12, 10, 1, 0.85, 0.175], - post_max_size=83, - score_threshold=0.1, - pc_range=[-54, -54], - out_size_factor=get_downsample_factor(model), - voxel_size=[0.075, 0.075], - double_flip=DOUBLE_FLIP -) - -# dataset settings -dataset_type = "NuScenesDataset" -nsweeps = 10 -data_root = "data/nuScenes" - -db_sampler = dict( - type="GT-AUG", - enable=False, - db_info_path="data/nuScenes/dbinfos_train_10sweeps_withvelo.pkl", - sample_groups=[ - dict(car=2), - dict(truck=3), - dict(construction_vehicle=7), - dict(bus=4), - dict(trailer=6), - dict(barrier=2), - dict(motorcycle=6), - dict(bicycle=6), - dict(pedestrian=2), - dict(traffic_cone=2), - ], - db_prep_steps=[ - dict( - filter_by_min_num_points=dict( - car=5, - truck=5, - bus=5, - trailer=5, - construction_vehicle=5, - traffic_cone=5, - barrier=5, - motorcycle=5, - bicycle=5, - pedestrian=5, - ) - ), - dict(filter_by_difficulty=[-1],), - ], - global_random_rotation_range_per_object=[0, 0], - rate=1.0, -) -train_preprocessor = dict( - mode="train", - shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], - global_rot_noise=[-0.3925, 0.3925], - global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, - db_sampler=db_sampler, - class_names=class_names, -) - -val_preprocessor = dict( - mode="val", - shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, -) - -voxel_generator = dict( - range=[-54, -54, -5.0, 54, 54, 3.0], - voxel_size=[0.075, 0.075, 0.2], - max_points_in_voxel=10, - max_voxel_num=90000, - double_flip=DOUBLE_FLIP -) - -train_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=train_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat"), - # dict(type='PointCloudCollect', keys=['points', 'voxels', 'annotations', 'calib']), -] -test_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=val_preprocessor), - dict(type="DoubleFlip") if DOUBLE_FLIP else dict(type="Empty"), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat", double_flip=DOUBLE_FLIP), -] - -train_anno = "data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl" -val_anno = "data/nuScenes/infos_val_10sweeps_withvelo_filter_True.pkl" -test_anno = None - -data = dict( - samples_per_gpu=4, - workers_per_gpu=8, - train=dict( - type=dataset_type, - root_path=data_root, - info_path=train_anno, - ann_file=train_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=train_pipeline, - ), - val=dict( - type=dataset_type, - root_path=data_root, - info_path=val_anno, - test_mode=True, - ann_file=val_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), - test=dict( - type=dataset_type, - root_path=data_root, - info_path=test_anno, - ann_file=test_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), -) - - - -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# optimizer -optimizer = dict( - type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, -) -lr_config = dict( - type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, -) - -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=5, - hooks=[ - dict(type="TextLoggerHook"), - # dict(type='TensorboardLoggerHook') - ], -) -# yapf:enable -# runtime settings -total_epochs = 20 -device_ids = range(8) -dist_params = dict(backend="nccl", init_method="env://") -log_level = "INFO" -work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) -load_from = None -resume_from = None -workflow = [('train', 1)] diff --git a/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_flip_testset.py b/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_flip_testset.py deleted file mode 100644 index 45502da..0000000 --- a/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_flip_testset.py +++ /dev/null @@ -1,266 +0,0 @@ -import itertools -import logging - -from det3d.builder import build_box_coder -from det3d.utils.config_tool import get_downsample_factor - -norm_cfg = None -DOUBLE_FLIP = True - -tasks = [ - dict(num_class=1, class_names=["car"]), - dict(num_class=2, class_names=["truck", "construction_vehicle"]), - dict(num_class=2, class_names=["bus", "trailer"]), - dict(num_class=1, class_names=["barrier"]), - dict(num_class=2, class_names=["motorcycle", "bicycle"]), - dict(num_class=2, class_names=["pedestrian", "traffic_cone"]), -] - -class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) - -# training and testing settings -target_assigner = dict( - tasks=tasks, -) - -# model settings -model = dict( - type="VoxelNet", - pretrained=None, - reader=dict( - type="VoxelFeatureExtractorV3", - # type='SimpleVoxel', - num_input_features=5, - norm_cfg=norm_cfg, - ), - backbone=dict( - type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8, norm_cfg=norm_cfg, - ), - neck=dict( - type="RPN", - layer_nums=[5, 5], - ds_layer_strides=[1, 2], - ds_num_filters=[128, 256], - us_layer_strides=[1, 2], - us_num_filters=[256, 256], - num_input_features=256, - norm_cfg=norm_cfg, - logger=logging.getLogger("RPN"), - ), - bbox_head=dict( - type="CenterHead", - mode="3d", - in_channels=sum([256, 256]), - norm_cfg=norm_cfg, - tasks=tasks, - dataset='nuscenes', - weight=0.25, - code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, - encode_rad_error_by_sin=False, - direction_offset=0.0, - share_conv_channel=64, - dcn_head=True, - bn=True - ), -) - -assigner = dict( - target_assigner=target_assigner, - out_size_factor=get_downsample_factor(model), - dense_reg=1, - gaussian_overlap=0.1, - max_objs=500, - min_radius=2, -) - - -train_cfg = dict(assigner=assigner) - -test_cfg = dict( - post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, - nms=dict( - use_rotate_nms=True, - use_multi_class_nms=False, - nms_pre_max_size=1000, - nms_post_max_size=83, - nms_iou_threshold=0.2, - ), - score_threshold=0.1, - pc_range=[-54, -54], - out_size_factor=get_downsample_factor(model), - voxel_size=[0.075, 0.075], - double_flip=DOUBLE_FLIP -) - -# dataset settings -dataset_type = "NuScenesDataset" -nsweeps = 10 -data_root = "data/nuScenes/v1.0-test" - -db_sampler = dict( - type="GT-AUG", - enable=False, - db_info_path="data/nuScenes/dbinfos_train_10sweeps_withvelo.pkl", - sample_groups=[ - dict(car=2), - dict(truck=3), - dict(construction_vehicle=7), - dict(bus=4), - dict(trailer=6), - dict(barrier=2), - dict(motorcycle=6), - dict(bicycle=6), - dict(pedestrian=2), - dict(traffic_cone=2), - ], - db_prep_steps=[ - dict( - filter_by_min_num_points=dict( - car=5, - truck=5, - bus=5, - trailer=5, - construction_vehicle=5, - traffic_cone=5, - barrier=5, - motorcycle=5, - bicycle=5, - pedestrian=5, - ) - ), - dict(filter_by_difficulty=[-1],), - ], - global_random_rotation_range_per_object=[0, 0], - rate=1.0, -) -train_preprocessor = dict( - mode="train", - shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], - global_rot_noise=[-0.3925, 0.3925], - global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, - db_sampler=db_sampler, - class_names=class_names, -) - -val_preprocessor = dict( - mode="val", - shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, -) - -voxel_generator = dict( - range=[-54, -54, -5.0, 54, 54, 3.0], - voxel_size=[0.075, 0.075, 0.2], - max_points_in_voxel=10, - max_voxel_num=90000, - double_flip=DOUBLE_FLIP -) - -train_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=train_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat"), - # dict(type='PointCloudCollect', keys=['points', 'voxels', 'annotations', 'calib']), -] -val_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=val_preprocessor), - dict(type="DoubleFlip") if DOUBLE_FLIP else dict(type="Empty"), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat", double_flip=DOUBLE_FLIP), -] -test_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="Preprocess", cfg=val_preprocessor), - dict(type="DoubleFlip") if DOUBLE_FLIP else dict(type="Empty"), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignLabel", cfg=train_cfg["assigner"]), - dict(type="Reformat", double_flip=DOUBLE_FLIP), -] - -train_anno = "data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl" -val_anno = "data/nuScenes/infos_val_10sweeps_withvelo_filter_True.pkl" -test_anno = "data/nuScenes/v1.0-test/infos_test_10sweeps_withvelo.pkl" - -data = dict( - samples_per_gpu=4, - workers_per_gpu=8, - train=dict( - type=dataset_type, - root_path=data_root, - info_path=train_anno, - ann_file=train_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=train_pipeline, - ), - val=dict( - type=dataset_type, - root_path=data_root, - info_path=val_anno, - test_mode=True, - ann_file=val_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=val_pipeline, - ), - test=dict( - type=dataset_type, - root_path=data_root, - info_path=test_anno, - test_mode=True, - ann_file=test_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - version="v1.0-test", - ), -) - - - -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# optimizer -optimizer = dict( - type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, -) -lr_config = dict( - type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, -) - -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=5, - hooks=[ - dict(type="TextLoggerHook"), - # dict(type='TensorboardLoggerHook') - ], -) -# yapf:enable -# runtime settings -total_epochs = 20 -device_ids = range(8) -dist_params = dict(backend="nccl", init_method="env://") -log_level = "INFO" -work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) -load_from = None -resume_from = None -workflow = [('train', 1)] diff --git a/configs/nusc/README.md b/configs/nusc/README.md new file mode 100644 index 0000000..d4ae094 --- /dev/null +++ b/configs/nusc/README.md @@ -0,0 +1,57 @@ +# MODEL ZOO + +### Common settings and notes + +- The experiments are run with PyTorch 1.1, CUDA 10.0, and CUDNN 7.5. +- The training is conducted on 4 V100 GPUs in a DGX server. +- Testing times are measured on a TITAN RTX GPU with batch size 1. + +## nuScenes 3D Detection + +**We provide training / validation configurations, logs, pretrained models, and prediction files for all models in the paper** + +### VoxelNet +| Model | FPS | Validation MAP | Validation NDS | Link | +|-----------------------|------------------|-----------------|-----------------|---------------| +| [centerpoint_voxel_1024](voxelnet/nusc_centerpoint_voxelnet_01voxel.py) | 16 | 56.4 | 64.8 | [URL](https://drive.google.com/drive/folders/1RyBD23GDfeU4AnRkea2BxlrosbKJmDKW?usp=sharing) | +| [centerpoint_voxel_1440_dcn](voxelnet/nusc_centerpoint_voxelnet_0075voxel_dcn.py) | 11 | 57.1 | 65.4 | [URL](https://drive.google.com/drive/folders/1R7Ny4ia6NksL-FoltQKUtqrCB6DhX3TP?usp=sharing) | +| [centerpoint_voxel_1440_dcn(flip)](voxelnet/nusc_centerpoint_voxelnet_0075voxel_dcn.py) | 3 | 59.5 | 67.4 | [URL](https://drive.google.com/drive/folders/1fAz0Hn8hLdmwYZh_JuMQj69O7uEHAjOh?usp=sharing) | + + +### PointPillars + +| Model | FPS | Validation MAP | Validation NDS | Link | +|-----------------------|-----------------|-----------------|-----------------|---------------| +| [centerpoint_pillar](voxelnet/nusc_centerpoint_pp_02voxel_two_pfn_10sweep.py) | 31 | 50.3 | 60.2 | [URL](https://drive.google.com/drive/folders/1K_wHrBo6yRSG7H7UUjKI4rPnyEA8HvOp?usp=sharing) | + + +## nuScenes 3D Tracking + +| Model | Tracking time | Total time | Validation AMOTA ↑ | Validation AMOTP ↓ | Link | +|-----------------------|-----------|------------------|------------------|-------------------|---------------| +| [centerpoint_voxel_1024](../../tracking_scripts/centerpoint_voxel_1024.sh) | 1ms | 64ms | 63.7 | 0.606 | [URL](https://drive.google.com/drive/folders/19pdribrqU5JyGSmrrvIKQ_ecYIG1QW0t?usp=sharing) | +| [centerpoint_voxel_1440_dcn](../../tracking_scripts/centerpoint_voxel_1440.sh) | 1ms | 95ms | 64.1 | 0.596 | [URL](https://drive.google.com/drive/folders/1o030ph0USc2GALIL5goiGsZtmJfzbi1T?usp=sharing) | +| [centerpoint_voxel_1440_dcn(flip test)](../../tracking_scripts/centerpoint_voxel_1440.sh) | 1ms | 343ms | 66.5 | 0.567 | [URL](https://drive.google.com/drive/folders/1uU_wXuNikmRorf_rPBbM0UTrW54ztvMs?usp=sharing) | + + +## nuScenes test set Detection/Tracking +### Detection + +| Model | Test MAP | Test NDS | Link | +|-----------------------|-----------|-----------|---------------| +| [centerpoint_voxel_1440_dcn](voxelnet/nusc_centerpoint_voxelnet_0075voxel_dcn.py) | 58.0 | 65.5 | [Detection](https://drive.google.com/file/d/10FxIthdrycFMlY8xQCuxzrPWTiDNy3-f/view?usp=sharing) | + +### Tracking +| Model | Test AMOTA | Test AMOTP | Link | +|-----------------------|------------|---------------|-------| +| [centerpoint_voxel_1440_dcn(flip test)](../../tracking_scripts/centerpoint_voxel_1440_dcn_flip_testset.sh) | 63.8* | 0.555* | [Tracking](https://drive.google.com/file/d/1evPKLwzlJB5QeECCjDWyla-CXzK0F255/view?usp=sharing)| + +*The numbers are from an old version of the codebase. Current model should perform slightly better. + +## Enhanced Results + +This section aims to keep track of follow-up works based on CenterPoint. We appreciate all contributions and please send us an [email](mailto:yintianwei@utexas.edu) if you want to be listed here. + +| Method | Val mAP | Val NDS | Val AMOTA | Test mAP | Test NDS | Test AMOTA | +|--------|----------|----------|-----------|-----------|-----------|------------| + diff --git a/configs/centerpoint/nusc_centerpoint_pp_02voxel.py b/configs/nusc/pp/nusc_centerpoint_pp_02voxel_two_pfn_10sweep.py similarity index 87% rename from configs/centerpoint/nusc_centerpoint_pp_02voxel.py rename to configs/nusc/pp/nusc_centerpoint_pp_02voxel_two_pfn_10sweep.py index c93d891..1a89adf 100644 --- a/configs/centerpoint/nusc_centerpoint_pp_02voxel.py +++ b/configs/nusc/pp/nusc_centerpoint_pp_02voxel_two_pfn_10sweep.py @@ -1,11 +1,8 @@ import itertools import logging -from det3d.builder import build_box_coder from det3d.utils.config_tool import get_downsample_factor -norm_cfg = None - tasks = [ dict(num_class=1, class_names=["car"]), dict(num_class=2, class_names=["truck", "construction_vehicle"]), @@ -29,14 +26,13 @@ pretrained=None, reader=dict( type="PillarFeatureNet", - num_filters=[64], + num_filters=[64, 64], num_input_features=5, with_distance=False, voxel_size=(0.2, 0.2, 8), pc_range=(-51.2, -51.2, -5.0, 51.2, 51.2, 3.0), - norm_cfg=norm_cfg, ), - backbone=dict(type="PointPillarsScatter", ds_factor=1, norm_cfg=norm_cfg,), + backbone=dict(type="PointPillarsScatter", ds_factor=1), neck=dict( type="RPN", layer_nums=[3, 5, 5], @@ -45,30 +41,23 @@ us_layer_strides=[0.5, 1, 2], us_num_filters=[128, 128, 128], num_input_features=64, - norm_cfg=norm_cfg, logger=logging.getLogger("RPN"), ), bbox_head=dict( # type='RPNHead', type="CenterHead", - mode="3d", in_channels=sum([128, 128, 128]), - norm_cfg=norm_cfg, tasks=tasks, dataset='nuscenes', weight=0.25, code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, # (output_channel, num_conv) - encode_rad_error_by_sin=False, - direction_offset=0.0, - bn=True ), ) assigner = dict( target_assigner=target_assigner, out_size_factor=get_downsample_factor(model), - dense_reg=1, gaussian_overlap=0.1, max_objs=500, min_radius=2, @@ -81,8 +70,6 @@ post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], max_per_img=500, nms=dict( - use_rotate_nms=True, - use_multi_class_nms=False, nms_pre_max_size=1000, nms_post_max_size=83, nms_iou_threshold=0.2, @@ -137,17 +124,8 @@ train_preprocessor = dict( mode="train", shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], global_rot_noise=[-0.3925, 0.3925], global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, db_sampler=db_sampler, class_names=class_names, ) @@ -155,15 +133,13 @@ val_preprocessor = dict( mode="val", shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, ) voxel_generator = dict( range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], voxel_size=[0.2, 0.2, 8], max_points_in_voxel=20, - max_voxel_num=30000, + max_voxel_num=[30000, 60000], ) train_pipeline = [ diff --git a/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_circle_nms.py b/configs/nusc/voxelnet/nusc_centerpoint_voxelnet_0075voxel_dcn.py similarity index 88% rename from configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_circle_nms.py rename to configs/nusc/voxelnet/nusc_centerpoint_voxelnet_0075voxel_dcn.py index c0f9256..e51f4c2 100644 --- a/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_circle_nms.py +++ b/configs/nusc/voxelnet/nusc_centerpoint_voxelnet_0075voxel_dcn.py @@ -1,11 +1,8 @@ import itertools import logging -from det3d.builder import build_box_coder from det3d.utils.config_tool import get_downsample_factor -norm_cfg = None - tasks = [ dict(num_class=1, class_names=["car"]), dict(num_class=2, class_names=["truck", "construction_vehicle"]), @@ -30,10 +27,9 @@ type="VoxelFeatureExtractorV3", # type='SimpleVoxel', num_input_features=5, - norm_cfg=norm_cfg, ), backbone=dict( - type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8, norm_cfg=norm_cfg, + type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8 ), neck=dict( type="RPN", @@ -43,24 +39,18 @@ us_layer_strides=[1, 2], us_num_filters=[256, 256], num_input_features=256, - norm_cfg=norm_cfg, logger=logging.getLogger("RPN"), ), bbox_head=dict( type="CenterHead", - mode="3d", in_channels=sum([256, 256]), - norm_cfg=norm_cfg, tasks=tasks, dataset='nuscenes', weight=0.25, code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, - encode_rad_error_by_sin=False, - direction_offset=0.0, share_conv_channel=64, - dcn_head=True, - bn=True + dcn_head=True ), ) @@ -79,9 +69,13 @@ test_cfg = dict( post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], max_per_img=500, - circle_nms=True, - min_radius=[4, 12, 10, 1, 0.85, 0.175], - post_max_size=83, + nms=dict( + use_rotate_nms=True, + use_multi_class_nms=False, + nms_pre_max_size=1000, + nms_post_max_size=83, + nms_iou_threshold=0.2, + ), score_threshold=0.1, pc_range=[-54, -54], out_size_factor=get_downsample_factor(model), @@ -132,17 +126,8 @@ train_preprocessor = dict( mode="train", shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], global_rot_noise=[-0.3925, 0.3925], global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, db_sampler=db_sampler, class_names=class_names, ) @@ -150,15 +135,13 @@ val_preprocessor = dict( mode="val", shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, ) voxel_generator = dict( range=[-54, -54, -5.0, 54, 54, 3.0], voxel_size=[0.075, 0.075, 0.2], max_points_in_voxel=10, - max_voxel_num=90000, + max_voxel_num=[120000, 160000], ) train_pipeline = [ diff --git a/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_flip.py b/configs/nusc/voxelnet/nusc_centerpoint_voxelnet_0075voxel_dcn_flip.py similarity index 90% rename from configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_flip.py rename to configs/nusc/voxelnet/nusc_centerpoint_voxelnet_0075voxel_dcn_flip.py index dc5c882..d05379b 100644 --- a/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_0075voxel_flip.py +++ b/configs/nusc/voxelnet/nusc_centerpoint_voxelnet_0075voxel_dcn_flip.py @@ -1,10 +1,7 @@ import itertools import logging -from det3d.builder import build_box_coder from det3d.utils.config_tool import get_downsample_factor - -norm_cfg = None DOUBLE_FLIP = True tasks = [ @@ -31,10 +28,9 @@ type="VoxelFeatureExtractorV3", # type='SimpleVoxel', num_input_features=5, - norm_cfg=norm_cfg, ), backbone=dict( - type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8, norm_cfg=norm_cfg, + type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8 ), neck=dict( type="RPN", @@ -44,24 +40,18 @@ us_layer_strides=[1, 2], us_num_filters=[256, 256], num_input_features=256, - norm_cfg=norm_cfg, logger=logging.getLogger("RPN"), ), bbox_head=dict( type="CenterHead", - mode="3d", in_channels=sum([256, 256]), - norm_cfg=norm_cfg, tasks=tasks, dataset='nuscenes', weight=0.25, code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, - encode_rad_error_by_sin=False, - direction_offset=0.0, share_conv_channel=64, - dcn_head=True, - bn=True + dcn_head=True ), ) @@ -138,17 +128,8 @@ train_preprocessor = dict( mode="train", shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], global_rot_noise=[-0.3925, 0.3925], global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, db_sampler=db_sampler, class_names=class_names, ) @@ -156,15 +137,13 @@ val_preprocessor = dict( mode="val", shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, ) voxel_generator = dict( range=[-54, -54, -5.0, 54, 54, 3.0], voxel_size=[0.075, 0.075, 0.2], max_points_in_voxel=10, - max_voxel_num=90000, + max_voxel_num=[120000, 160000], double_flip=DOUBLE_FLIP ) @@ -189,7 +168,7 @@ train_anno = "data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl" val_anno = "data/nuScenes/infos_val_10sweeps_withvelo_filter_True.pkl" -test_anno = None +test_anno = "data/nuScenes/infos_test_10sweeps_withvelo_filter_True.pkl" data = dict( samples_per_gpu=4, @@ -217,6 +196,7 @@ type=dataset_type, root_path=data_root, info_path=test_anno, + test_mode=True, ann_file=test_anno, nsweeps=nsweeps, class_names=class_names, diff --git a/configs/centerpoint/nusc_centerpoint_voxelnet_01voxel.py b/configs/nusc/voxelnet/nusc_centerpoint_voxelnet_01voxel.py similarity index 87% rename from configs/centerpoint/nusc_centerpoint_voxelnet_01voxel.py rename to configs/nusc/voxelnet/nusc_centerpoint_voxelnet_01voxel.py index e185554..c3c499c 100644 --- a/configs/centerpoint/nusc_centerpoint_voxelnet_01voxel.py +++ b/configs/nusc/voxelnet/nusc_centerpoint_voxelnet_01voxel.py @@ -1,11 +1,8 @@ import itertools import logging -from det3d.builder import build_box_coder from det3d.utils.config_tool import get_downsample_factor -norm_cfg = None - tasks = [ dict(num_class=1, class_names=["car"]), dict(num_class=2, class_names=["truck", "construction_vehicle"]), @@ -29,10 +26,9 @@ reader=dict( type="VoxelFeatureExtractorV3", num_input_features=5, - norm_cfg=norm_cfg, ), backbone=dict( - type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8, norm_cfg=norm_cfg, + type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8 ), neck=dict( type="RPN", @@ -42,30 +38,24 @@ us_layer_strides=[1, 2], us_num_filters=[256, 256], num_input_features=256, - norm_cfg=norm_cfg, logger=logging.getLogger("RPN"), ), bbox_head=dict( - # type='RPNHead', type="CenterHead", - mode="3d", in_channels=sum([256, 256]), - norm_cfg=norm_cfg, tasks=tasks, dataset='nuscenes', weight=0.25, code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, # (output_channel, num_conv) - encode_rad_error_by_sin=False, - direction_offset=0.0, share_conv_channel=64, + dcn_head=False ), ) assigner = dict( target_assigner=target_assigner, out_size_factor=get_downsample_factor(model), - dense_reg=1, gaussian_overlap=0.1, max_objs=500, min_radius=2, @@ -76,10 +66,7 @@ test_cfg = dict( post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, nms=dict( - use_rotate_nms=True, - use_multi_class_nms=False, nms_pre_max_size=1000, nms_post_max_size=83, nms_iou_threshold=0.2, @@ -135,17 +122,8 @@ train_preprocessor = dict( mode="train", shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], global_rot_noise=[-0.3925, 0.3925], global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, db_sampler=db_sampler, class_names=class_names, ) @@ -153,15 +131,13 @@ val_preprocessor = dict( mode="val", shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, ) voxel_generator = dict( range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], voxel_size=[0.1, 0.1, 0.2], max_points_in_voxel=10, - max_voxel_num=60000, + max_voxel_num=[90000, 120000], ) train_pipeline = [ @@ -236,7 +212,6 @@ interval=5, hooks=[ dict(type="TextLoggerHook"), - # dict(type='TensorboardLoggerHook') ], ) # yapf:enable diff --git a/configs/point_pillars/nusc_pp_02voxel.py b/configs/point_pillars/nusc_pp_02voxel.py deleted file mode 100644 index 413c8c3..0000000 --- a/configs/point_pillars/nusc_pp_02voxel.py +++ /dev/null @@ -1,365 +0,0 @@ -import itertools -import logging - -from det3d.builder import build_box_coder -from det3d.utils.config_tool import get_downsample_factor - -norm_cfg = None - -tasks = [ - dict(num_class=1, class_names=["car"]), - dict(num_class=2, class_names=["truck", "construction_vehicle"]), - dict(num_class=2, class_names=["bus", "trailer"]), - dict(num_class=1, class_names=["barrier"]), - dict(num_class=2, class_names=["motorcycle", "bicycle"]), - dict(num_class=2, class_names=["pedestrian", "traffic_cone"]), -] - -class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) - -target_assigner = dict( - type="iou", - anchor_generators=[ - dict( - type="anchor_generator_range", - sizes=[1.97, 4.63, 1.74], - anchor_ranges=[-51.2, -51.2, -0.95, 51.2, 51.2, -0.95], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.6, - unmatched_threshold=0.45, - class_name="car", - ), - dict( - type="anchor_generator_range", - sizes=[2.51, 6.93, 2.84], - anchor_ranges=[-51.2, -51.2, -0.40, 51.2, 51.2, -0.40], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.55, - unmatched_threshold=0.4, - class_name="truck", - ), - dict( - type="anchor_generator_range", - sizes=[2.85, 6.37, 3.19], - anchor_ranges=[-51.2, -51.2, -0.225, 51.2, 51.2, -0.225], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.5, - unmatched_threshold=0.35, - class_name="construction_vehicle", - ), - dict( - type="anchor_generator_range", - sizes=[2.94, 10.5, 3.47], - anchor_ranges=[-51.2, -51.2, -0.085, 51.2, 51.2, -0.085], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.55, - unmatched_threshold=0.4, - class_name="bus", - ), - dict( - type="anchor_generator_range", - sizes=[2.90, 12.29, 3.87], - anchor_ranges=[-51.2, -51.2, 0.115, 51.2, 51.2, 0.115], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.5, - unmatched_threshold=0.35, - class_name="trailer", - ), - dict( - type="anchor_generator_range", - sizes=[2.53, 0.50, 0.98], - anchor_ranges=[-51.2, -51.2, -1.33, 51.2, 51.2, -1.33], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.55, - unmatched_threshold=0.4, - class_name="barrier", - ), - dict( - type="anchor_generator_range", - sizes=[0.77, 2.11, 1.47], - anchor_ranges=[-51.2, -51.2, -1.085, 51.2, 51.2, -1.085], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.5, - unmatched_threshold=0.3, - class_name="motorcycle", - ), - dict( - type="anchor_generator_range", - sizes=[0.60, 1.70, 1.28], - anchor_ranges=[-51.2, -51.2, -1.18, 51.2, 51.2, -1.18], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.5, - unmatched_threshold=0.35, - class_name="bicycle", - ), - dict( - type="anchor_generator_range", - sizes=[0.67, 0.73, 1.77], - anchor_ranges=[-51.2, -51.2, -0.935, 51.2, 51.2, -0.935], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.6, - unmatched_threshold=0.4, - class_name="pedestrian", - ), - dict( - type="anchor_generator_range", - sizes=[0.41, 0.41, 1.07], - anchor_ranges=[-51.2, -51.2, -1.285, 51.2, 51.2, -1.285], - rotations=[0, 1.57], - velocities=[0, 0], - matched_threshold=0.6, - unmatched_threshold=0.4, - class_name="traffic_cone", - ), - ], - sample_positive_fraction=-1, - sample_size=512, - region_similarity_calculator=dict(type="nearest_iou_similarity",), - pos_area_threshold=-1, - tasks=tasks, -) - -box_coder = dict( - type="ground_box3d_coder", n_dim=9, linear_dim=False, encode_angle_vector=True, -) - -# model settings -model = dict( - type="PointPillars", - pretrained=None, - reader=dict( - type="PillarFeatureNet", - num_filters=[64], - num_input_features=5, - with_distance=False, - voxel_size=(0.2, 0.2, 8), - pc_range=(-51.2, -51.2, -5.0, 51.2, 51.2, 3.0), - norm_cfg=norm_cfg, - ), - backbone=dict(type="PointPillarsScatter", ds_factor=1, norm_cfg=norm_cfg,), - neck=dict( - type="RPN", - layer_nums=[3, 5, 5], - ds_layer_strides=[2, 2, 2], - ds_num_filters=[64, 128, 256], - us_layer_strides=[0.5, 1, 2], - us_num_filters=[128, 128, 128], - num_input_features=64, - norm_cfg=norm_cfg, - logger=logging.getLogger("RPN"), - ), - bbox_head=dict( - type="MultiGroupHead", - mode="3d", - in_channels=sum([128, 128, 128]), # this is linked to 'neck' us_num_filters - norm_cfg=norm_cfg, - tasks=tasks, - weights=[1,], - box_coder=build_box_coder(box_coder), - encode_background_as_zeros=True, - loss_norm=dict( - type="NormByNumPositives", pos_cls_weight=1.0, neg_cls_weight=2.0, - ), - loss_cls=dict(type="SigmoidFocalLoss", alpha=0.25, gamma=2.0, loss_weight=1.0,), - use_sigmoid_score=True, - loss_bbox=dict( - type="WeightedL1Loss", - code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], - codewise=True, - loss_weight=0.25, - ), - encode_rad_error_by_sin=False, - loss_aux=None, - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, # (output_channel, num_conv) - ), -) - -assigner = dict( - box_coder=box_coder, - target_assigner=target_assigner, - out_size_factor=get_downsample_factor(model), - debug=False, -) - -train_cfg = dict(assigner=assigner) - -test_cfg = dict( - nms=dict( - use_rotate_nms=True, - use_multi_class_nms=False, - nms_pre_max_size=1000, - nms_post_max_size=83, - nms_iou_threshold=0.2, - ), - score_threshold=0.1, - post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, -) - -# dataset settings -dataset_type = "NuScenesDataset" -nsweeps = 10 -data_root = "data/nuScenes" - -db_sampler = dict( - type="GT-AUG", - enable=False, - db_info_path="data/nuScenes/dbinfos_train_10sweeps_withvelo.pkl", - sample_groups=[ - dict(car=2), - dict(truck=3), - dict(construction_vehicle=7), - dict(bus=4), - dict(trailer=6), - dict(barrier=2), - dict(motorcycle=6), - dict(bicycle=6), - dict(pedestrian=2), - dict(traffic_cone=2), - ], - db_prep_steps=[ - dict( - filter_by_min_num_points=dict( - car=5, - truck=5, - bus=5, - trailer=5, - construction_vehicle=5, - traffic_cone=5, - barrier=5, - motorcycle=5, - bicycle=5, - pedestrian=5, - ) - ), - dict(filter_by_difficulty=[-1],), - ], - global_random_rotation_range_per_object=[0, 0], - rate=1.0, -) -train_preprocessor = dict( - mode="train", - shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], - global_rot_noise=[-0.3925, 0.3925], - global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, - db_sampler=db_sampler, - class_names=class_names, -) - -val_preprocessor = dict( - mode="val", - shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, -) - -voxel_generator = dict( - range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], - voxel_size=[0.2, 0.2, 8], - max_points_in_voxel=20, - max_voxel_num=30000, -) - -train_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=train_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignTarget", cfg=train_cfg["assigner"]), - dict(type="Reformat"), -] -test_pipeline = [ - dict(type="LoadPointCloudFromFile", dataset=dataset_type), - dict(type="LoadPointCloudAnnotations", with_bbox=True), - dict(type="Preprocess", cfg=val_preprocessor), - dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignTarget", cfg=train_cfg["assigner"]), - dict(type="Reformat"), -] - -train_anno = "data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl" -val_anno = "data/nuScenes/infos_val_10sweeps_withvelo_filter_True.pkl" -test_anno = None - -data = dict( - samples_per_gpu=4, - workers_per_gpu=8, - train=dict( - type=dataset_type, - root_path=data_root, - info_path=train_anno, - ann_file=train_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=train_pipeline, - ), - val=dict( - type=dataset_type, - root_path=data_root, - info_path=val_anno, - test_mode=True, - ann_file=val_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), - test=dict( - type=dataset_type, - root_path=data_root, - info_path=test_anno, - ann_file=test_anno, - nsweeps=nsweeps, - class_names=class_names, - pipeline=test_pipeline, - ), -) - -# optimizer -optimizer = dict( - type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, -) - -"""training hooks """ -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy in training hooks -lr_config = dict( - type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, -) - -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=5, - hooks=[ - dict(type="TextLoggerHook"), - # dict(type='TensorboardLoggerHook') - ], -) -# yapf:enable -# runtime settings -total_epochs = 20 -device_ids = range(8) -dist_params = dict(backend="nccl", init_method="env://") -log_level = "INFO" -work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) -load_from = None -resume_from = None -workflow = [('train', 1)] diff --git a/configs/waymo/README.md b/configs/waymo/README.md new file mode 100644 index 0000000..ef07e3f --- /dev/null +++ b/configs/waymo/README.md @@ -0,0 +1,77 @@ +# MODEL ZOO + +### Common settings and notes + +- The experiments are run with PyTorch 1.1, CUDA 10.0, and CUDNN 7.5. +- The training is conducted on 4 V100 GPUs in a DGX server. +- Testing times are measured on a TITAN RTX GPU with batch size 1. + +## Waymo 3D Detection + +We provide training / validation configurations, pretrained models, and prediction files for all models in the paper. To access these pretrained models, please send us an [email](mailto:yintianwei@utexas.edu) with your name, institute, and a screenshot of the the Waymo dataset registration confirmation mail. + +### One-stage VoxelNet +| Model | Veh_L2 | Ped_L2 | Cyc_L2 | MAPH | FPS | +|---------|--------|--------|---------|--------|------------| +| [VoxelNet](voxelnet/waymo_centerpoint_voxelnet_3x.py) | 66.2 | 62.6 | 67.6 | 65.5 | 13 | + +In the paper, our models only detect Vehicle and Pedestrian. Here, we provide the three classes config that also enables cyclist detection (and perform similarly). We encourage the community to also report three class performance in the future. + +### Ablations for training schedule + +CenterPoint is fast to train and converge in as little as 3~6 epochs. We tried a few training schedules for CenterPoint-Voxel and list their performance below. + +| Schedule | Veh_L2 | Ped_L2 | Cyc_L2 | MAPH | Training Time | +|------------|--------|--------|---------|--------|----------------| +| [36 epoch](voxelnet/waymo_centerpoint_voxelnet_3x.py) | 66.2 | 62.6 | 67.6 | 65.5 | 84hr | +| [12 epoch](voxelnet/waymo_centerpoint_voxelnet_1x.py) | 65.6 | 61.3 | 67.1 | 64.7 | 28hr | +| [6 epoch](voxelnet/waymo_centerpoint_voxelnet_6epoch.py) | 65.5 | 59.5 | 66.4 | 63.4 | 14hr | +| [3 epoch](voxelnet/waymo_centerpoint_voxelnet_3epoch.py) | 61.5 | 56.2 | 64.5 | 60.7 | 7hr | + +### Two-stage VoxelNet + +By default, we finetune a pretrained [one stage model](voxelnet/waymo_centerpoint_voxelnet_3x.py) for 6 epochs. To save GPU memory, we also freeze the backbone weight. + +| Model | Split | Veh_L2 | Ped_L2 | Cyc_L2 | MAPH | FPS | +|------------|----|----|--------|---------|--------|----------------| +| [VoxelNet](voxelnet/two_stage/waymo_centerpoint_voxelnet_two_stage_bev_5point_ft_6epoch_freeze.py) | Val | 67.9 | 65.6 | 68.6 | 67.4 | 13 | +| [VoxelNet](voxelnet/two_stage/waymo_centerpoint_voxelnet_two_stage_bev_5point_ft_6epoch_freeze.py) | Test| 71.9 | 67.0 | 68.2| 69.0 | 13 | + + +### Two frame model + +To provide richer input information and enable a more reasonable velocity estimation, we transform and merge the Lidar points of previous frame into current frame. This two frame model significanty boosts the detection performance. + +| Model | Split | Veh_L2 | Ped_L2 | Cyc_L2 | MAPH | FPS | +|------------|----|----|--------|---------|--------|----------------| +| [One-stage](voxelnet/waymo_centerpoint_voxelnet_two_sweeps_3x_with_velo.py) | Val | 67.3 | 67.5 | 69.9 | 68.2 | 11 | +| [Two-stage](voxelnet/two_stage/waymo_centerpoint_voxelnet_two_sweep_two_stage_bev_5point_ft_6epoch_freeze_with_vel.py) | Val | 69.7 | 70.3 | 70.9 | 70.3 | 11 | +| [Two-stage](voxelnet/two_stage/waymo_centerpoint_voxelnet_two_sweep_two_stage_bev_5point_ft_6epoch_freeze_with_vel.py) | Test | 73.0 | 71.5 | 71.3 | 71.9 | 11 | + + +### PointPillars + +| Model | Veh_L2 | Ped_L2 | Cyc_L2 | MAPH | FPS | +|---------|--------|--------|---------|--------|------------| +| [centerpoint_pillar](pp/waymo_centerpoint_pp_two_pfn_stride1_3x.py) | 65.5 | 55.1 | 60.2 | 60.3 | 19 | +| [centerpoint_pillar_two_stage](pp/two_stage/waymo_centerpoint_pp_two_pfn_stride1_3x.py) | 66.7 | 55.9 | 61.7 | 61.4 | 16 | + +For PointPillars, we notice a 1.5 mAPH drop when converting from two class model to three class model. You can refer to [ONE_STAGE](pp/waymo_centerpoint_pp_two_cls_two_pfn_stride1_3x.py) and [TWO_STAGE](pp/two_stage/waymo_centerpoint_pp_two_cls_two_pfn_stride1_two_stage_bev_6epoch.py) configs to reproduce the two class result. + +## Waymo 3D Tracking + +For 3D Tracking, we apply our center-based tracking on top of our two frame model's detection result. + +| | Split | Veh_L2 | Ped_L2 | Cyc_L2 | MOTA | FPS | +|---------|---------|--------|--------|---------|--------|-------| +| [centerpoint_voxel_two_sweep](../../tracking_scripts/centerpoint_voxel_two_sweep_val.sh)| Val | 55.0 | 55.0 | 57.4 | 55.8 | 11 | +| [centerpoint_voxel_two_sweep](../../tracking_scripts/centerpoint_voxel_two_sweep_test.sh)| Test | 59.4 | 56.6 | 60.0 | 58.7 | 11 | + +## Enhanced Results + +This section aims to keep track of follow-up works based on CenterPoint. We appreciate all contributions and please send us an [email](mailto:yintianwei@utexas.edu) if you want to be listed here. + +| Method | Val mAPH | Val MOTA | Test mAPH | Test MOTA | Reference | +|--------|----------|----------|-----------|-----------|-----------| +| baseline | 70.3 | 55.8 | 71.9 | 58.7 | [CenterPoint](https://arxiv.org/abs/2006.11275) | + diff --git a/configs/waymo/pp/two_stage/waymo_centerpoint_pp_two_cls_two_pfn_stride1_two_stage_bev_6epoch.py b/configs/waymo/pp/two_stage/waymo_centerpoint_pp_two_cls_two_pfn_stride1_two_stage_bev_6epoch.py new file mode 100644 index 0000000..ef13caf --- /dev/null +++ b/configs/waymo/pp/two_stage/waymo_centerpoint_pp_two_cls_two_pfn_stride1_two_stage_bev_6epoch.py @@ -0,0 +1,238 @@ +import itertools +import logging + +from det3d.utils.config_tool import get_downsample_factor + +tasks = [ + dict(num_class=3, class_names=['VEHICLE', 'PEDESTRIAN', 'CYCLIST']), +] + +class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) + +# training and testing settings +target_assigner = dict( + tasks=tasks, +) + +# model settings +model = dict( + type='TwoStageDetector', + first_stage_cfg=dict( + type="PointPillars", + pretrained='work_dirs/waymo_centerpoint_pp_two_cls_two_pfn_stride1_3x/epoch_36.pth', + reader=dict( + type="PillarFeatureNet", + num_filters=[64, 64], + num_input_features=5, + with_distance=False, + voxel_size=(0.32, 0.32, 6.0), + pc_range=(-74.88, -74.88, -2, 74.88, 74.88, 4.0), + ), + backbone=dict(type="PointPillarsScatter", ds_factor=1), + neck=dict( + type="RPN", + layer_nums=[3, 5, 5], + ds_layer_strides=[1, 2, 2], + ds_num_filters=[64, 128, 256], + us_layer_strides=[1, 2, 4], + us_num_filters=[128, 128, 128], + num_input_features=64, + logger=logging.getLogger("RPN"), + ), + bbox_head=dict( + type="CenterHead", + in_channels=128*3, + tasks=tasks, + dataset='waymo', + weight=2, + code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) + ), + ), + second_stage_modules=[ + dict( + type="BEVFeatureExtractor", + pc_start=[-74.88, -74.88], + voxel_size=[0.32, 0.32], + out_stride=1 + ) + ], + roi_head=dict( + type="RoIHead", + input_channels=128*3*5, + model_cfg=dict( + CLASS_AGNOSTIC=True, + SHARED_FC=[256, 256], + CLS_FC=[256, 256], + REG_FC=[256, 256], + DP_RATIO=0.3, + + TARGET_CONFIG=dict( + ROI_PER_IMAGE=128, + FG_RATIO=0.5, + SAMPLE_ROI_BY_EACH_CLASS=True, + CLS_SCORE_TYPE='roi_iou', + CLS_FG_THRESH=0.75, + CLS_BG_THRESH=0.25, + CLS_BG_THRESH_LO=0.1, + HARD_BG_RATIO=0.8, + REG_FG_THRESH=0.55 + ), + LOSS_CONFIG=dict( + CLS_LOSS='BinaryCrossEntropy', + REG_LOSS='L1', + LOSS_WEIGHTS={ + 'rcnn_cls_weight': 1.0, + 'rcnn_reg_weight': 1.0, + 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] + } + ) + ), + code_size=7 + ), + NMS_POST_MAXSIZE=500, + num_point=5, + freeze=True +) + +assigner = dict( + target_assigner=target_assigner, + out_size_factor=get_downsample_factor(model), + dense_reg=1, + gaussian_overlap=0.1, + max_objs=500, + min_radius=2, +) + + +train_cfg = dict(assigner=assigner) + +test_cfg = dict( + post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], + max_per_img=4096, + nms=dict( + use_rotate_nms=True, + use_multi_class_nms=False, + nms_pre_max_size=4096, + nms_post_max_size=500, + nms_iou_threshold=0.7, + ), + score_threshold=0.1, + pc_range=[-74.88, -74.88], + out_size_factor=get_downsample_factor(model), + voxel_size=[0.32, 0.32] +) + + +# dataset settings +dataset_type = "WaymoDataset" +nsweeps = 1 +data_root = "data/Waymo" + + +train_preprocessor = dict( + mode="train", + shuffle_points=True, + global_rot_noise=[-0.78539816, 0.78539816], + global_scale_noise=[0.95, 1.05], + db_sampler=None, + class_names=class_names, +) + +val_preprocessor = dict( + mode="val", + shuffle_points=False, +) + +voxel_generator = dict( + range=[-74.88, -74.88, -2, 74.88, 74.88, 4.0], + voxel_size=[0.32, 0.32, 6.0], + max_points_in_voxel=20, + max_voxel_num=[32000, 60000], # we only use non-empty voxels. this will be much smaller than max_voxel_num +) + +train_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=train_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] +test_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=val_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] + +train_anno = "data/Waymo/infos_train_01sweeps_filter_zero_gt.pkl" +val_anno = "data/Waymo/infos_val_01sweeps_filter_zero_gt.pkl" +test_anno = None + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + root_path=data_root, + info_path=train_anno, + ann_file=train_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=train_pipeline, + ), + val=dict( + type=dataset_type, + root_path=data_root, + info_path=val_anno, + test_mode=True, + ann_file=val_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), + test=dict( + type=dataset_type, + root_path=data_root, + info_path=test_anno, + ann_file=test_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), +) + + + +optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) + +# optimizer +optimizer = dict( + type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, +) +lr_config = dict( + type="one_cycle", lr_max=0.003, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, +) + +checkpoint_config = dict(interval=1) +# yapf:disable +log_config = dict( + interval=5, + hooks=[ + dict(type="TextLoggerHook"), + # dict(type='TensorboardLoggerHook') + ], +) +# yapf:enable +# runtime settings +total_epochs = 6 +device_ids = range(8) +dist_params = dict(backend="nccl", init_method="env://") +log_level = "INFO" +work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/configs/waymo/pp/two_stage/waymo_centerpoint_pp_two_pfn_stride1_two_stage_bev_6epoch.py b/configs/waymo/pp/two_stage/waymo_centerpoint_pp_two_pfn_stride1_two_stage_bev_6epoch.py new file mode 100644 index 0000000..604898d --- /dev/null +++ b/configs/waymo/pp/two_stage/waymo_centerpoint_pp_two_pfn_stride1_two_stage_bev_6epoch.py @@ -0,0 +1,260 @@ +import itertools +import logging + +from det3d.utils.config_tool import get_downsample_factor + +tasks = [ + dict(num_class=3, class_names=['VEHICLE', 'PEDESTRIAN', 'CYCLIST']), +] + +class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) + +# training and testing settings +target_assigner = dict( + tasks=tasks, +) + +# model settings +model = dict( + type='TwoStageDetector', + first_stage_cfg=dict( + type="PointPillars", + pretrained='work_dirs/waymo_centerpoint_pp_two_pfn_stride1_3x/epoch_36.pth', + reader=dict( + type="PillarFeatureNet", + num_filters=[64, 64], + num_input_features=5, + with_distance=False, + voxel_size=(0.32, 0.32, 6.0), + pc_range=(-74.88, -74.88, -2, 74.88, 74.88, 4.0), + ), + backbone=dict(type="PointPillarsScatter", ds_factor=1), + neck=dict( + type="RPN", + layer_nums=[3, 5, 5], + ds_layer_strides=[1, 2, 2], + ds_num_filters=[64, 128, 256], + us_layer_strides=[1, 2, 4], + us_num_filters=[128, 128, 128], + num_input_features=64, + logger=logging.getLogger("RPN"), + ), + bbox_head=dict( + type="CenterHead", + in_channels=128*3, + tasks=tasks, + dataset='waymo', + weight=2, + code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) + ), + ), + second_stage_modules=[ + dict( + type="BEVFeatureExtractor", + pc_start=[-74.88, -74.88], + voxel_size=[0.32, 0.32], + out_stride=1 + ) + ], + roi_head=dict( + type="RoIHead", + input_channels=128*3*5, + model_cfg=dict( + CLASS_AGNOSTIC=True, + SHARED_FC=[256, 256], + CLS_FC=[256, 256], + REG_FC=[256, 256], + DP_RATIO=0.3, + + TARGET_CONFIG=dict( + ROI_PER_IMAGE=128, + FG_RATIO=0.5, + SAMPLE_ROI_BY_EACH_CLASS=True, + CLS_SCORE_TYPE='roi_iou', + CLS_FG_THRESH=0.75, + CLS_BG_THRESH=0.25, + CLS_BG_THRESH_LO=0.1, + HARD_BG_RATIO=0.8, + REG_FG_THRESH=0.55 + ), + LOSS_CONFIG=dict( + CLS_LOSS='BinaryCrossEntropy', + REG_LOSS='L1', + LOSS_WEIGHTS={ + 'rcnn_cls_weight': 1.0, + 'rcnn_reg_weight': 1.0, + 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] + } + ) + ), + code_size=7 + ), + NMS_POST_MAXSIZE=500, + num_point=5, + freeze=True +) + +assigner = dict( + target_assigner=target_assigner, + out_size_factor=get_downsample_factor(model), + dense_reg=1, + gaussian_overlap=0.1, + max_objs=500, + min_radius=2, +) + + +train_cfg = dict(assigner=assigner) + +test_cfg = dict( + post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], + max_per_img=4096, + nms=dict( + use_rotate_nms=True, + use_multi_class_nms=False, + nms_pre_max_size=4096, + nms_post_max_size=500, + nms_iou_threshold=0.7, + ), + score_threshold=0.1, + pc_range=[-74.88, -74.88], + out_size_factor=get_downsample_factor(model), + voxel_size=[0.32, 0.32] +) + + +# dataset settings +dataset_type = "WaymoDataset" +nsweeps = 1 +data_root = "data/Waymo" + +db_sampler = dict( + type="GT-AUG", + enable=False, + db_info_path="data/Waymo/dbinfos_train_1sweeps_withvelo.pkl", + sample_groups=[ + dict(VEHICLE=15), + dict(PEDESTRIAN=10), + dict(CYCLIST=10), + ], + db_prep_steps=[ + dict( + filter_by_min_num_points=dict( + VEHICLE=5, + PEDESTRIAN=5, + CYCLIST=5, + ) + ), + dict(filter_by_difficulty=[-1],), + ], + global_random_rotation_range_per_object=[0, 0], + rate=1.0, +) + +train_preprocessor = dict( + mode="train", + shuffle_points=True, + global_rot_noise=[-0.78539816, 0.78539816], + global_scale_noise=[0.95, 1.05], + db_sampler=db_sampler, + class_names=class_names, +) + +val_preprocessor = dict( + mode="val", + shuffle_points=False, +) + +voxel_generator = dict( + range=[-74.88, -74.88, -2, 74.88, 74.88, 4.0], + voxel_size=[0.32, 0.32, 6.0], + max_points_in_voxel=20, + max_voxel_num=[32000, 60000], # we only use non-empty voxels. this will be much smaller than max_voxel_num +) + +train_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=train_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] +test_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=val_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] + +train_anno = "data/Waymo/infos_train_01sweeps_filter_zero_gt.pkl" +val_anno = "data/Waymo/infos_val_01sweeps_filter_zero_gt.pkl" +test_anno = None + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + root_path=data_root, + info_path=train_anno, + ann_file=train_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=train_pipeline, + ), + val=dict( + type=dataset_type, + root_path=data_root, + info_path=val_anno, + test_mode=True, + ann_file=val_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), + test=dict( + type=dataset_type, + root_path=data_root, + info_path=test_anno, + ann_file=test_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), +) + + + +optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) + +# optimizer +optimizer = dict( + type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, +) +lr_config = dict( + type="one_cycle", lr_max=0.003, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, +) + +checkpoint_config = dict(interval=1) +# yapf:disable +log_config = dict( + interval=5, + hooks=[ + dict(type="TextLoggerHook"), + # dict(type='TensorboardLoggerHook') + ], +) +# yapf:enable +# runtime settings +total_epochs = 6 +device_ids = range(8) +dist_params = dict(backend="nccl", init_method="env://") +log_level = "INFO" +work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/configs/centerpoint/nusc_centerpoint_pp_02voxel_circle_nms_demo.py b/configs/waymo/pp/waymo_centerpoint_pp_two_cls_two_pfn_stride1_3x.py similarity index 58% rename from configs/centerpoint/nusc_centerpoint_pp_02voxel_circle_nms_demo.py rename to configs/waymo/pp/waymo_centerpoint_pp_two_cls_two_pfn_stride1_3x.py index 0d75c3d..1fbd0e3 100644 --- a/configs/centerpoint/nusc_centerpoint_pp_02voxel_circle_nms_demo.py +++ b/configs/waymo/pp/waymo_centerpoint_pp_two_cls_two_pfn_stride1_3x.py @@ -1,18 +1,9 @@ import itertools import logging - -from det3d.builder import build_box_coder from det3d.utils.config_tool import get_downsample_factor -norm_cfg = None - tasks = [ - dict(num_class=1, class_names=["car"]), - dict(num_class=2, class_names=["truck", "construction_vehicle"]), - dict(num_class=2, class_names=["bus", "trailer"]), - dict(num_class=1, class_names=["barrier"]), - dict(num_class=2, class_names=["motorcycle", "bicycle"]), - dict(num_class=2, class_names=["pedestrian", "traffic_cone"]), + dict(num_class=2, class_names=['VEHICLE', 'PEDESTRIAN']), ] class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) @@ -22,46 +13,37 @@ tasks=tasks, ) - # model settings model = dict( type="PointPillars", pretrained=None, reader=dict( type="PillarFeatureNet", - num_filters=[64], + num_filters=[64, 64], num_input_features=5, with_distance=False, - voxel_size=(0.2, 0.2, 8), - pc_range=(-51.2, -51.2, -5.0, 51.2, 51.2, 3.0), - norm_cfg=norm_cfg, + voxel_size=(0.32, 0.32, 6.0), + pc_range=(-74.88, -74.88, -2, 74.88, 74.88, 4.0), ), - backbone=dict(type="PointPillarsScatter", ds_factor=1, norm_cfg=norm_cfg,), + backbone=dict(type="PointPillarsScatter", ds_factor=1), neck=dict( type="RPN", layer_nums=[3, 5, 5], - ds_layer_strides=[2, 2, 2], + ds_layer_strides=[1, 2, 2], ds_num_filters=[64, 128, 256], - us_layer_strides=[0.5, 1, 2], + us_layer_strides=[1, 2, 4], us_num_filters=[128, 128, 128], num_input_features=64, - norm_cfg=norm_cfg, logger=logging.getLogger("RPN"), ), bbox_head=dict( - # type='RPNHead', type="CenterHead", - mode="3d", - in_channels=sum([128, 128, 128]), - norm_cfg=norm_cfg, + in_channels=128*3, tasks=tasks, - dataset='nuscenes', - weight=0.25, - code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, # (output_channel, num_conv) - encode_rad_error_by_sin=False, - direction_offset=0.0, - bn=True + dataset='waymo', + weight=2, + code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) ), ) @@ -77,57 +59,45 @@ train_cfg = dict(assigner=assigner) - test_cfg = dict( - post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, - max_pool_nms=False, - min_radius=[4, 12, 10, 1, 0.85, 0.175], - post_max_size=83, + post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], + nms=dict( + nms_pre_max_size=4096, + nms_post_max_size=500, + nms_iou_threshold=0.7, + ), score_threshold=0.1, - pc_range=[-51.2, -51.2], + pc_range=[-74.88, -74.88], out_size_factor=get_downsample_factor(model), - voxel_size=[0.2, 0.2], - circle_nms=True + voxel_size=[0.32, 0.32] ) # dataset settings -dataset_type = "NuScenesDataset" -nsweeps = 10 -data_root = "demo/nuScenes" +dataset_type = "WaymoDataset" +nsweeps = 1 +data_root = "data/Waymo" + -db_sampler = None train_preprocessor = dict( mode="train", shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], - global_rot_noise=[-0.3925, 0.3925], + global_rot_noise=[-0.78539816, 0.78539816], global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, - db_sampler=db_sampler, + db_sampler=None, class_names=class_names, ) val_preprocessor = dict( mode="val", shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, ) voxel_generator = dict( - range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], - voxel_size=[0.2, 0.2, 8], + range=[-74.88, -74.88, -2, 74.88, 74.88, 4.0], + voxel_size=[0.32, 0.32, 6.0], max_points_in_voxel=20, - max_voxel_num=30000, + max_voxel_num=[32000, 60000], # we only use non-empty voxels. this will be much smaller than max_voxel_num ) train_pipeline = [ @@ -147,12 +117,22 @@ dict(type="Reformat"), ] -val_anno = "demo/nuScenes/demo_infos.pkl" +train_anno = "data/Waymo/infos_train_01sweeps_filter_zero_gt.pkl" +val_anno = "data/Waymo/infos_val_01sweeps_filter_zero_gt.pkl" test_anno = None data = dict( - samples_per_gpu=1, + samples_per_gpu=4, workers_per_gpu=8, + train=dict( + type=dataset_type, + root_path=data_root, + info_path=train_anno, + ann_file=train_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=train_pipeline, + ), val=dict( type=dataset_type, root_path=data_root, @@ -163,16 +143,27 @@ class_names=class_names, pipeline=test_pipeline, ), + test=dict( + type=dataset_type, + root_path=data_root, + info_path=test_anno, + ann_file=test_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), ) + optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) + # optimizer optimizer = dict( type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, ) lr_config = dict( - type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, + type="one_cycle", lr_max=0.003, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, ) checkpoint_config = dict(interval=1) @@ -186,11 +177,11 @@ ) # yapf:enable # runtime settings -total_epochs = 20 +total_epochs = 36 device_ids = range(8) dist_params = dict(backend="nccl", init_method="env://") log_level = "INFO" work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) -load_from = None -resume_from = None +load_from = None +resume_from = None workflow = [('train', 1)] diff --git a/configs/centerpoint/waymo_centerpoint_pp_car_large.py b/configs/waymo/pp/waymo_centerpoint_pp_two_pfn_stride1_3x.py similarity index 77% rename from configs/centerpoint/waymo_centerpoint_pp_car_large.py rename to configs/waymo/pp/waymo_centerpoint_pp_two_pfn_stride1_3x.py index 0ac42b2..539d513 100644 --- a/configs/centerpoint/waymo_centerpoint_pp_car_large.py +++ b/configs/waymo/pp/waymo_centerpoint_pp_two_pfn_stride1_3x.py @@ -1,13 +1,9 @@ import itertools import logging - -from det3d.builder import build_box_coder from det3d.utils.config_tool import get_downsample_factor -norm_cfg = None - tasks = [ - dict(num_class=1, class_names=['VEHICLE']), + dict(num_class=3, class_names=['VEHICLE', 'PEDESTRIAN', 'CYCLIST']), ] class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) @@ -23,14 +19,13 @@ pretrained=None, reader=dict( type="PillarFeatureNet", - num_filters=[64], + num_filters=[64, 64], num_input_features=5, with_distance=False, - voxel_size=(0.3, 0.3, 6), - pc_range=(-76.8, -76.8, -2, 76.8, 76.8, 4), - norm_cfg=norm_cfg, + voxel_size=(0.32, 0.32, 6.0), + pc_range=(-74.88, -74.88, -2, 74.88, 74.88, 4.0), ), - backbone=dict(type="PointPillarsScatter", ds_factor=1, norm_cfg=norm_cfg,), + backbone=dict(type="PointPillarsScatter", ds_factor=1), neck=dict( type="RPN", layer_nums=[3, 5, 5], @@ -39,23 +34,16 @@ us_layer_strides=[1, 2, 4], us_num_filters=[128, 128, 128], num_input_features=64, - norm_cfg=norm_cfg, logger=logging.getLogger("RPN"), ), bbox_head=dict( - # type='RPNHead', type="CenterHead", - mode="3d", - in_channels=sum([128, 128, 128]), - norm_cfg=norm_cfg, + in_channels=128*3, tasks=tasks, dataset='waymo', weight=2, code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) - encode_rad_error_by_sin=False, - direction_offset=0.0, - share_conv_channel=64, ), ) @@ -73,75 +61,65 @@ test_cfg = dict( post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], - max_per_img=4096, nms=dict( - use_rotate_nms=True, - use_multi_class_nms=False, nms_pre_max_size=4096, nms_post_max_size=500, - nms_iou_threshold=0.25, + nms_iou_threshold=0.7, ), score_threshold=0.1, - pc_range=[-76.8, -76.8], + pc_range=[-74.88, -74.88], out_size_factor=get_downsample_factor(model), - voxel_size=[0.3, 0.3] + voxel_size=[0.32, 0.32] ) # dataset settings dataset_type = "WaymoDataset" -nsweeps = 10 +nsweeps = 1 data_root = "data/Waymo" db_sampler = dict( type="GT-AUG", enable=False, - db_info_path="data/Waymo/dbinfos_train.pkl", + db_info_path="data/Waymo/dbinfos_train_1sweeps_withvelo.pkl", sample_groups=[ dict(VEHICLE=15), + dict(PEDESTRIAN=10), + dict(CYCLIST=10), ], db_prep_steps=[ dict( filter_by_min_num_points=dict( VEHICLE=5, + PEDESTRIAN=5, + CYCLIST=5, ) ), dict(filter_by_difficulty=[-1],), ], global_random_rotation_range_per_object=[0, 0], rate=1.0, -) -db_sampler = None +) + train_preprocessor = dict( mode="train", shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], global_rot_noise=[-0.78539816, 0.78539816], global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.0, 0.0, 0.0], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, db_sampler=db_sampler, class_names=class_names, ) val_preprocessor = dict( mode="val", - shuffle_points=True, - remove_environment=False, - remove_unknown_examples=False, + shuffle_points=False, ) voxel_generator = dict( - range=[-76.8, -76.8, -2.0, 76.8, 76.8, 4.0], - voxel_size=[0.3, 0.3, 6], + range=[-74.88, -74.88, -2, 74.88, 74.88, 4.0], + voxel_size=[0.32, 0.32, 6.0], max_points_in_voxel=20, - max_voxel_num=32000, + max_voxel_num=[32000, 60000], # we only use non-empty voxels. this will be much smaller than max_voxel_num ) train_pipeline = [ @@ -221,7 +199,7 @@ ) # yapf:enable # runtime settings -total_epochs = 30 +total_epochs = 36 device_ids = range(8) dist_params = dict(backend="nccl", init_method="env://") log_level = "INFO" diff --git a/configs/point_pillars/waymo_pp_car_large.py b/configs/waymo/voxelnet/two_stage/waymo_centerpoint_voxelnet_two_stage_bev_5point_ft_6epoch_freeze.py similarity index 55% rename from configs/point_pillars/waymo_pp_car_large.py rename to configs/waymo/voxelnet/two_stage/waymo_centerpoint_voxelnet_two_stage_bev_5point_ft_6epoch_freeze.py index 79a327b..744cc27 100644 --- a/configs/point_pillars/waymo_pp_car_large.py +++ b/configs/waymo/voxelnet/two_stage/waymo_centerpoint_voxelnet_two_stage_bev_5point_ft_6epoch_freeze.py @@ -1,103 +1,111 @@ import itertools import logging -from det3d.builder import build_box_coder from det3d.utils.config_tool import get_downsample_factor -norm_cfg = None - tasks = [ - dict(num_class=1, class_names=['VEHICLE']), + dict(num_class=3, class_names=['VEHICLE', 'PEDESTRIAN', 'CYCLIST']), ] class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) # training and testing settings target_assigner = dict( - type="iou", - anchor_generators=[ - dict( - type="anchor_generator_range", - sizes=[2.08, 4.73, 1.77], - anchor_ranges=[-76.8, -76.8, 0, 76.8, 76.8, 0], - rotations=[0, 1.57], - matched_threshold=0.55, - unmatched_threshold=0.4, - class_name="VEHICLE", - ), - ], - sample_positive_fraction=-1, - sample_size=512, - region_similarity_calculator=dict(type="nearest_iou_similarity",), - pos_area_threshold=-1, tasks=tasks, ) -box_coder = dict( - type="ground_box3d_coder", n_dim=7, linear_dim=False, encode_angle_vector=True, -) - # model settings model = dict( - type="PointPillars", - pretrained=None, - reader=dict( - type="PillarFeatureNet", - num_filters=[64], - num_input_features=5, - with_distance=False, - voxel_size=(0.3, 0.3, 6), - pc_range=(-76.8, -76.8, -2, 76.8, 76.8, 4), - norm_cfg=norm_cfg, - ), - backbone=dict(type="PointPillarsScatter", ds_factor=1, norm_cfg=norm_cfg,), - neck=dict( - type="RPN", - layer_nums=[3, 5, 5], - ds_layer_strides=[1, 2, 2], - ds_num_filters=[64, 128, 256], - us_layer_strides=[1, 2, 4], - us_num_filters=[128, 128, 128], - num_input_features=64, - norm_cfg=norm_cfg, - logger=logging.getLogger("RPN"), - ), - bbox_head=dict( - type="MultiGroupHead", - mode="3d", - in_channels=sum([128, 128, 128]), - norm_cfg=norm_cfg, - tasks=tasks, - weights=[1,], - box_coder=build_box_coder(box_coder), - encode_background_as_zeros=True, - loss_norm=dict( - type="NormByNumPositives", pos_cls_weight=1.0, neg_cls_weight=2.0, + type='TwoStageDetector', + first_stage_cfg=dict( + type="VoxelNet", + pretrained='work_dirs/waymo_centerpoint_voxelnet_3x/epoch_36.pth', + reader=dict( + type="VoxelFeatureExtractorV3", + num_input_features=5 + ), + backbone=dict( + type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8 ), - loss_cls=dict(type="SigmoidFocalLoss", alpha=0.25, gamma=2.0, loss_weight=1.0,), - use_sigmoid_score=True, - loss_bbox=dict( - type="WeightedL1Loss", + neck=dict( + type="RPN", + layer_nums=[5, 5], + ds_layer_strides=[1, 2], + ds_num_filters=[128, 256], + us_layer_strides=[1, 2], + us_num_filters=[256, 256], + num_input_features=256, + logger=logging.getLogger("RPN"), + ), + bbox_head=dict( + type="CenterHead", + in_channels=sum([256, 256]), + tasks=tasks, + dataset='waymo', + weight=2, code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], - codewise=True, - loss_weight=2, + common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) ), - encode_rad_error_by_sin=False, - loss_aux=None, - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) ), + second_stage_modules=[ + dict( + type="BEVFeatureExtractor", + pc_start=[-75.2, -75.2], + voxel_size=[0.1, 0.1], + out_stride=8 + ) + ], + roi_head=dict( + type="RoIHead", + input_channels=512*5, + model_cfg=dict( + CLASS_AGNOSTIC=True, + SHARED_FC=[256, 256], + CLS_FC=[256, 256], + REG_FC=[256, 256], + DP_RATIO=0.3, + + TARGET_CONFIG=dict( + ROI_PER_IMAGE=128, + FG_RATIO=0.5, + SAMPLE_ROI_BY_EACH_CLASS=True, + CLS_SCORE_TYPE='roi_iou', + CLS_FG_THRESH=0.75, + CLS_BG_THRESH=0.25, + CLS_BG_THRESH_LO=0.1, + HARD_BG_RATIO=0.8, + REG_FG_THRESH=0.55 + ), + LOSS_CONFIG=dict( + CLS_LOSS='BinaryCrossEntropy', + REG_LOSS='L1', + LOSS_WEIGHTS={ + 'rcnn_cls_weight': 1.0, + 'rcnn_reg_weight': 1.0, + 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] + } + ) + ), + code_size=7 + ), + NMS_POST_MAXSIZE=500, + num_point=5, + freeze=True ) assigner = dict( - box_coder=box_coder, target_assigner=target_assigner, out_size_factor=get_downsample_factor(model), - debug=False, + dense_reg=1, + gaussian_overlap=0.1, + max_objs=500, + min_radius=2, ) train_cfg = dict(assigner=assigner) + test_cfg = dict( post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], max_per_img=4096, @@ -106,12 +114,12 @@ use_multi_class_nms=False, nms_pre_max_size=4096, nms_post_max_size=500, - nms_iou_threshold=0.25, + nms_iou_threshold=0.7, ), score_threshold=0.1, - pc_range=[-76.8, -76.8], + pc_range=[-75.2, -75.2], out_size_factor=get_downsample_factor(model), - voxel_size=[0.3, 0.3] + voxel_size=[0.1, 0.1] ) @@ -123,52 +131,45 @@ db_sampler = dict( type="GT-AUG", enable=False, - db_info_path="data/Waymo/dbinfos_train.pkl", + db_info_path="data/Waymo/dbinfos_train_1sweeps_withvelo.pkl", sample_groups=[ dict(VEHICLE=15), + dict(PEDESTRIAN=10), + dict(CYCLIST=10), ], db_prep_steps=[ dict( filter_by_min_num_points=dict( VEHICLE=5, + PEDESTRIAN=5, + CYCLIST=5, ) ), dict(filter_by_difficulty=[-1],), ], global_random_rotation_range_per_object=[0, 0], rate=1.0, -) -db_sampler = None +) + train_preprocessor = dict( mode="train", shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], global_rot_noise=[-0.78539816, 0.78539816], global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.0, 0.0, 0.0], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, db_sampler=db_sampler, class_names=class_names, ) val_preprocessor = dict( mode="val", - shuffle_points=True, - remove_environment=False, - remove_unknown_examples=False, + shuffle_points=False, ) voxel_generator = dict( - range=[-76.8, -76.8, -2.0, 76.8, 76.8, 4.0], - voxel_size=[0.3, 0.3, 6], - max_points_in_voxel=20, - max_voxel_num=32000, + range=[-75.2, -75.2, -2, 75.2, 75.2, 4], + voxel_size=[0.1, 0.1, 0.15], + max_points_in_voxel=5, + max_voxel_num=[150000, 200000] ) train_pipeline = [ @@ -176,7 +177,7 @@ dict(type="LoadPointCloudAnnotations", with_bbox=True), dict(type="Preprocess", cfg=train_preprocessor), dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignTarget", cfg=train_cfg["assigner"]), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), dict(type="Reformat"), ] test_pipeline = [ @@ -184,17 +185,17 @@ dict(type="LoadPointCloudAnnotations", with_bbox=True), dict(type="Preprocess", cfg=val_preprocessor), dict(type="Voxelization", cfg=voxel_generator), - dict(type="AssignTarget", cfg=train_cfg["assigner"]), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), dict(type="Reformat"), ] train_anno = "data/Waymo/infos_train_01sweeps_filter_zero_gt.pkl" val_anno = "data/Waymo/infos_val_01sweeps_filter_zero_gt.pkl" -test_anno = None +test_anno = "data/Waymo/infos_test_01sweeps_filter_zero_gt.pkl" data = dict( samples_per_gpu=4, - workers_per_gpu=8, + workers_per_gpu=4, train=dict( type=dataset_type, root_path=data_root, @@ -220,6 +221,7 @@ info_path=test_anno, ann_file=test_anno, nsweeps=nsweeps, + test_mode=True, class_names=class_names, pipeline=test_pipeline, ), @@ -248,7 +250,7 @@ ) # yapf:enable # runtime settings -total_epochs = 30 +total_epochs = 6 device_ids = range(8) dist_params = dict(backend="nccl", init_method="env://") log_level = "INFO" diff --git a/configs/waymo/voxelnet/two_stage/waymo_centerpoint_voxelnet_two_sweep_two_stage_bev_5point_ft_6epoch_freeze_with_vel.py b/configs/waymo/voxelnet/two_stage/waymo_centerpoint_voxelnet_two_sweep_two_stage_bev_5point_ft_6epoch_freeze_with_vel.py new file mode 100644 index 0000000..eb16415 --- /dev/null +++ b/configs/waymo/voxelnet/two_stage/waymo_centerpoint_voxelnet_two_sweep_two_stage_bev_5point_ft_6epoch_freeze_with_vel.py @@ -0,0 +1,260 @@ +import itertools +import logging + +from det3d.utils.config_tool import get_downsample_factor + +tasks = [ + dict(num_class=3, class_names=['VEHICLE', 'PEDESTRIAN', 'CYCLIST']), +] + +class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) + +# training and testing settings +target_assigner = dict( + tasks=tasks, +) + +# model settings +model = dict( + type='TwoStageDetector', + first_stage_cfg=dict( + type="VoxelNet", + pretrained='work_dirs/waymo_centerpoint_voxelnet_two_sweeps_3x_with_velo/epoch_36.pth', + reader=dict( + type="VoxelFeatureExtractorV3", + num_input_features=6 + ), + backbone=dict( + type="SpMiddleResNetFHD", num_input_features=6, ds_factor=8 + ), + neck=dict( + type="RPN", + layer_nums=[5, 5], + ds_layer_strides=[1, 2], + ds_num_filters=[128, 256], + us_layer_strides=[1, 2], + us_num_filters=[256, 256], + num_input_features=256, + logger=logging.getLogger("RPN"), + ), + bbox_head=dict( + type="CenterHead", + in_channels=sum([256, 256]), + tasks=tasks, + dataset='waymo', + weight=2, + code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], + common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel':(2,2)}, # (output_channel, num_conv) + ), + ), + second_stage_modules=[ + dict( + type="BEVFeatureExtractor", + pc_start=[-75.2, -75.2], + voxel_size=[0.1, 0.1], + out_stride=8 + ) + ], + roi_head=dict( + type="RoIHead", + input_channels=512*5, + model_cfg=dict( + CLASS_AGNOSTIC=True, + SHARED_FC=[256, 256], + CLS_FC=[256, 256], + REG_FC=[256, 256], + DP_RATIO=0.3, + + TARGET_CONFIG=dict( + ROI_PER_IMAGE=128, + FG_RATIO=0.5, + SAMPLE_ROI_BY_EACH_CLASS=True, + CLS_SCORE_TYPE='roi_iou', + CLS_FG_THRESH=0.75, + CLS_BG_THRESH=0.25, + CLS_BG_THRESH_LO=0.1, + HARD_BG_RATIO=0.8, + REG_FG_THRESH=0.55 + ), + LOSS_CONFIG=dict( + CLS_LOSS='BinaryCrossEntropy', + REG_LOSS='L1', + LOSS_WEIGHTS={ + 'rcnn_cls_weight': 1.0, + 'rcnn_reg_weight': 1.0, + 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2] + } + ) + ), + code_size=9 + ), + NMS_POST_MAXSIZE=500, + num_point=5, + freeze=True +) + +assigner = dict( + target_assigner=target_assigner, + out_size_factor=get_downsample_factor(model), + dense_reg=1, + gaussian_overlap=0.1, + max_objs=500, + min_radius=2, +) + + +train_cfg = dict(assigner=assigner) + + +test_cfg = dict( + post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], + max_per_img=4096, + nms=dict( + use_rotate_nms=True, + use_multi_class_nms=False, + nms_pre_max_size=4096, + nms_post_max_size=500, + nms_iou_threshold=0.7, + ), + score_threshold=0.1, + pc_range=[-75.2, -75.2], + out_size_factor=get_downsample_factor(model), + voxel_size=[0.1, 0.1] +) + + +# dataset settings +dataset_type = "WaymoDataset" +nsweeps = 2 +data_root = "data/Waymo" + +db_sampler = dict( + type="GT-AUG", + enable=False, + db_info_path="data/Waymo/dbinfos_train_2sweeps_withvelo.pkl", + sample_groups=[ + dict(VEHICLE=15), + dict(PEDESTRIAN=10), + dict(CYCLIST=10), + ], + db_prep_steps=[ + dict( + filter_by_min_num_points=dict( + VEHICLE=5, + PEDESTRIAN=5, + CYCLIST=5, + ) + ), + dict(filter_by_difficulty=[-1],), + ], + global_random_rotation_range_per_object=[0, 0], + rate=1.0, +) + +train_preprocessor = dict( + mode="train", + shuffle_points=True, + global_rot_noise=[-0.78539816, 0.78539816], + global_scale_noise=[0.95, 1.05], + db_sampler=db_sampler, + class_names=class_names, +) + +val_preprocessor = dict( + mode="val", + shuffle_points=False, +) + +voxel_generator = dict( + range=[-75.2, -75.2, -2, 75.2, 75.2, 4], + voxel_size=[0.1, 0.1, 0.15], + max_points_in_voxel=5, + max_voxel_num=[180000, 400000], +) + +train_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=train_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] +test_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=val_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] + +train_anno = "data/Waymo/infos_train_02sweeps_filter_zero_gt.pkl" +val_anno = "data/Waymo/infos_val_02sweeps_filter_zero_gt.pkl" +test_anno = "data/Waymo/infos_test_02sweeps_filter_zero_gt.pkl" + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + root_path=data_root, + info_path=train_anno, + ann_file=train_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=train_pipeline, + ), + val=dict( + type=dataset_type, + root_path=data_root, + info_path=val_anno, + test_mode=True, + ann_file=val_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), + test=dict( + type=dataset_type, + root_path=data_root, + info_path=test_anno, + ann_file=test_anno, + test_mode=True, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), +) + + + +optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) + +# optimizer +optimizer = dict( + type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, +) +lr_config = dict( + type="one_cycle", lr_max=0.003, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, +) + +checkpoint_config = dict(interval=1) +# yapf:disable +log_config = dict( + interval=5, + hooks=[ + dict(type="TextLoggerHook"), + # dict(type='TensorboardLoggerHook') + ], +) +# yapf:enable +# runtime settings +total_epochs = 6 +device_ids = range(8) +dist_params = dict(backend="nccl", init_method="env://") +log_level = "INFO" +work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_01voxel_circle_nms.py b/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_1x.py similarity index 60% rename from configs/centerpoint/nusc_centerpoint_voxelnet_dcn_01voxel_circle_nms.py rename to configs/waymo/voxelnet/waymo_centerpoint_voxelnet_1x.py index ecac40c..b9a83be 100644 --- a/configs/centerpoint/nusc_centerpoint_voxelnet_dcn_01voxel_circle_nms.py +++ b/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_1x.py @@ -1,18 +1,10 @@ import itertools import logging -from det3d.builder import build_box_coder from det3d.utils.config_tool import get_downsample_factor -norm_cfg = None - tasks = [ - dict(num_class=1, class_names=["car"]), - dict(num_class=2, class_names=["truck", "construction_vehicle"]), - dict(num_class=2, class_names=["bus", "trailer"]), - dict(num_class=1, class_names=["barrier"]), - dict(num_class=2, class_names=["motorcycle", "bicycle"]), - dict(num_class=2, class_names=["pedestrian", "traffic_cone"]), + dict(num_class=3, class_names=['VEHICLE', 'PEDESTRIAN', 'CYCLIST']), ] class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) @@ -29,11 +21,9 @@ reader=dict( type="VoxelFeatureExtractorV3", num_input_features=5, - norm_cfg=norm_cfg, ), backbone=dict( - type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8, norm_cfg=norm_cfg, - ), + type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8), neck=dict( type="RPN", layer_nums=[5, 5], @@ -42,22 +32,16 @@ us_layer_strides=[1, 2], us_num_filters=[256, 256], num_input_features=256, - norm_cfg=norm_cfg, logger=logging.getLogger("RPN"), ), bbox_head=dict( type="CenterHead", - mode="3d", in_channels=sum([256, 256]), - norm_cfg=norm_cfg, tasks=tasks, - dataset='nuscenes', - weight=0.25, - code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, # (output_channel, num_conv) - encode_rad_error_by_sin=False, - direction_offset=0.0, - dcn_head=True + dataset='waymo', + weight=2, + code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) ), ) @@ -73,75 +57,56 @@ train_cfg = dict(assigner=assigner) + test_cfg = dict( - post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, - max_pool_nms=False, - circle_nms=True, - min_radius=[4, 12, 10, 1, 0.85, 0.175], - post_max_size=83, + post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], + nms=dict( + use_rotate_nms=True, + use_multi_class_nms=False, + nms_pre_max_size=4096, + nms_post_max_size=500, + nms_iou_threshold=0.7, + ), score_threshold=0.1, - pc_range=[-51.2, -51.2], + pc_range=[-75.2, -75.2], out_size_factor=get_downsample_factor(model), - voxel_size=[0.1, 0.1] + voxel_size=[0.1, 0.1], ) # dataset settings -dataset_type = "NuScenesDataset" -nsweeps = 10 -data_root = "data/nuScenes" +dataset_type = "WaymoDataset" +nsweeps = 1 +data_root = "data/Waymo" db_sampler = dict( type="GT-AUG", enable=False, - db_info_path="data/nuScenes/dbinfos_train_10sweeps_withvelo.pkl", + db_info_path="data/Waymo/dbinfos_train_1sweeps_withvelo.pkl", sample_groups=[ - dict(car=2), - dict(truck=3), - dict(construction_vehicle=7), - dict(bus=4), - dict(trailer=6), - dict(barrier=2), - dict(motorcycle=6), - dict(bicycle=6), - dict(pedestrian=2), - dict(traffic_cone=2), + dict(VEHICLE=15), + dict(PEDESTRIAN=10), + dict(CYCLIST=10), ], db_prep_steps=[ dict( filter_by_min_num_points=dict( - car=5, - truck=5, - bus=5, - trailer=5, - construction_vehicle=5, - traffic_cone=5, - barrier=5, - motorcycle=5, - bicycle=5, - pedestrian=5, + VEHICLE=5, + PEDESTRIAN=5, + CYCLIST=5, ) ), dict(filter_by_difficulty=[-1],), ], global_random_rotation_range_per_object=[0, 0], rate=1.0, -) +) + train_preprocessor = dict( mode="train", shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], - global_rot_noise=[-0.3925, 0.3925], + global_rot_noise=[-0.78539816, 0.78539816], global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, db_sampler=db_sampler, class_names=class_names, ) @@ -149,15 +114,13 @@ val_preprocessor = dict( mode="val", shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, ) voxel_generator = dict( - range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], - voxel_size=[0.1, 0.1, 0.2], - max_points_in_voxel=10, - max_voxel_num=60000, + range=[-75.2, -75.2, -2, 75.2, 75.2, 4], + voxel_size=[0.1, 0.1, 0.15], + max_points_in_voxel=5, + max_voxel_num=150000, ) train_pipeline = [ @@ -177,13 +140,13 @@ dict(type="Reformat"), ] -train_anno = "data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl" -val_anno = "data/nuScenes/infos_val_10sweeps_withvelo_filter_True.pkl" +train_anno = "data/Waymo/infos_train_01sweeps_filter_zero_gt.pkl" +val_anno = "data/Waymo/infos_val_01sweeps_filter_zero_gt.pkl" test_anno = None data = dict( samples_per_gpu=4, - workers_per_gpu=8, + workers_per_gpu=4, train=dict( type=dataset_type, root_path=data_root, @@ -223,7 +186,7 @@ type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, ) lr_config = dict( - type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, + type="one_cycle", lr_max=0.003, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, ) checkpoint_config = dict(interval=1) @@ -237,11 +200,11 @@ ) # yapf:enable # runtime settings -total_epochs = 20 +total_epochs = 12 device_ids = range(8) dist_params = dict(backend="nccl", init_method="env://") log_level = "INFO" work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) -load_from = None +load_from = None resume_from = None workflow = [('train', 1)] diff --git a/configs/centerpoint/nusc_centerpoint_voxelnet_01voxel_circle_nms.py b/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_3epoch.py similarity index 60% rename from configs/centerpoint/nusc_centerpoint_voxelnet_01voxel_circle_nms.py rename to configs/waymo/voxelnet/waymo_centerpoint_voxelnet_3epoch.py index fa4a67b..02fa88e 100644 --- a/configs/centerpoint/nusc_centerpoint_voxelnet_01voxel_circle_nms.py +++ b/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_3epoch.py @@ -1,18 +1,10 @@ import itertools import logging -from det3d.builder import build_box_coder from det3d.utils.config_tool import get_downsample_factor -norm_cfg = None - tasks = [ - dict(num_class=1, class_names=["car"]), - dict(num_class=2, class_names=["truck", "construction_vehicle"]), - dict(num_class=2, class_names=["bus", "trailer"]), - dict(num_class=1, class_names=["barrier"]), - dict(num_class=2, class_names=["motorcycle", "bicycle"]), - dict(num_class=2, class_names=["pedestrian", "traffic_cone"]), + dict(num_class=3, class_names=['VEHICLE', 'PEDESTRIAN', 'CYCLIST']), ] class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) @@ -29,11 +21,9 @@ reader=dict( type="VoxelFeatureExtractorV3", num_input_features=5, - norm_cfg=norm_cfg, ), backbone=dict( - type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8, norm_cfg=norm_cfg, - ), + type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8), neck=dict( type="RPN", layer_nums=[5, 5], @@ -42,23 +32,16 @@ us_layer_strides=[1, 2], us_num_filters=[256, 256], num_input_features=256, - norm_cfg=norm_cfg, logger=logging.getLogger("RPN"), ), bbox_head=dict( - # type='RPNHead', type="CenterHead", - mode="3d", in_channels=sum([256, 256]), - norm_cfg=norm_cfg, tasks=tasks, - dataset='nuscenes', - weight=0.25, - code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], - common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel': (2, 2)}, # (output_channel, num_conv) - encode_rad_error_by_sin=False, - direction_offset=0.0, - share_conv_channel=64, + dataset='waymo', + weight=2, + code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) ), ) @@ -74,75 +57,56 @@ train_cfg = dict(assigner=assigner) + test_cfg = dict( - post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], - max_per_img=500, - max_pool_nms=False, - circle_nms=True, - min_radius=[4, 12, 10, 1, 0.85, 0.175], - post_max_size=83, + post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], + nms=dict( + use_rotate_nms=True, + use_multi_class_nms=False, + nms_pre_max_size=4096, + nms_post_max_size=500, + nms_iou_threshold=0.7, + ), score_threshold=0.1, - pc_range=[-51.2, -51.2], + pc_range=[-75.2, -75.2], out_size_factor=get_downsample_factor(model), - voxel_size=[0.1, 0.1] + voxel_size=[0.1, 0.1], ) # dataset settings -dataset_type = "NuScenesDataset" -nsweeps = 10 -data_root = "data/nuScenes" +dataset_type = "WaymoDataset" +nsweeps = 1 +data_root = "data/Waymo" db_sampler = dict( type="GT-AUG", enable=False, - db_info_path="data/nuScenes/dbinfos_train_10sweeps_withvelo.pkl", + db_info_path="data/Waymo/dbinfos_train_1sweeps_withvelo.pkl", sample_groups=[ - dict(car=2), - dict(truck=3), - dict(construction_vehicle=7), - dict(bus=4), - dict(trailer=6), - dict(barrier=2), - dict(motorcycle=6), - dict(bicycle=6), - dict(pedestrian=2), - dict(traffic_cone=2), + dict(VEHICLE=15), + dict(PEDESTRIAN=10), + dict(CYCLIST=10), ], db_prep_steps=[ dict( filter_by_min_num_points=dict( - car=5, - truck=5, - bus=5, - trailer=5, - construction_vehicle=5, - traffic_cone=5, - barrier=5, - motorcycle=5, - bicycle=5, - pedestrian=5, + VEHICLE=5, + PEDESTRIAN=5, + CYCLIST=5, ) ), dict(filter_by_difficulty=[-1],), ], global_random_rotation_range_per_object=[0, 0], rate=1.0, -) +) + train_preprocessor = dict( mode="train", shuffle_points=True, - gt_loc_noise=[0.0, 0.0, 0.0], - gt_rot_noise=[0.0, 0.0], - global_rot_noise=[-0.3925, 0.3925], + global_rot_noise=[-0.78539816, 0.78539816], global_scale_noise=[0.95, 1.05], - global_rot_per_obj_range=[0, 0], - global_trans_noise=[0.2, 0.2, 0.2], - remove_points_after_sample=False, - gt_drop_percentage=0.0, - gt_drop_max_keep_points=15, - remove_unknown_examples=False, - remove_environment=False, db_sampler=db_sampler, class_names=class_names, ) @@ -150,15 +114,13 @@ val_preprocessor = dict( mode="val", shuffle_points=False, - remove_environment=False, - remove_unknown_examples=False, ) voxel_generator = dict( - range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], - voxel_size=[0.1, 0.1, 0.2], - max_points_in_voxel=10, - max_voxel_num=60000, + range=[-75.2, -75.2, -2, 75.2, 75.2, 4], + voxel_size=[0.1, 0.1, 0.15], + max_points_in_voxel=5, + max_voxel_num=150000, ) train_pipeline = [ @@ -178,13 +140,13 @@ dict(type="Reformat"), ] -train_anno = "data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl" -val_anno = "data/nuScenes/infos_val_10sweeps_withvelo_filter_True.pkl" +train_anno = "data/Waymo/infos_train_01sweeps_filter_zero_gt.pkl" +val_anno = "data/Waymo/infos_val_01sweeps_filter_zero_gt.pkl" test_anno = None data = dict( samples_per_gpu=4, - workers_per_gpu=8, + workers_per_gpu=4, train=dict( type=dataset_type, root_path=data_root, @@ -224,7 +186,7 @@ type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, ) lr_config = dict( - type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, + type="one_cycle", lr_max=0.003, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, ) checkpoint_config = dict(interval=1) @@ -238,7 +200,7 @@ ) # yapf:enable # runtime settings -total_epochs = 20 +total_epochs = 3 device_ids = range(8) dist_params = dict(backend="nccl", init_method="env://") log_level = "INFO" diff --git a/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_3x.py b/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_3x.py new file mode 100644 index 0000000..989cdde --- /dev/null +++ b/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_3x.py @@ -0,0 +1,210 @@ +import itertools +import logging + +from det3d.utils.config_tool import get_downsample_factor + +tasks = [ + dict(num_class=3, class_names=['VEHICLE', 'PEDESTRIAN', 'CYCLIST']), +] + +class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) + +# training and testing settings +target_assigner = dict( + tasks=tasks, +) + +# model settings +model = dict( + type="VoxelNet", + pretrained=None, + reader=dict( + type="VoxelFeatureExtractorV3", + num_input_features=5, + ), + backbone=dict( + type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8), + neck=dict( + type="RPN", + layer_nums=[5, 5], + ds_layer_strides=[1, 2], + ds_num_filters=[128, 256], + us_layer_strides=[1, 2], + us_num_filters=[256, 256], + num_input_features=256, + logger=logging.getLogger("RPN"), + ), + bbox_head=dict( + type="CenterHead", + in_channels=sum([256, 256]), + tasks=tasks, + dataset='waymo', + weight=2, + code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) + ), +) + +assigner = dict( + target_assigner=target_assigner, + out_size_factor=get_downsample_factor(model), + dense_reg=1, + gaussian_overlap=0.1, + max_objs=500, + min_radius=2, +) + + +train_cfg = dict(assigner=assigner) + + +test_cfg = dict( + post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], + nms=dict( + use_rotate_nms=True, + use_multi_class_nms=False, + nms_pre_max_size=4096, + nms_post_max_size=500, + nms_iou_threshold=0.7, + ), + score_threshold=0.1, + pc_range=[-75.2, -75.2], + out_size_factor=get_downsample_factor(model), + voxel_size=[0.1, 0.1], +) + + +# dataset settings +dataset_type = "WaymoDataset" +nsweeps = 1 +data_root = "data/Waymo" + +db_sampler = dict( + type="GT-AUG", + enable=False, + db_info_path="data/Waymo/dbinfos_train_1sweeps_withvelo.pkl", + sample_groups=[ + dict(VEHICLE=15), + dict(PEDESTRIAN=10), + dict(CYCLIST=10), + ], + db_prep_steps=[ + dict( + filter_by_min_num_points=dict( + VEHICLE=5, + PEDESTRIAN=5, + CYCLIST=5, + ) + ), + dict(filter_by_difficulty=[-1],), + ], + global_random_rotation_range_per_object=[0, 0], + rate=1.0, +) + +train_preprocessor = dict( + mode="train", + shuffle_points=True, + global_rot_noise=[-0.78539816, 0.78539816], + global_scale_noise=[0.95, 1.05], + db_sampler=db_sampler, + class_names=class_names, +) + +val_preprocessor = dict( + mode="val", + shuffle_points=False, +) + +voxel_generator = dict( + range=[-75.2, -75.2, -2, 75.2, 75.2, 4], + voxel_size=[0.1, 0.1, 0.15], + max_points_in_voxel=5, + max_voxel_num=[150000, 200000], +) + +train_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=train_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] +test_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=val_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] + +train_anno = "data/Waymo/infos_train_01sweeps_filter_zero_gt.pkl" +val_anno = "data/Waymo/infos_val_01sweeps_filter_zero_gt.pkl" +test_anno = None + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + root_path=data_root, + info_path=train_anno, + ann_file=train_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=train_pipeline, + ), + val=dict( + type=dataset_type, + root_path=data_root, + info_path=val_anno, + test_mode=True, + ann_file=val_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), + test=dict( + type=dataset_type, + root_path=data_root, + info_path=test_anno, + ann_file=test_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), +) + + + +optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) + +# optimizer +optimizer = dict( + type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, +) +lr_config = dict( + type="one_cycle", lr_max=0.003, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, +) + +checkpoint_config = dict(interval=1) +# yapf:disable +log_config = dict( + interval=5, + hooks=[ + dict(type="TextLoggerHook"), + # dict(type='TensorboardLoggerHook') + ], +) +# yapf:enable +# runtime settings +total_epochs = 36 +device_ids = range(8) +dist_params = dict(backend="nccl", init_method="env://") +log_level = "INFO" +work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_6epoch.py b/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_6epoch.py new file mode 100644 index 0000000..ff8eccf --- /dev/null +++ b/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_6epoch.py @@ -0,0 +1,210 @@ +import itertools +import logging + +from det3d.utils.config_tool import get_downsample_factor + +tasks = [ + dict(num_class=3, class_names=['VEHICLE', 'PEDESTRIAN', 'CYCLIST']), +] + +class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) + +# training and testing settings +target_assigner = dict( + tasks=tasks, +) + +# model settings +model = dict( + type="VoxelNet", + pretrained=None, + reader=dict( + type="VoxelFeatureExtractorV3", + num_input_features=5, + ), + backbone=dict( + type="SpMiddleResNetFHD", num_input_features=5, ds_factor=8), + neck=dict( + type="RPN", + layer_nums=[5, 5], + ds_layer_strides=[1, 2], + ds_num_filters=[128, 256], + us_layer_strides=[1, 2], + us_num_filters=[256, 256], + num_input_features=256, + logger=logging.getLogger("RPN"), + ), + bbox_head=dict( + type="CenterHead", + in_channels=sum([256, 256]), + tasks=tasks, + dataset='waymo', + weight=2, + code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) + ), +) + +assigner = dict( + target_assigner=target_assigner, + out_size_factor=get_downsample_factor(model), + dense_reg=1, + gaussian_overlap=0.1, + max_objs=500, + min_radius=2, +) + + +train_cfg = dict(assigner=assigner) + + +test_cfg = dict( + post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], + nms=dict( + use_rotate_nms=True, + use_multi_class_nms=False, + nms_pre_max_size=4096, + nms_post_max_size=500, + nms_iou_threshold=0.7, + ), + score_threshold=0.1, + pc_range=[-75.2, -75.2], + out_size_factor=get_downsample_factor(model), + voxel_size=[0.1, 0.1], +) + + +# dataset settings +dataset_type = "WaymoDataset" +nsweeps = 1 +data_root = "data/Waymo" + +db_sampler = dict( + type="GT-AUG", + enable=False, + db_info_path="data/Waymo/dbinfos_train_1sweeps_withvelo.pkl", + sample_groups=[ + dict(VEHICLE=15), + dict(PEDESTRIAN=10), + dict(CYCLIST=10), + ], + db_prep_steps=[ + dict( + filter_by_min_num_points=dict( + VEHICLE=5, + PEDESTRIAN=5, + CYCLIST=5, + ) + ), + dict(filter_by_difficulty=[-1],), + ], + global_random_rotation_range_per_object=[0, 0], + rate=1.0, +) + +train_preprocessor = dict( + mode="train", + shuffle_points=True, + global_rot_noise=[-0.78539816, 0.78539816], + global_scale_noise=[0.95, 1.05], + db_sampler=db_sampler, + class_names=class_names, +) + +val_preprocessor = dict( + mode="val", + shuffle_points=False, +) + +voxel_generator = dict( + range=[-75.2, -75.2, -2, 75.2, 75.2, 4], + voxel_size=[0.1, 0.1, 0.15], + max_points_in_voxel=5, + max_voxel_num=150000, +) + +train_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=train_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] +test_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=val_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] + +train_anno = "data/Waymo/infos_train_01sweeps_filter_zero_gt.pkl" +val_anno = "data/Waymo/infos_val_01sweeps_filter_zero_gt.pkl" +test_anno = None + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + root_path=data_root, + info_path=train_anno, + ann_file=train_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=train_pipeline, + ), + val=dict( + type=dataset_type, + root_path=data_root, + info_path=val_anno, + test_mode=True, + ann_file=val_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), + test=dict( + type=dataset_type, + root_path=data_root, + info_path=test_anno, + ann_file=test_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), +) + + + +optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) + +# optimizer +optimizer = dict( + type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, +) +lr_config = dict( + type="one_cycle", lr_max=0.003, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, +) + +checkpoint_config = dict(interval=1) +# yapf:disable +log_config = dict( + interval=5, + hooks=[ + dict(type="TextLoggerHook"), + # dict(type='TensorboardLoggerHook') + ], +) +# yapf:enable +# runtime settings +total_epochs = 6 +device_ids = range(8) +dist_params = dict(backend="nccl", init_method="env://") +log_level = "INFO" +work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_two_sweeps_3x_with_velo.py b/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_two_sweeps_3x_with_velo.py new file mode 100644 index 0000000..c7313d1 --- /dev/null +++ b/configs/waymo/voxelnet/waymo_centerpoint_voxelnet_two_sweeps_3x_with_velo.py @@ -0,0 +1,210 @@ +import itertools +import logging + +from det3d.utils.config_tool import get_downsample_factor + +tasks = [ + dict(num_class=3, class_names=['VEHICLE', 'PEDESTRIAN', 'CYCLIST']), +] + +class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) + +# training and testing settings +target_assigner = dict( + tasks=tasks, +) + +# model settings +model = dict( + type="VoxelNet", + pretrained=None, + reader=dict( + type="VoxelFeatureExtractorV3", + num_input_features=6, + ), + backbone=dict( + type="SpMiddleResNetFHD", num_input_features=6, ds_factor=8), + neck=dict( + type="RPN", + layer_nums=[5, 5], + ds_layer_strides=[1, 2], + ds_num_filters=[128, 256], + us_layer_strides=[1, 2], + us_num_filters=[256, 256], + num_input_features=256, + logger=logging.getLogger("RPN"), + ), + bbox_head=dict( + type="CenterHead", + in_channels=sum([256, 256]), + tasks=tasks, + dataset='waymo', + weight=2, + code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 1.0, 1.0], + common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2), 'vel':(2,2)}, # (output_channel, num_conv) + ), +) + +assigner = dict( + target_assigner=target_assigner, + out_size_factor=get_downsample_factor(model), + dense_reg=1, + gaussian_overlap=0.1, + max_objs=500, + min_radius=2, +) + + +train_cfg = dict(assigner=assigner) + + +test_cfg = dict( + post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0], + nms=dict( + use_rotate_nms=True, + use_multi_class_nms=False, + nms_pre_max_size=4096, + nms_post_max_size=500, + nms_iou_threshold=0.7, + ), + score_threshold=0.1, + pc_range=[-75.2, -75.2], + out_size_factor=get_downsample_factor(model), + voxel_size=[0.1, 0.1], +) + + +# dataset settings +dataset_type = "WaymoDataset" +nsweeps = 2 +data_root = "data/Waymo" + +db_sampler = dict( + type="GT-AUG", + enable=False, + db_info_path="data/Waymo/dbinfos_train_2sweeps_withvelo.pkl", + sample_groups=[ + dict(VEHICLE=15), + dict(PEDESTRIAN=10), + dict(CYCLIST=10), + ], + db_prep_steps=[ + dict( + filter_by_min_num_points=dict( + VEHICLE=5, + PEDESTRIAN=5, + CYCLIST=5, + ) + ), + dict(filter_by_difficulty=[-1],), + ], + global_random_rotation_range_per_object=[0, 0], + rate=1.0, +) + +train_preprocessor = dict( + mode="train", + shuffle_points=True, + global_rot_noise=[-0.78539816, 0.78539816], + global_scale_noise=[0.95, 1.05], + db_sampler=db_sampler, + class_names=class_names, +) + +val_preprocessor = dict( + mode="val", + shuffle_points=False, +) + +voxel_generator = dict( + range=[-75.2, -75.2, -2, 75.2, 75.2, 4], + voxel_size=[0.1, 0.1, 0.15], + max_points_in_voxel=5, + max_voxel_num=[180000, 400000], +) + +train_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=train_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] +test_pipeline = [ + dict(type="LoadPointCloudFromFile", dataset=dataset_type), + dict(type="LoadPointCloudAnnotations", with_bbox=True), + dict(type="Preprocess", cfg=val_preprocessor), + dict(type="Voxelization", cfg=voxel_generator), + dict(type="AssignLabel", cfg=train_cfg["assigner"]), + dict(type="Reformat"), +] + +train_anno = "data/Waymo/infos_train_02sweeps_filter_zero_gt.pkl" +val_anno = "data/Waymo/infos_val_02sweeps_filter_zero_gt.pkl" +test_anno = None + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + root_path=data_root, + info_path=train_anno, + ann_file=train_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=train_pipeline, + ), + val=dict( + type=dataset_type, + root_path=data_root, + info_path=val_anno, + test_mode=True, + ann_file=val_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), + test=dict( + type=dataset_type, + root_path=data_root, + info_path=test_anno, + ann_file=test_anno, + nsweeps=nsweeps, + class_names=class_names, + pipeline=test_pipeline, + ), +) + + + +optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) + +# optimizer +optimizer = dict( + type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, +) +lr_config = dict( + type="one_cycle", lr_max=0.003, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, +) + +checkpoint_config = dict(interval=1) +# yapf:disable +log_config = dict( + interval=5, + hooks=[ + dict(type="TextLoggerHook"), + # dict(type='TensorboardLoggerHook') + ], +) +# yapf:enable +# runtime settings +total_epochs = 36 +device_ids = range(8) +dist_params = dict(backend="nccl", init_method="env://") +log_level = "INFO" +work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/det3d/__init__.py b/det3d/__init__.py index 779cc85..e69de29 100644 --- a/det3d/__init__.py +++ b/det3d/__init__.py @@ -1,3 +0,0 @@ -from .version import __version__, short_version - -__all__ = ["__version__", "short_version"] diff --git a/det3d/builder.py b/det3d/builder.py index ba90a09..f1bf49f 100644 --- a/det3d/builder.py +++ b/det3d/builder.py @@ -5,17 +5,9 @@ import det3d.core.sampler.preprocess as prep import numpy as np import torch -from det3d.core.anchor.anchor_generator import ( - AnchorGeneratorRange, - AnchorGeneratorStride, - BevAnchorGeneratorRange, -) -from det3d.core.bbox import region_similarity -from det3d.core.bbox.box_coders import BevBoxCoderTorch, GroundBox3dCoderTorch from det3d.core.input.voxel_generator import VoxelGenerator from det3d.core.sampler.preprocess import DataBasePreprocessor from det3d.core.sampler.sample_ops import DataBaseSamplerV2 -from det3d.models.losses import GHMCLoss, GHMRLoss, losses from det3d.solver import learning_schedules from det3d.solver import learning_schedules_fastai as lsf from det3d.solver import optim @@ -34,36 +26,6 @@ def build_voxel_generator(voxel_config): return voxel_generator - -def build_similarity_metric(similarity_config): - """Create optimizer based on config. - - Args: - optimizer_config: A Optimizer proto message. - - Returns: - An optimizer and a list of variables for summary. - - Raises: - ValueError: when using an unsupported input data type. - """ - similarity_type = similarity_config.type - - if similarity_type == "rotate_iou_similarity": - return region_similarity.RotateIouSimilarity() - elif similarity_type == "nearest_iou_similarity": - return region_similarity.NearestIouSimilarity() - elif similarity_type == "distance_similarity": - cfg = similarity_config.distance_similarity - return region_similarity.DistanceSimilarity( - distance_norm=cfg.distance_norm, - with_rotation=cfg.with_rotation, - rotation_alpha=cfg.rotation_alpha, - ) - else: - raise ValueError("unknown similarity type") - - def build_db_preprocess(db_prep_config, logger=None): logger = logging.getLogger("build_db_preprocess") cfg = db_prep_config @@ -239,142 +201,6 @@ def _create_learning_rate_scheduler(optimizer, learning_rate_config, total_step) return lr_scheduler -def build_loss(loss_config): - """Build losses based on the config. - - Builds classification, localization losses and optionally a hard example miner - based on the config. - - Args: - loss_config: A losses_pb2.Loss object. - - Returns: - classification_loss: Classification loss object. - localization_loss: Localization loss object. - classification_weight: Classification loss weight. - localization_weight: Localization loss weight. - hard_example_miner: Hard example miner object. - - Raises: - ValueError: If hard_example_miner is used with sigmoid_focal_loss. - """ - classification_loss = _build_classification_loss(loss_config.classification_loss) - localization_loss = _build_localization_loss(loss_config.localization_loss) - - classification_weight = loss_config.classification_weight - localization_weight = loss_config.localization_weight - - hard_example_miner = None # 'Pytorch don\'t support HardExampleMiner' - - return ( - classification_loss, - localization_loss, - classification_weight, - localization_weight, - hard_example_miner, - ) - - -def build_faster_rcnn_classification_loss(loss_config): - """Builds a classification loss for Faster RCNN based on the loss config. - - Args: - loss_config: A losses_pb2.ClassificationLoss object. - - Returns: - Loss based on the config. - - Raises: - ValueError: On invalid loss_config. - """ - loss_type = loss_config.TYPE - config = loss_config.VALUE - - # By default, Faster RCNN second stage classifier uses Softmax loss - # with anchor-wise outputs. - return losses.WeightedSoftmaxClassificationLoss(logit_scale=config.logit_scale) - - -def _build_localization_loss(loss_config): - """Builds a localization loss based on the loss config. - - Args: - loss_config: A losses_pb2.LocalizationLoss object. - - Returns: - Loss based on the config. - - Raises: - ValueError: On invalid loss_config. - """ - loss_type = loss_config.type - config = loss_config - - if loss_type == "weighted_l2": - if len(config.code_weight) == 0: - code_weight = None - else: - code_weight = config.code_weight - return losses.WeightedL2LocalizationLoss(code_weight) - - if loss_type == "weighted_smooth_l1": - if len(config.code_weight) == 0: - code_weight = None - else: - code_weight = config.code_weight - return losses.WeightedSmoothL1LocalizationLoss(config.sigma, code_weight) - if loss_type == "weighted_ghm": - if len(config.code_weight) == 0: - code_weight = None - else: - code_weight = config.code_weight - return GHMRLoss(config.mu, config.bins, config.momentum, code_weight) - - raise ValueError("Empty loss config.") - - -def _build_classification_loss(loss_config): - """Builds a classification loss based on the loss config. - - Args: - loss_config: A losses_pb2.ClassificationLoss object. - - Returns: - Loss based on the config. - - Raises: - ValueError: On invalid loss_config. - """ - loss_type = loss_config.TYPE - config = loss_config.VALUE - - if loss_type == "weighted_sigmoid": - return losses.WeightedSigmoidClassificationLoss() - elif loss_type == "weighted_sigmoid_focal": - if config.alpha > 0: - alpha = config.alpha - else: - alpha = None - return losses.SigmoidFocalClassificationLoss(gamma=config.gamma, alpha=alpha) - elif loss_type == "weighted_softmax_focal": - if config.alpha > 0: - alpha = config.alpha - else: - alpha = None - return losses.SoftmaxFocalClassificationLoss(gamma=config.gamma, alpha=alpha) - elif loss_type == "weighted_ghm": - return GHMCLoss(bins=config.bins, momentum=config.momentum) - elif loss_type == "weighted_softmax": - return losses.WeightedSoftmaxClassificationLoss(logit_scale=config.logit_scale) - elif loss_type == "bootstrapped_sigmoid": - return losses.BootstrappedSigmoidClassificationLoss( - alpha=config.alpha, - bootstrap_type=("hard" if config.hard_bootstrap else "soft"), - ) - - raise ValueError("Empty loss config.") - - def build_dbsampler(cfg, logger=None): logger = logging.getLogger("build_dbsampler") prepors = [build_db_preprocess(c, logger=logger) for c in cfg.db_prep_steps] @@ -394,98 +220,3 @@ def build_dbsampler(cfg, logger=None): ) return sampler - - -def build_box_coder(box_coder_config): - """Create optimizer based on config. - - Args: - optimizer_config: A Optimizer proto message. - - Returns: - An optimizer and a list of variables for summary. - - Raises: - ValueError: when using an unsupported input data type. - """ - box_coder_type = box_coder_config["type"] - cfg = box_coder_config - - n_dim = cfg.get("n_dim", 9) - norm_velo = cfg.get("norm_velo", False) - - if box_coder_type == "ground_box3d_coder": - return GroundBox3dCoderTorch( - cfg["linear_dim"], - cfg["encode_angle_vector"], - n_dim=n_dim, - norm_velo=norm_velo, - ) - elif box_coder_type == "bev_box_coder": - cfg = box_coder_config - return BevBoxCoderTorch( - cfg["linear_dim"], - cfg["encode_angle_vector"], - cfg["z_fixed"], - cfg["h_fixed"], - ) - else: - raise ValueError("unknown box_coder type") - - -def build_anchor_generator(anchor_config): - """Create optimizer based on config. - - Args: - optimizer_config: A Optimizer proto message. - - Returns: - An optimizer and a list of variables for summary. - - Raises: - ValueError: when using an unsupported input data type. - """ - ag_type = anchor_config.type - config = anchor_config - - if "velocities" not in config: - velocities = None - else: - velocities = config.velocities - - if ag_type == "anchor_generator_stride": - ag = AnchorGeneratorStride( - sizes=config.sizes, - anchor_strides=config.strides, - anchor_offsets=config.offsets, - rotations=config.rotations, - velocities=velocities, - match_threshold=config.matched_threshold, - unmatch_threshold=config.unmatched_threshold, - class_name=config.class_name, - ) - return ag - elif ag_type == "anchor_generator_range": - ag = AnchorGeneratorRange( - sizes=config.sizes, - anchor_ranges=config.anchor_ranges, - rotations=config.rotations, - velocities=velocities, - match_threshold=config.matched_threshold, - unmatch_threshold=config.unmatched_threshold, - class_name=config.class_name, - ) - return ag - elif ag_type == "bev_anchor_generator_range": - ag = BevAnchorGeneratorRange( - sizes=config.sizes, - anchor_ranges=config.anchor_ranges, - rotations=config.rotations, - velocities=velocities, - match_threshold=config.matched_threshold, - unmatch_threshold=config.unmatched_threshold, - class_name=config.class_name, - ) - return ag - else: - raise ValueError(" unknown anchor generator type") diff --git a/det3d/core/__init__.py b/det3d/core/__init__.py index b148b56..d05014b 100644 --- a/det3d/core/__init__.py +++ b/det3d/core/__init__.py @@ -1,7 +1,4 @@ -from .fp16 import * -from .evaluation import * from .utils import * -from .anchor import * from .bbox import * from .input import * from .sampler import * diff --git a/det3d/core/anchor/__init__.py b/det3d/core/anchor/__init__.py deleted file mode 100644 index dc6d2a8..0000000 --- a/det3d/core/anchor/__init__.py +++ /dev/null @@ -1,7 +0,0 @@ -from .anchor_generator import ( - AnchorGeneratorRange, - AnchorGeneratorStride, - BevAnchorGeneratorRange, -) -from .target_assigner import TargetAssigner -from .target_ops import create_target_np diff --git a/det3d/core/anchor/anchor_generator.py b/det3d/core/anchor/anchor_generator.py deleted file mode 100644 index 1bdd123..0000000 --- a/det3d/core/anchor/anchor_generator.py +++ /dev/null @@ -1,173 +0,0 @@ -import numpy as np -from det3d.core.bbox import box_np_ops - - -class AnchorGeneratorStride: - def __init__( - self, - sizes=[1.6, 3.9, 1.56], - anchor_strides=[0.4, 0.4, 1.0], - anchor_offsets=[0.2, -39.8, -1.78], - rotations=[0, np.pi / 2], - velocities=[0, 0], - class_name=None, - match_threshold=-1, - unmatch_threshold=-1, - dtype=np.float32, - ): - self._sizes = sizes - self._anchor_strides = anchor_strides - self._anchor_offsets = anchor_offsets - self._rotations = rotations - self._velocities = velocities - self._dtype = dtype - self._class_name = class_name - self._match_threshold = match_threshold - self._unmatch_threshold = unmatch_threshold - - @property - def class_name(self): - return self._class_name - - @property - def match_threshold(self): - return self._match_threshold - - @property - def unmatch_threshold(self): - return self._unmatch_threshold - - @property - def num_anchors_per_localization(self): - num_rot = len(self._rotations) - num_size = np.array(self._sizes).reshape([-1, 3]).shape[0] - return num_rot * num_size - - @property - def ndim(self): - # return 7 + len(self._custom_values) - return self._anchors.shape[-1] - - def generate(self, feature_map_size): - self._anchors = box_np_ops.create_anchors_3d_stride( - feature_map_size, - self._sizes, - self._anchor_strides, - self._anchor_offsets, - self._rotations, - self._velocities, - self._dtype, - ) - return self._anchors - - -class AnchorGeneratorRange: - def __init__( - self, - anchor_ranges, - sizes=[1.6, 3.9, 1.56], - rotations=[0, np.pi / 2], - velocities=[0, 0], - class_name=None, - match_threshold=-1, - unmatch_threshold=-1, - dtype=np.float32, - ): - self._sizes = sizes - self._anchor_ranges = anchor_ranges - self._rotations = rotations - self._velocities = velocities - self._dtype = dtype - self._class_name = class_name - self._match_threshold = match_threshold - self._unmatch_threshold = unmatch_threshold - - @property - def class_name(self): - return self._class_name - - @property - def match_threshold(self): - return self._match_threshold - - @property - def unmatch_threshold(self): - return self._unmatch_threshold - - @property - def num_anchors_per_localization(self): - num_rot = len(self._rotations) - num_size = np.array(self._sizes).reshape([-1, 3]).shape[0] - return num_rot * num_size - - @property - def ndim(self): - # return 7 + len(self._custom_values) - return self._anchors.shape[-1] - - def generate(self, feature_map_size): - self._anchors = box_np_ops.create_anchors_3d_range( - feature_map_size, - self._anchor_ranges, - self._sizes, - self._rotations, - self._velocities, - self._dtype, - ) - return self._anchors - - -class BevAnchorGeneratorRange: - def __init__( - self, - anchor_ranges, - sizes=[1.6, 3.9], - rotations=[0, np.pi / 2], - velocities=[0, 0], - class_name=None, - match_threshold=-1, - unmatch_threshold=-1, - dtype=np.float32, - ): - self._sizes = sizes - self._anchor_ranges = anchor_ranges - self._rotations = rotations - self._velocities = velocities - self._dtype = dtype - self._class_name = class_name - self._match_threshold = match_threshold - self._unmatch_threshold = unmatch_threshold - - @property - def class_name(self): - return self._class_name - - @property - def match_threshold(self): - return self._match_threshold - - @property - def unmatch_threshold(self): - return self._unmatch_threshold - - @property - def num_anchors_per_localization(self): - num_rot = len(self._rotations) - num_size = np.array(self._sizes).reshape([-1, 2]).shape[0] - return num_rot * num_size - - @property - def ndim(self): - # return 7 + len(self._custom_values) - return self._anchors.shape[-1] - - def generate(self, feature_map_size): - self._anchors = box_np_ops.create_anchors_bev_range( - feature_map_size, - self._anchor_ranges, - self._sizes, - self._rotations, - self._velocities, - self._dtype, - ) - return self._anchors diff --git a/det3d/core/anchor/target_assigner.py b/det3d/core/anchor/target_assigner.py deleted file mode 100644 index f4bdf0b..0000000 --- a/det3d/core/anchor/target_assigner.py +++ /dev/null @@ -1,194 +0,0 @@ -from collections import OrderedDict - -import numpy as np -from det3d.core.anchor.target_ops import create_target_np -from det3d.core.bbox import box_np_ops, region_similarity - - -class TargetAssigner: - def __init__( - self, - box_coder, - anchor_generators, - region_similarity_calculator=None, - positive_fraction=None, - sample_size=512, - ): - self._region_similarity_calculator = region_similarity_calculator - self._box_coder = box_coder - self._anchor_generators = anchor_generators - self._positive_fraction = positive_fraction - self._sample_size = sample_size - - @property - def box_coder(self): - return self._box_coder - - @property - def classes(self): - return [a.class_name for a in self._anchor_generators] - - def assign( - self, - anchors, - gt_boxes, - anchors_mask=None, - gt_classes=None, - matched_thresholds=None, - unmatched_thresholds=None, - ): - if anchors_mask is not None: - prune_anchor_fn = lambda _: np.where(anchors_mask)[0] - else: - prune_anchor_fn = None - - def similarity_fn(anchors, gt_boxes): - anchors_rbv = anchors[:, [0, 1, 3, 4, -1]] - gt_boxes_rbv = gt_boxes[:, [0, 1, 3, 4, -1]] - return self._region_similarity_calculator.compare(anchors_rbv, gt_boxes_rbv) - - def box_encoding_fn(boxes, anchors): - return self._box_coder.encode(boxes, anchors) - - return create_target_np( - anchors, - gt_boxes[gt_classes == class_name], - similarity_fn, - box_encoding_fn, - prune_anchor_fn=prune_anchor_fn, - gt_classes=gt_classes[gt_classes == class_name], - matched_threshold=matched_thresholds, - unmatched_threshold=unmatched_thresholds, - positive_fraction=self._positive_fraction, - rpn_batch_size=self._sample_size, - norm_by_num_examples=False, - box_code_size=self.box_coder.code_size, - ) - - def assign_v2( - self, anchors_dict, gt_boxes, anchors_mask=None, gt_classes=None, gt_names=None - ): - def similarity_fn(anchors, gt_boxes): - anchors_rbv = anchors[:, [0, 1, 3, 4, -1]] - gt_boxes_rbv = gt_boxes[:, [0, 1, 3, 4, -1]] - return self._region_similarity_calculator.compare(anchors_rbv, gt_boxes_rbv) - - def box_encoding_fn(boxes, anchors): - return self._box_coder.encode(boxes, anchors) - - targets_list = [] - anchor_loc_idx = 0 - for class_name, anchor_dict in anchors_dict.items(): - mask = np.array([c == class_name for c in gt_names], dtype=np.bool_) - feature_map_size = anchor_dict["anchors"].shape[:3] - num_loc = anchor_dict["anchors"].shape[-2] - - if anchors_mask is not None: - anchors_mask = anchors_mask.reshape(*feature_map_size, -1) - anchors_mask_class = anchors_mask[ - ..., anchor_loc_idx : anchor_loc_idx + num_loc - ].reshape(-1) - prune_anchor_fn = lambda _: np.where(anchors_mask_class)[0] - else: - prune_anchor_fn = None - - targets = create_target_np( - anchor_dict["anchors"].reshape(-1, self.box_coder.n_dim),#code_size), - gt_boxes[mask], - similarity_fn, - box_encoding_fn, - prune_anchor_fn=prune_anchor_fn, - gt_classes=gt_classes[mask], - matched_threshold=anchor_dict["matched_thresholds"], - unmatched_threshold=anchor_dict["unmatched_thresholds"], - positive_fraction=self._positive_fraction, - rpn_batch_size=self._sample_size, - norm_by_num_examples=False, - box_code_size=self.box_coder.code_size, - ) - anchor_loc_idx += num_loc - targets_list.append(targets) - - targets_dict = { - "labels": [t["labels"] for t in targets_list], - "bbox_targets": [t["bbox_targets"] for t in targets_list], - "bbox_outside_weights": [t["bbox_outside_weights"] for t in targets_list], - } - targets_dict["bbox_targets"] = np.concatenate( - [ - v.reshape(*feature_map_size, -1, self.box_coder.code_size) - for v in targets_dict["bbox_targets"] - ], - axis=-2, - ) - targets_dict["bbox_targets"] = targets_dict["bbox_targets"].reshape( - -1, self.box_coder.code_size - ) - targets_dict["labels"] = np.concatenate( - [v.reshape(*feature_map_size, -1) for v in targets_dict["labels"]], axis=-1 - ) - targets_dict["bbox_outside_weights"] = np.concatenate( - [ - v.reshape(*feature_map_size, -1) - for v in targets_dict["bbox_outside_weights"] - ], - axis=-1, - ) - targets_dict["labels"] = targets_dict["labels"].reshape(-1) - targets_dict["bbox_outside_weights"] = targets_dict[ - "bbox_outside_weights" - ].reshape(-1) - - return targets_dict - - def generate_anchors(self, feature_map_size): - anchors_list = [] - matched_thresholds = [a.match_threshold for a in self._anchor_generators] - unmatched_thresholds = [a.unmatch_threshold for a in self._anchor_generators] - match_list, unmatch_list = [], [] - for anchor_generator, match_thresh, unmatch_thresh in zip( - self._anchor_generators, matched_thresholds, unmatched_thresholds - ): - anchors = anchor_generator.generate(feature_map_size) - anchors = anchors.reshape([*anchors.shape[:3], -1, anchors.shape[-1]]) - anchors_list.append(anchors) - num_anchors = np.prod(anchors.shape[:-1]) - match_list.append(np.full([num_anchors], match_thresh, anchors.dtype)) - unmatch_list.append(np.full([num_anchors], unmatch_thresh, anchors.dtype)) - anchors = np.concatenate(anchors_list, axis=-2) - matched_thresholds = np.concatenate(match_list, axis=0) - unmatched_thresholds = np.concatenate(unmatch_list, axis=0) - return { - "anchors": anchors, - "matched_thresholds": matched_thresholds, - "unmatched_thresholds": unmatched_thresholds, - } - - def generate_anchors_dict(self, feature_map_size): - anchors_list = [] - matched_thresholds = [a.match_threshold for a in self._anchor_generators] - unmatched_thresholds = [a.unmatch_threshold for a in self._anchor_generators] - match_list, unmatch_list = [], [] - anchors_dict = {a.class_name: {} for a in self._anchor_generators} - anchors_dict = OrderedDict(anchors_dict) - for anchor_generator, match_thresh, unmatch_thresh in zip( - self._anchor_generators, matched_thresholds, unmatched_thresholds - ): - anchors = anchor_generator.generate(feature_map_size) - anchors = anchors.reshape([*anchors.shape[:3], -1, anchors.shape[-1]]) - anchors_list.append(anchors) - num_anchors = np.prod(anchors.shape[:-1]) - match_list.append(np.full([num_anchors], match_thresh, anchors.dtype)) - unmatch_list.append(np.full([num_anchors], unmatch_thresh, anchors.dtype)) - class_name = anchor_generator.class_name - anchors_dict[class_name]["anchors"] = anchors - anchors_dict[class_name]["matched_thresholds"] = match_list[-1] - anchors_dict[class_name]["unmatched_thresholds"] = unmatch_list[-1] - return anchors_dict - - @property - def num_anchors_per_location(self): - num = 0 - for a_generator in self._anchor_generators: - num += a_generator.num_anchors_per_localization - return num diff --git a/det3d/core/anchor/target_ops.py b/det3d/core/anchor/target_ops.py deleted file mode 100644 index 02709bd..0000000 --- a/det3d/core/anchor/target_ops.py +++ /dev/null @@ -1,222 +0,0 @@ -import logging - -import numba -import numpy as np -import numpy.random as npr -from det3d.core.bbox import box_np_ops - -logger = logging.getLogger(__name__) - - -def unmap(data, count, inds, fill=0): - """Unmap a subset of item (data) back to the original set of items (of - size count)""" - if count == len(inds): - return data - - if len(data.shape) == 1: - ret = np.empty((count,), dtype=data.dtype) - ret.fill(fill) - ret[inds] = data - else: - ret = np.empty((count,) + data.shape[1:], dtype=data.dtype) - ret.fill(fill) - ret[inds, :] = data - return ret - - -def create_target_np( - all_anchors, - gt_boxes, - similarity_fn, - box_encoding_fn, - prune_anchor_fn=None, - gt_classes=None, - matched_threshold=0.6, - unmatched_threshold=0.45, - bbox_inside_weight=None, - positive_fraction=None, - rpn_batch_size=300, - norm_by_num_examples=False, - box_code_size=7, -): - """Modified from FAIR detectron. - Args: - all_anchors: [num_of_anchors, box_ndim] float tensor. - gt_boxes: [num_gt_boxes, box_ndim] float tensor. - similarity_fn: a function, accept anchors and gt_boxes, return - similarity matrix(such as IoU). - box_encoding_fn: a function, accept gt_boxes and anchors, return - box encodings(offsets). - prune_anchor_fn: a function, accept anchors, return indices that - indicate valid anchors. - gt_classes: [num_gt_boxes] int tensor. indicate gt classes, must - start with 1. - matched_threshold: float, iou greater than matched_threshold will - be treated as positives. - unmatched_threshold: float, iou smaller than unmatched_threshold will - be treated as negatives. - bbox_inside_weight: unused - positive_fraction: [0-1] float or None. if not None, we will try to - keep ratio of pos/neg equal to positive_fraction when sample. - if there is not enough positives, it fills the rest with negatives - rpn_batch_size: int. sample size - norm_by_num_examples: bool. norm box_weight by number of examples, but - I recommend to do this outside. - Returns: - labels, bbox_targets, bbox_outside_weights - """ - - total_anchors = all_anchors.shape[0] - if prune_anchor_fn is not None: - inds_inside = prune_anchor_fn(all_anchors) - anchors = all_anchors[inds_inside, :] - if not isinstance(matched_threshold, float): - matched_threshold = matched_threshold[inds_inside] - if not isinstance(unmatched_threshold, float): - unmatched_threshold = unmatched_threshold[inds_inside] - else: - anchors = all_anchors - inds_inside = None - num_inside = len(inds_inside) if inds_inside is not None else total_anchors - box_ndim = all_anchors.shape[1] - logger.debug("total_anchors: {}".format(total_anchors)) - logger.debug("inds_inside: {}".format(num_inside)) - logger.debug("anchors.shape: {}".format(anchors.shape)) - if gt_classes is None: - gt_classes = np.ones([gt_boxes.shape[0]], dtype=np.int32) - # Compute anchor labels: - # label=1 is positive, 0 is negative, -1 is don't care (ignore) - labels = np.empty((num_inside,), dtype=np.int32) - gt_ids = np.empty((num_inside,), dtype=np.int32) - labels.fill(-1) - gt_ids.fill(-1) - if len(gt_boxes) > 0: - # Compute overlaps between the anchors and the gt boxes overlaps - anchor_by_gt_overlap = similarity_fn(anchors, gt_boxes) - # Map from anchor to gt box that has highest overlap - anchor_to_gt_argmax = anchor_by_gt_overlap.argmax(axis=1) - # For each anchor, amount of overlap with most overlapping gt box - anchor_to_gt_max = anchor_by_gt_overlap[ - np.arange(num_inside), anchor_to_gt_argmax - ] # - # Map from gt box to an anchor that has highest overlap - gt_to_anchor_argmax = anchor_by_gt_overlap.argmax(axis=0) - # For each gt box, amount of overlap with most overlapping anchor - gt_to_anchor_max = anchor_by_gt_overlap[ - gt_to_anchor_argmax, np.arange(anchor_by_gt_overlap.shape[1]) - ] - # must remove gt which doesn't match any anchor. - empty_gt_mask = gt_to_anchor_max == 0 - gt_to_anchor_max[empty_gt_mask] = -1 - """ - if not np.all(empty_gt_mask): - gt_to_anchor_max = gt_to_anchor_max[empty_gt_mask] - anchor_by_gt_overlap = anchor_by_gt_overlap[:, empty_gt_mask] - gt_classes = gt_classes[empty_gt_mask] - gt_boxes = gt_boxes[empty_gt_mask] - """ - # Find all anchors that share the max overlap amount - # (this includes many ties) - anchors_with_max_overlap = np.where(anchor_by_gt_overlap == gt_to_anchor_max)[0] - # Fg label: for each gt use anchors with highest overlap - # (including ties) - gt_inds_force = anchor_to_gt_argmax[anchors_with_max_overlap] - labels[anchors_with_max_overlap] = gt_classes[gt_inds_force] - gt_ids[anchors_with_max_overlap] = gt_inds_force - # Fg label: above threshold IOU - pos_inds = anchor_to_gt_max >= matched_threshold - gt_inds = anchor_to_gt_argmax[pos_inds] - labels[pos_inds] = gt_classes[gt_inds] - gt_ids[pos_inds] = gt_inds - bg_inds = np.where(anchor_to_gt_max < unmatched_threshold)[0] - else: - # labels[:] = 0 - bg_inds = np.arange(num_inside) - fg_inds = np.where(labels > 0)[0] - fg_max_overlap = None - if len(gt_boxes) > 0: - fg_max_overlap = anchor_to_gt_max[fg_inds] - gt_pos_ids = gt_ids[fg_inds] - # bg_inds = np.where(anchor_to_gt_max < unmatched_threshold)[0] - # bg_inds = np.where(labels == 0)[0] - # subsample positive labels if we have too many - if positive_fraction is not None: - num_fg = int(positive_fraction * rpn_batch_size) - if len(fg_inds) > num_fg: - disable_inds = npr.choice( - fg_inds, size=(len(fg_inds) - num_fg), replace=False - ) - labels[disable_inds] = -1 - fg_inds = np.where(labels > 0)[0] - - # subsample negative labels if we have too many - # (samples with replacement, but since the set of bg inds is large most - # samples will not have repeats) - num_bg = rpn_batch_size - np.sum(labels > 0) - # print(num_fg, num_bg, len(bg_inds) ) - if len(bg_inds) > num_bg: - enable_inds = bg_inds[npr.randint(len(bg_inds), size=num_bg)] - labels[enable_inds] = 0 - bg_inds = np.where(labels == 0)[0] - else: - if len(gt_boxes) == 0: - labels[:] = 0 - else: - labels[bg_inds] = 0 - # re-enable anchors_with_max_overlap - labels[anchors_with_max_overlap] = gt_classes[gt_inds_force] - bbox_targets = np.zeros((num_inside, box_code_size), dtype=all_anchors.dtype) - if len(gt_boxes) > 0: - # bbox_targets[fg_inds, :] = box_encoding_fn( - # anchors[fg_inds, :], gt_boxes[anchor_to_gt_argmax[fg_inds], :]) - bbox_targets[fg_inds, :] = box_encoding_fn( - gt_boxes[anchor_to_gt_argmax[fg_inds], :], anchors[fg_inds, :] - ) - # Bbox regression loss has the form: - # loss(x) = weight_outside * L(weight_inside * x) - # Inside weights allow us to set zero loss on an element-wise basis - # Bbox regression is only trained on positive examples so we set their - # weights to 1.0 (or otherwise if config is different) and 0 otherwise - # NOTE: we don't need bbox_inside_weights, remove it. - # bbox_inside_weights = np.zeros((num_inside, box_ndim), dtype=np.float32) - # bbox_inside_weights[labels == 1, :] = [1.0] * box_ndim - - # The bbox regression loss only averages by the number of images in the - # mini-batch, whereas we need to average by the total number of example - # anchors selected - # Outside weights are used to scale each element-wise loss so the final - # average over the mini-batch is correct - # bbox_outside_weights = np.zeros((num_inside, box_ndim), dtype=np.float32) - bbox_outside_weights = np.zeros((num_inside,), dtype=all_anchors.dtype) - # uniform weighting of examples (given non-uniform sampling) - if norm_by_num_examples: - num_examples = np.sum(labels >= 0) # neg + pos - num_examples = np.maximum(1.0, num_examples) - bbox_outside_weights[labels > 0] = 1.0 / num_examples - else: - bbox_outside_weights[labels > 0] = 1.0 - # bbox_outside_weights[labels == 0, :] = 1.0 / num_examples - - # Map up to original set of anchors - if inds_inside is not None: - labels = unmap(labels, total_anchors, inds_inside, fill=-1) - bbox_targets = unmap(bbox_targets, total_anchors, inds_inside, fill=0) - # bbox_inside_weights = unmap( - # bbox_inside_weights, total_anchors, inds_inside, fill=0) - bbox_outside_weights = unmap( - bbox_outside_weights, total_anchors, inds_inside, fill=0 - ) - # return labels, bbox_targets, bbox_outside_weights - ret = { - "labels": labels, - "bbox_targets": bbox_targets, - "bbox_outside_weights": bbox_outside_weights, - "assigned_anchors_overlap": fg_max_overlap, - "positive_gt_id": gt_pos_ids, - } - if inds_inside is not None: - ret["assigned_anchors_inds"] = inds_inside[fg_inds] - else: - ret["assigned_anchors_inds"] = fg_inds - return ret diff --git a/det3d/core/bbox/__init__.py b/det3d/core/bbox/__init__.py index 5f65eac..11c3613 100644 --- a/det3d/core/bbox/__init__.py +++ b/det3d/core/bbox/__init__.py @@ -1,54 +1 @@ -# from .box_torch_ops import ( -# torch_to_np_dtype, -# second_box_decode, -# bev_box_decode, -# corners_nd, -# corners_2d, -# corner_to_standup_nd, -# rotation_3d_in_axis, -# rotation_2d, -# center_to_corner_box3d, -# center_to_corner_box2d, -# project_to_image, -# camera_to_lidar, -# lidar_to_camera, -# box_camera_to_lidar, -# box_lidar_to_camera, -# multiclass_nms, -# nms, -# rotate_nms, -# ) -# from .box_np_ops import ( -# points_count_rbbox, riou_cc, rinter_cc, second_box_encode, bev_box_encode, -# corners_nd, corner_to_standup_nd, rbbox2d_to_near_bbox, -# rotation_3d_in_axis, rotation_points_single_angle, rotation_2d, -# rotation_box, center_to_corner_box3d, center_to_corner_box2d, -# rbbox3d_to_corners, rbbox3d_to_bev_corners, minmax_to_corner_2d, -# minmax_to_corner_2d_v2, minmax_to_corner_3d, minmax_to_center_2d, -# center_to_minmax_2d_0_5, center_to_minmax_2d, limit_period, -# projection_matrix_to_CRT_kitti, get_frustum, get_frustum_v2, -# create_anchors_3d_stride, create_anchors_bev_stride, -# create_anchors_3d_range, create_anchors_bev_range, add_rgb_to_points, -# project_to_image, camera_to_lidar, lidar_to_camera, box_camera_to_lidar, -# box_lidar_to_camera, remove_outside_points, iou_jit, iou_3d_jit, -# iou_nd_jit, points_in_rbbox, corner_to_surfaces_3d, -# corner_to_surfaces_3d_jit, assign_label_to_voxel, assign_label_to_voxel_v3, -# image_box_region_area, get_minimum_bounding_box_bv, -# get_anchor_bv_in_feature_jit, get_anchor_bv_in_feature, -# sparse_sum_for_anchors_mask, fused_get_anchors_area, distance_similarity, -# box3d_to_bbox, change_box3d_center_) -# from .box_coders import (GroundBox3dCoder, BevBoxCoder, GroundBox3dCoderTorch, -# BevBoxCoderTorch) -from . import box_coders, box_np_ops, box_torch_ops, geometry, region_similarity - -# from .region_similarity import (RegionSimilarityCalculator, -# RotateIouSimilarity, NearestIouSimilarity, -# DistanceSimilarity) -from .iou import bbox_overlaps - -# from .geometry import ( -# points_count_convex_polygon_3d_jit, is_line_segment_intersection_jit, -# line_segment_intersection, is_line_segment_cross, surface_equ_3d_jit, -# points_in_convex_polygon_3d_jit_v1, surface_equ_3d, surface_equ_3d_jitv2, -# points_in_convex_polygon_3d_jit, points_in_convex_polygon_jit, -# points_in_convex_polygon, points_in_convex_polygon_3d_jit_v2) +from . import box_np_ops, box_torch_ops, geometry diff --git a/det3d/core/bbox/box_coders.py b/det3d/core/bbox/box_coders.py deleted file mode 100644 index 75de0be..0000000 --- a/det3d/core/bbox/box_coders.py +++ /dev/null @@ -1,133 +0,0 @@ -from abc import ABCMeta, abstractmethod, abstractproperty - -import numpy as np - -from . import box_np_ops, box_torch_ops - - -class BoxCoder(object): - """Abstract base class for box coder.""" - - __metaclass__ = ABCMeta - - @abstractproperty - def code_size(self): - pass - - def encode(self, boxes, anchors): - return self._encode(boxes, anchors) - - def decode(self, rel_codes, anchors): - return self._decode(rel_codes, anchors) - - @abstractmethod - def _encode(self, boxes, anchors): - pass - - @abstractmethod - def _decode(self, rel_codes, anchors): - pass - - -class GroundBox3dCoder(BoxCoder): - def __init__(self, linear_dim=False, vec_encode=False, n_dim=7, norm_velo=False): - super().__init__() - self.linear_dim = linear_dim - self.vec_encode = vec_encode - self.norm_velo = norm_velo - self.n_dim = n_dim - - @property - def code_size(self): - # return 8 if self.vec_encode else 7 - # return 10 if self.vec_encode else 9 - return self.n_dim + 1 if self.vec_encode else self.n_dim - - def _encode(self, boxes, anchors): - return box_np_ops.second_box_encode( - boxes, - anchors, - encode_angle_to_vector=self.vec_encode, - smooth_dim=self.linear_dim, - norm_velo=self.norm_velo, - ) - - def _decode(self, encodings, anchors): - return box_np_ops.second_box_decode( - encodings, - anchors, - encode_angle_to_vector=self.vec_encode, - smooth_dim=self.linear_dim, - norm_velo=self.norm_velo, - ) - - -class BevBoxCoder(BoxCoder): - """WARNING: this coder will return encoding with size=5, but - takes size=7 boxes, anchors - """ - - def __init__(self, linear_dim=False, vec_encode=False, z_fixed=-1.0, h_fixed=2.0): - super().__init__() - self.linear_dim = linear_dim - self.z_fixed = z_fixed - self.h_fixed = h_fixed - self.vec_encode = vec_encode - - @property - def code_size(self): - return 6 if self.vec_encode else 5 - - def _encode(self, boxes, anchors): - anchors = anchors[..., [0, 1, 3, 4, 6]] - boxes = boxes[..., [0, 1, 3, 4, 6]] - return box_np_ops.bev_box_encode( - boxes, anchors, self.vec_encode, self.linear_dim - ) - - def _decode(self, encodings, anchors): - anchors = anchors[..., [0, 1, 3, 4, 6]] - ret = box_np_ops.bev_box_decode( - encodings, anchors, self.vec_encode, self.linear_dim - ) - z_fixed = np.full([*ret.shape[:-1], 1], self.z_fixed, dtype=ret.dtype) - h_fixed = np.full([*ret.shape[:-1], 1], self.h_fixed, dtype=ret.dtype) - return np.concatenate( - [ret[..., :2], z_fixed, ret[..., 2:4], h_fixed, ret[..., 4:]], axis=-1 - ) - - -class GroundBox3dCoderTorch(GroundBox3dCoder): - def encode_torch(self, boxes, anchors): - return box_torch_ops.second_box_encode( - boxes, anchors, self.vec_encode, self.linear_dim - ) - - def decode_torch(self, boxes, anchors): - return box_torch_ops.second_box_decode( - boxes, anchors, self.vec_encode, self.linear_dim - ) - - -class BevBoxCoderTorch(BevBoxCoder): - def encode_torch(self, boxes, anchors): - anchors = anchors[..., [0, 1, 3, 4, 6]] - boxes = boxes[..., [0, 1, 3, 4, 6]] - return box_torch_ops.bev_box_encode( - boxes, anchors, self.vec_encode, self.linear_dim - ) - - def decode_torch(self, encodings, anchors): - anchors = anchors[..., [0, 1, 3, 4, 6]] - ret = box_torch_ops.bev_box_decode( - encodings, anchors, self.vec_encode, self.linear_dim - ) - z_fixed = torch.full( - [*ret.shape[:-1], 1], self.z_fixed, dtype=ret.dtype, device=ret.device - ) - h_fixed = torch.full( - [*ret.shape[:-1], 1], self.h_fixed, dtype=ret.dtype, device=ret.device - ) - return torch.cat( - [ret[..., :2], z_fixed, ret[..., 2:4], h_fixed, ret[..., 4:]], dim=-1 - ) diff --git a/det3d/core/bbox/box_np_ops.py b/det3d/core/bbox/box_np_ops.py index bbd368a..5d7e5c8 100644 --- a/det3d/core/bbox/box_np_ops.py +++ b/det3d/core/bbox/box_np_ops.py @@ -52,221 +52,6 @@ def rinter_cc(rbboxes, qrbboxes, standup_thresh=0.0): ) -def second_box_encode( - boxes, - anchors, - encode_angle_to_vector=False, - smooth_dim=False, - cylindrical=False, - norm_velo=False, -): - """box encode for VoxelNet in lidar - Args: - boxes ([N, 7] Tensor): normal boxes: x, y, z, w, l, h, r - anchors ([N, 7] Tensor): anchors - """ - # need to convert boxes to z-center format - box_ndim = anchors.shape[-1] - - if box_ndim == 7: - xa, ya, za, wa, la, ha, ra = np.split(anchors, box_ndim, axis=1) - xg, yg, zg, wg, lg, hg, rg = np.split(boxes, box_ndim, axis=1) - else: - xa, ya, za, wa, la, ha, vxa, vya, ra = np.split(anchors, box_ndim, axis=1) - xg, yg, zg, wg, lg, hg, vxg, vyg, rg = np.split(boxes, box_ndim, axis=1) - - diagonal = np.sqrt(la ** 2 + wa ** 2) # 4.3 - xt = (xg - xa) / diagonal - yt = (yg - ya) / diagonal - zt = (zg - za) / ha # 1.6 - - if smooth_dim: - lt = lg / la - 1 - wt = wg / wa - 1 - ht = hg / ha - 1 - else: - lt = np.log(lg / la) - wt = np.log(wg / wa) - ht = np.log(hg / ha) - - ret = [xt, yt, zt, wt, lt, ht] - - if box_ndim > 7: - if norm_velo: - vxt = (vxg - vxa) / diagonal - vyt = (vyg - vya) / diagonal - else: - vxt = vxg - vxa - vyt = vyg - vya - ret.extend([vxt, vyt]) - - if encode_angle_to_vector: - rgx = np.cos(rg) - rgy = np.sin(rg) - rax = np.cos(ra) - ray = np.sin(ra) - rtx = rgx - rax - rty = rgy - ray - - ret.extend([rtx, rty]) - else: - rt = rg - ra - ret.append(rt) - - return np.concatenate(ret, axis=1) - - -def second_box_decode( - box_encodings, - anchors, - encode_angle_to_vector=False, - smooth_dim=False, - cylindrical=False, - norm_velo=False, -): - """box decode for VoxelNet in lidar - Args: - boxes ([N, 9] Tensor): normal boxes: x, y, z, w, l, h, vx, vy, r - anchors ([N, 9] Tensor): anchors - """ - # need to convert box_encodings to z-bottom format - box_ndim = anchors.shape[-1] - - if box_ndim > 7: - xa, ya, za, wa, la, ha, vxa, vya, ra = np.split(anchors, box_ndim, axis=1) - if encode_angle_to_vector: - xt, yt, zt, wt, lt, ht, vxt, vyt, rtx, rty = np.split( - box_encodings, box_ndim + 1, axis=-1 - ) - else: - xt, yt, zt, wt, lt, ht, vxt, vyt, rt = np.split( - box_encodings, box_ndim, axis=-1 - ) - else: - xa, ya, za, wa, la, ha, ra = np.split(anchors, box_ndim, axis=-1) - if encode_angle_to_vector: - xt, yt, zt, wt, lt, ht, rtx, rty = np.split( - box_encodings, box_ndim + 1, axis=-1 - ) - else: - xt, yt, zt, wt, lt, ht, rt = np.split(box_encodings, box_ndim, axis=-1) - - # if cylindrical: - # diagonal = np.sqrt(la**2 + wa**2) - # xg = xt * diagonal + xa - # yg = yt * diagonal + ya - # else: - # diagonal = np.sqrt(la**2 + wa**2) - # xg = xt * diagonal + xa - # yg = yt * diagonal + ya - - diagonal = np.sqrt(la ** 2 + wa ** 2) - xg = xt * diagonal + xa - yg = yt * diagonal + ya - zg = zt * ha + za - - ret = [xg, yg, zg] - - if smooth_dim: - lg = (lt + 1) * la - wg = (wt + 1) * wa - hg = (ht + 1) * ha - else: - lg = np.exp(lt) * la - wg = np.exp(wt) * wa - hg = np.exp(ht) * ha - ret.extend([wg, lg, hg]) - - if encode_angle_to_vector: - rax = np.cos(ra) - ray = np.sin(ra) - rgx = rtx + rax - rgy = rty + ray - rg = np.arctan2(rgy, rgx) - else: - rg = rt + ra - - if box_ndim > 7: - if norm_velo: - vxg = vxt * diagonal + vxa - vyg = vyt * diagonal + vya - else: - vxg = vxt + vxa - vyg = vyt + vya - ret.extend([vxg, vyg]) - - ret.append(rg) - - return np.concatenate(ret, axis=1) - - -def bev_box_encode(boxes, anchors, encode_angle_to_vector=False, smooth_dim=False): - """box encode for VoxelNet in lidar - Args: - boxes ([N, 7] Tensor): normal boxes: x, y, z, w, l, h, r - anchors ([N, 7] Tensor): anchors - encode_angle_to_vector: bool. increase aos performance, - decrease other performance. - """ - # need to convert boxes to z-center format - xa, ya, wa, la, ra = np.split(anchors, 5, axis=-1) - xg, yg, wg, lg, rg = np.split(boxes, 5, axis=-1) - diagonal = np.sqrt(la ** 2 + wa ** 2) # 4.3 - xt = (xg - xa) / diagonal - yt = (yg - ya) / diagonal - if smooth_dim: - lt = lg / la - 1 - wt = wg / wa - 1 - else: - lt = np.log(lg / la) - wt = np.log(wg / wa) - if encode_angle_to_vector: - rgx = np.cos(rg) - rgy = np.sin(rg) - rax = np.cos(ra) - ray = np.sin(ra) - rtx = rgx - rax - rty = rgy - ray - return np.concatenate([xt, yt, wt, lt, rtx, rty], axis=-1) - else: - rt = rg - ra - return np.concatenate([xt, yt, wt, lt, rt], axis=-1) - - -def bev_box_decode( - box_encodings, anchors, encode_angle_to_vector=False, smooth_dim=False -): - """box decode for VoxelNet in lidar - Args: - boxes ([N, 7] Tensor): normal boxes: x, y, z, w, l, h, r - anchors ([N, 7] Tensor): anchors - """ - # need to convert box_encodings to z-bottom format - xa, ya, wa, la, ra = np.split(anchors, 5, axis=-1) - if encode_angle_to_vector: - xt, yt, wt, lt, rtx, rty = np.split(box_encodings, 6, axis=-1) - else: - xt, yt, wt, lt, rt = np.split(box_encodings, 5, axis=-1) - diagonal = np.sqrt(la ** 2 + wa ** 2) - xg = xt * diagonal + xa - yg = yt * diagonal + ya - if smooth_dim: - lg = (lt + 1) * la - wg = (wt + 1) * wa - else: - lg = np.exp(lt) * la - wg = np.exp(wt) * wa - if encode_angle_to_vector: - rax = np.cos(ra) - ray = np.sin(ra) - rgx = rtx + rax - rgy = rty + ray - rg = np.arctan2(rgy, rgx) - else: - rg = rt + ra - return np.concatenate([xg, yg, wg, lg, rg], axis=-1) - - def corners_nd(dims, origin=0.5): """generate relative box corners based on length per dim and origin point. @@ -631,253 +416,6 @@ def get_frustum_v2(bboxes, C, near_clip=0.001, far_clip=100): return ret_xyz -def create_anchors_3d_stride( - feature_size, - sizes=[1.6, 3.9, 1.56], - anchor_strides=[0.4, 0.4, 0.0], - anchor_offsets=[0.2, -39.8, -1.78], - rotations=[0, np.pi / 2], - velocities=[], - dtype=np.float32, -): - """ - Args: - feature_size: list [D, H, W](zyx) - sizes: [N, 3] list of list or array, size of anchors, xyz - - Returns: - anchors: [*feature_size, num_sizes, num_rots, 7] tensor. - """ - # almost 2x faster than v1 - x_stride, y_stride, z_stride = anchor_strides - x_offset, y_offset, z_offset = anchor_offsets - z_centers = np.arange(feature_size[0], dtype=dtype) - y_centers = np.arange(feature_size[1], dtype=dtype) - x_centers = np.arange(feature_size[2], dtype=dtype) - z_centers = z_centers * z_stride + z_offset - y_centers = y_centers * y_stride + y_offset - x_centers = x_centers * x_stride + x_offset - sizes = np.reshape(np.array(sizes, dtype=dtype), [-1, 3]) - rotations = np.array(rotations, dtype=dtype) - velocities = np.array(velocities, dtype=dtype).reshape([-1, 2]) - - combines = np.hstack([sizes, velocities]).reshape([-1, 5]) - - rets = np.meshgrid(x_centers, y_centers, z_centers, rotations, indexing="ij") - tile_shape = [1] * 5 - tile_shape[-2] = int(sizes.shape[0]) - for i in range(len(rets)): - rets[i] = np.tile(rets[i][..., np.newaxis, :], tile_shape) - rets[i] = rets[i][..., np.newaxis] # for concat - # sizes = np.reshape(sizes, [1, 1, 1, -1, 1, 3]) - combines = np.reshape(combines, [1, 1, 1, -1, 1, 5]) - tile_size_shape = list(rets[0].shape) - tile_size_shape[3] = 1 - # sizes = np.tile(sizes, tile_size_shape) - combines = np.tile(combines, tile_size_shape) - - # rets.insert(3, sizes) - rets.insert(3, combines) - - ret = np.concatenate(rets, axis=-1) - return np.transpose(ret, [2, 1, 0, 3, 4, 5]) - - -def create_anchors_bev_stride( - feature_size, - sizes=[1.6, 3.9], - anchor_strides=[0.4, 0.4], - anchor_offsets=[0.2, -39.8], - rotations=[0, np.pi / 2], - velocities=[], - dtype=np.float32, -): - """ - Args: - feature_size: list [D, H, W](zyx) - sizes: [N, 3] list of list or array, size of anchors, xyz - - Returns: - anchors: [*feature_size, num_sizes, num_rots, 7] tensor. - """ - # almost 2x faster than v1 - x_stride, y_stride = anchor_strides - x_offset, y_offset = anchor_offsets - y_centers = np.arange(feature_size[0], dtype=dtype) - x_centers = np.arange(feature_size[1], dtype=dtype) - y_centers = y_centers * y_stride + y_offset - x_centers = x_centers * x_stride + x_offset - sizes = np.reshape(np.array(sizes, dtype=dtype), [-1, 2]) - rotations = np.array(rotations, dtype=dtype) - velocities = np.array(velocities, dtype=dtype).reshape([-1, 2]) - - combines = np.hstack([sizes, velocities]).reshape([-1, 4]) - - rets = np.meshgrid(x_centers, y_centers, rotations, indexing="ij") - tile_shape = [1] * 4 - tile_shape[-2] = int(sizes.shape[0]) - for i in range(len(rets)): - rets[i] = np.tile(rets[i][..., np.newaxis, :], tile_shape) - rets[i] = rets[i][..., np.newaxis] # for concat - # sizes = np.reshape(sizes, [1, 1, 1, -1, 1, 3]) - combines = np.reshape(combines, [1, 1, 1, -1, 1, 4]) - tile_size_shape = list(rets[0].shape) - tile_size_shape[3] = 1 - # sizes = np.tile(sizes, tile_size_shape) - combines = np.tile(combines, tile_size_shape) - - # rets.insert(3, sizes) - rets.insert(3, combines) - - ret = np.concatenate(rets, axis=-1) - return np.transpose(ret, [2, 1, 0, 3, 4, 5]) - - -def create_anchors_3d_range( - feature_size, - anchor_range, - sizes=[1.6, 3.9, 1.56], - rotations=[0, np.pi / 2], - velocities=None, - dtype=np.float32, -): - """ - Args: - feature_size: list [D, H, W](zyx) - sizes: [N, 3] list of list or array, size of anchors, xyz - rotations: len(stride) num Reference - velocities: ref velo along x and y axis. - - Returns: - anchors: [*feature_size, num_sizes, num_rots, 9] tensor. - """ - anchor_range = np.array(anchor_range, dtype) - stride = (anchor_range[3] - anchor_range[0]) / feature_size[2] - - z_centers = np.linspace( - anchor_range[2], anchor_range[5], feature_size[0], dtype=dtype - ) - y_centers = ( - np.linspace( - anchor_range[1], - anchor_range[4], - feature_size[1], - endpoint=False, - dtype=dtype, - ) - + stride / 2 - ) - x_centers = ( - np.linspace( - anchor_range[0], - anchor_range[3], - feature_size[2], - endpoint=False, - dtype=dtype, - ) - + stride / 2 - ) - rotations = np.array(rotations, dtype=dtype) - sizes = np.reshape(np.array(sizes, dtype=dtype), [-1, 3]) - - if velocities is not None: - velocities = np.array(velocities, dtype=dtype).reshape([-1, 2]) - combines = np.hstack([sizes, velocities]).reshape([-1, 5]) - else: - combines = sizes - - rets = np.meshgrid(x_centers, y_centers, z_centers, rotations, indexing="ij") - - tile_shape = [1] * 5 - tile_shape[-2] = int(sizes.shape[0]) - for i in range(len(rets)): - rets[i] = np.tile(rets[i][..., np.newaxis, :], tile_shape) - rets[i] = rets[i][..., np.newaxis] # for concat - # sizes = np.reshape(sizes, [1, 1, 1, -1, 1, 3]) - combines = np.reshape(combines, [1, 1, 1, -1, 1, combines.shape[-1]]) - tile_size_shape = list(rets[0].shape) - tile_size_shape[3] = 1 - # sizes = np.tile(sizes, tile_size_shape) - combines = np.tile(combines, tile_size_shape) - - # rets.insert(3, sizes) - rets.insert(3, combines) - - ret = np.concatenate(rets, axis=-1) - - return np.transpose(ret, [2, 1, 0, 3, 4, 5]) - - -def create_anchors_bev_range( - feature_size, - anchor_range, - sizes=[1.6, 3.9], - rotations=[0, np.pi / 2], - velocities=None, - dtype=np.float32, -): - """ - Args: - feature_size: list [D, H, W](zyx) - sizes: [N, 3] list of list or array, size of anchors, xyz - rotations: len(stride) num Reference - velocities: ref velo along x and y axis. - - Returns: - anchors: [*feature_size, num_sizes, num_rots, 9] tensor. - """ - anchor_range = np.array(anchor_range, dtype) - stride = (anchor_range[2] - anchor_range[0]) / feature_size[1] - - y_centers = ( - np.linspace( - anchor_range[1], - anchor_range[3], - feature_size[0], - endpoint=False, - dtype=dtype, - ) - + stride / 2 - ) - x_centers = ( - np.linspace( - anchor_range[0], - anchor_range[2], - feature_size[1], - endpoint=False, - dtype=dtype, - ) - + stride / 2 - ) - rotations = np.array(rotations, dtype=dtype) - sizes = np.reshape(np.array(sizes, dtype=dtype), [-1, 2]) - - if velocities is not None: - velocities = np.array(velocities, dtype=dtype).reshape([-1, 2]) - combines = np.hstack([sizes, velocities]).reshape([-1, 4]) - else: - combines = sizes - - rets = np.meshgrid(x_centers, y_centers, rotations, indexing="ij") - - tile_shape = [1] * 4 - # tile_shape[-2] = int(sizes.shape[0]) - for i in range(len(rets)): - rets[i] = np.tile(rets[i][..., np.newaxis, :], tile_shape) - rets[i] = rets[i][..., np.newaxis] # for concat - # sizes = np.reshape(sizes, [1, 1, 1, -1, 1, 3]) - combines = np.reshape(combines, [1, 1, -1, 1, combines.shape[-1]]) - tile_size_shape = list(rets[0].shape) - tile_size_shape[2] = 1 - # sizes = np.tile(sizes, tile_size_shape) - combines = np.tile(combines, tile_size_shape) - - rets.insert(2, combines) - - ret = np.concatenate(rets, axis=-1) - return np.transpose(ret, [1, 0, 2, 3, 4]) - - @numba.njit def _add_rgb_to_points_kernel(points_2d, image, points_rgb): num_points = points_2d.shape[0] @@ -1244,94 +782,7 @@ def get_minimum_bounding_box_bv(points, voxel_size, bound, downsample=8, margin= min_x = np.maximum(min_x - margin, bound[0]) min_y = np.maximum(min_y - margin, bound[1]) return np.array([min_x, min_y, max_x, max_y]) - - -@numba.jit(nopython=True) -def get_anchor_bv_in_feature_jit(anchors_bv, voxel_size, coors_range, grid_size): - anchors_bv_coors = np.zeros(anchors_bv.shape, dtype=np.int32) - anchor_coor = np.zeros(anchors_bv.shape[1:], dtype=np.int32) - grid_size_x = grid_size[0] - 1 - grid_size_y = grid_size[1] - 1 - for i in range(anchors_bv.shape[0]): - anchor_coor[0] = np.floor((anchors_bv[i, 0] - coors_range[0]) / voxel_size[0]) - anchor_coor[1] = np.floor((anchors_bv[i, 1] - coors_range[1]) / voxel_size[1]) - anchor_coor[2] = np.floor((anchors_bv[i, 2] - coors_range[0]) / voxel_size[0]) - anchor_coor[3] = np.floor((anchors_bv[i, 3] - coors_range[1]) / voxel_size[1]) - anchor_coor[0] = max(anchor_coor[0], 0) - anchor_coor[1] = max(anchor_coor[1], 0) - anchor_coor[2] = min(anchor_coor[2], grid_size_x) - anchor_coor[3] = min(anchor_coor[3], grid_size_y) - anchors_bv_coors[i] = anchor_coor - return anchors_bv_coors - - -def get_anchor_bv_in_feature(anchors_bv, voxel_size, coors_range, grid_size): - vsize_bv = np.tile(voxel_size[:2], 2) - anchors_bv[..., [1, 3]] -= coors_range[1] - anchors_bv_coors = np.floor(anchors_bv / vsize_bv).astype(np.int64) - anchors_bv_coors[..., [0, 2]] = np.clip( - anchors_bv_coors[..., [0, 2]], a_max=grid_size[0] - 1, a_min=0 - ) - anchors_bv_coors[..., [1, 3]] = np.clip( - anchors_bv_coors[..., [1, 3]], a_max=grid_size[1] - 1, a_min=0 - ) - anchors_bv_coors = anchors_bv_coors.reshape([-1, 4]) - return anchors_bv_coors - - -@numba.jit(nopython=True) -def sparse_sum_for_anchors_mask(coors, shape): - ret = np.zeros(shape, dtype=np.float32) - for i in range(coors.shape[0]): - ret[coors[i, 1], coors[i, 2]] += 1 - return ret - - -@numba.jit(nopython=True) -def fused_get_anchors_area(dense_map, anchors_bv, stride, offset, grid_size): - anchor_coor = np.zeros(anchors_bv.shape[1:], dtype=np.int32) - grid_size_x = grid_size[0] - 1 - grid_size_y = grid_size[1] - 1 - N = anchors_bv.shape[0] - ret = np.zeros((N), dtype=dense_map.dtype) - for i in range(N): - anchor_coor[0] = np.floor((anchors_bv[i, 0] - offset[0]) / stride[0]) - anchor_coor[1] = np.floor((anchors_bv[i, 1] - offset[1]) / stride[1]) - anchor_coor[2] = np.floor((anchors_bv[i, 2] - offset[0]) / stride[0]) - anchor_coor[3] = np.floor((anchors_bv[i, 3] - offset[1]) / stride[1]) - anchor_coor[0] = max(anchor_coor[0], 0) - anchor_coor[1] = max(anchor_coor[1], 0) - anchor_coor[2] = min(anchor_coor[2], grid_size_x) - anchor_coor[3] = min(anchor_coor[3], grid_size_y) - ID = dense_map[anchor_coor[3], anchor_coor[2]] - IA = dense_map[anchor_coor[1], anchor_coor[0]] - IB = dense_map[anchor_coor[3], anchor_coor[0]] - IC = dense_map[anchor_coor[1], anchor_coor[2]] - ret[i] = ID - IB - IC + IA - return ret - - -@numba.jit(nopython=True) -def distance_similarity(points, qpoints, dist_norm, with_rotation=False, rot_alpha=0.5): - N = points.shape[0] - K = qpoints.shape[0] - dists = np.zeros((N, K), dtype=points.dtype) - rot_alpha_1 = 1 - rot_alpha - for k in range(K): - for n in range(N): - if np.abs(points[n, 0] - qpoints[k, 0]) <= dist_norm: - if np.abs(points[n, 1] - qpoints[k, 1]) <= dist_norm: - dist = np.sum((points[n, :2] - qpoints[k, :2]) ** 2) - dist_normed = min(dist / dist_norm, dist_norm) - if with_rotation: - dist_rot = np.abs(np.sin(points[n, -1] - qpoints[k, -1])) - dists[n, k] = ( - 1 - rot_alpha_1 * dist_normed - rot_alpha * dist_rot - ) - else: - dists[n, k] = 1 - dist_normed - return dists - + def box3d_to_bbox(box3d, rect, Trv2c, P2): box3d_to_cam = box_lidar_to_camera(box3d, rect, Trv2c) diff --git a/det3d/core/bbox/box_torch_ops.py b/det3d/core/bbox/box_torch_ops.py index e609d9a..18a3b12 100644 --- a/det3d/core/bbox/box_torch_ops.py +++ b/det3d/core/bbox/box_torch_ops.py @@ -5,7 +5,7 @@ import torch from torch import stack as tstack try: - from det3d.ops.iou3d_nms import iou3d_nms_cuda + from det3d.ops.iou3d_nms import iou3d_nms_cuda, iou3d_nms_utils except: print("iou3d cuda not built. You don't need this if you use circle_nms. Otherwise, refer to the advanced installation part to build this cuda extension") @@ -21,203 +21,6 @@ def torch_to_np_dtype(ttype): return type_map[ttype] -def second_box_encode( - boxes, anchors, encode_angle_to_vector=False, smooth_dim=False, norm_velo=False -): - """box encode for VoxelNet - Args: - boxes ([N, 7] Tensor): normal boxes: x, y, z, l, w, h, r - anchors ([N, 7] Tensor): anchors - """ - box_ndim = anchors.shape[-1] - - if box_ndim == 7: - xa, ya, za, wa, la, ha, ra = torch.split(anchors, 1, dim=-1) - xg, yg, zg, wg, lg, hg, rg = torch.split(boxes, 1, dim=-1) - else: - xa, ya, za, wa, la, ha, vxa, vya, ra = torch.split(anchors, 1, dim=-1) - xg, yg, zg, wg, lg, hg, vxg, vyg, rg = torch.split(boxes, 1, dim=-1) - - diagonal = torch.sqrt(la ** 2 + wa ** 2) - xt = (xg - xa) / diagonal - yt = (yg - ya) / diagonal - zt = (zg - za) / ha - - if smooth_dim: - lt = lg / la - 1 - wt = wg / wa - 1 - ht = hg / ha - 1 - else: - lt = torch.log(lg / la) - wt = torch.log(wg / wa) - ht = torch.log(hg / ha) - - ret = [xt, yt, zt, wt, lt, ht] - - if box_ndim > 7: - if norm_velo: - vxt = (vxg - vxa) / diagonal - vyt = (vyg - vya) / diagonal - else: - vxt = vxg - vxa - vyt = vyg - vya - ret.extend([vxt, vyt]) - - if encode_angle_to_vector: - rgx = torch.cos(rg) - rgy = torch.sin(rg) - rax = torch.cos(ra) - ray = torch.sin(ra) - rtx = rgx - rax - rty = rgy - ray - ret.extend([rtx, rty]) - else: - rt = rg - ra - ret.append(rt) - - return torch.cat(ret, dim=-1) - - -def second_box_decode( - box_encodings, - anchors, - encode_angle_to_vector=False, - bin_loss=False, - smooth_dim=False, - norm_velo=False, -): - """box decode for VoxelNet in lidar - Args: - boxes ([N, 7] Tensor): normal boxes: x, y, z, w, l, h, r - anchors ([N, 7] Tensor): anchors - """ - box_ndim = anchors.shape[-1] - - if box_ndim == 9: - xa, ya, za, wa, la, ha, vxa, vya, ra = torch.split(anchors, 1, dim=-1) - if encode_angle_to_vector: - xt, yt, zt, wt, lt, ht, vxt, vyt, rtx, rty = torch.split( - box_encodings, 1, dim=-1 - ) - else: - xt, yt, zt, wt, lt, ht, vxt, vyt, rt = torch.split(box_encodings, 1, dim=-1) - elif box_ndim == 7: - xa, ya, za, wa, la, ha, ra = torch.split(anchors, 1, dim=-1) - if encode_angle_to_vector: - xt, yt, zt, wt, lt, ht, rtx, rty = torch.split(box_encodings, 1, dim=-1) - else: - xt, yt, zt, wt, lt, ht, rt = torch.split(box_encodings, 1, dim=-1) - - diagonal = torch.sqrt(la ** 2 + wa ** 2) - xg = xt * diagonal + xa - yg = yt * diagonal + ya - zg = zt * ha + za - - ret = [xg, yg, zg] - - if smooth_dim: - lg = (lt + 1) * la - wg = (wt + 1) * wa - hg = (ht + 1) * ha - else: - - lg = torch.exp(lt) * la - wg = torch.exp(wt) * wa - hg = torch.exp(ht) * ha - ret.extend([wg, lg, hg]) - - if encode_angle_to_vector: - rax = torch.cos(ra) - ray = torch.sin(ra) - rgx = rtx + rax - rgy = rty + ray - rg = torch.atan2(rgy, rgx) - else: - rg = rt + ra - - if box_ndim > 7: - if norm_velo: - vxg = vxt * diagonal + vxa - vyg = vyt * diagonal + vya - else: - vxg = vxt + vxa - vyg = vyt + vya - ret.extend([vxg, vyg]) - - ret.append(rg) - - return torch.cat(ret, dim=-1) - - -def bev_box_encode(boxes, anchors, encode_angle_to_vector=False, smooth_dim=False): - """box encode for VoxelNet - Args: - boxes ([N, 7] Tensor): normal boxes: x, y, z, l, w, h, r - anchors ([N, 7] Tensor): anchors - """ - xa, ya, wa, la, ra = torch.split(anchors, 1, dim=-1) - xg, yg, wg, lg, rg = torch.split(boxes, 1, dim=-1) - diagonal = torch.sqrt(la ** 2 + wa ** 2) - xt = (xg - xa) / diagonal - yt = (yg - ya) / diagonal - if smooth_dim: - lt = lg / la - 1 - wt = wg / wa - 1 - else: - lt = torch.log(lg / la) - wt = torch.log(wg / wa) - if encode_angle_to_vector: - rgx = torch.cos(rg) - rgy = torch.sin(rg) - rax = torch.cos(ra) - ray = torch.sin(ra) - rtx = rgx - rax - rty = rgy - ray - return torch.cat([xt, yt, wt, lt, rtx, rty], dim=-1) - else: - rt = rg - ra - return torch.cat([xt, yt, wt, lt, rt], dim=-1) - - # rt = rg - ra - # return torch.cat([xt, yt, zt, wt, lt, ht, rt], dim=-1) - - -def bev_box_decode( - box_encodings, anchors, encode_angle_to_vector=False, smooth_dim=False -): - """box decode for VoxelNet in lidar - Args: - boxes ([N, 7] Tensor): normal boxes: x, y, z, w, l, h, r - anchors ([N, 7] Tensor): anchors - """ - xa, ya, wa, la, ra = torch.split(anchors, 1, dim=-1) - if encode_angle_to_vector: - xt, yt, wt, lt, rtx, rty = torch.split(box_encodings, 1, dim=-1) - - else: - xt, yt, wt, lt, rt = torch.split(box_encodings, 1, dim=-1) - - # xt, yt, zt, wt, lt, ht, rt = torch.split(box_encodings, 1, dim=-1) - diagonal = torch.sqrt(la ** 2 + wa ** 2) - xg = xt * diagonal + xa - yg = yt * diagonal + ya - if smooth_dim: - lg = (lt + 1) * la - wg = (wt + 1) * wa - else: - lg = torch.exp(lt) * la - wg = torch.exp(wt) * wa - if encode_angle_to_vector: - rax = torch.cos(ra) - ray = torch.sin(ra) - rgx = rtx + rax - rgy = rty + ray - rg = torch.atan2(rgy, rgx) - else: - rg = rt + ra - return torch.cat([xg, yg, wg, lg, rg], dim=-1) - - def corners_nd(dims, origin=0.5): """generate relative box corners based on length per dim and origin point. @@ -318,33 +121,25 @@ def rotation_3d_in_axis(points, angles, axis=0): # print(points.shape, rot_mat_T.shape) return torch.einsum("aij,jka->aik", points, rot_mat_T) - -# def rotation_points_single_angle(points, angle, axis=0): -# # points: [N, 3] -# rot_sin = math.sin(angle) -# rot_cos = math.cos(angle) -# point_type = torchplus.get_tensor_class(points) -# if axis == 1: -# rot_mat_T = torch.stack([ -# point_type([rot_cos, 0, -rot_sin]), -# point_type([0, 1, 0]), -# point_type([rot_sin, 0, rot_cos]) -# ]) -# elif axis == 2 or axis == -1: -# rot_mat_T = torch.stack([ -# point_type([rot_cos, -rot_sin, 0]), -# point_type([rot_sin, rot_cos, 0]), -# point_type([0, 0, 1]) -# ]) -# elif axis == 0: -# rot_mat_T = torch.stack([ -# point_type([1, 0, 0]), -# point_type([0, rot_cos, -rot_sin]), -# point_type([0, rot_sin, rot_cos]) -# ]) -# else: -# raise ValueError("axis should in range") -# return points @ rot_mat_T +def rotate_points_along_z(points, angle): + """ + Args: + points: (B, N, 3 + C) + angle: (B), angle along z-axis, angle increases x ==> y + Returns: + """ + cosa = torch.cos(angle) + sina = torch.sin(angle) + zeros = angle.new_zeros(points.shape[0]) + ones = angle.new_ones(points.shape[0]) + rot_matrix = torch.stack(( + cosa, -sina, zeros, + sina, cosa, zeros, + zeros, zeros, ones + ), dim=1).view(-1, 3, 3).float() + points_rot = torch.matmul(points[:, :, 0:3], rot_matrix) + points_rot = torch.cat((points_rot, points[:, :, 3:]), dim=-1) + return points_rot def rotation_2d(points, angles): @@ -450,114 +245,17 @@ def box_lidar_to_camera(data, r_rect, velo2cam): return torch.cat([xyz, l, h, w, r], dim=-1) -def multiclass_nms( - nms_func, - boxes, - scores, - num_class, - pre_max_size=None, - post_max_size=None, - score_thresh=0.0, - iou_threshold=0.5, -): - # only output [selected] * num_class, please slice by your self - selected_per_class = [] - assert len(boxes.shape) == 3, "bbox must have shape [N, num_cls, 7]" - assert len(scores.shape) == 2, "score must have shape [N, num_cls]" - num_class = scores.shape[1] - if not (boxes.shape[1] == scores.shape[1] or boxes.shape[1] == 1): - raise ValueError( - "second dimension of boxes must be either 1 or equal " - "to the second dimension of scores" - ) - num_boxes = boxes.shape[0] - num_scores = scores.shape[0] - num_classes = scores.shape[1] - boxes_ids = range(num_classes) if boxes.shape[1] > 1 else [0] * num_classes - for class_idx, boxes_idx in zip(range(num_classes), boxes_ids): - # for class_idx in range(1, num_class): - class_scores = scores[:, class_idx] - class_boxes = boxes[:, boxes_idx] - if score_thresh > 0.0: - class_scores_keep = torch.nonzero(class_scores >= score_thresh) - if class_scores_keep.shape[0] != 0: - class_scores_keep = class_scores_keep[:, 0] - else: - selected_per_class.append(None) - continue - class_scores = class_scores[class_scores_keep] - if class_scores.shape[0] != 0: - if score_thresh > 0.0: - class_boxes = class_boxes[class_scores_keep] - keep = nms_func( - class_boxes, class_scores, pre_max_size, post_max_size, iou_threshold - ) - if keep is not None: - if score_thresh > 0.0: - selected_per_class.append(class_scores_keep[keep]) - else: - selected_per_class.append(keep) - else: - selected_per_class.append(None) - else: - selected_per_class.append(None) - return selected_per_class - - -def nms(bboxes, scores, pre_max_size=None, post_max_size=None, iou_threshold=0.5): - if pre_max_size is not None: - num_keeped_scores = scores.shape[0] - pre_max_size = min(num_keeped_scores, pre_max_size) - scores, indices = torch.topk(scores, k=pre_max_size) - bboxes = bboxes[indices] - dets = torch.cat([bboxes, scores.unsqueeze(-1)], dim=1) - dets_np = dets.data.cpu().numpy() - if len(dets_np) == 0: - keep = np.array([], dtype=np.int64) - else: - ret = np.array(nms_gpu(dets_np, iou_threshold), dtype=np.int64) - keep = ret[:post_max_size] - if keep.shape[0] == 0: - return torch.zeros([0]).long().to(bboxes.device) - if pre_max_size is not None: - keep = torch.from_numpy(keep).long().to(bboxes.device) - return indices[keep] - else: - return torch.from_numpy(keep).long().to(bboxes.device) - - -def rotate_nms( - rbboxes, scores, pre_max_size=None, post_max_size=None, iou_threshold=0.5 -): - if pre_max_size is not None: - num_keeped_scores = scores.shape[0] - pre_max_size = min(num_keeped_scores, pre_max_size) - scores, indices = torch.topk(scores, k=pre_max_size) - rbboxes = rbboxes[indices] - dets = torch.cat([rbboxes, scores.unsqueeze(-1)], dim=1) - dets_np = dets.data.cpu().numpy() - if len(dets_np) == 0: - keep = np.array([], dtype=np.int64) - else: - ret = np.array(rotate_nms_cc(dets_np, iou_threshold), dtype=np.int64) - keep = ret[:post_max_size] - if keep.shape[0] == 0: - return torch.zeros([0]).long().to(rbboxes.device) - if pre_max_size is not None: - keep = torch.from_numpy(keep).long().to(rbboxes.device) - return indices[keep] - else: - return torch.from_numpy(keep).long().to(rbboxes.device) - - def rotate_nms_pcdet(boxes, scores, thresh, pre_maxsize=None, post_max_size=None): """ - :param boxes: (N, 5) [x1, y1, x2, y2, ry] + :param boxes: (N, 5) [x, y, z, l, w, h, theta] :param scores: (N) :param thresh: :return: """ - # areas = (x2 - x1) * (y2 - y1) + # transform back to pcdet's coordinate + boxes = boxes[:, [0, 1, 2, 4, 3, 5, -1]] + boxes[:, -1] = -boxes[:, -1] - np.pi /2 + order = scores.sort(0, descending=True)[1] if pre_maxsize is not None: order = order[:pre_maxsize] @@ -565,27 +263,15 @@ def rotate_nms_pcdet(boxes, scores, thresh, pre_maxsize=None, post_max_size=None boxes = boxes[order].contiguous() keep = torch.LongTensor(boxes.size(0)) - num_out = iou3d_nms_cuda.nms_gpu(boxes, keep, thresh) + + if len(boxes) == 0: + num_out =0 + else: + num_out = iou3d_nms_cuda.nms_gpu(boxes, keep, thresh) + selected = order[keep[:num_out].cuda()].contiguous() if post_max_size is not None: selected = selected[:post_max_size] - return selected - - -def boxes3d_to_bevboxes_lidar_torch(boxes3d): - """ - :param boxes3d: (N, 7) [x, y, z, w, l, h, ry] in LiDAR coords - :return: - boxes_bev: (N, 5) [x1, y1, x2, y2, ry] - """ - boxes_bev = boxes3d.new(torch.Size((boxes3d.shape[0], 5))) - - cu, cv = boxes3d[:, 0], boxes3d[:, 1] - - half_w, half_l = boxes3d[:, 3] / 2, boxes3d[:, 4] / 2 - boxes_bev[:, 0], boxes_bev[:, 1] = cu - half_w, cv - half_l - boxes_bev[:, 2], boxes_bev[:, 3] = cu + half_w, cv + half_l - boxes_bev[:, 4] = boxes3d[:, -1] - return boxes_bev + return selected \ No newline at end of file diff --git a/det3d/core/bbox/iou.py b/det3d/core/bbox/iou.py deleted file mode 100644 index 9a2c038..0000000 --- a/det3d/core/bbox/iou.py +++ /dev/null @@ -1,64 +0,0 @@ -import torch - - -def bbox_overlaps(bboxes1, bboxes2, mode="iou", is_aligned=False): - """Calculate overlap between two set of bboxes. - If ``is_aligned`` is ``False``, then calculate the ious between each bbox - of bboxes1 and bboxes2, otherwise the ious between each aligned pair of - bboxes1 and bboxes2. - Args: - bboxes1 (Tensor): shape (m, 4) - bboxes2 (Tensor): shape (n, 4), if is_aligned is ``True``, then m and n - must be equal. - mode (str): "iou" (intersection over union) or iof (intersection over - foreground). - Returns: - ious(Tensor): shape (m, n) if is_aligned == False else shape (m, 1) - """ - - assert mode in ["iou", "iof"] - - rows = bboxes1.size(0) - cols = bboxes2.size(0) - if is_aligned: - assert rows == cols - - if rows * cols == 0: - return bboxes1.new(rows, 1) if is_aligned else bboxes1.new(rows, cols) - - if is_aligned: - lt = torch.max(bboxes1[:, :2], bboxes2[:, :2]) # [rows, 2] - rb = torch.min(bboxes1[:, 2:], bboxes2[:, 2:]) # [rows, 2] - - wh = (rb - lt + 1).clamp(min=0) # [rows, 2] - overlap = wh[:, 0] * wh[:, 1] - area1 = (bboxes1[:, 2] - bboxes1[:, 0] + 1) * ( - bboxes1[:, 3] - bboxes1[:, 1] + 1 - ) - - if mode == "iou": - area2 = (bboxes2[:, 2] - bboxes2[:, 0] + 1) * ( - bboxes2[:, 3] - bboxes2[:, 1] + 1 - ) - ious = overlap / (area1 + area2 - overlap) - else: - ious = overlap / area1 - else: - lt = torch.max(bboxes1[:, None, :2], bboxes2[:, :2]) # [rows, cols, 2] - rb = torch.min(bboxes1[:, None, 2:], bboxes2[:, 2:]) # [rows, cols, 2] - - wh = (rb - lt + 1).clamp(min=0) # [rows, cols, 2] - overlap = wh[:, :, 0] * wh[:, :, 1] - area1 = (bboxes1[:, 2] - bboxes1[:, 0] + 1) * ( - bboxes1[:, 3] - bboxes1[:, 1] + 1 - ) - - if mode == "iou": - area2 = (bboxes2[:, 2] - bboxes2[:, 0] + 1) * ( - bboxes2[:, 3] - bboxes2[:, 1] + 1 - ) - ious = overlap / (area1[:, None] + area2 - overlap) - else: - ious = overlap / (area1[:, None]) - - return ious diff --git a/det3d/core/bbox/region_similarity.py b/det3d/core/bbox/region_similarity.py deleted file mode 100644 index eb40da9..0000000 --- a/det3d/core/bbox/region_similarity.py +++ /dev/null @@ -1,124 +0,0 @@ -# Copyright 2017 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== -"""Region Similarity Calculators for BoxLists. - -Region Similarity Calculators compare a pairwise measure of similarity -between the boxes in two BoxLists. -""" -from abc import ABCMeta, abstractmethod - -from det3d.core.bbox import box_np_ops - - -class RegionSimilarityCalculator(object): - """Abstract base class for 2d region similarity calculator.""" - - __metaclass__ = ABCMeta - - def compare(self, boxes1, boxes2): - """Computes matrix of pairwise similarity between BoxLists. - - This op (to be overriden) computes a measure of pairwise similarity between - the boxes in the given BoxLists. Higher values indicate more similarity. - - Note that this method simply measures similarity and does not explicitly - perform a matching. - - Args: - boxes1: [N, 5] [x,y,w,l,r] tensor. - boxes2: [M, 5] [x,y,w,l,r] tensor. - - Returns: - a (float32) tensor of shape [N, M] with pairwise similarity score. - """ - return self._compare(boxes1, boxes2) - - @abstractmethod - def _compare(self, boxes1, boxes2): - pass - - -class RotateIouSimilarity(RegionSimilarityCalculator): - """Class to compute similarity based on Intersection over Union (IOU) metric. - - This class computes pairwise similarity between two BoxLists based on IOU. - """ - - def _compare(self, boxes1, boxes2): - """Compute pairwise IOU similarity between the two BoxLists. - - Args: - boxlist1: BoxList holding N boxes. - boxlist2: BoxList holding M boxes. - - Returns: - A tensor with shape [N, M] representing pairwise iou scores. - """ - - return box_np_ops.riou_cc(boxes1, boxes2) - - -class NearestIouSimilarity(RegionSimilarityCalculator): - """Class to compute similarity based on the squared distance metric. - - This class computes pairwise similarity between two BoxLists based on the - negative squared distance metric. - """ - - def _compare(self, boxes1, boxes2): - """Compute matrix of (negated) sq distances. - - Args: - boxlist1: BoxList holding N boxes. - boxlist2: BoxList holding M boxes. - - Returns: - A tensor with shape [N, M] representing negated pairwise squared distance. - """ - boxes1_bv = box_np_ops.rbbox2d_to_near_bbox(boxes1) - boxes2_bv = box_np_ops.rbbox2d_to_near_bbox(boxes2) - ret = box_np_ops.iou_jit(boxes1_bv, boxes2_bv, eps=0.0) - return ret - - -class DistanceSimilarity(RegionSimilarityCalculator): - """Class to compute similarity based on Intersection over Area (IOA) metric. - - This class computes pairwise similarity between two BoxLists based on their - pairwise intersections divided by the areas of second BoxLists. - """ - - def __init__(self, distance_norm, with_rotation=False, rotation_alpha=0.5): - self._distance_norm = distance_norm - self._with_rotation = with_rotation - self._rotation_alpha = rotation_alpha - - def _compare(self, boxes1, boxes2): - """Compute matrix of (negated) sq distances. - - Args: - boxlist1: BoxList holding N boxes. - boxlist2: BoxList holding M boxes. - - Returns: - A tensor with shape [N, M] representing negated pairwise squared distance. - """ - return box_np_ops.distance_similarity( - boxes1[..., [0, 1, -1]], - boxes2[..., [0, 1, -1]], - dist_norm=self._distance_norm, - with_rotation=self._with_rotation, - rot_alpha=self._rotation_alpha, - ) diff --git a/det3d/core/evaluation/__init__.py b/det3d/core/evaluation/__init__.py deleted file mode 100644 index b9c7374..0000000 --- a/det3d/core/evaluation/__init__.py +++ /dev/null @@ -1,30 +0,0 @@ -from .class_names import ( - coco_classes, - dataset_aliases, - get_classes, - imagenet_det_classes, - imagenet_vid_classes, - voc_classes, -) -from .coco_utils import coco_eval, fast_eval_recall, results2json -from .mean_ap import average_precision, eval_map, print_map_summary -from .recall import eval_recalls, plot_iou_recall, plot_num_recall, print_recall_summary - -__all__ = [ - "voc_classes", - "imagenet_det_classes", - "imagenet_vid_classes", - "coco_classes", - "dataset_aliases", - "get_classes", - "coco_eval", - "fast_eval_recall", - "results2json", - "average_precision", - "eval_map", - "print_map_summary", - "eval_recalls", - "print_recall_summary", - "plot_num_recall", - "plot_iou_recall", -] diff --git a/det3d/core/evaluation/bbox_overlaps.py b/det3d/core/evaluation/bbox_overlaps.py deleted file mode 100644 index 8473c44..0000000 --- a/det3d/core/evaluation/bbox_overlaps.py +++ /dev/null @@ -1,48 +0,0 @@ -import numpy as np - - -def bbox_overlaps(bboxes1, bboxes2, mode="iou"): - """Calculate the ious between each bbox of bboxes1 and bboxes2. - - Args: - bboxes1(ndarray): shape (n, 4) - bboxes2(ndarray): shape (k, 4) - mode(str): iou (intersection over union) or iof (intersection - over foreground) - - Returns: - ious(ndarray): shape (n, k) - """ - - assert mode in ["iou", "iof"] - - bboxes1 = bboxes1.astype(np.float32) - bboxes2 = bboxes2.astype(np.float32) - rows = bboxes1.shape[0] - cols = bboxes2.shape[0] - ious = np.zeros((rows, cols), dtype=np.float32) - if rows * cols == 0: - return ious - exchange = False - if bboxes1.shape[0] > bboxes2.shape[0]: - bboxes1, bboxes2 = bboxes2, bboxes1 - ious = np.zeros((cols, rows), dtype=np.float32) - exchange = True - area1 = (bboxes1[:, 2] - bboxes1[:, 0] + 1) * (bboxes1[:, 3] - bboxes1[:, 1] + 1) - area2 = (bboxes2[:, 2] - bboxes2[:, 0] + 1) * (bboxes2[:, 3] - bboxes2[:, 1] + 1) - for i in range(bboxes1.shape[0]): - x_start = np.maximum(bboxes1[i, 0], bboxes2[:, 0]) - y_start = np.maximum(bboxes1[i, 1], bboxes2[:, 1]) - x_end = np.minimum(bboxes1[i, 2], bboxes2[:, 2]) - y_end = np.minimum(bboxes1[i, 3], bboxes2[:, 3]) - overlap = np.maximum(x_end - x_start + 1, 0) * np.maximum( - y_end - y_start + 1, 0 - ) - if mode == "iou": - union = area1[i] + area2 - overlap - else: - union = area1[i] if not exchange else area2 - ious[i, :] = overlap / union - if exchange: - ious = ious.T - return ious diff --git a/det3d/core/evaluation/class_names.py b/det3d/core/evaluation/class_names.py deleted file mode 100644 index 96dd90b..0000000 --- a/det3d/core/evaluation/class_names.py +++ /dev/null @@ -1,386 +0,0 @@ -from det3d import torchie - - -def wider_face_classes(): - return ["face"] - - -def voc_classes(): - return [ - "aeroplane", - "bicycle", - "bird", - "boat", - "bottle", - "bus", - "car", - "cat", - "chair", - "cow", - "diningtable", - "dog", - "horse", - "motorbike", - "person", - "pottedplant", - "sheep", - "sofa", - "train", - "tvmonitor", - ] - - -def imagenet_det_classes(): - return [ - "accordion", - "airplane", - "ant", - "antelope", - "apple", - "armadillo", - "artichoke", - "axe", - "baby_bed", - "backpack", - "bagel", - "balance_beam", - "banana", - "band_aid", - "banjo", - "baseball", - "basketball", - "bathing_cap", - "beaker", - "bear", - "bee", - "bell_pepper", - "bench", - "bicycle", - "binder", - "bird", - "bookshelf", - "bow_tie", - "bow", - "bowl", - "brassiere", - "burrito", - "bus", - "butterfly", - "camel", - "can_opener", - "car", - "cart", - "cattle", - "cello", - "centipede", - "chain_saw", - "chair", - "chime", - "cocktail_shaker", - "coffee_maker", - "computer_keyboard", - "computer_mouse", - "corkscrew", - "cream", - "croquet_ball", - "crutch", - "cucumber", - "cup_or_mug", - "diaper", - "digital_clock", - "dishwasher", - "dog", - "domestic_cat", - "dragonfly", - "drum", - "dumbbell", - "electric_fan", - "elephant", - "face_powder", - "fig", - "filing_cabinet", - "flower_pot", - "flute", - "fox", - "french_horn", - "frog", - "frying_pan", - "giant_panda", - "goldfish", - "golf_ball", - "golfcart", - "guacamole", - "guitar", - "hair_dryer", - "hair_spray", - "hamburger", - "hammer", - "hamster", - "harmonica", - "harp", - "hat_with_a_wide_brim", - "head_cabbage", - "helmet", - "hippopotamus", - "horizontal_bar", - "horse", - "hotdog", - "iPod", - "isopod", - "jellyfish", - "koala_bear", - "ladle", - "ladybug", - "lamp", - "laptop", - "lemon", - "lion", - "lipstick", - "lizard", - "lobster", - "maillot", - "maraca", - "microphone", - "microwave", - "milk_can", - "miniskirt", - "monkey", - "motorcycle", - "mushroom", - "nail", - "neck_brace", - "oboe", - "orange", - "otter", - "pencil_box", - "pencil_sharpener", - "perfume", - "person", - "piano", - "pineapple", - "ping-pong_ball", - "pitcher", - "pizza", - "plastic_bag", - "plate_rack", - "pomegranate", - "popsicle", - "porcupine", - "power_drill", - "pretzel", - "printer", - "puck", - "punching_bag", - "purse", - "rabbit", - "racket", - "ray", - "red_panda", - "refrigerator", - "remote_control", - "rubber_eraser", - "rugby_ball", - "ruler", - "salt_or_pepper_shaker", - "saxophone", - "scorpion", - "screwdriver", - "seal", - "sheep", - "ski", - "skunk", - "snail", - "snake", - "snowmobile", - "snowplow", - "soap_dispenser", - "soccer_ball", - "sofa", - "spatula", - "squirrel", - "starfish", - "stethoscope", - "stove", - "strainer", - "strawberry", - "stretcher", - "sunglasses", - "swimming_trunks", - "swine", - "syringe", - "table", - "tape_player", - "tennis_ball", - "tick", - "tie", - "tiger", - "toaster", - "traffic_light", - "train", - "trombone", - "trumpet", - "turtle", - "tv_or_monitor", - "unicycle", - "vacuum", - "violin", - "volleyball", - "waffle_iron", - "washer", - "water_bottle", - "watercraft", - "whale", - "wine_bottle", - "zebra", - ] - - -def imagenet_vid_classes(): - return [ - "airplane", - "antelope", - "bear", - "bicycle", - "bird", - "bus", - "car", - "cattle", - "dog", - "domestic_cat", - "elephant", - "fox", - "giant_panda", - "hamster", - "horse", - "lion", - "lizard", - "monkey", - "motorcycle", - "rabbit", - "red_panda", - "sheep", - "snake", - "squirrel", - "tiger", - "train", - "turtle", - "watercraft", - "whale", - "zebra", - ] - - -def coco_classes(): - return [ - "person", - "bicycle", - "car", - "motorcycle", - "airplane", - "bus", - "train", - "truck", - "boat", - "traffic_light", - "fire_hydrant", - "stop_sign", - "parking_meter", - "bench", - "bird", - "cat", - "dog", - "horse", - "sheep", - "cow", - "elephant", - "bear", - "zebra", - "giraffe", - "backpack", - "umbrella", - "handbag", - "tie", - "suitcase", - "frisbee", - "skis", - "snowboard", - "sports_ball", - "kite", - "baseball_bat", - "baseball_glove", - "skateboard", - "surfboard", - "tennis_racket", - "bottle", - "wine_glass", - "cup", - "fork", - "knife", - "spoon", - "bowl", - "banana", - "apple", - "sandwich", - "orange", - "broccoli", - "carrot", - "hot_dog", - "pizza", - "donut", - "cake", - "chair", - "couch", - "potted_plant", - "bed", - "dining_table", - "toilet", - "tv", - "laptop", - "mouse", - "remote", - "keyboard", - "cell_phone", - "microwave", - "oven", - "toaster", - "sink", - "refrigerator", - "book", - "clock", - "vase", - "scissors", - "teddy_bear", - "hair_drier", - "toothbrush", - ] - - -def cityscapes_classes(): - return ["person", "rider", "car", "truck", "bus", "train", "motorcycle", "bicycle"] - - -dataset_aliases = { - "voc": ["voc", "pascal_voc", "voc07", "voc12"], - "imagenet_det": ["det", "imagenet_det", "ilsvrc_det"], - "imagenet_vid": ["vid", "imagenet_vid", "ilsvrc_vid"], - "coco": ["coco", "mscoco", "ms_coco"], - "wider_face": ["WIDERFaceDataset", "wider_face", "WDIERFace"], - "cityscapes": ["cityscapes"], -} - - -def get_classes(dataset): - """Get class names of a dataset.""" - alias2name = {} - for name, aliases in dataset_aliases.items(): - for alias in aliases: - alias2name[alias] = name - - if torchie.is_str(dataset): - if dataset in alias2name: - labels = eval(alias2name[dataset] + "_classes()") - else: - raise ValueError("Unrecognized dataset: {}".format(dataset)) - else: - raise TypeError("dataset must a str, but got {}".format(type(dataset))) - return labels diff --git a/det3d/core/evaluation/coco_utils.py b/det3d/core/evaluation/coco_utils.py deleted file mode 100644 index 100856e..0000000 --- a/det3d/core/evaluation/coco_utils.py +++ /dev/null @@ -1,173 +0,0 @@ -import numpy as np -from det3d import torchie -from pycocotools.coco import COCO -from pycocotools.cocoeval import COCOeval - -from .recall import eval_recalls - - -def coco_eval(result_files, result_types, coco, max_dets=(100, 300, 1000)): - for res_type in result_types: - assert res_type in ["proposal", "proposal_fast", "bbox", "segm", "keypoints"] - - if mmcv.is_str(coco): - coco = COCO(coco) - assert isinstance(coco, COCO) - - if result_types == ["proposal_fast"]: - ar = fast_eval_recall(result_files, coco, np.array(max_dets)) - for i, num in enumerate(max_dets): - print("AR@{}\t= {:.4f}".format(num, ar[i])) - return - - for res_type in result_types: - result_file = result_files[res_type] - assert result_file.endswith(".json") - - coco_dets = coco.loadRes(result_file) - img_ids = coco.getImgIds() - iou_type = "bbox" if res_type == "proposal" else res_type - cocoEval = COCOeval(coco, coco_dets, iou_type) - cocoEval.params.imgIds = img_ids - if res_type == "proposal": - cocoEval.params.useCats = 0 - cocoEval.params.maxDets = list(max_dets) - cocoEval.evaluate() - cocoEval.accumulate() - cocoEval.summarize() - - -def fast_eval_recall(results, coco, max_dets, iou_thrs=np.arange(0.5, 0.96, 0.05)): - if mmcv.is_str(results): - assert results.endswith(".pkl") - results = mmcv.load(results) - elif not isinstance(results, list): - raise TypeError( - "results must be a list of numpy arrays or a filename, not {}".format( - type(results) - ) - ) - - gt_bboxes = [] - img_ids = coco.getImgIds() - for i in range(len(img_ids)): - ann_ids = coco.getAnnIds(imgIds=img_ids[i]) - ann_info = coco.loadAnns(ann_ids) - if len(ann_info) == 0: - gt_bboxes.append(np.zeros((0, 4))) - continue - bboxes = [] - for ann in ann_info: - if ann.get("ignore", False) or ann["iscrowd"]: - continue - x1, y1, w, h = ann["bbox"] - bboxes.append([x1, y1, x1 + w - 1, y1 + h - 1]) - bboxes = np.array(bboxes, dtype=np.float32) - if bboxes.shape[0] == 0: - bboxes = np.zeros((0, 4)) - gt_bboxes.append(bboxes) - - recalls = eval_recalls(gt_bboxes, results, max_dets, iou_thrs, print_summary=False) - ar = recalls.mean(axis=1) - return ar - - -def xyxy2xywh(bbox): - _bbox = bbox.tolist() - return [ - _bbox[0], - _bbox[1], - _bbox[2] - _bbox[0] + 1, - _bbox[3] - _bbox[1] + 1, - ] - - -def proposal2json(dataset, results): - json_results = [] - for idx in range(len(dataset)): - img_id = dataset.img_ids[idx] - bboxes = results[idx] - for i in range(bboxes.shape[0]): - data = dict() - data["image_id"] = img_id - data["bbox"] = xyxy2xywh(bboxes[i]) - data["score"] = float(bboxes[i][4]) - data["category_id"] = 1 - json_results.append(data) - return json_results - - -def det2json(dataset, results): - json_results = [] - for idx in range(len(dataset)): - img_id = dataset.img_ids[idx] - result = results[idx] - for label in range(len(result)): - bboxes = result[label] - for i in range(bboxes.shape[0]): - data = dict() - data["image_id"] = img_id - data["bbox"] = xyxy2xywh(bboxes[i]) - data["score"] = float(bboxes[i][4]) - data["category_id"] = dataset.cat_ids[label] - json_results.append(data) - return json_results - - -def segm2json(dataset, results): - bbox_json_results = [] - segm_json_results = [] - for idx in range(len(dataset)): - img_id = dataset.img_ids[idx] - det, seg = results[idx] - for label in range(len(det)): - # bbox results - bboxes = det[label] - for i in range(bboxes.shape[0]): - data = dict() - data["image_id"] = img_id - data["bbox"] = xyxy2xywh(bboxes[i]) - data["score"] = float(bboxes[i][4]) - data["category_id"] = dataset.cat_ids[label] - bbox_json_results.append(data) - - # segm results - # some detectors use different score for det and segm - if len(seg) == 2: - segms = seg[0][label] - mask_score = seg[1][label] - else: - segms = seg[label] - mask_score = [bbox[4] for bbox in bboxes] - for i in range(bboxes.shape[0]): - data = dict() - data["image_id"] = img_id - data["score"] = float(mask_score[i]) - data["category_id"] = dataset.cat_ids[label] - segms[i]["counts"] = segms[i]["counts"].decode() - data["segmentation"] = segms[i] - segm_json_results.append(data) - return bbox_json_results, segm_json_results - - -def results2json(dataset, results, out_file): - result_files = dict() - if isinstance(results[0], list): - json_results = det2json(dataset, results) - result_files["bbox"] = "{}.{}.json".format(out_file, "bbox") - result_files["proposal"] = "{}.{}.json".format(out_file, "bbox") - mmcv.dump(json_results, result_files["bbox"]) - elif isinstance(results[0], tuple): - json_results = segm2json(dataset, results) - result_files["bbox"] = "{}.{}.json".format(out_file, "bbox") - result_files["proposal"] = "{}.{}.json".format(out_file, "bbox") - result_files["segm"] = "{}.{}.json".format(out_file, "segm") - mmcv.dump(json_results[0], result_files["bbox"]) - mmcv.dump(json_results[1], result_files["segm"]) - elif isinstance(results[0], np.ndarray): - json_results = proposal2json(dataset, results) - result_files["proposal"] = "{}.{}.json".format(out_file, "proposal") - mmcv.dump(json_results, result_files["proposal"]) - else: - raise TypeError("invalid type of results") - return result_files diff --git a/det3d/core/evaluation/mean_ap.py b/det3d/core/evaluation/mean_ap.py deleted file mode 100644 index a7ade28..0000000 --- a/det3d/core/evaluation/mean_ap.py +++ /dev/null @@ -1,384 +0,0 @@ -import numpy as np -from det3d import torchie -from terminaltables import AsciiTable - -from .bbox_overlaps import bbox_overlaps -from .class_names import get_classes - - -def average_precision(recalls, precisions, mode="area"): - """Calculate average precision (for single or multiple scales). - - Args: - recalls (ndarray): shape (num_scales, num_dets) or (num_dets, ) - precisions (ndarray): shape (num_scales, num_dets) or (num_dets, ) - mode (str): 'area' or '11points', 'area' means calculating the area - under precision-recall curve, '11points' means calculating - the average precision of recalls at [0, 0.1, ..., 1] - - Returns: - float or ndarray: calculated average precision - """ - no_scale = False - if recalls.ndim == 1: - no_scale = True - recalls = recalls[np.newaxis, :] - precisions = precisions[np.newaxis, :] - assert recalls.shape == precisions.shape and recalls.ndim == 2 - num_scales = recalls.shape[0] - ap = np.zeros(num_scales, dtype=np.float32) - if mode == "area": - zeros = np.zeros((num_scales, 1), dtype=recalls.dtype) - ones = np.ones((num_scales, 1), dtype=recalls.dtype) - mrec = np.hstack((zeros, recalls, ones)) - mpre = np.hstack((zeros, precisions, zeros)) - for i in range(mpre.shape[1] - 1, 0, -1): - mpre[:, i - 1] = np.maximum(mpre[:, i - 1], mpre[:, i]) - for i in range(num_scales): - ind = np.where(mrec[i, 1:] != mrec[i, :-1])[0] - ap[i] = np.sum((mrec[i, ind + 1] - mrec[i, ind]) * mpre[i, ind + 1]) - elif mode == "11points": - for i in range(num_scales): - for thr in np.arange(0, 1 + 1e-3, 0.1): - precs = precisions[i, recalls[i, :] >= thr] - prec = precs.max() if precs.size > 0 else 0 - ap[i] += prec - ap /= 11 - else: - raise ValueError('Unrecognized mode, only "area" and "11points" are supported') - if no_scale: - ap = ap[0] - return ap - - -def tpfp_imagenet(det_bboxes, gt_bboxes, gt_ignore, default_iou_thr, area_ranges=None): - """Check if detected bboxes are true positive or false positive. - - Args: - det_bbox (ndarray): the detected bbox - gt_bboxes (ndarray): ground truth bboxes of this image - gt_ignore (ndarray): indicate if gts are ignored for evaluation or not - default_iou_thr (float): the iou thresholds for medium and large bboxes - area_ranges (list or None): gt bbox area ranges - - Returns: - tuple: two arrays (tp, fp) whose elements are 0 and 1 - """ - num_dets = det_bboxes.shape[0] - num_gts = gt_bboxes.shape[0] - if area_ranges is None: - area_ranges = [(None, None)] - num_scales = len(area_ranges) - # tp and fp are of shape (num_scales, num_gts), each row is tp or fp - # of a certain scale. - tp = np.zeros((num_scales, num_dets), dtype=np.float32) - fp = np.zeros((num_scales, num_dets), dtype=np.float32) - if gt_bboxes.shape[0] == 0: - if area_ranges == [(None, None)]: - fp[...] = 1 - else: - det_areas = (det_bboxes[:, 2] - det_bboxes[:, 0] + 1) * ( - det_bboxes[:, 3] - det_bboxes[:, 1] + 1 - ) - for i, (min_area, max_area) in enumerate(area_ranges): - fp[i, (det_areas >= min_area) & (det_areas < max_area)] = 1 - return tp, fp - ious = bbox_overlaps(det_bboxes, gt_bboxes - 1) - gt_w = gt_bboxes[:, 2] - gt_bboxes[:, 0] + 1 - gt_h = gt_bboxes[:, 3] - gt_bboxes[:, 1] + 1 - iou_thrs = np.minimum( - (gt_w * gt_h) / ((gt_w + 10.0) * (gt_h + 10.0)), default_iou_thr - ) - # sort all detections by scores in descending order - sort_inds = np.argsort(-det_bboxes[:, -1]) - for k, (min_area, max_area) in enumerate(area_ranges): - gt_covered = np.zeros(num_gts, dtype=bool) - # if no area range is specified, gt_area_ignore is all False - if min_area is None: - gt_area_ignore = np.zeros_like(gt_ignore, dtype=bool) - else: - gt_areas = gt_w * gt_h - gt_area_ignore = (gt_areas < min_area) | (gt_areas >= max_area) - for i in sort_inds: - max_iou = -1 - matched_gt = -1 - # find best overlapped available gt - for j in range(num_gts): - # different from PASCAL VOC: allow finding other gts if the - # best overlaped ones are already matched by other det bboxes - if gt_covered[j]: - continue - elif ious[i, j] >= iou_thrs[j] and ious[i, j] > max_iou: - max_iou = ious[i, j] - matched_gt = j - # there are 4 cases for a det bbox: - # 1. it matches a gt, tp = 1, fp = 0 - # 2. it matches an ignored gt, tp = 0, fp = 0 - # 3. it matches no gt and within area range, tp = 0, fp = 1 - # 4. it matches no gt but is beyond area range, tp = 0, fp = 0 - if matched_gt >= 0: - gt_covered[matched_gt] = 1 - if not (gt_ignore[matched_gt] or gt_area_ignore[matched_gt]): - tp[k, i] = 1 - elif min_area is None: - fp[k, i] = 1 - else: - bbox = det_bboxes[i, :4] - area = (bbox[2] - bbox[0] + 1) * (bbox[3] - bbox[1] + 1) - if area >= min_area and area < max_area: - fp[k, i] = 1 - return tp, fp - - -def tpfp_default(det_bboxes, gt_bboxes, gt_ignore, iou_thr, area_ranges=None): - """Check if detected bboxes are true positive or false positive. - - Args: - det_bbox (ndarray): the detected bbox - gt_bboxes (ndarray): ground truth bboxes of this image - gt_ignore (ndarray): indicate if gts are ignored for evaluation or not - iou_thr (float): the iou thresholds - - Returns: - tuple: (tp, fp), two arrays whose elements are 0 and 1 - """ - num_dets = det_bboxes.shape[0] - num_gts = gt_bboxes.shape[0] - if area_ranges is None: - area_ranges = [(None, None)] - num_scales = len(area_ranges) - # tp and fp are of shape (num_scales, num_gts), each row is tp or fp of - # a certain scale - tp = np.zeros((num_scales, num_dets), dtype=np.float32) - fp = np.zeros((num_scales, num_dets), dtype=np.float32) - # if there is no gt bboxes in this image, then all det bboxes - # within area range are false positives - if gt_bboxes.shape[0] == 0: - if area_ranges == [(None, None)]: - fp[...] = 1 - else: - det_areas = (det_bboxes[:, 2] - det_bboxes[:, 0] + 1) * ( - det_bboxes[:, 3] - det_bboxes[:, 1] + 1 - ) - for i, (min_area, max_area) in enumerate(area_ranges): - fp[i, (det_areas >= min_area) & (det_areas < max_area)] = 1 - return tp, fp - ious = bbox_overlaps(det_bboxes, gt_bboxes) - ious_max = ious.max(axis=1) - ious_argmax = ious.argmax(axis=1) - sort_inds = np.argsort(-det_bboxes[:, -1]) - for k, (min_area, max_area) in enumerate(area_ranges): - gt_covered = np.zeros(num_gts, dtype=bool) - # if no area range is specified, gt_area_ignore is all False - if min_area is None: - gt_area_ignore = np.zeros_like(gt_ignore, dtype=bool) - else: - gt_areas = (gt_bboxes[:, 2] - gt_bboxes[:, 0] + 1) * ( - gt_bboxes[:, 3] - gt_bboxes[:, 1] + 1 - ) - gt_area_ignore = (gt_areas < min_area) | (gt_areas >= max_area) - for i in sort_inds: - if ious_max[i] >= iou_thr: - matched_gt = ious_argmax[i] - if not (gt_ignore[matched_gt] or gt_area_ignore[matched_gt]): - if not gt_covered[matched_gt]: - gt_covered[matched_gt] = True - tp[k, i] = 1 - else: - fp[k, i] = 1 - # otherwise ignore this detected bbox, tp = 0, fp = 0 - elif min_area is None: - fp[k, i] = 1 - else: - bbox = det_bboxes[i, :4] - area = (bbox[2] - bbox[0] + 1) * (bbox[3] - bbox[1] + 1) - if area >= min_area and area < max_area: - fp[k, i] = 1 - return tp, fp - - -def get_cls_results(det_results, gt_bboxes, gt_labels, gt_ignore, class_id): - """Get det results and gt information of a certain class.""" - cls_dets = [det[class_id] for det in det_results] # det bboxes of this class - cls_gts = [] # gt bboxes of this class - cls_gt_ignore = [] - for j in range(len(gt_bboxes)): - gt_bbox = gt_bboxes[j] - cls_inds = gt_labels[j] == class_id + 1 - cls_gt = gt_bbox[cls_inds, :] if gt_bbox.shape[0] > 0 else gt_bbox - cls_gts.append(cls_gt) - if gt_ignore is None: - cls_gt_ignore.append(np.zeros(cls_gt.shape[0], dtype=np.int32)) - else: - cls_gt_ignore.append(gt_ignore[j][cls_inds]) - return cls_dets, cls_gts, cls_gt_ignore - - -def eval_map( - det_results, - gt_bboxes, - gt_labels, - gt_ignore=None, - scale_ranges=None, - iou_thr=0.5, - dataset=None, - print_summary=True, -): - """Evaluate mAP of a dataset. - - Args: - det_results (list): a list of list, [[cls1_det, cls2_det, ...], ...] - gt_bboxes (list): ground truth bboxes of each image, a list of K*4 - array. - gt_labels (list): ground truth labels of each image, a list of K array - gt_ignore (list): gt ignore indicators of each image, a list of K array - scale_ranges (list, optional): [(min1, max1), (min2, max2), ...] - iou_thr (float): IoU threshold - dataset (None or str or list): dataset name or dataset classes, there - are minor differences in metrics for different datsets, e.g. - "voc07", "imagenet_det", etc. - print_summary (bool): whether to print the mAP summary - - Returns: - tuple: (mAP, [dict, dict, ...]) - """ - assert len(det_results) == len(gt_bboxes) == len(gt_labels) - if gt_ignore is not None: - assert len(gt_ignore) == len(gt_labels) - for i in range(len(gt_ignore)): - assert len(gt_labels[i]) == len(gt_ignore[i]) - area_ranges = ( - [(rg[0] ** 2, rg[1] ** 2) for rg in scale_ranges] - if scale_ranges is not None - else None - ) - num_scales = len(scale_ranges) if scale_ranges is not None else 1 - eval_results = [] - num_classes = len(det_results[0]) # positive class num - gt_labels = [label if label.ndim == 1 else label[:, 0] for label in gt_labels] - for i in range(num_classes): - # get gt and det bboxes of this class - cls_dets, cls_gts, cls_gt_ignore = get_cls_results( - det_results, gt_bboxes, gt_labels, gt_ignore, i - ) - # calculate tp and fp for each image - tpfp_func = tpfp_imagenet if dataset in ["det", "vid"] else tpfp_default - tpfp = [ - tpfp_func(cls_dets[j], cls_gts[j], cls_gt_ignore[j], iou_thr, area_ranges) - for j in range(len(cls_dets)) - ] - tp, fp = tuple(zip(*tpfp)) - # calculate gt number of each scale, gts ignored or beyond scale - # are not counted - num_gts = np.zeros(num_scales, dtype=int) - for j, bbox in enumerate(cls_gts): - if area_ranges is None: - num_gts[0] += np.sum(np.logical_not(cls_gt_ignore[j])) - else: - gt_areas = (bbox[:, 2] - bbox[:, 0] + 1) * (bbox[:, 3] - bbox[:, 1] + 1) - for k, (min_area, max_area) in enumerate(area_ranges): - num_gts[k] += np.sum( - np.logical_not(cls_gt_ignore[j]) - & (gt_areas >= min_area) - & (gt_areas < max_area) - ) - # sort all det bboxes by score, also sort tp and fp - cls_dets = np.vstack(cls_dets) - num_dets = cls_dets.shape[0] - sort_inds = np.argsort(-cls_dets[:, -1]) - tp = np.hstack(tp)[:, sort_inds] - fp = np.hstack(fp)[:, sort_inds] - # calculate recall and precision with tp and fp - tp = np.cumsum(tp, axis=1) - fp = np.cumsum(fp, axis=1) - eps = np.finfo(np.float32).eps - recalls = tp / np.maximum(num_gts[:, np.newaxis], eps) - precisions = tp / np.maximum((tp + fp), eps) - # calculate AP - if scale_ranges is None: - recalls = recalls[0, :] - precisions = precisions[0, :] - num_gts = num_gts.item() - mode = "area" if dataset != "voc07" else "11points" - ap = average_precision(recalls, precisions, mode) - eval_results.append( - { - "num_gts": num_gts, - "num_dets": num_dets, - "recall": recalls, - "precision": precisions, - "ap": ap, - } - ) - if scale_ranges is not None: - # shape (num_classes, num_scales) - all_ap = np.vstack([cls_result["ap"] for cls_result in eval_results]) - all_num_gts = np.vstack([cls_result["num_gts"] for cls_result in eval_results]) - mean_ap = [] - for i in range(num_scales): - if np.any(all_num_gts[:, i] > 0): - mean_ap.append(all_ap[all_num_gts[:, i] > 0, i].mean()) - else: - mean_ap.append(0.0) - else: - aps = [] - for cls_result in eval_results: - if cls_result["num_gts"] > 0: - aps.append(cls_result["ap"]) - mean_ap = np.array(aps).mean().item() if aps else 0.0 - if print_summary: - print_map_summary(mean_ap, eval_results, dataset) - - return mean_ap, eval_results - - -def print_map_summary(mean_ap, results, dataset=None): - """Print mAP and results of each class. - - Args: - mean_ap(float): calculated from `eval_map` - results(list): calculated from `eval_map` - dataset(None or str or list): dataset name or dataset classes. - """ - num_scales = ( - len(results[0]["ap"]) if isinstance(results[0]["ap"], np.ndarray) else 1 - ) - num_classes = len(results) - - recalls = np.zeros((num_scales, num_classes), dtype=np.float32) - precisions = np.zeros((num_scales, num_classes), dtype=np.float32) - aps = np.zeros((num_scales, num_classes), dtype=np.float32) - num_gts = np.zeros((num_scales, num_classes), dtype=int) - for i, cls_result in enumerate(results): - if cls_result["recall"].size > 0: - recalls[:, i] = np.array(cls_result["recall"], ndmin=2)[:, -1] - precisions[:, i] = np.array(cls_result["precision"], ndmin=2)[:, -1] - aps[:, i] = cls_result["ap"] - num_gts[:, i] = cls_result["num_gts"] - - if dataset is None: - label_names = [str(i) for i in range(1, num_classes + 1)] - elif torchie.is_str(dataset): - label_names = get_classes(dataset) - else: - label_names = dataset - - if not isinstance(mean_ap, list): - mean_ap = [mean_ap] - header = ["class", "gts", "dets", "recall", "precision", "ap"] - for i in range(num_scales): - table_data = [header] - for j in range(num_classes): - row_data = [ - label_names[j], - num_gts[i, j], - results[j]["num_dets"], - "{:.3f}".format(recalls[i, j]), - "{:.3f}".format(precisions[i, j]), - "{:.3f}".format(aps[i, j]), - ] - table_data.append(row_data) - table_data.append(["mAP", "", "", "", "", "{:.3f}".format(mean_ap[i])]) - table = AsciiTable(table_data) - table.inner_footing_row_border = True - print(table.table) diff --git a/det3d/core/evaluation/recall.py b/det3d/core/evaluation/recall.py deleted file mode 100644 index 4b17eea..0000000 --- a/det3d/core/evaluation/recall.py +++ /dev/null @@ -1,178 +0,0 @@ -import numpy as np -from terminaltables import AsciiTable - -from .bbox_overlaps import bbox_overlaps - - -def _recalls(all_ious, proposal_nums, thrs): - - img_num = all_ious.shape[0] - total_gt_num = sum([ious.shape[0] for ious in all_ious]) - - _ious = np.zeros((proposal_nums.size, total_gt_num), dtype=np.float32) - for k, proposal_num in enumerate(proposal_nums): - tmp_ious = np.zeros(0) - for i in range(img_num): - ious = all_ious[i][:, :proposal_num].copy() - gt_ious = np.zeros((ious.shape[0])) - if ious.size == 0: - tmp_ious = np.hstack((tmp_ious, gt_ious)) - continue - for j in range(ious.shape[0]): - gt_max_overlaps = ious.argmax(axis=1) - max_ious = ious[np.arange(0, ious.shape[0]), gt_max_overlaps] - gt_idx = max_ious.argmax() - gt_ious[j] = max_ious[gt_idx] - box_idx = gt_max_overlaps[gt_idx] - ious[gt_idx, :] = -1 - ious[:, box_idx] = -1 - tmp_ious = np.hstack((tmp_ious, gt_ious)) - _ious[k, :] = tmp_ious - - _ious = np.fliplr(np.sort(_ious, axis=1)) - recalls = np.zeros((proposal_nums.size, thrs.size)) - for i, thr in enumerate(thrs): - recalls[:, i] = (_ious >= thr).sum(axis=1) / float(total_gt_num) - - return recalls - - -def set_recall_param(proposal_nums, iou_thrs): - """Check proposal_nums and iou_thrs and set correct format. - """ - if isinstance(proposal_nums, list): - _proposal_nums = np.array(proposal_nums) - elif isinstance(proposal_nums, int): - _proposal_nums = np.array([proposal_nums]) - else: - _proposal_nums = proposal_nums - - if iou_thrs is None: - _iou_thrs = np.array([0.5]) - elif isinstance(iou_thrs, list): - _iou_thrs = np.array(iou_thrs) - elif isinstance(iou_thrs, float): - _iou_thrs = np.array([iou_thrs]) - else: - _iou_thrs = iou_thrs - - return _proposal_nums, _iou_thrs - - -def eval_recalls(gts, proposals, proposal_nums=None, iou_thrs=None, print_summary=True): - """Calculate recalls. - - Args: - gts(list or ndarray): a list of arrays of shape (n, 4) - proposals(list or ndarray): a list of arrays of shape (k, 4) or (k, 5) - proposal_nums(int or list of int or ndarray): top N proposals - thrs(float or list or ndarray): iou thresholds - - Returns: - ndarray: recalls of different ious and proposal nums - """ - - img_num = len(gts) - assert img_num == len(proposals) - - proposal_nums, iou_thrs = set_recall_param(proposal_nums, iou_thrs) - - all_ious = [] - for i in range(img_num): - if proposals[i].ndim == 2 and proposals[i].shape[1] == 5: - scores = proposals[i][:, 4] - sort_idx = np.argsort(scores)[::-1] - img_proposal = proposals[i][sort_idx, :] - else: - img_proposal = proposals[i] - prop_num = min(img_proposal.shape[0], proposal_nums[-1]) - if gts[i] is None or gts[i].shape[0] == 0: - ious = np.zeros((0, img_proposal.shape[0]), dtype=np.float32) - else: - ious = bbox_overlaps(gts[i], img_proposal[:prop_num, :4]) - all_ious.append(ious) - all_ious = np.array(all_ious) - recalls = _recalls(all_ious, proposal_nums, iou_thrs) - if print_summary: - print_recall_summary(recalls, proposal_nums, iou_thrs) - return recalls - - -def print_recall_summary( - recalls, proposal_nums, iou_thrs, row_idxs=None, col_idxs=None -): - """Print recalls in a table. - - Args: - recalls(ndarray): calculated from `bbox_recalls` - proposal_nums(ndarray or list): top N proposals - iou_thrs(ndarray or list): iou thresholds - row_idxs(ndarray): which rows(proposal nums) to print - col_idxs(ndarray): which cols(iou thresholds) to print - """ - proposal_nums = np.array(proposal_nums, dtype=np.int32) - iou_thrs = np.array(iou_thrs) - if row_idxs is None: - row_idxs = np.arange(proposal_nums.size) - if col_idxs is None: - col_idxs = np.arange(iou_thrs.size) - row_header = [""] + iou_thrs[col_idxs].tolist() - table_data = [row_header] - for i, num in enumerate(proposal_nums[row_idxs]): - row = ["{:.3f}".format(val) for val in recalls[row_idxs[i], col_idxs].tolist()] - row.insert(0, num) - table_data.append(row) - table = AsciiTable(table_data) - print(table.table) - - -def plot_num_recall(recalls, proposal_nums): - """Plot Proposal_num-Recalls curve. - - Args: - recalls(ndarray or list): shape (k,) - proposal_nums(ndarray or list): same shape as `recalls` - """ - if isinstance(proposal_nums, np.ndarray): - _proposal_nums = proposal_nums.tolist() - else: - _proposal_nums = proposal_nums - if isinstance(recalls, np.ndarray): - _recalls = recalls.tolist() - else: - _recalls = recalls - - import matplotlib.pyplot as plt - - f = plt.figure() - plt.plot([0] + _proposal_nums, [0] + _recalls) - plt.xlabel("Proposal num") - plt.ylabel("Recall") - plt.axis([0, proposal_nums.max(), 0, 1]) - f.show() - - -def plot_iou_recall(recalls, iou_thrs): - """Plot IoU-Recalls curve. - - Args: - recalls(ndarray or list): shape (k,) - iou_thrs(ndarray or list): same shape as `recalls` - """ - if isinstance(iou_thrs, np.ndarray): - _iou_thrs = iou_thrs.tolist() - else: - _iou_thrs = iou_thrs - if isinstance(recalls, np.ndarray): - _recalls = recalls.tolist() - else: - _recalls = recalls - - import matplotlib.pyplot as plt - - f = plt.figure() - plt.plot(_iou_thrs + [1.0], _recalls + [0.0]) - plt.xlabel("IoU") - plt.ylabel("Recall") - plt.axis([iou_thrs.min(), 1, 0, 1]) - f.show() diff --git a/det3d/core/fp16/__init__.py b/det3d/core/fp16/__init__.py deleted file mode 100644 index eb76323..0000000 --- a/det3d/core/fp16/__init__.py +++ /dev/null @@ -1,4 +0,0 @@ -from .decorators import auto_fp16, force_fp32 -from .hooks import Fp16OptimizerHook, wrap_fp16_model - -__all__ = ["auto_fp16", "force_fp32", "Fp16OptimizerHook", "wrap_fp16_model"] diff --git a/det3d/core/fp16/decorators.py b/det3d/core/fp16/decorators.py deleted file mode 100644 index e7915bc..0000000 --- a/det3d/core/fp16/decorators.py +++ /dev/null @@ -1,151 +0,0 @@ -import functools -from inspect import getfullargspec - -import torch - -from .utils import cast_tensor_type - - -def auto_fp16(apply_to=None, out_fp32=False): - """Decorator to enable fp16 training automatically. - This decorator is useful when you write custom modules and want to support - mixed precision training. If inputs arguments are fp32 tensors, they will - be converted to fp16 automatically. Arguments other than fp32 tensors are - ignored. - Args: - apply_to (Iterable, optional): The argument names to be converted. - `None` indicates all arguments. - out_fp32 (bool): Whether to convert the output back to fp32. - :Example: - class MyModule1(nn.Module) - # Convert x and y to fp16 - @auto_fp16() - def forward(self, x, y): - pass - class MyModule2(nn.Module): - # convert pred to fp16 - @auto_fp16(apply_to=('pred', )) - def do_something(self, pred, others): - pass - """ - - def auto_fp16_wrapper(old_func): - @functools.wraps(old_func) - def new_func(*args, **kwargs): - # check if the module has set the attribute `fp16_enabled`, if not, - # just fallback to the original method. - if not isinstance(args[0], torch.nn.Module): - raise TypeError( - "@auto_fp16 can only be used to decorate the " "method of nn.Module" - ) - if not (hasattr(args[0], "fp16_enabled") and args[0].fp16_enabled): - return old_func(*args, **kwargs) - # get the arg spec of the decorated method - args_info = getfullargspec(old_func) - # get the argument names to be casted - args_to_cast = args_info.args if apply_to is None else apply_to - # convert the args that need to be processed - new_args = [] - # NOTE: default args are not taken into consideration - if args: - arg_names = args_info.args[: len(args)] - for i, arg_name in enumerate(arg_names): - if arg_name in args_to_cast: - new_args.append( - cast_tensor_type(args[i], torch.float, torch.half) - ) - else: - new_args.append(args[i]) - # convert the kwargs that need to be processed - new_kwargs = {} - if kwargs: - for arg_name, arg_value in kwargs.items(): - if arg_name in args_to_cast: - new_kwargs[arg_name] = cast_tensor_type( - arg_value, torch.float, torch.half - ) - else: - new_kwargs[arg_name] = arg_value - # apply converted arguments to the decorated method - output = old_func(*new_args, **new_kwargs) - # cast the results back to fp32 if necessary - if out_fp32: - output = cast_tensor_type(output, torch.half, torch.float) - return output - - return new_func - - return auto_fp16_wrapper - - -def force_fp32(apply_to=None, out_fp16=False): - """Decorator to convert input arguments to fp32 in force. - This decorator is useful when you write custom modules and want to support - mixed precision training. If there are some inputs that must be processed - in fp32 mode, then this decorator can handle it. If inputs arguments are - fp16 tensors, they will be converted to fp32 automatically. Arguments other - than fp16 tensors are ignored. - Args: - apply_to (Iterable, optional): The argument names to be converted. - `None` indicates all arguments. - out_fp16 (bool): Whether to convert the output back to fp16. - :Example: - class MyModule1(nn.Module) - # Convert x and y to fp32 - @force_fp32() - def loss(self, x, y): - pass - class MyModule2(nn.Module): - # convert pred to fp32 - @force_fp32(apply_to=('pred', )) - def post_process(self, pred, others): - pass - """ - - def force_fp32_wrapper(old_func): - @functools.wraps(old_func) - def new_func(*args, **kwargs): - # check if the module has set the attribute `fp16_enabled`, if not, - # just fallback to the original method. - if not isinstance(args[0], torch.nn.Module): - raise TypeError( - "@force_fp32 can only be used to decorate the " - "method of nn.Module" - ) - if not (hasattr(args[0], "fp16_enabled") and args[0].fp16_enabled): - return old_func(*args, **kwargs) - # get the arg spec of the decorated method - args_info = getfullargspec(old_func) - # get the argument names to be casted - args_to_cast = args_info.args if apply_to is None else apply_to - # convert the args that need to be processed - new_args = [] - if args: - arg_names = args_info.args[: len(args)] - for i, arg_name in enumerate(arg_names): - if arg_name in args_to_cast: - new_args.append( - cast_tensor_type(args[i], torch.half, torch.float) - ) - else: - new_args.append(args[i]) - # convert the kwargs that need to be processed - new_kwargs = dict() - if kwargs: - for arg_name, arg_value in kwargs.items(): - if arg_name in args_to_cast: - new_kwargs[arg_name] = cast_tensor_type( - arg_value, torch.half, torch.float - ) - else: - new_kwargs[arg_name] = arg_value - # apply converted arguments to the decorated method - output = old_func(*new_args, **new_kwargs) - # cast the results back to fp32 if necessary - if out_fp16: - output = cast_tensor_type(output, torch.float, torch.half) - return output - - return new_func - - return force_fp32_wrapper diff --git a/det3d/core/fp16/hooks.py b/det3d/core/fp16/hooks.py deleted file mode 100644 index 12f3a60..0000000 --- a/det3d/core/fp16/hooks.py +++ /dev/null @@ -1,124 +0,0 @@ -import copy - -import torch -import torch.nn as nn -from det3d.torchie.trainer import OptimizerHook - -from ..utils.dist_utils import allreduce_grads -from .utils import cast_tensor_type - - -class Fp16OptimizerHook(OptimizerHook): - """FP16 optimizer hook. - The steps of fp16 optimizer is as follows. - 1. Scale the loss value. - 2. BP in the fp16 model. - 2. Copy gradients from fp16 model to fp32 weights. - 3. Update fp32 weights. - 4. Copy updated parameters from fp32 weights to fp16 model. - Refer to https://arxiv.org/abs/1710.03740 for more details. - Args: - loss_scale (float): Scale factor multiplied with loss. - """ - - def __init__( - self, - grad_clip=None, - coalesce=True, - bucket_size_mb=-1, - loss_scale=512.0, - distributed=True, - ): - self.grad_clip = grad_clip - self.coalesce = coalesce - self.bucket_size_mb = bucket_size_mb - self.loss_scale = loss_scale - self.distributed = distributed - - def before_run(self, runner): - # keep a copy of fp32 weights - runner.optimizer.param_groups = copy.deepcopy(runner.optimizer.param_groups) - # convert model to fp16 - wrap_fp16_model(runner.model) - - def copy_grads_to_fp32(self, fp16_net, fp32_weights): - """Copy gradients from fp16 model to fp32 weight copy.""" - for fp32_param, fp16_param in zip(fp32_weights, fp16_net.parameters()): - if fp16_param.grad is not None: - if fp32_param.grad is None: - fp32_param.grad = fp32_param.data.new(fp32_param.size()) - fp32_param.grad.copy_(fp16_param.grad) - - def copy_params_to_fp16(self, fp16_net, fp32_weights): - """Copy updated params from fp32 weight copy to fp16 model.""" - for fp16_param, fp32_param in zip(fp16_net.parameters(), fp32_weights): - fp16_param.data.copy_(fp32_param.data) - - def after_train_iter(self, runner): - # clear grads of last iteration - runner.model.zero_grad() - runner.optimizer.zero_grad() - # scale the loss value - scaled_loss = runner.outputs["loss"] * self.loss_scale - scaled_loss.backward() - # copy fp16 grads in the model to fp32 params in the optimizer - fp32_weights = [] - for param_group in runner.optimizer.param_groups: - fp32_weights += param_group["params"] - self.copy_grads_to_fp32(runner.model, fp32_weights) - # allreduce grads - if self.distributed: - allreduce_grads(fp32_weights, self.coalesce, self.bucket_size_mb) - # scale the gradients back - for param in fp32_weights: - if param.grad is not None: - param.grad.div_(self.loss_scale) - if self.grad_clip is not None: - self.clip_grads(fp32_weights) - # update fp32 params - runner.optimizer.step() - # copy fp32 params to the fp16 model - self.copy_params_to_fp16(runner.model, fp32_weights) - - -def wrap_fp16_model(model): - # convert model to fp16 - model.half() - # patch the normalization layers to make it work in fp32 mode - patch_norm_fp32(model) - # set `fp16_enabled` flag - for m in model.modules(): - if hasattr(m, "fp16_enabled"): - m.fp16_enabled = True - - -def patch_norm_fp32(module): - if isinstance(module, (nn.modules.batchnorm._BatchNorm, nn.GroupNorm)): - module.float() - module.forward = patch_forward_method(module.forward, torch.half, torch.float) - for child in module.children(): - patch_norm_fp32(child) - return module - - -def patch_forward_method(func, src_type, dst_type, convert_output=True): - """Patch the forward method of a module. - Args: - func (callable): The original forward method. - src_type (torch.dtype): Type of input arguments to be converted from. - dst_type (torch.dtype): Type of input arguments to be converted to. - convert_output (bool): Whether to convert the output back to src_type. - Returns: - callable: The patched forward method. - """ - - def new_forward(*args, **kwargs): - output = func( - *cast_tensor_type(args, src_type, dst_type), - **cast_tensor_type(kwargs, src_type, dst_type) - ) - if convert_output: - output = cast_tensor_type(output, dst_type, src_type) - return output - - return new_forward diff --git a/det3d/core/fp16/utils.py b/det3d/core/fp16/utils.py deleted file mode 100644 index ff370c4..0000000 --- a/det3d/core/fp16/utils.py +++ /dev/null @@ -1,23 +0,0 @@ -from collections import abc - -import numpy as np -import torch - - -def cast_tensor_type(inputs, src_type, dst_type): - if isinstance(inputs, torch.Tensor): - return inputs.to(dst_type) - elif isinstance(inputs, str): - return inputs - elif isinstance(inputs, np.ndarray): - return inputs - elif isinstance(inputs, abc.Mapping): - return type(inputs)( - {k: cast_tensor_type(v, src_type, dst_type) for k, v in inputs.items()} - ) - elif isinstance(inputs, abc.Iterable): - return type(inputs)( - cast_tensor_type(item, src_type, dst_type) for item in inputs - ) - else: - return inputs diff --git a/det3d/core/input/voxel_generator.py b/det3d/core/input/voxel_generator.py index cef088e..4164469 100644 --- a/det3d/core/input/voxel_generator.py +++ b/det3d/core/input/voxel_generator.py @@ -16,14 +16,17 @@ def __init__(self, voxel_size, point_cloud_range, max_num_points, max_voxels=200 self._max_voxels = max_voxels self._grid_size = grid_size - def generate(self, points, max_voxels=20000): + def generate(self, points, max_voxels=-1): + if max_voxels == -1: + max_voxels=self._max_voxels + return points_to_voxel( points, self._voxel_size, self._point_cloud_range, self._max_num_points, True, - self._max_voxels, + max_voxels, ) @property diff --git a/det3d/core/sampler/preprocess.py b/det3d/core/sampler/preprocess.py index 4e5da2b..071c3cd 100644 --- a/det3d/core/sampler/preprocess.py +++ b/det3d/core/sampler/preprocess.py @@ -105,31 +105,6 @@ def __call__(self, db_infos): return db_infos -def random_crop_frustum( - bboxes, rect, Trv2c, P2, max_crop_height=1.0, max_crop_width=0.9 -): - num_gt = bboxes.shape[0] - crop_minxy = np.random.uniform( - [1 - max_crop_width, 1 - max_crop_height], [0.3, 0.3], size=[num_gt, 2] - ) - crop_maxxy = np.ones([num_gt, 2], dtype=bboxes.dtype) - crop_bboxes = np.concatenate([crop_minxy, crop_maxxy], axis=1) - left = np.random.choice([False, True], replace=False, p=[0.5, 0.5]) - if left: - crop_bboxes[:, [0, 2]] -= crop_bboxes[:, 0:1] - # crop_relative_bboxes to real bboxes - crop_bboxes *= np.tile(bboxes[:, 2:] - bboxes[:, :2], [1, 2]) - crop_bboxes += np.tile(bboxes[:, :2], [1, 2]) - C, R, T = box_np_ops.projection_matrix_to_CRT_kitti(P2) - frustums = box_np_ops.get_frustum_v2(crop_bboxes, C) - frustums -= T - # frustums = np.linalg.inv(R) @ frustums.T - frustums = np.einsum("ij, akj->aki", np.linalg.inv(R), frustums) - frustums = box_np_ops.camera_to_lidar(frustums, rect, Trv2c) - - return frustums - - def filter_gt_box_outside_range(gt_boxes, limit_range): """remove gtbox outside training range. this function should be applied after other prep functions diff --git a/det3d/core/sampler/sample_ops.py b/det3d/core/sampler/sample_ops.py index 20f918c..d507460 100644 --- a/det3d/core/sampler/sample_ops.py +++ b/det3d/core/sampler/sample_ops.py @@ -173,38 +173,14 @@ def sample_all( if len(sampled) > 0: sampled_gt_boxes = np.concatenate(sampled_gt_boxes, axis=0) - if road_planes is not None: - # Only support KITTI - # image plane - assert False, "Not correct yet!" - a, b, c, d = road_planes - - center = sampled_gt_boxes[:, :3] - center[:, 2] -= sampled_gt_boxes[:, 5] / 2 - center_cam = box_np_ops.lidar_to_camera(center, calib["rect"], calib["Trv2c"]) - - cur_height_cam = (-d - a * center_cam[:, 0] - c * center_cam[:, 2]) / b - center_cam[:, 1] = cur_height_cam - lidar_tmp_point = box_np_ops.camera_to_lidar(center_cam, calib["rect"], calib["Trv2c"]) - cur_lidar_height = lidar_tmp_point[:, 2] - - # botom to middle center - # kitti [0.5, 0.5, 0] center to [0.5, 0.5, 0.5] - sampled_gt_boxes[:, 2] = cur_lidar_height + sampled_gt_boxes[:, 5] / 2 - - # mv_height = sampled_gt_boxes[:, 2] - cur_lidar_height - # sampled_gt_boxes[:, 2] -= mv_height - num_sampled = len(sampled) s_points_list = [] for info in sampled: try: - # TODO fix point read error s_points = np.fromfile( str(pathlib.Path(root_path) / info["path"]), dtype=np.float32 ).reshape(-1, num_point_features) - # if not add_rgb_to_points: - # s_points = s_points[:, :4] + if "rot_transform" in info: rot = info["rot_transform"] s_points[:, :3] = box_np_ops.rotation_points_single_angle( @@ -216,9 +192,6 @@ def sample_all( except Exception: print(str(pathlib.Path(root_path) / info["path"])) continue - # gt_bboxes = np.stack([s["bbox"] for s in sampled], axis=0) - # if np.random.choice([False, True], replace=False, p=[0.3, 0.7]): - # do random crop. if random_crop: s_points_list_new = [] assert calib is not None diff --git a/det3d/core/utils/center_utils.py b/det3d/core/utils/center_utils.py index ce8f3a7..8edc421 100644 --- a/det3d/core/utils/center_utils.py +++ b/det3d/core/utils/center_utils.py @@ -2,7 +2,7 @@ # Copyright (c) Microsoft # Licensed under the MIT License. # Written by Bin Xiao (Bin.Xiao@microsoft.com) -# Modified by Xingyi Zhou +# Modified by Xingyi Zhou and Tianwei Yin # ------------------------------------------------------------------------------ from __future__ import absolute_import @@ -10,118 +10,31 @@ from __future__ import print_function import numpy as np -import cv2 -import random import torch from torch import nn from .circle_nms_jit import circle_nms -def flip(img): - return img[:, :, ::-1].copy() - -def transform_preds(coords, center, scale, output_size): - target_coords = np.zeros(coords.shape) - trans = get_affine_transform(center, scale, 0, output_size, inv=1) - for p in range(coords.shape[0]): - target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans) - return target_coords - - -def get_affine_transform(center, - scale, - rot, - output_size, - shift=np.array([0, 0], dtype=np.float32), - inv=0): - if not isinstance(scale, np.ndarray) and not isinstance(scale, list): - scale = np.array([scale, scale], dtype=np.float32) - - scale_tmp = scale - src_w = scale_tmp[0] - dst_w = output_size[0] - dst_h = output_size[1] - - rot_rad = np.pi * rot / 180 - src_dir = get_dir([0, src_w * -0.5], rot_rad) - dst_dir = np.array([0, dst_w * -0.5], np.float32) - - src = np.zeros((3, 2), dtype=np.float32) - dst = np.zeros((3, 2), dtype=np.float32) - src[0, :] = center + scale_tmp * shift - src[1, :] = center + src_dir + scale_tmp * shift - dst[0, :] = [dst_w * 0.5, dst_h * 0.5] - dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5], np.float32) + dst_dir - - src[2:, :] = get_3rd_point(src[0, :], src[1, :]) - dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :]) - - if inv: - trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) - else: - trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) - - return trans - - -def affine_transform(pt, t): - new_pt = np.array([pt[0], pt[1], 1.], dtype=np.float32).T - new_pt = np.dot(t, new_pt) - return new_pt[:2] - - -def get_3rd_point(a, b): - direct = a - b - return b + np.array([-direct[1], direct[0]], dtype=np.float32) - - -def get_dir(src_point, rot_rad): - sn, cs = np.sin(rot_rad), np.cos(rot_rad) - - src_result = [0, 0] - src_result[0] = src_point[0] * cs - src_point[1] * sn - src_result[1] = src_point[0] * sn + src_point[1] * cs - - return src_result - - -def crop(img, center, scale, output_size, rot=0): - trans = get_affine_transform(center, scale, rot, output_size) - - dst_img = cv2.warpAffine(img, - trans, - (int(output_size[0]), int(output_size[1])), - flags=cv2.INTER_LINEAR) - - return dst_img - - def gaussian_radius(det_size, min_overlap=0.5): - height, width = det_size - - a1 = 1 - b1 = (height + width) - c1 = width * height * (1 - min_overlap) / (1 + min_overlap) - sq1 = np.sqrt(b1 ** 2 - 4 * a1 * c1) - r1 = (b1 + sq1) / 2 - - a2 = 4 - b2 = 2 * (height + width) - c2 = (1 - min_overlap) * width * height - sq2 = np.sqrt(b2 ** 2 - 4 * a2 * c2) - r2 = (b2 + sq2) / 2 - - a3 = 4 * min_overlap - b3 = -2 * min_overlap * (height + width) - c3 = (min_overlap - 1) * width * height - sq3 = np.sqrt(b3 ** 2 - 4 * a3 * c3) - r3 = (b3 + sq3) / 2 - return min(r1, r2, r3) - - -def simple_radius(det_size): - height, width = det_size - return np.sqrt(height * width) - + height, width = det_size + + a1 = 1 + b1 = (height + width) + c1 = width * height * (1 - min_overlap) / (1 + min_overlap) + sq1 = np.sqrt(b1 ** 2 - 4 * a1 * c1) + r1 = (b1 + sq1) / 2 + + a2 = 4 + b2 = 2 * (height + width) + c2 = (1 - min_overlap) * width * height + sq2 = np.sqrt(b2 ** 2 - 4 * a2 * c2) + r2 = (b2 + sq2) / 2 + + a3 = 4 * min_overlap + b3 = -2 * min_overlap * (height + width) + c3 = (min_overlap - 1) * width * height + sq3 = np.sqrt(b3 ** 2 - 4 * a3 * c3) + r3 = (b3 + sq3) / 2 + return min(r1, r2, r3) def gaussian2D(shape, sigma=1): m, n = [(ss - 1.) / 2. for ss in shape] @@ -133,111 +46,21 @@ def gaussian2D(shape, sigma=1): def draw_umich_gaussian(heatmap, center, radius, k=1): - diameter = 2 * radius + 1 - gaussian = gaussian2D((diameter, diameter), sigma=diameter / 6) - - x, y = int(center[0]), int(center[1]) - - height, width = heatmap.shape[0:2] - - left, right = min(x, radius), min(width - x, radius + 1) - top, bottom = min(y, radius), min(height - y, radius + 1) - - masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] - masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:radius + right] - if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0: # TODO debug - np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap) - return heatmap - + diameter = 2 * radius + 1 + gaussian = gaussian2D((diameter, diameter), sigma=diameter / 6) -def draw_dense_reg(regmap, heatmap, center, value, radius, is_offset=False): - diameter = 2 * radius + 1 - gaussian = gaussian2D((diameter, diameter), sigma=diameter / 6) - value = np.array(value, dtype=np.float32).reshape(-1, 1, 1) - dim = value.shape[0] - reg = np.ones((dim, diameter*2+1, diameter*2+1), dtype=np.float32) * value - if is_offset and dim == 2: - delta = np.arange(diameter*2+1) - radius - reg[0] = reg[0] - delta.reshape(1, -1) - reg[1] = reg[1] - delta.reshape(-1, 1) - - x, y = int(center[0]), int(center[1]) + x, y = int(center[0]), int(center[1]) - height, width = heatmap.shape[0:2] - - left, right = min(x, radius), min(width - x, radius + 1) - top, bottom = min(y, radius), min(height - y, radius + 1) + height, width = heatmap.shape[0:2] - masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] - masked_regmap = regmap[:, y - top:y + bottom, x - left:x + right] - masked_gaussian = gaussian[radius - top:radius + bottom, - radius - left:radius + right] - masked_reg = reg[:, radius - top:radius + bottom, - radius - left:radius + right] - if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0: # TODO debug - idx = (masked_gaussian >= masked_heatmap).reshape( - 1, masked_gaussian.shape[0], masked_gaussian.shape[1]) - masked_regmap = (1-idx) * masked_regmap + idx * masked_reg - regmap[:, y - top:y + bottom, x - left:x + right] = masked_regmap - return regmap + left, right = min(x, radius), min(width - x, radius + 1) + top, bottom = min(y, radius), min(height - y, radius + 1) - -def draw_msra_gaussian(heatmap, center, sigma): - tmp_size = sigma * 3 - mu_x = int(center[0] + 0.5) - mu_y = int(center[1] + 0.5) - w, h = heatmap.shape[0], heatmap.shape[1] - ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)] - br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)] - if ul[0] >= h or ul[1] >= w or br[0] < 0 or br[1] < 0: + masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] + masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:radius + right] + if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0: # TODO debug + np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap) return heatmap - size = 2 * tmp_size + 1 - x = np.arange(0, size, 1, np.float32) - y = x[:, np.newaxis] - x0 = y0 = size // 2 - g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * sigma ** 2)) - g_x = max(0, -ul[0]), min(br[0], h) - ul[0] - g_y = max(0, -ul[1]), min(br[1], w) - ul[1] - img_x = max(0, ul[0]), min(br[0], h) - img_y = max(0, ul[1]), min(br[1], w) - heatmap[img_y[0]:img_y[1], img_x[0]:img_x[1]] = np.maximum( - heatmap[img_y[0]:img_y[1], img_x[0]:img_x[1]], - g[g_y[0]:g_y[1], g_x[0]:g_x[1]]) - return heatmap - -def grayscale(image): - return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) - -def lighting_(data_rng, image, alphastd, eigval, eigvec): - alpha = data_rng.normal(scale=alphastd, size=(3, )) - image += np.dot(eigvec, eigval * alpha) - -def blend_(alpha, image1, image2): - image1 *= alpha - image2 *= (1 - alpha) - image1 += image2 - -def saturation_(data_rng, image, gs, gs_mean, var): - alpha = 1. + data_rng.uniform(low=-var, high=var) - blend_(alpha, image, gs[:, :, None]) - -def brightness_(data_rng, image, gs, gs_mean, var): - alpha = 1. + data_rng.uniform(low=-var, high=var) - image *= alpha - -def contrast_(data_rng, image, gs, gs_mean, var): - alpha = 1. + data_rng.uniform(low=-var, high=var) - blend_(alpha, image, gs_mean) - -def color_aug(data_rng, image, eig_val, eig_vec): - functions = [brightness_, contrast_, saturation_] - random.shuffle(functions) - - gs = grayscale(image) - gs_mean = gs.mean() - for f in functions: - f(data_rng, image, gs, gs_mean, 0.4) - lighting_(data_rng, image, 0.1, eig_val, eig_vec) def _gather_feat(feat, ind, mask=None): dim = feat.size(2) @@ -255,158 +78,44 @@ def _transpose_and_gather_feat(feat, ind): feat = _gather_feat(feat, ind) return feat -def _nms(heat, kernel=3): - pad = (kernel - 1) // 2 - - hmax = nn.functional.max_pool2d( - heat, (kernel, kernel), stride=1, padding=pad) - keep = (hmax == heat).float() - return heat * keep - - def _circle_nms(boxes, min_radius, post_max_size=83): - """ - NMS according to center distance - """ - keep = np.array(circle_nms(boxes.cpu().numpy(), thresh=min_radius))[:post_max_size] - - keep = torch.from_numpy(keep).long().to(boxes.device) - - return keep - -def _topk(scores, K=40): - batch, cat, height, width = scores.size() - - topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K) - - topk_inds = topk_inds % (height * width) - topk_ys = (topk_inds / width).int().float() - topk_xs = (topk_inds % width).int().float() - - topk_score, topk_ind = torch.topk(topk_scores.view(batch, -1), K) - topk_clses = (topk_ind / K).int() - topk_inds = _gather_feat( - topk_inds.view(batch, -1, 1), topk_ind).view(batch, K) - topk_ys = _gather_feat(topk_ys.view(batch, -1, 1), topk_ind).view(batch, K) - topk_xs = _gather_feat(topk_xs.view(batch, -1, 1), topk_ind).view(batch, K) - - return topk_score, topk_inds, topk_clses, topk_ys, topk_xs - -def ddd_decode(heat, rots, rotc, hei, dim, dir_preds, vel, direction_offset=0, reg=None, \ - post_center_range=None, K=100, score_threshold=None, cfg=None, raw_rot=False, task_id=-1): - - # print(heat.shape, rots.shape, hei.shape, dim.shape) - batch, cat, _, _ = heat.size() - # heat = torch.sigmoid(heat) - # perform nms on heatmaps - - # TODO: Add Comments to explain this - maxpool = cfg.get('max_pool_nms', False) or (cfg.get('circle_nms', False) and (cfg.min_radius[task_id] == -1)) - use_circle_nms = cfg.get('circle_nms', False) and (cfg.min_radius[task_id] != -1) - - if maxpool: - heat = _nms(heat) - - scores, inds, clses, ys, xs = _topk(heat, K=K) - - # scores, inds, clses, ys, xs, K = _circle_nms(scores, inds, clses, ys, xs, cfg, task_id) - - if reg is not None: - reg = _transpose_and_gather_feat(reg, inds) - reg = reg.view(batch, K, 2) - xs = xs.view(batch, K, 1) + reg[:, :, 0:1] - ys = ys.view(batch, K, 1) + reg[:, :, 1:2] - else: - xs = xs.view(batch, K, 1) + 0.5 - ys = ys.view(batch, K, 1) + 0.5 - - # rotation value and direction label - if not raw_rot: - rots = _transpose_and_gather_feat(rots, inds) - rots = rots.view(batch, K, 1) - - rotc = _transpose_and_gather_feat(rotc, inds) - rotc = rotc.view(batch, K, 1) - rot = torch.atan2(rots, rotc) - else: - rot = _transpose_and_gather_feat(rots, inds) - - dir_preds = _transpose_and_gather_feat(dir_preds, inds).view(batch, K, 2) - - # height in the bev - hei = _transpose_and_gather_feat(hei, inds) - hei = hei.view(batch, K, 1) - - # dim of the box - dim = _transpose_and_gather_feat(dim, inds) - dim = dim.view(batch, K, 3) - - # class label - clses = clses.view(batch, K).float() - scores = scores.view(batch, K) - - # center location - pc_range = cfg.pc_range - - xs = xs.view(batch, K, 1) * cfg.out_size_factor * cfg.voxel_size[0] + pc_range[0] - ys = ys.view(batch, K, 1) * cfg.out_size_factor * cfg.voxel_size[1] + pc_range[1] - - if vel is None: # KITTI FORMAT - final_box_preds = torch.cat( - [xs, ys, hei, dim, rot], dim=2 - ) - else: # exist velocity, nuscene format - vel = _transpose_and_gather_feat(vel, inds) - vel = vel.view(batch, K, 2) - final_box_preds = torch.cat( - [xs, ys, hei, dim, vel, rot], dim=2 - ) - - final_scores = scores - final_preds = clses - - # use score threshold - if score_threshold is not None: - thresh_mask = final_scores > score_threshold - - if post_center_range is not None: - mask = (final_box_preds[..., :3] >= post_center_range[:3]).all(2) - mask &= (final_box_preds[..., :3] <= post_center_range[3:]).all(2) - - predictions_dicts = [] - for i in range(batch): - cmask = mask[i, :] - if score_threshold: - cmask &= thresh_mask[i] - - boxes3d = final_box_preds[i, cmask] - scores = final_scores[i, cmask] - labels = final_preds[i, cmask] - - if use_circle_nms: - centers = boxes3d[:, [0, 1]] - boxes = torch.cat([centers, scores.view(-1, 1)], dim=1) - keep = _circle_nms(boxes, min_radius=cfg.min_radius[task_id], post_max_size=cfg.post_max_size) - - boxes3d = boxes3d[keep] - scores = scores[keep] - labels = labels[keep] - - predictions_dict = { - "box3d_lidar": boxes3d, - "scores": scores, - "label_preds": labels - } - if raw_rot: - predictions_dict['dir_preds'] = dir_preds[i, cmask] - - predictions_dicts.append(predictions_dict) - else: - raise NotImplementedError("Need to reorganize output as a batch so only the first if part is supported for now!") - predictions_dict = { - "box3d_lidar": final_box_preds, - "scores": final_scores, - "label_preds": final_preds - } - - return predictions_dicts + """ + NMS according to center distance + """ + keep = np.array(circle_nms(boxes.cpu().numpy(), thresh=min_radius))[:post_max_size] + + keep = torch.from_numpy(keep).long().to(boxes.device) + + return keep + + +def bilinear_interpolate_torch(im, x, y): + """ + Args: + im: (H, W, C) [y, x] + x: (N) + y: (N) + Returns: + """ + x0 = torch.floor(x).long() + x1 = x0 + 1 + + y0 = torch.floor(y).long() + y1 = y0 + 1 + + x0 = torch.clamp(x0, 0, im.shape[1] - 1) + x1 = torch.clamp(x1, 0, im.shape[1] - 1) + y0 = torch.clamp(y0, 0, im.shape[0] - 1) + y1 = torch.clamp(y1, 0, im.shape[0] - 1) + + Ia = im[y0, x0] + Ib = im[y1, x0] + Ic = im[y0, x1] + Id = im[y1, x1] + + wa = (x1.type_as(x) - x) * (y1.type_as(y) - y) + wb = (x1.type_as(x) - x) * (y - y0.type_as(y)) + wc = (x - x0.type_as(x)) * (y1.type_as(y) - y) + wd = (x - x0.type_as(x)) * (y - y0.type_as(y)) + ans = torch.t((torch.t(Ia) * wa)) + torch.t(torch.t(Ib) * wb) + torch.t(torch.t(Ic) * wc) + torch.t(torch.t(Id) * wd) + return ans diff --git a/det3d/datasets/nuscenes/nusc_common.py b/det3d/datasets/nuscenes/nusc_common.py index 2af2c1e..84e07c4 100644 --- a/det3d/datasets/nuscenes/nusc_common.py +++ b/det3d/datasets/nuscenes/nusc_common.py @@ -1,11 +1,9 @@ -import os.path as osp import numpy as np import pickle -import random from pathlib import Path from functools import reduce -from typing import Tuple, List +from typing import List from tqdm import tqdm from pyquaternion import Quaternion @@ -13,8 +11,6 @@ try: from nuscenes import NuScenes from nuscenes.utils import splits - from nuscenes.utils.data_classes import LidarPointCloud - from nuscenes.utils.geometry_utils import transform_matrix from nuscenes.utils.data_classes import Box from nuscenes.eval.detection.config import config_factory from nuscenes.eval.detection.evaluate import NuScenesEval @@ -160,69 +156,6 @@ }, } - -def box_velocity( - nusc, sample_annotation_token: str, max_time_diff: float = 1.5 -) -> np.ndarray: - """ - Estimate the velocity for an annotation. - If possible, we compute the centered difference between the previous and next frame. - Otherwise we use the difference between the current and previous/next frame. - If the velocity cannot be estimated, values are set to np.nan. - :param sample_annotation_token: Unique sample_annotation identifier. - :param max_time_diff: Max allowed time diff between consecutive samples that are used to estimate velocities. - :return: . Velocity in x/y/z direction in m/s. - """ - - current = nusc.get("sample_annotation", sample_annotation_token) - has_prev = current["prev"] != "" - has_next = current["next"] != "" - - # Cannot estimate velocity for a single annotation. - if not has_prev and not has_next: - return np.array([np.nan, np.nan, np.nan]) - - if has_prev: - first = nusc.get("sample_annotation", current["prev"]) - else: - first = current - - if has_next: - last = nusc.get("sample_annotation", current["next"]) - else: - last = current - - pos_last = np.array(last["translation"]) - pos_first = np.array(first["translation"]) - pos_diff = pos_last - pos_first - - time_last = 1e-6 * nusc.get("sample", last["sample_token"])["timestamp"] - time_first = 1e-6 * nusc.get("sample", first["sample_token"])["timestamp"] - time_diff = time_last - time_first - - if has_next and has_prev: - # If doing centered difference, allow for up to double the max_time_diff. - max_time_diff *= 2 - - if time_diff > max_time_diff: - # If time_diff is too big, don't return an estimate. - return np.array([np.nan, np.nan, np.nan]) - else: - return pos_diff / time_diff - - -def remove_close(points, radius: float) -> None: - """ - Removes point too close within a certain radius from origin. - :param radius: Radius below which points are removed. - """ - x_filt = np.abs(points[0, :]) < radius - y_filt = np.abs(points[1, :]) < radius - not_close = np.logical_not(np.logical_and(x_filt, y_filt)) - points = points[:, not_close] - return points - - def _second_det_to_nusc_box(detection): box3d = detection["box3d_lidar"].detach().cpu().numpy() scores = detection["scores"].detach().cpu().numpy() @@ -253,10 +186,8 @@ def _lidar_nusc_box_to_global(nusc, boxes, sample_token): sd_record = nusc.get("sample_data", sample_data_token) cs_record = nusc.get("calibrated_sensor", sd_record["calibrated_sensor_token"]) - sensor_record = nusc.get("sensor", cs_record["sensor_token"]) pose_record = nusc.get("ego_pose", sd_record["ego_pose_token"]) - data_path = nusc.get_sample_data_path(sample_data_token) box_list = [] for box in boxes: # Move box to ego vehicle coord system @@ -286,10 +217,6 @@ def _get_available_scenes(nusc): break else: break - if not sd_rec["next"] == "": - sd_rec = nusc.get("sample_data", sd_rec["next"]) - else: - has_more_frames = False if scene_not_exist: continue available_scenes.append(scene) @@ -318,10 +245,8 @@ def get_sample_data( if sensor_record["modality"] == "camera": cam_intrinsic = np.array(cs_record["camera_intrinsic"]) - imsize = (sd_record["width"], sd_record["height"]) else: cam_intrinsic = None - imsize = None # Retrieve all sample annotations and map to sensor coordinate system. if selected_anntokens is not None: @@ -332,7 +257,7 @@ def get_sample_data( # Make list of Box objects including coord system transforms. box_list = [] for box in boxes: - + box.velocity = nusc.box_velocity(box.token) # Move box to ego vehicle coord system box.translate(-np.array(pose_record["translation"])) box.rotate(Quaternion(pose_record["rotation"]).inverse) @@ -346,32 +271,6 @@ def get_sample_data( return data_path, box_list, cam_intrinsic -def get_sample_ground_plane(root_path, version): - nusc = NuScenes(version=version, dataroot=root_path, verbose=True) - rets = {} - - for sample in tqdm(nusc.sample): - chan = "LIDAR_TOP" - sd_token = sample["data"][chan] - sd_rec = nusc.get("sample_data", sd_token) - - lidar_path, _, _ = get_sample_data(nusc, sd_token) - points = read_file(lidar_path) - points = np.concatenate((points[:, :3], np.ones((points.shape[0], 1))), axis=1) - - plane, inliers, outliers = fit_plane_LSE_RANSAC( - points, return_outlier_list=True - ) - - xx = points[:, 0] - yy = points[:, 1] - zz = (-plane[0] * xx - plane[1] * yy - plane[3]) / plane[2] - - rets.update({sd_token: {"plane": plane, "height": zz,}}) - - with open(nusc.root_path / "infos_trainval_ground_plane.pkl", "wb") as f: - pickle.dump(rets, f) - def _fill_trainval_infos(nusc, train_scenes, val_scenes, test=False, nsweeps=10, filter_zero=True): from nuscenes.utils.geometry_utils import transform_matrix @@ -484,30 +383,6 @@ def _fill_trainval_infos(nusc, train_scenes, val_scenes, test=False, nsweeps=10, len(info["sweeps"]) == nsweeps - 1 ), f"sweep {curr_sd_rec['token']} only has {len(info['sweeps'])} sweeps, you should duplicate to sweep num {nsweeps-1}" """ read from api """ - # sd_record = nusc.get('sample_data', sample['data']['LIDAR_TOP']) - # - # # Get boxes in lidar frame. - # lidar_path, boxes, cam_intrinsic = nusc.get_sample_data( - # sample['data']['LIDAR_TOP']) - # - # # Get aggregated point cloud in lidar frame. - # sample_rec = nusc.get('sample', sd_record['sample_token']) - # chan = sd_record['channel'] - # ref_chan = 'LIDAR_TOP' - # pc, times = LidarPointCloud.from_file_multisweep(nusc, - # sample_rec, - # chan, - # ref_chan, - # nsweeps=nsweeps) - # lidar_path = osp.join(nusc.dataroot, "sample_10sweeps/LIDAR_TOP", - # sample['data']['LIDAR_TOP'] + ".bin") - # pc.points.astype('float32').tofile(open(lidar_path, "wb")) - # - # info = { - # "lidar_path": lidar_path, - # "token": sample["token"], - # # "timestamp": times, - # } if not test: annotations = [ @@ -631,18 +506,6 @@ def create_nuscenes_infos(root_path, version="v1.0-trainval", nsweeps=10, filter pickle.dump(val_nusc_infos, f) -def get_box_mean(info_path, class_name="vehicle.car"): - with open(info_path, "rb") as f: - nusc_infos = pickle.load(f) - - gt_boxes_list = [] - for info in nusc_infos: - mask = np.array([s == class_name for s in info["gt_names"]], dtype=np.bool_) - gt_boxes_list.append(info["gt_boxes"][mask].reshape(-1, 7)) - gt_boxes_list = np.concatenate(gt_boxes_list, axis=0) - print(gt_boxes_list.mean(0)) - - def eval_main(nusc, eval_version, res_path, eval_set, output_dir): # nusc = NuScenes(version=version, dataroot=str(root_path), verbose=True) cfg = config_factory(eval_version) diff --git a/det3d/datasets/nuscenes/nuscenes.py b/det3d/datasets/nuscenes/nuscenes.py index 9859b15..8afeae8 100644 --- a/det3d/datasets/nuscenes/nuscenes.py +++ b/det3d/datasets/nuscenes/nuscenes.py @@ -16,7 +16,6 @@ print("nuScenes devkit not found!") from det3d.datasets.custom import PointCloudDataset -from det3d.datasets.utils.ground_plane_detection import fit_plane_LSE_RANSAC from det3d.datasets.nuscenes.nusc_common import ( general_to_detection, cls_attr_dist, @@ -48,9 +47,7 @@ def __init__( ) self.nsweeps = nsweeps - # print('self.nsweeps', self.nsweeps) assert self.nsweeps > 0, "At least input one sweep please!" - # assert self.nsweeps > 0, "At least input one sweep please!" print(self.nsweeps) self._info_path = info_path @@ -62,6 +59,10 @@ def __init__( self._num_point_features = NuScenesDataset.NumPointFeatures self._name_mapping = general_to_detection + self.painted = kwargs.get('painted', False) + if self.painted: + self._num_point_features += 10 + self.version = version self.eval_version = "detection_cvpr_2019" @@ -174,6 +175,7 @@ def get_sensor_data(self, idx): "calib": None, "cam": {}, "mode": "val" if self.test_mode else "train", + "painted": self.painted } data, _ = self.pipeline(res, info) diff --git a/det3d/datasets/nuscenes/plot_results.py b/det3d/datasets/nuscenes/plot_results.py deleted file mode 100644 index 668587b..0000000 --- a/det3d/datasets/nuscenes/plot_results.py +++ /dev/null @@ -1,211 +0,0 @@ -import os -import os.path as osp -import nuscenes -from nuscenes.eval.detection.data_classes import EvalBoxes, EvalBox -from nuscenes.utils.data_classes import LidarPointCloud -import json -import random -from det3d.deps.nuscenes.eval.detection.render import visualize_sample -import numpy as np -from matplotlib import pyplot as plt -from nuscenes import NuScenes -from nuscenes.eval.detection.constants import ( - TP_METRICS, - DETECTION_NAMES, - DETECTION_COLORS, - TP_METRICS_UNITS, - PRETTY_DETECTION_NAMES, - PRETTY_TP_METRICS, -) -from nuscenes.eval.detection.data_classes import EvalBoxes -from nuscenes.eval.detection.data_classes import MetricDataList, DetectionMetrics -from nuscenes.eval.detection.utils import boxes_to_sensor -from nuscenes.utils.data_classes import LidarPointCloud -from nuscenes.utils.geometry_utils import view_points -from collections import defaultdict - - -def from_file_multisweep( - nusc: "NuScenes", sample_data_token: str, -): - """ - Return a point cloud that aggregates multiple sweeps. - As every sweep is in a different coordinate frame, we need to map the coordinates to a single reference frame. - As every sweep has a different timestamp, we need to account for that in the transformations and timestamps. - :param nusc: A NuScenes instance. - :param sample_rec: The current sample. - :param chan: The radar channel from which we track back n sweeps to aggregate the point cloud. - :param ref_chan: The reference channel of the current sample_rec that the point clouds are mapped to. - :param nsweeps: Number of sweeps to aggregated. - :param min_distance: Distance below which points are discarded. - :return: (all_pc, all_times). The aggregated point cloud and timestamps. - """ - - # Init - # Aggregate current and previous sweeps. - current_sd_rec = nusc.get("sample_data", sample_data_token) - current_pc = LidarPointCloud.from_file( - osp.join(nusc.dataroot, current_sd_rec["filename"]) - ) - - return current_pc - - -def visualize_sample_data( - nusc: NuScenes, - sample_data_token: str, - pred_boxes: EvalBoxes, - nsweeps: int = 1, - conf_th: float = 0.15, - eval_range: float = 50, - verbose: bool = True, - savepath: str = None, -) -> None: - """ - Visualizes a sample from BEV with annotations and detection results. - :param nusc: NuScenes object. - :param sample_token: The nuScenes sample token. - :param gt_boxes: Ground truth boxes grouped by sample. - :param pred_boxes: Prediction grouped by sample. - :param nsweeps: Number of sweeps used for lidar visualization. - :param conf_th: The confidence threshold used to filter negatives. - :param eval_range: Range in meters beyond which boxes are ignored. - :param verbose: Whether to print to stdout. - :param savepath: If given, saves the the rendering here instead of displaying. - """ - # Retrieve sensor & pose records. - - sd_record = nusc.get("sample_data", sample_data_token) - cs_record = nusc.get("calibrated_sensor", sd_record["calibrated_sensor_token"]) - pose_record = nusc.get("ego_pose", sd_record["ego_pose_token"]) - - # Get boxes. - boxes_est_global = pred_boxes[sample_data_token] - - # Map EST boxes to lidar. - boxes_est = boxes_to_sensor(boxes_est_global, pose_record, cs_record) - - # Add scores to EST boxes. - for box_est, box_est_global in zip(boxes_est, boxes_est_global): - box_est.score = box_est_global.detection_score - - # Get point cloud in lidar frame. - pc = from_file_multisweep(nusc, sample_data_token) - # Init axes. - _, ax = plt.subplots(1, 1, figsize=(9, 9)) - - # Show point cloud. - points = view_points(pc.points[:3, :], np.eye(4), normalize=False) - dists = np.sqrt(np.sum(pc.points[:2, :] ** 2, axis=0)) - colors = np.minimum(1, dists / eval_range) - ax.scatter(points[0, :], points[1, :], c=colors, s=0.2) - - # Show ego vehicle. - ax.plot(0, 0, "x", color="black") - - # Show EST boxes. - for box in boxes_est: - # Show only predictions with a high score. - assert not np.isnan(box.score), "Error: Box score cannot be NaN!" - if box.score >= conf_th: - box.render(ax, view=np.eye(4), colors=("b", "b", "b"), linewidth=1) - - # Limit visible range. - axes_limit = eval_range + 3 - ax.set_xlim(-axes_limit, axes_limit) - ax.set_ylim(-axes_limit, axes_limit) - - # Show / save plot. - if verbose: - print("Rendering sample token %s" % sample_data_token) - plt.title(sample_data_token) - if savepath is not None: - plt.savefig(savepath) - plt.close() - else: - plt.show() - - -nusc = nuscenes.NuScenes( - version="v1.0-trainval", dataroot="/data/Datasets/nuScenes", verbose=True -) - -with open( - "/data/NUSC_SECOND__20190531-193048/results_2/9088db17416043e5880a53178bfa461c.json" -) as f: - nusc_annos = json.load(f) - -print("Plot some examples") -results = nusc_annos["results"] - -pred_boxes = defaultdict(list) -for sample_data_token, boxes in results.items(): - pred_boxes[sample_data_token].extend( - [ - EvalBox( - sample_token=box["sample_token"], - translation=tuple(box["translation"]), - size=tuple(box["size"]), - rotation=tuple(box["rotation"]), - velocity=tuple(box["velocity"]), - detection_name=box["detection_name"], - attribute_name=box["attribute_name"], - ego_dist=0.0 if "ego_dist" not in box else float(box["ego_dist"]), - detection_score=-1.0 - if "detection_score" not in box - else float(box["detection_score"]), - num_pts=-1 if "num_pts" not in box else int(box["num_pts"]), - ) - for box in boxes - ] - ) - - -def add_center_dist(nusc, eval_boxes: EvalBoxes): - """ Adds the cylindrical (xy) center distance from ego vehicle to each box. """ - - for sample_token in eval_boxes.keys(): - sd_record = nusc.get("sample_data", sample_token) - pose_record = nusc.get("ego_pose", sd_record["ego_pose_token"]) - - for box in eval_boxes[sample_token]: - # Both boxes and ego pose are given in global coord system, so distance can be calculated directly. - diff = np.array(pose_record["translation"][:2]) - np.array( - box.translation[:2] - ) - box.ego_dist = np.sqrt(np.sum(diff ** 2)) - - return eval_boxes - - -pred_boxes = add_center_dist(nusc, pred_boxes) - -tokens = list(pred_boxes.keys()) - -scene_token = "9088db17416043e5880a53178bfa461c" -scene = nusc.get("scene", scene_token) - -token2id = {} - -frame_id = 1 -first_sample = nusc.get("sample", scene["first_sample_token"]) -first_sample_data = nusc.get("sample_data", first_sample["data"]["LIDAR_TOP"]) -token2id[first_sample_data["token"]] = frame_id - -nxt = first_sample_data["next"] -while nxt != "": - frame_id += 1 - token2id[nxt] = frame_id - nxt = nusc.get("sample_data", nxt)["next"] - -# random.shuffle(pred_boxes.sample_tokens) -sample_data_tokens = list(token2id.keys()) - -for sample_data_token in sample_data_tokens: - visualize_sample_data( - nusc, - sample_data_token, - pred_boxes, - eval_range=50, - savepath=os.path.join(".", "{}.png".format(token2id[sample_data_token])), - ) diff --git a/det3d/datasets/pipelines/__init__.py b/det3d/datasets/pipelines/__init__.py index 55936d3..4cef197 100644 --- a/det3d/datasets/pipelines/__init__.py +++ b/det3d/datasets/pipelines/__init__.py @@ -3,19 +3,8 @@ # from .loading import LoadAnnotations, LoadImageFromFile, LoadProposals from .loading import * -from .test_aug import MultiScaleFlipAug -from .transforms import ( - Expand, - MinIoURandomCrop, - Normalize, - Pad, - PhotoMetricDistortion, - RandomCrop, - RandomFlip, - Resize, - SegResizeFlipPadRescale, -) -from .preprocess import Preprocess, Voxelization, AssignTarget +from .test_aug import DoubleFlip +from .preprocess import Preprocess, Voxelization __all__ = [ "Compose", @@ -28,15 +17,6 @@ "LoadImageAnnotations", "LoadImageFromFile", "LoadProposals", - "MultiScaleFlipAug", - "Resize", - "RandomFlip", - "Pad", - "RandomCrop", - "Normalize", - "SegResizeFlipPadRescale", - "MinIoURandomCrop", - "Expand", "PhotoMetricDistortion", "Preprocess", "Voxelization", diff --git a/det3d/datasets/pipelines/formating.py b/det3d/datasets/pipelines/formating.py index 5e1016c..49523ba 100644 --- a/det3d/datasets/pipelines/formating.py +++ b/det3d/datasets/pipelines/formating.py @@ -31,126 +31,54 @@ def __call__(self, res, info): coordinates=voxels["coordinates"] ) - if "anchors" in res["lidar"]["targets"]: - anchors = res["lidar"]["targets"]["anchors"] - data_bundle.update(dict(anchors=anchors)) - - if res["mode"] == "val": + if res["mode"] == "train": + data_bundle.update(res["lidar"]["targets"]) + elif res["mode"] == "val": data_bundle.update(dict(metadata=meta, )) - calib = res.get("calib", None) - if calib: - data_bundle["calib"] = calib - - if res["mode"] != "test": - annos = res["lidar"]["annotations"] - data_bundle.update(annos=annos, ) + if self.double_flip: + # y axis + yflip_points = res["lidar"]["yflip_points"] + yflip_voxels = res["lidar"]["yflip_voxels"] + yflip_data_bundle = dict( + metadata=meta, + points=yflip_points, + voxels=yflip_voxels["voxels"], + shape=yflip_voxels["shape"], + num_points=yflip_voxels["num_points"], + num_voxels=yflip_voxels["num_voxels"], + coordinates=yflip_voxels["coordinates"], + ) - if res["mode"] == "train": - # ground_plane = res["lidar"].get("ground_plane", None) - #if ground_plane: - # data_bundle["ground_plane"] = ground_plane + # x axis + xflip_points = res["lidar"]["xflip_points"] + xflip_voxels = res["lidar"]["xflip_voxels"] + xflip_data_bundle = dict( + metadata=meta, + points=xflip_points, + voxels=xflip_voxels["voxels"], + shape=xflip_voxels["shape"], + num_points=xflip_voxels["num_points"], + num_voxels=xflip_voxels["num_voxels"], + coordinates=xflip_voxels["coordinates"], + ) + # double axis flip + double_flip_points = res["lidar"]["double_flip_points"] + double_flip_voxels = res["lidar"]["double_flip_voxels"] + double_flip_data_bundle = dict( + metadata=meta, + points=double_flip_points, + voxels=double_flip_voxels["voxels"], + shape=double_flip_voxels["shape"], + num_points=double_flip_voxels["num_points"], + num_voxels=double_flip_voxels["num_voxels"], + coordinates=double_flip_voxels["coordinates"], + ) - if "reg_targets" in res["lidar"]["targets"]: # anchor based - labels = res["lidar"]["targets"]["labels"] - reg_targets = res["lidar"]["targets"]["reg_targets"] - reg_weights = res["lidar"]["targets"]["reg_weights"] + return [data_bundle, yflip_data_bundle, xflip_data_bundle, double_flip_data_bundle], info - data_bundle.update( - dict(labels=labels, reg_targets=reg_targets, reg_weights=reg_weights) - ) - else: # anchor free - data_bundle.update(res["lidar"]["targets"]) - - elif self.double_flip: - # y axis - yflip_points = res["lidar"]["yflip_points"] - yflip_voxels = res["lidar"]["yflip_voxels"] - yflip_data_bundle = dict( - metadata=meta, - points=yflip_points, - voxels=yflip_voxels["voxels"], - shape=yflip_voxels["shape"], - num_points=yflip_voxels["num_points"], - num_voxels=yflip_voxels["num_voxels"], - coordinates=yflip_voxels["coordinates"], - annos=annos, - ) - if calib: - yflip_data_bundle["calib"] = calib - - # x axis - xflip_points = res["lidar"]["xflip_points"] - xflip_voxels = res["lidar"]["xflip_voxels"] - xflip_data_bundle = dict( - metadata=meta, - points=xflip_points, - voxels=xflip_voxels["voxels"], - shape=xflip_voxels["shape"], - num_points=xflip_voxels["num_points"], - num_voxels=xflip_voxels["num_voxels"], - coordinates=xflip_voxels["coordinates"], - annos=annos, - ) - if calib: - xflip_data_bundle["calib"] = calib - - # double axis flip - double_flip_points = res["lidar"]["double_flip_points"] - double_flip_voxels = res["lidar"]["double_flip_voxels"] - double_flip_data_bundle = dict( - metadata=meta, - points=double_flip_points, - voxels=double_flip_voxels["voxels"], - shape=double_flip_voxels["shape"], - num_points=double_flip_voxels["num_points"], - num_voxels=double_flip_voxels["num_voxels"], - coordinates=double_flip_voxels["coordinates"], - annos=annos, - ) - if calib: - double_flip_data_bundle["calib"] = calib - - return [data_bundle, yflip_data_bundle, xflip_data_bundle, double_flip_data_bundle], info return data_bundle, info -@PIPELINES.register_module -class PointCloudCollect(object): - def __init__( - self, - keys, - meta_keys=( - "filename", - "ori_shape", - "img_shape", - "pad_shape", - "scale_factor", - "flip", - "img_norm_cfg", - ), - ): - self.keys = keys - self.meta_keys = meta_keys - - def __call__(self, info): - - results = info["res"] - - data = {} - img_meta = {} - - for key in self.meta_keys: - img_meta[key] = results[key] - data["img_meta"] = DC(img_meta, cpu_only=True) - - for key in self.keys: - data[key] = results[key] - return data - - def __repr__(self): - return self.__class__.__name__ + "(keys={}, meta_keys={})".format( - self.keys, self.meta_keys - ) \ No newline at end of file diff --git a/det3d/datasets/pipelines/loading.py b/det3d/datasets/pipelines/loading.py index 4792474..5d0e0a6 100644 --- a/det3d/datasets/pipelines/loading.py +++ b/det3d/datasets/pipelines/loading.py @@ -10,7 +10,7 @@ from det3d import torchie from det3d.core import box_np_ops import pickle - +import os from ..registry import PIPELINES def _dict_select(dict_, inds): @@ -20,19 +20,14 @@ def _dict_select(dict_, inds): else: dict_[k] = v[inds] -def read_file(path, tries=2, num_point_feature=4): - points = None - try_cnt = 0 - while points is None and try_cnt < tries: - try_cnt += 1 - try: - points = np.fromfile(path, dtype=np.float32) - s = points.shape[0] - if s % 5 != 0: - points = points[: s - (s % 5)] - points = points.reshape(-1, 5)[:, :num_point_feature] - except Exception: - points = None +def read_file(path, tries=2, num_point_feature=4, painted=False): + if painted: + dir_path = os.path.join(*path.split('/')[:-2], 'painted_'+path.split('/')[-2]) + painted_path = os.path.join(dir_path, path.split('/')[-1]+'.npy') + points = np.load(painted_path) + points = points[:, [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]] # remove ring_index from features + else: + points = np.fromfile(path, dtype=np.float32).reshape(-1, 5)[:, :num_point_feature] return points @@ -49,9 +44,9 @@ def remove_close(points, radius: float) -> None: return points -def read_sweep(sweep): +def read_sweep(sweep, painted=False): min_distance = 1.0 - points_sweep = read_file(str(sweep["lidar_path"])).T + points_sweep = read_file(str(sweep["lidar_path"]), painted=painted).T points_sweep = remove_close(points_sweep, min_distance) nbr_points = points_sweep.shape[1] @@ -63,25 +58,43 @@ def read_sweep(sweep): return points_sweep.T, curr_times.T -def get_obj(path): - with open(path, 'rb') as f: - obj = pickle.load(f) - return obj +def read_single_waymo(obj): + points_xyz = obj["lidars"]["points_xyz"] + points_feature = obj["lidars"]["points_feature"] + + # normalize intensity + points_feature[:, 0] = np.tanh(points_feature[:, 0]) + + points = np.concatenate([points_xyz, points_feature], axis=-1) + + return points -def veh_pos_to_transform(veh_pos): - "convert vehicle pose to two transformation matrix" - rotation = veh_pos[:3, :3] - tran = veh_pos[:3, 3] +def read_single_waymo_sweep(sweep): + obj = get_obj(sweep['path']) - global_from_car = transform_matrix( - tran, Quaternion(matrix=rotation), inverse=False - ) + points_xyz = obj["lidars"]["points_xyz"] + points_feature = obj["lidars"]["points_feature"] - car_from_global = transform_matrix( - tran, Quaternion(matrix=rotation), inverse=True - ) + # normalize intensity + points_feature[:, 0] = np.tanh(points_feature[:, 0]) + points_sweep = np.concatenate([points_xyz, points_feature], axis=-1).T # 5 x N - return global_from_car, car_from_global + nbr_points = points_sweep.shape[1] + + if sweep["transform_matrix"] is not None: + points_sweep[:3, :] = sweep["transform_matrix"].dot( + np.vstack((points_sweep[:3, :], np.ones(nbr_points))) + )[:3, :] + + curr_times = sweep["time_lag"] * np.ones((1, points_sweep.shape[1])) + + return points_sweep.T, curr_times.T + + +def get_obj(path): + with open(path, 'rb') as f: + obj = pickle.load(f) + return obj @PIPELINES.register_module @@ -100,20 +113,20 @@ def __call__(self, res, info): nsweeps = res["lidar"]["nsweeps"] lidar_path = Path(info["lidar_path"]) - points = read_file(str(lidar_path)) + points = read_file(str(lidar_path), painted=res["painted"]) sweep_points_list = [points] sweep_times_list = [np.zeros((points.shape[0], 1))] - assert (nsweeps - 1) <= len( + assert (nsweeps - 1) == len( info["sweeps"] - ), "nsweeps {} should not greater than list length {}.".format( + ), "nsweeps {} should equal to list length {}.".format( nsweeps, len(info["sweeps"]) ) for i in np.random.choice(len(info["sweeps"]), nsweeps - 1, replace=False): sweep = info["sweeps"][i] - points_sweep, times_sweep = read_sweep(sweep) + points_sweep, times_sweep = read_sweep(sweep, painted=res["painted"]) sweep_points_list.append(points_sweep) sweep_times_list.append(times_sweep) @@ -126,31 +139,33 @@ def __call__(self, res, info): elif self.type == "WaymoDataset": path = info['path'] + nsweeps = res["lidar"]["nsweeps"] obj = get_obj(path) + points = read_single_waymo(obj) + res["lidar"]["points"] = points - points_xyz = obj["lidars"]["points_xyz"] - points_feature = obj["lidars"]["points_feature"] + if nsweeps > 1: + sweep_points_list = [points] + sweep_times_list = [np.zeros((points.shape[0], 1))] - # normalize intensity - points_feature[:, 0] = np.tanh(points_feature[:, 0]) + assert (nsweeps - 1) == len( + info["sweeps"] + ), "nsweeps {} should be equal to the list length {}.".format( + nsweeps, len(info["sweeps"]) + ) - res["lidar"]["points"] = np.concatenate([points_xyz, points_feature], axis=-1) + for i in range(nsweeps - 1): + sweep = info["sweeps"][i] + points_sweep, times_sweep = read_single_waymo_sweep(sweep) + sweep_points_list.append(points_sweep) + sweep_times_list.append(times_sweep) - # read boxes - TYPE_LIST = ['UNKNOWN', 'VEHICLE', 'PEDESTRIAN', 'SIGN', 'CYCLIST'] - annos = obj['objects'] - num_points_in_gt = np.array([ann['num_points'] for ann in annos]) - gt_boxes = np.array([ann['box'] for ann in annos]).reshape(-1, 7) - if len(gt_boxes) != 0: - gt_boxes[:, -1] = -np.pi / 2 - gt_boxes[:, -1] - - gt_names = np.array([TYPE_LIST[ann['label']] for ann in annos]) - mask_not_zero = (num_points_in_gt > 0).reshape(-1) + points = np.concatenate(sweep_points_list, axis=0) + times = np.concatenate(sweep_times_list, axis=0).astype(points.dtype) - res["lidar"]["annotations"] = { - "boxes": gt_boxes[mask_not_zero, :].astype(np.float32), - "names": gt_names[mask_not_zero], - } + res["lidar"]["points"] = points + res["lidar"]["times"] = times + res["lidar"]["combined"] = np.hstack([points, times]) else: raise NotImplementedError @@ -171,13 +186,12 @@ def __call__(self, res, info): "tokens": info["gt_boxes_token"], "velocities": info["gt_boxes_velocity"].astype(np.float32), } - elif res["type"] == 'WaymoDataset': - """res["lidar"]["annotations"] = { + elif res["type"] == 'WaymoDataset' and "gt_boxes" in info: + res["lidar"]["annotations"] = { "boxes": info["gt_boxes"].astype(np.float32), "names": info["gt_names"], - }""" - pass # already load in the above function + } else: - return NotImplementedError + pass return res, info diff --git a/det3d/datasets/pipelines/preprocess.py b/det3d/datasets/pipelines/preprocess.py index 5a7c764..0461642 100644 --- a/det3d/datasets/pipelines/preprocess.py +++ b/det3d/datasets/pipelines/preprocess.py @@ -1,20 +1,13 @@ import numpy as np -from det3d import torchie -from det3d.core.evaluation.bbox_overlaps import bbox_overlaps from det3d.core.bbox import box_np_ops from det3d.core.sampler import preprocess as prep -from det3d.builder import ( - build_dbsampler, - build_anchor_generator, - build_similarity_metric, - build_box_coder, -) -from det3d.core.input.voxel_generator import VoxelGenerator -from det3d.core.anchor.target_assigner import TargetAssigner -from det3d.core.utils.center_utils import draw_umich_gaussian, gaussian_radius -from collections import defaultdict +from det3d.builder import build_dbsampler +from det3d.core.input.voxel_generator import VoxelGenerator +from det3d.core.utils.center_utils import ( + draw_umich_gaussian, gaussian_radius +) from ..registry import PIPELINES @@ -31,56 +24,39 @@ def drop_arrays_by_name(gt_names, used_classes): inds = np.array(inds, dtype=np.int64) return inds -def keep_arrays_by_name(gt_names, used_classes): - inds = [i for i, x in enumerate(gt_names) if x in used_classes] - inds = np.array(inds, dtype=np.int64) - return inds - @PIPELINES.register_module class Preprocess(object): def __init__(self, cfg=None, **kwargs): - self.remove_environment = cfg.remove_environment self.shuffle_points = cfg.shuffle_points - self.remove_unknown = cfg.remove_unknown_examples self.min_points_in_gt = cfg.get("min_points_in_gt", -1) - self.add_rgb_to_points = cfg.get("add_rgb_to_points", False) - self.reference_detections = cfg.get("reference_detections", None) - self.remove_outside_points = cfg.get("remove_outside_points", False) - self.random_crop = cfg.get("random_crop", False) - - self.normalize_intensity = cfg.get("normalize_intensity", False) self.mode = cfg.mode if self.mode == "train": - self.gt_rotation_noise = cfg.gt_rot_noise - self.gt_loc_noise_std = cfg.gt_loc_noise self.global_rotation_noise = cfg.global_rot_noise self.global_scaling_noise = cfg.global_scale_noise - self.global_random_rot_range = cfg.global_rot_per_obj_range - self.global_translate_noise_std = cfg.global_trans_noise - self.gt_points_drop = (cfg.gt_drop_percentage,) - self.remove_points_after_sample = cfg.remove_points_after_sample self.class_names = cfg.class_names if cfg.db_sampler != None: self.db_sampler = build_dbsampler(cfg.db_sampler) else: self.db_sampler = None - self.flip_single = cfg.get("flip_single", False) self.npoints = cfg.get("npoints", -1) - self.random_select = cfg.get("random_select", False) - self.symmetry_intensity = cfg.get("symmetry_intensity", False) - self.kitti_double = cfg.get("kitti_double", False) + self.no_augmentation = cfg.get('no_augmentation', False) def __call__(self, res, info): res["mode"] = self.mode if res["type"] in ["WaymoDataset"]: - points = res["lidar"]["points"] + if "combined" in res["lidar"]: + points = res["lidar"]["combined"] + else: + points = res["lidar"]["points"] elif res["type"] in ["NuScenesDataset"]: points = res["lidar"]["combined"] + else: + raise NotImplementedError if self.mode == "train": anno_dict = res["lidar"]["annotations"] @@ -90,74 +66,14 @@ def __call__(self, res, info): "gt_names": np.array(anno_dict["names"]).reshape(-1), } - if "difficulty" not in anno_dict: - difficulty = np.zeros([anno_dict["boxes"].shape[0]], dtype=np.int32) - gt_dict["difficulty"] = difficulty - else: - gt_dict["difficulty"] = anno_dict["difficulty"] - - if "calib" in res: - calib = res["calib"] - else: - calib = None - - if self.add_rgb_to_points: - assert calib is not None and "image" in res - image_path = res["image"]["image_path"] - image = ( - imgio.imread(str(pathlib.Path(root_path) / image_path)).astype( - np.float32 - ) - / 255 - ) - points_rgb = box_np_ops.add_rgb_to_points( - points, image, calib["rect"], calib["Trv2c"], calib["P2"] - ) - points = np.concatenate([points, points_rgb], axis=1) - num_point_features += 3 - - if self.reference_detections is not None: - assert calib is not None and "image" in res - C, R, T = box_np_ops.projection_matrix_to_CRT_kitti(P2) - frustums = box_np_ops.get_frustum_v2(reference_detections, C) - frustums -= T - frustums = np.einsum("ij, akj->aki", np.linalg.inv(R), frustums) - frustums = box_np_ops.camera_to_lidar(frustums, rect, Trv2c) - surfaces = box_np_ops.corner_to_surfaces_3d_jit(frustums) - masks = points_in_convex_polygon_3d_jit(points, surfaces) - points = points[masks.any(-1)] - - if self.remove_outside_points: - assert calib is not None - image_shape = res["metadata"]["image_shape"] - points = box_np_ops.remove_outside_points( - points, calib["rect"], calib["Trv2c"], calib["P2"], image_shape - ) - if self.remove_environment is True and self.mode == "train": - selected = keep_arrays_by_name(gt_names, target_assigner.classes) - _dict_select(gt_dict, selected) - masks = box_np_ops.points_in_rbbox(points, gt_dict["gt_boxes"]) - points = points[masks.any(-1)] - - if self.mode == "train": + if self.mode == "train" and not self.no_augmentation: selected = drop_arrays_by_name( gt_dict["gt_names"], ["DontCare", "ignore", "UNKNOWN"] ) _dict_select(gt_dict, selected) - if self.remove_unknown: - remove_mask = gt_dict["difficulty"] == -1 - """ - gt_boxes_remove = gt_boxes[remove_mask] - gt_boxes_remove[:, 3:6] += 0.25 - points = prep.remove_points_in_boxes(points, gt_boxes_remove) - """ - keep_mask = np.logical_not(remove_mask) - _dict_select(gt_dict, keep_mask) - gt_dict.pop("difficulty") if self.min_points_in_gt > 0: - # points_count_rbbox takes 10ms with 10 sweeps nuscenes data point_counts = box_np_ops.points_count_rbbox( points, gt_dict["gt_boxes"] ) @@ -174,10 +90,10 @@ def __call__(self, res, info): gt_dict["gt_boxes"], gt_dict["gt_names"], res["metadata"]["num_point_features"], - self.random_crop, + False, gt_group_ids=None, - calib=calib, - road_planes=None # res["lidar"]["ground_plane"] + calib=None, + road_planes=None ) if sampled_dict is not None: @@ -195,21 +111,8 @@ def __call__(self, res, info): [gt_boxes_mask, sampled_gt_masks], axis=0 ) - if self.remove_points_after_sample: - masks = box_np_ops.points_in_rbbox(points, sampled_gt_boxes) - points = points[np.logical_not(masks.any(-1))] points = np.concatenate([sampled_points, points], axis=0) - prep.noise_per_object_v3_( - gt_dict["gt_boxes"], - points, - gt_boxes_mask, - rotation_perturb=self.gt_rotation_noise, - center_noise_std=self.gt_loc_noise_std, - global_random_rot_range=self.global_random_rot_range, - group_ids=None, - num_try=100, - ) _dict_select(gt_dict, gt_boxes_mask) @@ -219,16 +122,7 @@ def __call__(self, res, info): ) gt_dict["gt_classes"] = gt_classes - iskitti = res["type"] in ["KittiDataset"] - - if self.kitti_double: - assert False, "No more KITTI" - gt_dict["gt_boxes"], points = prep.random_flip_both(gt_dict["gt_boxes"], points, flip_coor=70.4/2) - elif self.flip_single or iskitti: - assert False, "nuscenes double flip is better" - gt_dict["gt_boxes"], points = prep.random_flip(gt_dict["gt_boxes"], points) - else: - gt_dict["gt_boxes"], points = prep.random_flip_both(gt_dict["gt_boxes"], points) + gt_dict["gt_boxes"], points = prep.random_flip_both(gt_dict["gt_boxes"], points) gt_dict["gt_boxes"], points = prep.global_rotation( gt_dict["gt_boxes"], points, rotation=self.global_rotation_noise @@ -236,46 +130,21 @@ def __call__(self, res, info): gt_dict["gt_boxes"], points = prep.global_scaling_v2( gt_dict["gt_boxes"], points, *self.global_scaling_noise ) + elif self.no_augmentation: + gt_boxes_mask = np.array( + [n in self.class_names for n in gt_dict["gt_names"]], dtype=np.bool_ + ) + _dict_select(gt_dict, gt_boxes_mask) - if self.shuffle_points: - # shuffle is a little slow. - np.random.shuffle(points) - - if self.mode == "train" and self.random_select: - if self.npoints < points.shape[0]: - pts_depth = points[:, 2] - pts_near_flag = pts_depth < 40.0 - far_idxs_choice = np.where(pts_near_flag == 0)[0] - near_idxs = np.where(pts_near_flag == 1)[0] - near_idxs_choice = np.random.choice( - near_idxs, self.npoints - len(far_idxs_choice), replace=False - ) - - choice = ( - np.concatenate((near_idxs_choice, far_idxs_choice), axis=0) - if len(far_idxs_choice) > 0 - else near_idxs_choice - ) - np.random.shuffle(choice) - else: - choice = np.arange(0, len(points), dtype=np.int32) - if self.npoints > len(points): - extra_choice = np.random.choice( - choice, self.npoints - len(points), replace=False - ) - choice = np.concatenate((choice, extra_choice), axis=0) - np.random.shuffle(choice) - - points = points[choice] + gt_classes = np.array( + [self.class_names.index(n) + 1 for n in gt_dict["gt_names"]], + dtype=np.int32, + ) + gt_dict["gt_classes"] = gt_classes - if self.symmetry_intensity: - points[:, -1] -= 0.5 # translate intensity to [-0.5, 0.5] - # points[:, -1] *= 2 - if self.normalize_intensity and res["type"] in ["NuScenesDataset"]: - # print(points[:20, 3]) - assert 0, "Velocity Accuracy drops 3 percent with normalization.." - points[:, 3] /= 255 + if self.shuffle_points: + np.random.shuffle(points) res["lidar"]["points"] = points @@ -292,7 +161,7 @@ def __init__(self, **kwargs): self.range = cfg.range self.voxel_size = cfg.voxel_size self.max_points_in_voxel = cfg.max_points_in_voxel - self.max_voxel_num = cfg.max_voxel_num + self.max_voxel_num = [cfg.max_voxel_num, cfg.max_voxel_num] if isinstance(cfg.max_voxel_num, int) else cfg.max_voxel_num self.double_flip = cfg.get('double_flip', False) @@ -300,17 +169,13 @@ def __init__(self, **kwargs): voxel_size=self.voxel_size, point_cloud_range=self.range, max_num_points=self.max_points_in_voxel, - max_voxels=self.max_voxel_num, + max_voxels=self.max_voxel_num[0], ) def __call__(self, res, info): - # [0, -40, -3, 70.4, 40, 1] voxel_size = self.voxel_generator.voxel_size pc_range = self.voxel_generator.point_cloud_range grid_size = self.voxel_generator.grid_size - # [352, 400] - - double_flip = self.double_flip and (res["mode"] != 'train') if res["mode"] == "train": gt_dict = res["lidar"]["annotations"] @@ -319,10 +184,12 @@ def __call__(self, res, info): _dict_select(gt_dict, mask) res["lidar"]["annotations"] = gt_dict + max_voxels = self.max_voxel_num[0] + else: + max_voxels = self.max_voxel_num[1] - # points = points[:int(points.shape[0] * 0.1), :] voxels, coordinates, num_points = self.voxel_generator.generate( - res["lidar"]["points"] + res["lidar"]["points"], max_voxels=max_voxels ) num_voxels = np.array([voxels.shape[0]], dtype=np.int64) @@ -336,6 +203,8 @@ def __call__(self, res, info): size=voxel_size ) + double_flip = self.double_flip and (res["mode"] != 'train') + if double_flip: flip_voxels, flip_coordinates, flip_num_points = self.voxel_generator.generate( res["lidar"]["yflip_points"] @@ -384,185 +253,18 @@ def __call__(self, res, info): return res, info +def flatten(box): + return np.concatenate(box, axis=0) -@PIPELINES.register_module -class AssignTarget(object): - def __init__(self, **kwargs): - assigner_cfg = kwargs["cfg"] - target_assigner_config = assigner_cfg.target_assigner - tasks = target_assigner_config.tasks - box_coder_cfg = assigner_cfg.box_coder - - anchor_cfg = target_assigner_config.anchor_generators - anchor_generators = [] - for a_cfg in anchor_cfg: - anchor_generator = build_anchor_generator(a_cfg) - anchor_generators.append(anchor_generator) - similarity_calc = build_similarity_metric( - target_assigner_config.region_similarity_calculator - ) - positive_fraction = target_assigner_config.sample_positive_fraction - if positive_fraction < 0: - positive_fraction = None - target_assigners = [] - flag = 0 - - box_coder = build_box_coder(box_coder_cfg) - - for task in tasks: - target_assigner = TargetAssigner( - box_coder=box_coder, - anchor_generators=anchor_generators[flag: flag + task.num_class], - region_similarity_calculator=similarity_calc, - positive_fraction=positive_fraction, - sample_size=target_assigner_config.sample_size, - ) - flag += task.num_class - target_assigners.append(target_assigner) - - self.target_assigners = target_assigners - self.out_size_factor = assigner_cfg.out_size_factor - self.anchor_area_threshold = target_assigner_config.pos_area_threshold - - def __call__(self, res, info): - - class_names_by_task = [t.classes for t in self.target_assigners] - - # Calculate output featuremap size - grid_size = res["lidar"]["voxels"]["shape"] - feature_map_size = grid_size[:2] // self.out_size_factor - feature_map_size = [*feature_map_size, 1][::-1] - - anchors_by_task = [ - t.generate_anchors(feature_map_size) for t in self.target_assigners - ] - anchor_dicts_by_task = [ - t.generate_anchors_dict(feature_map_size) for t in self.target_assigners - ] - reshaped_anchors_by_task = [ - t["anchors"].reshape([-1, t["anchors"].shape[-1]]) for t in anchors_by_task - ] - matched_by_task = [t["matched_thresholds"] for t in anchors_by_task] - unmatched_by_task = [t["unmatched_thresholds"] for t in anchors_by_task] - - bv_anchors_by_task = [ - box_np_ops.rbbox2d_to_near_bbox(anchors[:, [0, 1, 3, 4, -1]]) - for anchors in reshaped_anchors_by_task - ] - - anchor_caches_by_task = dict( - anchors=reshaped_anchors_by_task, - anchors_bv=bv_anchors_by_task, - matched_thresholds=matched_by_task, - unmatched_thresholds=unmatched_by_task, - anchors_dict=anchor_dicts_by_task, - ) - - if res["mode"] == "train": - gt_dict = res["lidar"]["annotations"] - - task_masks = [] - flag = 0 - for class_name in class_names_by_task: - task_masks.append( - [ - np.where( - gt_dict["gt_classes"] == class_name.index(i) + 1 + flag - ) - for i in class_name - ] - ) - flag += len(class_name) - - task_boxes = [] - task_classes = [] - task_names = [] - flag2 = 0 - for idx, mask in enumerate(task_masks): - task_box = [] - task_class = [] - task_name = [] - for m in mask: - task_box.append(gt_dict["gt_boxes"][m]) - task_class.append(gt_dict["gt_classes"][m] - flag2) - task_name.append(gt_dict["gt_names"][m]) - task_boxes.append(np.concatenate(task_box, axis=0)) - task_classes.append(np.concatenate(task_class)) - task_names.append(np.concatenate(task_name)) - flag2 += len(mask) +def merge_multi_group_label(gt_classes, num_classes_by_task): + num_task = len(gt_classes) + flag = 0 - for task_box in task_boxes: - # limit rad to [-pi, pi] - task_box[:, -1] = box_np_ops.limit_period( - task_box[:, -1], offset=0.5, period=np.pi * 2 - ) - - # print(gt_dict.keys()) - gt_dict["gt_classes"] = task_classes - gt_dict["gt_names"] = task_names - gt_dict["gt_boxes"] = task_boxes - - res["lidar"]["annotations"] = gt_dict - - anchorss = anchor_caches_by_task["anchors"] - anchors_bvs = anchor_caches_by_task["anchors_bv"] - anchors_dicts = anchor_caches_by_task["anchors_dict"] - - example = {} - example["anchors"] = anchorss - - if self.anchor_area_threshold >= 0: - example["anchors_mask"] = [] - for idx, anchors_bv in enumerate(anchors_bvs): - anchors_mask = None - # slow with high resolution. recommend disable this forever. - coors = coordinates - dense_voxel_map = box_np_ops.sparse_sum_for_anchors_mask( - coors, tuple(grid_size[::-1][1:]) - ) - dense_voxel_map = dense_voxel_map.cumsum(0) - dense_voxel_map = dense_voxel_map.cumsum(1) - anchors_area = box_np_ops.fused_get_anchors_area( - dense_voxel_map, anchors_bv, voxel_size, pc_range, grid_size - ) - anchors_mask = anchors_area > anchor_area_threshold - example["anchors_mask"].append(anchors_mask) - - if res["mode"] == "train": - targets_dicts = [] - for idx, target_assigner in enumerate(self.target_assigners): - if "anchors_mask" in example: - anchors_mask = example["anchors_mask"][idx] - else: - anchors_mask = None - targets_dict = target_assigner.assign_v2( - anchors_dicts[idx], - gt_dict["gt_boxes"][idx], - anchors_mask, - gt_classes=gt_dict["gt_classes"][idx], - gt_names=gt_dict["gt_names"][idx], - ) - targets_dicts.append(targets_dict) - - example.update( - { - "labels": [ - targets_dict["labels"] for targets_dict in targets_dicts - ], - "reg_targets": [ - targets_dict["bbox_targets"] for targets_dict in targets_dicts - ], - "reg_weights": [ - targets_dict["bbox_outside_weights"] - for targets_dict in targets_dicts - ], - } - ) - - res["lidar"]["targets"] = example - - return res, info + for i in range(num_task): + gt_classes[i] += flag + flag += num_classes_by_task[i] + return flatten(gt_classes) @PIPELINES.register_module class AssignLabel(object): @@ -571,21 +273,17 @@ def __init__(self, **kwargs): assigner_cfg = kwargs["cfg"] self.out_size_factor = assigner_cfg.out_size_factor self.tasks = assigner_cfg.target_assigner.tasks - self.dense_reg = assigner_cfg.dense_reg self.gaussian_overlap = assigner_cfg.gaussian_overlap self._max_objs = assigner_cfg.max_objs self._min_radius = assigner_cfg.min_radius - self.no_log = assigner_cfg.get("no_log", False) def __call__(self, res, info): - max_objs = self._max_objs * self.dense_reg + max_objs = self._max_objs class_names_by_task = [t.class_names for t in self.tasks] - - - dxy = [(0, 0)] + num_classes_by_task = [t.num_class for t in self.tasks] # Calculate output featuremap size - grid_size = res["lidar"]["voxels"]["shape"] # 448 x 512 + grid_size = res["lidar"]["voxels"]["shape"] pc_range = res["lidar"]["voxels"]["range"] voxel_size = res["lidar"]["voxels"]["size"] @@ -599,7 +297,6 @@ def __call__(self, res, info): task_masks = [] flag = 0 for class_name in class_names_by_task: - # print("classes: ", gt_dict["gt_classes"], "name", class_name) task_masks.append( [ np.where( @@ -652,14 +349,13 @@ def __call__(self, res, info): # [reg, hei, dim, vx, vy, rots, rotc] anno_box = np.zeros((max_objs, 10), dtype=np.float32) elif res['type'] == 'WaymoDataset': - anno_box = np.zeros((max_objs, 8), dtype=np.float32) + anno_box = np.zeros((max_objs, 10), dtype=np.float32) else: raise NotImplementedError("Only Support nuScene for Now!") ind = np.zeros((max_objs), dtype=np.int64) mask = np.zeros((max_objs), dtype=np.uint8) cat = np.zeros((max_objs), dtype=np.int64) - direction = np.zeros((max_objs), dtype=np.int64) num_objs = min(gt_dict['gt_boxes'][idx].shape[0], max_objs) @@ -693,11 +389,6 @@ def __call__(self, res, info): new_idx = k x, y = ct_int[0], ct_int[1] - if not (y * feature_map_size[0] + x < feature_map_size[0] * feature_map_size[1]): - # a double check, should never happen - print(x, y, y * feature_map_size[0] + x) - assert False - cat[new_idx] = cls_id ind[new_idx] = y * feature_map_size[0] + x mask[new_idx] = 1 @@ -705,22 +396,17 @@ def __call__(self, res, info): if res['type'] == 'NuScenesDataset': vx, vy = gt_dict['gt_boxes'][idx][k][6:8] rot = gt_dict['gt_boxes'][idx][k][8] - if not self.no_log: - anno_box[new_idx] = np.concatenate( - (ct - (x, y), z, np.log(gt_dict['gt_boxes'][idx][k][3:6]), - np.array(vx), np.array(vy), np.sin(rot), np.cos(rot)), axis=None) - else: - anno_box[new_idx] = np.concatenate( - (ct - (x, y), z, gt_dict['gt_boxes'][idx][k][3:6], - np.array(vx), np.array(vy), np.sin(rot), np.cos(rot)), axis=None) + anno_box[new_idx] = np.concatenate( + (ct - (x, y), z, np.log(gt_dict['gt_boxes'][idx][k][3:6]), + np.array(vx), np.array(vy), np.sin(rot), np.cos(rot)), axis=None) elif res['type'] == 'WaymoDataset': + vx, vy = gt_dict['gt_boxes'][idx][k][6:8] rot = gt_dict['gt_boxes'][idx][k][-1] anno_box[new_idx] = np.concatenate( - (ct - (x, y), z, np.log(gt_dict['gt_boxes'][idx][k][3:6]), - np.sin(rot), np.cos(rot)), axis=None) - + (ct - (x, y), z, np.log(gt_dict['gt_boxes'][idx][k][3:6]), + np.array(vx), np.array(vy), np.sin(rot), np.cos(rot)), axis=None) else: - raise NotImplementedError("Only Support KITTI and nuScene for Now!") + raise NotImplementedError("Only Support Waymo and nuScene for Now") hms.append(hm) anno_boxs.append(anno_box) @@ -728,6 +414,27 @@ def __call__(self, res, info): inds.append(ind) cats.append(cat) + # used for two stage code + boxes = flatten(gt_dict['gt_boxes']) + classes = merge_multi_group_label(gt_dict['gt_classes'], num_classes_by_task) + + if res["type"] == "NuScenesDataset": + gt_boxes_and_cls = np.zeros((max_objs, 10), dtype=np.float32) + elif res['type'] == "WaymoDataset": + gt_boxes_and_cls = np.zeros((max_objs, 10), dtype=np.float32) + else: + raise NotImplementedError() + + boxes_and_cls = np.concatenate((boxes, + classes.reshape(-1, 1).astype(np.float32)), axis=1) + num_obj = len(boxes_and_cls) + assert num_obj <= max_objs + # x, y, z, w, l, h, rotation_y, velocity_x, velocity_y, class_name + boxes_and_cls = boxes_and_cls[:, [0, 1, 2, 3, 4, 5, 8, 6, 7, 9]] + gt_boxes_and_cls[:num_obj] = boxes_and_cls + + example.update({'gt_boxes_and_cls': gt_boxes_and_cls}) + example.update({'hm': hms, 'anno_box': anno_boxs, 'ind': inds, 'mask': masks, 'cat': cats}) else: pass diff --git a/det3d/datasets/pipelines/test_aug.py b/det3d/datasets/pipelines/test_aug.py index 4191587..9a34bd0 100644 --- a/det3d/datasets/pipelines/test_aug.py +++ b/det3d/datasets/pipelines/test_aug.py @@ -4,51 +4,6 @@ from .compose import Compose -@PIPELINES.register_module -class MultiScaleFlipAug(object): - def __init__(self, transforms, img_scale, flip=False): - self.transforms = Compose(transforms) - self.img_scale = img_scale if isinstance(img_scale, list) else [img_scale] - assert torchie.is_list_of(self.img_scale, tuple) - self.flip = flip - - def __call__(self, results): - aug_data = [] - flip_aug = [False, True] if self.flip else [False] - for scale in self.img_scale: - for flip in flip_aug: - _results = results.copy() - _results["scale"] = scale - _results["flip"] = flip - data = self.transforms(_results) - aug_data.append(data) - # list of dict to dict of list - aug_data_dict = {key: [] for key in aug_data[0]} - for data in aug_data: - for key, val in data.items(): - aug_data_dict[key].append(val) - return aug_data_dict - - def __repr__(self): - repr_str = self.__class__.__name__ - repr_str += "(transforms={}, img_scale={}, flip={})".format( - self.transforms, self.img_scale, self.flip - ) - return repr_str - -@PIPELINES.register_module -class Flip(object): - def __init__(self): - pass - - def __call__(self, res, info): - points = res["lidar"]["points"].copy() - points[:, 1] = -points[:, 1] - - res["lidar"]['yflip_points'] = points - - return res, info - @PIPELINES.register_module class DoubleFlip(object): def __init__(self): diff --git a/det3d/datasets/pipelines/transforms.py b/det3d/datasets/pipelines/transforms.py deleted file mode 100644 index 4544051..0000000 --- a/det3d/datasets/pipelines/transforms.py +++ /dev/null @@ -1,641 +0,0 @@ -from det3d import torchie -import numpy as np -from imagecorruptions import corrupt - -from det3d.core.evaluation.bbox_overlaps import bbox_overlaps -from ..registry import PIPELINES - - -@PIPELINES.register_module -class Resize(object): - """Resize images & bbox & mask. - - This transform resizes the input image to some scale. Bboxes and masks are - then resized with the same scale factor. If the input dict contains the key - "scale", then the scale in the input dict is used, otherwise the specified - scale in the init method is used. - - `img_scale` can either be a tuple (single-scale) or a list of tuple - (multi-scale). There are 3 multiscale modes: - - `ratio_range` is not None: randomly sample a ratio from the ratio range - and multiply it with the image scale. - - `ratio_range` is None and `multiscale_mode` == "range": randomly sample a - scale from the a range. - - `ratio_range` is None and `multiscale_mode` == "value": randomly sample a - scale from multiple scales. - - Args: - img_scale (tuple or list[tuple]): Images scales for resizing. - multiscale_mode (str): Either "range" or "value". - ratio_range (tuple[float]): (min_ratio, max_ratio) - keep_ratio (bool): Whether to keep the aspect ratio when resizing the - image. - """ - - def __init__( - self, img_scale=None, multiscale_mode="range", ratio_range=None, keep_ratio=True - ): - if img_scale is None: - self.img_scale = None - else: - if isinstance(img_scale, list): - self.img_scale = img_scale - else: - self.img_scale = [img_scale] - assert torchie.is_list_of(self.img_scale, tuple) - - if ratio_range is not None: - # mode 1: given a scale and a range of image ratio - assert len(self.img_scale) == 1 - else: - # mode 2: given multiple scales or a range of scales - assert multiscale_mode in ["value", "range"] - - self.multiscale_mode = multiscale_mode - self.ratio_range = ratio_range - self.keep_ratio = keep_ratio - - @staticmethod - def random_select(img_scales): - assert torchie.is_list_of(img_scales, tuple) - scale_idx = np.random.randint(len(img_scales)) - img_scale = img_scales[scale_idx] - return img_scale, scale_idx - - @staticmethod - def random_sample(img_scales): - assert torchie.is_list_of(img_scales, tuple) and len(img_scales) == 2 - img_scale_long = [max(s) for s in img_scales] - img_scale_short = [min(s) for s in img_scales] - long_edge = np.random.randint(min(img_scale_long), max(img_scale_long) + 1) - short_edge = np.random.randint(min(img_scale_short), max(img_scale_short) + 1) - img_scale = (long_edge, short_edge) - return img_scale, None - - @staticmethod - def random_sample_ratio(img_scale, ratio_range): - assert isinstance(img_scale, tuple) and len(img_scale) == 2 - min_ratio, max_ratio = ratio_range - assert min_ratio <= max_ratio - ratio = np.random.random_sample() * (max_ratio - min_ratio) + min_ratio - scale = int(img_scale[0] * ratio), int(img_scale[1] * ratio) - return scale, None - - def _random_scale(self, results): - if self.ratio_range is not None: - scale, scale_idx = self.random_sample_ratio( - self.img_scale[0], self.ratio_range - ) - elif len(self.img_scale) == 1: - scale, scale_idx = self.img_scale[0], 0 - elif self.multiscale_mode == "range": - scale, scale_idx = self.random_sample(self.img_scale) - elif self.multiscale_mode == "value": - scale, scale_idx = self.random_select(self.img_scale) - else: - raise NotImplementedError - - results["scale"] = scale - results["scale_idx"] = scale_idx - - def _resize_img(self, results): - if self.keep_ratio: - img, scale_factor = torchie.imrescale( - results["img"], results["scale"], return_scale=True - ) - else: - img, w_scale, h_scale = torchie.imresize( - results["img"], results["scale"], return_scale=True - ) - scale_factor = np.array( - [w_scale, h_scale, w_scale, h_scale], dtype=np.float32 - ) - results["img"] = img - results["img_shape"] = img.shape - results["pad_shape"] = img.shape # in case that there is no padding - results["scale_factor"] = scale_factor - results["keep_ratio"] = self.keep_ratio - - def _resize_bboxes(self, results): - img_shape = results["img_shape"] - for key in results.get("bbox_fields", []): - bboxes = results[key] * results["scale_factor"] - bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0, img_shape[1] - 1) - bboxes[:, 1::2] = np.clip(bboxes[:, 1::2], 0, img_shape[0] - 1) - results[key] = bboxes - - def _resize_masks(self, results): - for key in results.get("mask_fields", []): - if results[key] is None: - continue - if self.keep_ratio: - masks = [ - torchie.imrescale( - mask, results["scale_factor"], interpolation="nearest" - ) - for mask in results[key] - ] - else: - mask_size = (results["img_shape"][1], results["img_shape"][0]) - masks = [ - torchie.imresize(mask, mask_size, interpolation="nearest") - for mask in results[key] - ] - results[key] = masks - - def __call__(self, results): - if "scale" not in results: - self._random_scale(results) - self._resize_img(results) - self._resize_bboxes(results) - self._resize_masks(results) - return results - - def __repr__(self): - repr_str = self.__class__.__name__ - repr_str += ( - "(img_scale={}, multiscale_mode={}, ratio_range={}, " "keep_ratio={})" - ).format( - self.img_scale, self.multiscale_mode, self.ratio_range, self.keep_ratio - ) - return repr_str - - -@PIPELINES.register_module -class RandomFlip(object): - """Flip the image & bbox & mask. - - If the input dict contains the key "flip", then the flag will be used, - otherwise it will be randomly decided by a ratio specified in the init - method. - - Args: - flip_ratio (float, optional): The flipping probability. - """ - - def __init__(self, flip_ratio=None): - self.flip_ratio = flip_ratio - if flip_ratio is not None: - assert flip_ratio >= 0 and flip_ratio <= 1 - - def bbox_flip(self, bboxes, img_shape): - """Flip bboxes horizontally. - - Args: - bboxes(ndarray): shape (..., 4*k) - img_shape(tuple): (height, width) - """ - assert bboxes.shape[-1] % 4 == 0 - w = img_shape[1] - flipped = bboxes.copy() - flipped[..., 0::4] = w - bboxes[..., 2::4] - 1 - flipped[..., 2::4] = w - bboxes[..., 0::4] - 1 - return flipped - - def __call__(self, results): - if "flip" not in results: - flip = True if np.random.rand() < self.flip_ratio else False - results["flip"] = flip - if results["flip"]: - # flip image - results["img"] = torchie.imflip(results["img"]) - # flip bboxes - for key in results.get("bbox_fields", []): - results[key] = self.bbox_flip(results[key], results["img_shape"]) - # flip masks - for key in results.get("mask_fields", []): - results[key] = [mask[:, ::-1] for mask in results[key]] - return results - - def __repr__(self): - return self.__class__.__name__ + "(flip_ratio={})".format(self.flip_ratio) - - -@PIPELINES.register_module -class Pad(object): - """Pad the image & mask. - - There are two padding modes: (1) pad to a fixed size and (2) pad to the - minimum size that is divisible by some number. - - Args: - size (tuple, optional): Fixed padding size. - size_divisor (int, optional): The divisor of padded size. - pad_val (float, optional): Padding value, 0 by default. - """ - - def __init__(self, size=None, size_divisor=None, pad_val=0): - self.size = size - self.size_divisor = size_divisor - self.pad_val = pad_val - # only one of size and size_divisor should be valid - assert size is not None or size_divisor is not None - assert size is None or size_divisor is None - - def _pad_img(self, results): - if self.size is not None: - padded_img = torchie.impad(results["img"], self.size) - elif self.size_divisor is not None: - padded_img = torchie.impad_to_multiple( - results["img"], self.size_divisor, pad_val=self.pad_val - ) - results["img"] = padded_img - results["pad_shape"] = padded_img.shape - results["pad_fixed_size"] = self.size - results["pad_size_divisor"] = self.size_divisor - - def _pad_masks(self, results): - pad_shape = results["pad_shape"][:2] - for key in results.get("mask_fields", []): - padded_masks = [ - torchie.impad(mask, pad_shape, pad_val=self.pad_val) - for mask in results[key] - ] - results[key] = np.stack(padded_masks, axis=0) - - def __call__(self, results): - self._pad_img(results) - self._pad_masks(results) - return results - - def __repr__(self): - repr_str = self.__class__.__name__ - repr_str += "(size={}, size_divisor={}, pad_val={})".format( - self.size, self.size_divisor, self.pad_val - ) - return repr_str - - -@PIPELINES.register_module -class Normalize(object): - """Normalize the image. - - Args: - mean (sequence): Mean values of 3 channels. - std (sequence): Std values of 3 channels. - to_rgb (bool): Whether to convert the image from BGR to RGB, - default is true. - """ - - def __init__(self, mean, std, to_rgb=True): - self.mean = np.array(mean, dtype=np.float32) - self.std = np.array(std, dtype=np.float32) - self.to_rgb = to_rgb - - def __call__(self, results): - results["img"] = torchie.imnormalize( - results["img"], self.mean, self.std, self.to_rgb - ) - results["img_norm_cfg"] = dict(mean=self.mean, std=self.std, to_rgb=self.to_rgb) - return results - - def __repr__(self): - repr_str = self.__class__.__name__ - repr_str += "(mean={}, std={}, to_rgb={})".format( - self.mean, self.std, self.to_rgb - ) - return repr_str - - -@PIPELINES.register_module -class RandomCrop(object): - """Random crop the image & bboxes. - - Args: - crop_size (tuple): Expected size after cropping, (h, w). - """ - - def __init__(self, crop_size): - self.crop_size = crop_size - - def __call__(self, results): - img = results["img"] - margin_h = max(img.shape[0] - self.crop_size[0], 0) - margin_w = max(img.shape[1] - self.crop_size[1], 0) - offset_h = np.random.randint(0, margin_h + 1) - offset_w = np.random.randint(0, margin_w + 1) - crop_y1, crop_y2 = offset_h, offset_h + self.crop_size[0] - crop_x1, crop_x2 = offset_w, offset_w + self.crop_size[1] - - # crop the image - img = img[crop_y1:crop_y2, crop_x1:crop_x2, :] - img_shape = img.shape - results["img"] = img - results["img_shape"] = img_shape - - # crop bboxes accordingly and clip to the image boundary - for key in results.get("bbox_fields", []): - bbox_offset = np.array( - [offset_w, offset_h, offset_w, offset_h], dtype=np.float32 - ) - bboxes = results[key] - bbox_offset - bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0, img_shape[1] - 1) - bboxes[:, 1::2] = np.clip(bboxes[:, 1::2], 0, img_shape[0] - 1) - results[key] = bboxes - - # filter out the gt bboxes that are completely cropped - if "gt_bboxes" in results: - gt_bboxes = results["gt_bboxes"] - valid_inds = (gt_bboxes[:, 2] > gt_bboxes[:, 0]) & ( - gt_bboxes[:, 3] > gt_bboxes[:, 1] - ) - # if no gt bbox remains after cropping, just skip this image - if not np.any(valid_inds): - return None - results["gt_bboxes"] = gt_bboxes[valid_inds, :] - if "gt_labels" in results: - results["gt_labels"] = results["gt_labels"][valid_inds] - - # filter and crop the masks - if "gt_masks" in results: - valid_gt_masks = [] - for i in valid_inds: - gt_mask = results["gt_masks"][i][crop_y1:crop_y2, crop_x1:crop_x2] - valid_gt_masks.append(gt_mask) - results["gt_masks"] = valid_gt_masks - - return results - - def __repr__(self): - return self.__class__.__name__ + "(crop_size={})".format(self.crop_size) - - -@PIPELINES.register_module -class SegResizeFlipPadRescale(object): - """A sequential transforms to semantic segmentation maps. - - The same pipeline as input images is applied to the semantic segmentation - map, and finally rescale it by some scale factor. The transforms include: - 1. resize - 2. flip - 3. pad - 4. rescale (so that the final size can be different from the image size) - - Args: - scale_factor (float): The scale factor of the final output. - """ - - def __init__(self, scale_factor=1): - self.scale_factor = scale_factor - - def __call__(self, results): - if results["keep_ratio"]: - gt_seg = torchie.imrescale( - results["gt_semantic_seg"], results["scale"], interpolation="nearest" - ) - else: - gt_seg = torchie.imresize( - results["gt_semantic_seg"], results["scale"], interpolation="nearest" - ) - if results["flip"]: - gt_seg = torchie.imflip(gt_seg) - if gt_seg.shape != results["pad_shape"]: - gt_seg = torchie.impad(gt_seg, results["pad_shape"][:2]) - if self.scale_factor != 1: - gt_seg = torchie.imrescale( - gt_seg, self.scale_factor, interpolation="nearest" - ) - results["gt_semantic_seg"] = gt_seg - return results - - def __repr__(self): - return self.__class__.__name__ + "(scale_factor={})".format(self.scale_factor) - - -@PIPELINES.register_module -class PhotoMetricDistortion(object): - """Apply photometric distortion to image sequentially, every transformation - is applied with a probability of 0.5. The position of random contrast is in - second or second to last. - - 1. random brightness - 2. random contrast (mode 0) - 3. convert color from BGR to HSV - 4. random saturation - 5. random hue - 6. convert color from HSV to BGR - 7. random contrast (mode 1) - 8. randomly swap channels - - Args: - brightness_delta (int): delta of brightness. - contrast_range (tuple): range of contrast. - saturation_range (tuple): range of saturation. - hue_delta (int): delta of hue. - """ - - def __init__( - self, - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18, - ): - self.brightness_delta = brightness_delta - self.contrast_lower, self.contrast_upper = contrast_range - self.saturation_lower, self.saturation_upper = saturation_range - self.hue_delta = hue_delta - - def __call__(self, results): - img = results["img"] - # random brightness - if random.randint(2): - delta = random.uniform(-self.brightness_delta, self.brightness_delta) - img += delta - - # mode == 0 --> do random contrast first - # mode == 1 --> do random contrast last - mode = random.randint(2) - if mode == 1: - if random.randint(2): - alpha = random.uniform(self.contrast_lower, self.contrast_upper) - img *= alpha - - # convert color from BGR to HSV - img = torchie.bgr2hsv(img) - - # random saturation - if random.randint(2): - img[..., 1] *= random.uniform(self.saturation_lower, self.saturation_upper) - - # random hue - if random.randint(2): - img[..., 0] += random.uniform(-self.hue_delta, self.hue_delta) - img[..., 0][img[..., 0] > 360] -= 360 - img[..., 0][img[..., 0] < 0] += 360 - - # convert color from HSV to BGR - img = torchie.hsv2bgr(img) - - # random contrast - if mode == 0: - if random.randint(2): - alpha = random.uniform(self.contrast_lower, self.contrast_upper) - img *= alpha - - # randomly swap channels - if random.randint(2): - img = img[..., random.permutation(3)] - - results["img"] = img - return results - - def __repr__(self): - repr_str = self.__class__.__name__ - repr_str += ( - "(brightness_delta={}, contrast_range={}, " - "saturation_range={}, hue_delta={})" - ).format( - self.brightness_delta, - self.contrast_range, - self.saturation_range, - self.hue_delta, - ) - return repr_str - - -@PIPELINES.register_module -class Expand(object): - """Random expand the image & bboxes. - - Randomly place the original image on a canvas of 'ratio' x original image - size filled with mean values. The ratio is in the range of ratio_range. - - Args: - mean (tuple): mean value of dataset. - to_rgb (bool): if need to convert the order of mean to align with RGB. - ratio_range (tuple): range of expand ratio. - """ - - def __init__(self, mean=(0, 0, 0), to_rgb=True, ratio_range=(1, 4)): - if to_rgb: - self.mean = mean[::-1] - else: - self.mean = mean - self.min_ratio, self.max_ratio = ratio_range - - def __call__(self, results): - if random.randint(2): - return results - - img, boxes = [results[k] for k in ("img", "gt_bboxes")] - - h, w, c = img.shape - ratio = random.uniform(self.min_ratio, self.max_ratio) - expand_img = np.full((int(h * ratio), int(w * ratio), c), self.mean).astype( - img.dtype - ) - left = int(random.uniform(0, w * ratio - w)) - top = int(random.uniform(0, h * ratio - h)) - expand_img[top : top + h, left : left + w] = img - boxes = boxes + np.tile((left, top), 2).astype(boxes.dtype) - - results["img"] = expand_img - results["gt_bboxes"] = boxes - return results - - def __repr__(self): - repr_str = self.__class__.__name__ - repr_str += "(mean={}, to_rgb={}, ratio_range={})".format( - self.mean, self.to_rgb, self.ratio_range - ) - return repr_str - - -@PIPELINES.register_module -class MinIoURandomCrop(object): - """Random crop the image & bboxes, the cropped patches have minimum IoU - requirement with original image & bboxes, the IoU threshold is randomly - selected from min_ious. - - Args: - min_ious (tuple): minimum IoU threshold - crop_size (tuple): Expected size after cropping, (h, w). - """ - - def __init__(self, min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3): - # 1: return ori img - self.sample_mode = (1, *min_ious, 0) - self.min_crop_size = min_crop_size - - def __call__(self, results): - img, boxes, labels = [results[k] for k in ("img", "gt_bboxes", "gt_labels")] - h, w, c = img.shape - while True: - mode = random.choice(self.sample_mode) - if mode == 1: - return results - - min_iou = mode - for i in range(50): - new_w = random.uniform(self.min_crop_size * w, w) - new_h = random.uniform(self.min_crop_size * h, h) - - # h / w in [0.5, 2] - if new_h / new_w < 0.5 or new_h / new_w > 2: - continue - - left = random.uniform(w - new_w) - top = random.uniform(h - new_h) - - patch = np.array( - (int(left), int(top), int(left + new_w), int(top + new_h)) - ) - overlaps = bbox_overlaps( - patch.reshape(-1, 4), boxes.reshape(-1, 4) - ).reshape(-1) - if overlaps.min() < min_iou: - continue - - # center of boxes should inside the crop img - center = (boxes[:, :2] + boxes[:, 2:]) / 2 - mask = ( - (center[:, 0] > patch[0]) - * (center[:, 1] > patch[1]) - * (center[:, 0] < patch[2]) - * (center[:, 1] < patch[3]) - ) - if not mask.any(): - continue - boxes = boxes[mask] - labels = labels[mask] - - # adjust boxes - img = img[patch[1] : patch[3], patch[0] : patch[2]] - boxes[:, 2:] = boxes[:, 2:].clip(max=patch[2:]) - boxes[:, :2] = boxes[:, :2].clip(min=patch[:2]) - boxes -= np.tile(patch[:2], 2) - - results["img"] = img - results["gt_bboxes"] = boxes - results["gt_labels"] = labels - return results - - def __repr__(self): - repr_str = self.__class__.__name__ - repr_str += "(min_ious={}, min_crop_size={})".format( - self.min_ious, self.min_crop_size - ) - return repr_str - - -@PIPELINES.register_module -class Corrupt(object): - def __init__(self, corruption, severity=1): - self.corruption = corruption - self.severity = severity - - def __call__(self, results): - results["img"] = corrupt( - results["img"].astype(np.uint8), - corruption_name=self.corruption, - severity=self.severity, - ) - return results - - def __repr__(self): - repr_str = self.__class__.__name__ - repr_str += "(corruption={}, severity={})".format( - self.corruption, self.severity - ) - return repr_str diff --git a/det3d/datasets/utils/create_gt_database.py b/det3d/datasets/utils/create_gt_database.py index 0054117..02f4f28 100644 --- a/det3d/datasets/utils/create_gt_database.py +++ b/det3d/datasets/utils/create_gt_database.py @@ -1,13 +1,10 @@ import pickle from pathlib import Path - +import os import numpy as np from det3d.core import box_np_ops from det3d.datasets.dataset_factory import get_dataset -from det3d.torchie import Config - -from joblib import Parallel, delayed from tqdm import tqdm dataset_name_map = { @@ -24,10 +21,6 @@ def create_groundtruth_database( db_path=None, dbinfo_path=None, relative_path=True, - add_rgb=False, - lidar_only=False, - bev_only=False, - coors_range=None, **kwargs, ): pipeline = [ @@ -55,22 +48,21 @@ def create_groundtruth_database( root_path = Path(data_path) - if dataset_class_name == "NUSC": + if dataset_class_name in ["WAYMO", "NUSC"]: if db_path is None: db_path = root_path / f"gt_database_{nsweeps}sweeps_withvelo" if dbinfo_path is None: dbinfo_path = root_path / f"dbinfos_train_{nsweeps}sweeps_withvelo.pkl" else: - if db_path is None: - db_path = root_path / "gt_database" - if dbinfo_path is None: - dbinfo_path = root_path / "dbinfos_train.pkl" - if dataset_class_name in ["NUSC", "WAYMO"]: + raise NotImplementedError() + + if dataset_class_name == "NUSC": point_features = 5 + elif dataset_class_name == "WAYMO": + point_features = 5 if nsweeps == 1 else 6 else: raise NotImplementedError() - db_path.mkdir(parents=True, exist_ok=True) all_db_infos = {} @@ -83,7 +75,7 @@ def create_groundtruth_database( if "image_idx" in sensor_data["metadata"]: image_idx = sensor_data["metadata"]["image_idx"] - if dataset_class_name == "NUSC": + if nsweeps > 1: points = sensor_data["lidar"]["combined"] else: points = sensor_data["lidar"]["points"] @@ -91,6 +83,25 @@ def create_groundtruth_database( annos = sensor_data["lidar"]["annotations"] gt_boxes = annos["boxes"] names = annos["names"] + + if dataset_class_name == 'WAYMO': + # waymo dataset contains millions of objects and it is not possible to store + # all of them into a single folder + # we randomly sample a few objects for gt augmentation + # We keep all cyclist as they are rare + if index % 4 != 0: + mask = (names == 'VEHICLE') + mask = np.logical_not(mask) + names = names[mask] + gt_boxes = gt_boxes[mask] + + if index % 2 != 0: + mask = (names == 'PEDESTRIAN') + mask = np.logical_not(mask) + names = names[mask] + gt_boxes = gt_boxes[mask] + + group_dict = {} group_ids = np.full([gt_boxes.shape[0]], -1, dtype=np.int64) if "group_ids" in annos: @@ -108,7 +119,7 @@ def create_groundtruth_database( for i in range(num_obj): if (used_classes is None) or names[i] in used_classes: filename = f"{image_idx}_{names[i]}_{i}.bin" - filepath = db_path / filename + filepath = os.path.join(str(db_path), names[i], filename) gt_points = points[point_indices[:, i]] gt_points[:, :3] -= gt_boxes[i, :3] with open(filepath, "w") as f: @@ -120,7 +131,7 @@ def create_groundtruth_database( if (used_classes is None) or names[i] in used_classes: if relative_path: - db_dump_path = str(db_path.stem + "/" + filename) + db_dump_path = os.path.join(db_path.stem, names[i], filename) else: db_dump_path = str(filepath) diff --git a/det3d/datasets/utils/ground_plane_detection.py b/det3d/datasets/utils/ground_plane_detection.py deleted file mode 100644 index 40ae39a..0000000 --- a/det3d/datasets/utils/ground_plane_detection.py +++ /dev/null @@ -1,180 +0,0 @@ -# from open3d import * -import numpy as np -import numpy.linalg as la - -eps = 0.00001 - - -def svd(A): - u, s, vh = la.svd(A) - S = np.zeros(A.shape) - S[: s.shape[0], : s.shape[0]] = np.diag(s) - return u, S, vh - - -def inverse_sigma(S): - inv_S = S.copy().transpose() - for i in range(min(S.shape)): - if abs(inv_S[i, i]) > eps: - inv_S[i, i] = 1.0 / inv_S[i, i] - return inv_S - - -def svd_solve(A, b): - U, S, Vt = svd(A) - inv_S = inverse_sigma(S) - svd_solution = Vt.transpose() @ inv_S @ U.transpose() @ b - - print("U:") - print(U) - print("Sigma:") - print(S) - print("V_transpose:") - print(Vt) - print("--------------") - print("SVD solution:") - print(svd_solution) - print("A multiplies SVD solution:") - print(A @ svd_solution) - - return svd_solution - - -def fit_plane_LSE(points): - # points: Nx4 homogeneous 3d points - # return: 1d array of four elements [a, b, c, d] of - # ax+by+cz+d = 0 - assert points.shape[0] >= 3 # at least 3 points needed - U, S, Vt = svd(points) - null_space = Vt[-1, :] - return null_space - - -def get_point_dist(points, plane): - # return: 1d array of size N (number of points) - dists = np.abs(points @ plane) / np.sqrt( - plane[0] ** 2 + plane[1] ** 2 + plane[2] ** 2 - ) - return dists - - -def fit_plane_LSE_RANSAC( - points, iters=1000, inlier_thresh=0.05, return_outlier_list=False -): - # points: Nx4 homogeneous 3d points - # return: - # plane: 1d array of four elements [a, b, c, d] of ax+by+cz+d = 0 - # inlier_list: 1d array of size N of inlier points - max_inlier_num = -1 - max_inlier_list = None - - N = points.shape[0] - assert N >= 3 - - for i in range(iters): - chose_id = np.random.choice(N, 3, replace=False) - chose_points = points[chose_id, :] - tmp_plane = fit_plane_LSE(chose_points) - - dists = get_point_dist(points, tmp_plane) - tmp_inlier_list = np.where(dists < inlier_thresh)[0] - tmp_inliers = points[tmp_inlier_list, :] - num_inliers = tmp_inliers.shape[0] - if num_inliers > max_inlier_num: - max_inlier_num = num_inliers - max_inlier_list = tmp_inlier_list - - final_points = points[max_inlier_list, :] - plane = fit_plane_LSE(final_points) - - fit_variance = np.var(get_point_dist(final_points, plane)) - # print('RANSAC fit variance: %f' % fit_variance) - # print(plane) - - dists = get_point_dist(points, plane) - - select_thresh = inlier_thresh * 1 - - inlier_list = np.where(dists < select_thresh)[0] - if not return_outlier_list: - return plane, inlier_list - else: - outlier_list = np.where(dists >= select_thresh)[0] - return plane, inlier_list, outlier_list - - -def display_inlier_outlier(cloud, ind): - inlier_cloud = select_down_sample(cloud, ind) - outlier_cloud = select_down_sample(cloud, ind, invert=True) - - print("Showing outliers (red) and inliers (gray): ") - outlier_cloud.paint_uniform_color([1, 0, 0]) - inlier_cloud.paint_uniform_color([0.8, 0.8, 0.8]) - draw_geometries([inlier_cloud, outlier_cloud]) - - -def create_pcd(points): - pcd = PointCloud() - pcd.points = Vector3dVector(points[:, :3]) - return pcd - - -if __name__ == "__main__": - - print("Load a ply point cloud, print it, and render it") - points = np.fromfile("path/to/point", dtype=np.float32).reshape(-1, 5) - points = np.concatenate((points[:, :3], np.ones((points.shape[0], 1))), axis=1) - - gp_height = -1.78 - - p_set_1 = points - p1, inlier_list1, outlier_list1 = fit_plane_LSE_RANSAC( - p_set_1, return_outlier_list=True - ) - p_set_2 = p_set_1[outlier_list1, :] - p2, inlier_list2, outlier_list2 = fit_plane_LSE_RANSAC( - p_set_2, return_outlier_list=True - ) - p_set_3 = p_set_2[outlier_list2, :] - p3, inlier_list3, outlier_list3 = fit_plane_LSE_RANSAC( - p_set_3, return_outlier_list=True - ) - p_set_4 = p_set_3[outlier_list3, :] - p4, inlier_list4, outlier_list4 = fit_plane_LSE_RANSAC( - p_set_4, return_outlier_list=True - ) - - ps = [p1, p2, p3, p4] - - for p in [p1]: - xx = points[:, 0] - yy = points[:, 1] - - zz = (-p[0] * xx - p[1] * yy - p[3]) / p[2] - - print(f"Current point cloud's gp height is: {np.mean(zz)}") - points_up = points[np.where(points[:, 2] >= zz)] - points_down = points[np.where(points[:, 2] < zz)] - print(f"Current point cloud's gp height is: {np.mean(points_down[:, 2])}") - ptup = create_pcd(points_up) - ptdn = create_pcd(points_down) - ptup.paint_uniform_color([0.5, 0.5, 0.5]) - draw_geometries([ptup, ptdn]) - - # print("Downsample the point cloud with a voxel of 0.02") - # voxel_down_pcd = voxel_down_sample(pcd, voxel_size = 0.02) - # draw_geometries([voxel_down_pcd]) - # - # print("Every 5th points are selected") - # uni_down_pcd = uniform_down_sample(pcd, every_k_points = 5) - # draw_geometries([uni_down_pcd]) - # - # print("Statistical oulier removal") - # cl,ind = statistical_outlier_removal(voxel_down_pcd, - # nb_neighbors=20, std_ratio=2.0) - # display_inlier_outlier(voxel_down_pcd, ind) - # - # print("Radius oulier removal") - # cl,ind = radius_outlier_removal(voxel_down_pcd, - # nb_points=16, radius=0.05) - # display_inlier_outlier(voxel_down_pcd, ind) diff --git a/det3d/datasets/utils/preprocess.py b/det3d/datasets/utils/preprocess.py deleted file mode 100644 index a95480a..0000000 --- a/det3d/datasets/utils/preprocess.py +++ /dev/null @@ -1,1125 +0,0 @@ -import pathlib -import time -from collections import defaultdict -import torch -import cv2 -import numpy as np - -from det3d.core import box_np_ops -from det3d.core import preprocess as prep -from det3d.core.geometry import points_in_convex_polygon_3d_jit -from det3d.datasets import kitti -import itertools - - -def prcnn_rpn_collate_batch(batch_list): - - example_merged = defaultdict(list) - for example in batch_list: - for k, v in example.items(): - example_merged[k].append(v) - - batch_size = len(batch_list) - ret = {} - - for key, elems in example_merged.items(): - if key in ["gt_boxes3d"]: - task_max_gts = [] - for task_id in range(len(elems[0])): - max_gt = 0 - for k in range(batch_size): - max_gt = max(max_gt, len(elems[k][task_id])) - task_max_gts.append(max_gt) - res = [] - for idx, max_gt in enumerate(task_max_gts): - batch_task_gt_boxes3d = np.zeros((batch_size, max_gt, 7)) - for i in range(batch_size): - batch_task_gt_boxes3d[i, : len(elems[i][idx]), :] = elems[i][idx] - res.append(batch_task_gt_boxes3d) - ret[key] = res - elif key == "metadata": - ret[key] = elems - elif key == "calib": - ret[key] = {} - for elem in elems: - for k1, v1 in elem.items(): - if k1 not in ret[key]: - ret[key][k1] = [v1] - else: - ret[key][k1].append(v1) - for k1, v1 in ret[key].items(): - ret[key][k1] = np.stack(v1, axis=0) - elif key in ["pts_input"]: - ret[key] = np.concatenate( - [elems[k][np.newaxis, ...] for k in range(batch_size)] - ) - elif key in ["rpn_cls_label", "rpn_reg_label"]: - ret[key] = [] - for task_id in range(len(elems[0])): - branch_out = np.concatenate( - [elems[k][task_id][np.newaxis, ...] for k in range(batch_size)] - ) - ret[key].append(branch_out) - else: - ret[key] = np.stack(elems, axis=0) - - return ret - - -type_map = { - "voxels": torch.float32, - "bev_map": torch.float32, - "anchors": torch.float32, - "reg_targets": torch.float32, - "reg_weights": torch.float32, - "coordinates": torch.int32, - "num_points": torch.int32, - "labels": torch.int32, - "points": torch.float32, - "anchors_mask": torch.uint8, - "calib": torch.float32, - "num_voxels": torch.int64, -} - - -def collate_sequence_batch(batch_list): - example_current_frame_merged = defaultdict(list) - for example in batch_list: - for k, v in example["current_frame"].items(): - example_current_frame_merged[k].append(v) - batch_size = len(batch_list) - ret_current_frame = {} - for key, elems in example_current_frame_merged.items(): - if key in ["voxels", "num_points", "num_gt", "voxel_labels"]: - ret_current_frame[key] = np.concatenate(elems, axis=0) - elif key in ["gt_boxes"]: - task_max_gts = [] - for task_id in range(len(elems[0])): - max_gt = 0 - for k in range(batch_size): - max_gt = max(max_gt, len(elems[k][task_id])) - task_max_gts.append(max_gt) - res = [] - for idx, max_gt in enumerate(task_max_gts): - batch_task_gt_boxes3d = np.zeros((batch_size, max_gt, 9)) - for i in range(batch_size): - batch_task_gt_boxes3d[i, : len(elems[i][idx]), :] = elems[i][idx] - res.append(batch_task_gt_boxes3d) - ret_current_frame[key] = res - elif key == "metadata": - ret_current_frame[key] = elems - elif key == "calib": - ret_current_frame[key] = {} - for elem in elems: - for k1, v1 in elem.items(): - if k1 not in ret[key]: - ret_current_frame[key][k1] = [v1] - else: - ret_current_frame[key][k1].append(v1) - for k1, v1 in ret_current_frame[key].items(): - ret_current_frame[key][k1] = np.stack(v1, axis=0) - elif key in ["coordinates", "points"]: - coors = [] - for i, coor in enumerate(elems): - coor_pad = np.pad( - coor, ((0, 0), (1, 0)), mode="constant", constant_values=i - ) - coors.append(coor_pad) - ret_current_frame[key] = np.concatenate(coors, axis=0) - elif key in ["anchors", "anchors_mask", "reg_targets", "reg_weights", "labels"]: - ret_current_frame[key] = defaultdict(list) - for elem in elems: - for idx, ele in enumerate(elem): - ret_current_frame[key][str(idx)].append(ele) - else: - ret_current_frame[key] = np.stack(elems, axis=0) - - example_keyframe_merged = defaultdict(list) - for example in batch_list: - for k, v in example["keyframe"].items(): - example_keyframe_merged[k].append(v) - batch_size = len(batch_list) - ret_keyframe = {} - for key, elems in example_keyframe_merged.items(): - if key in ["voxels", "num_points", "num_gt", "voxel_labels"]: - ret_keyframe[key] = np.concatenate(elems, axis=0) - elif key == "calib": - ret_keyframe[key] = {} - for elem in elems: - for k1, v1 in elem.items(): - if k1 not in ret[key]: - ret_keyframe[key][k1] = [v1] - else: - ret_keyframe[key][k1].append(v1) - for k1, v1 in ret_keyframe[key].items(): - ret_keyframe[key][k1] = np.stack(v1, axis=0) - elif key in ["coordinates", "points"]: - coors = [] - for i, coor in enumerate(elems): - coor_pad = np.pad( - coor, ((0, 0), (1, 0)), mode="constant", constant_values=i - ) - coors.append(coor_pad) - ret_keyframe[key] = np.concatenate(coors, axis=0) - else: - ret_keyframe[key] = np.stack(elems, axis=0) - - rets = {} - rets["current_frame"] = ret_current_frame - rets["keyframe"] = ret_keyframe - return rets - - -def collate_batch(batch_list): - example_merged = defaultdict(list) - for example in batch_list: - for k, v in example.items(): - example_merged[k].append(v) - batch_size = len(batch_list) - ret = {} - # voxel_nums_list = example_merged["num_voxels"] - # example_merged.pop("num_voxels") - for key, elems in example_merged.items(): - if key in ["voxels", "num_points", "num_gt", "voxel_labels", "ground_plane"]: - ret[key] = np.concatenate(elems, axis=0) - elif key in [ - "gt_boxes", - ]: - task_max_gts = [] - for task_id in range(len(elems[0])): - max_gt = 0 - for k in range(batch_size): - max_gt = max(max_gt, len(elems[k][task_id])) - task_max_gts.append(max_gt) - res = [] - for idx, max_gt in enumerate(task_max_gts): - batch_task_gt_boxes3d = np.zeros((batch_size, max_gt, 9)) - for i in range(batch_size): - batch_task_gt_boxes3d[i, : len(elems[i][idx]), :] = elems[i][idx] - res.append(batch_task_gt_boxes3d) - ret[key] = res - elif key == "metadata": - ret[key] = elems - elif key == "calib": - ret[key] = {} - for elem in elems: - for k1, v1 in elem.items(): - if k1 not in ret[key]: - ret[key][k1] = [v1] - else: - ret[key][k1].append(v1) - for k1, v1 in ret[key].items(): - ret[key][k1] = np.stack(v1, axis=0) - elif key in ["coordinates", "points"]: - coors = [] - for i, coor in enumerate(elems): - coor_pad = np.pad( - coor, ((0, 0), (1, 0)), mode="constant", constant_values=i - ) - coors.append(coor_pad) - ret[key] = np.concatenate(coors, axis=0) - elif key in ["anchors", "anchors_mask", "reg_targets", "reg_weights", "labels"]: - ret[key] = defaultdict(list) - for elem in elems: - for idx, ele in enumerate(elem): - ret[key][str(idx)].append(ele) - else: - ret[key] = np.stack(elems, axis=0) - - return ret - - -def collate_batch_kitti(batch_list): - example_merged = defaultdict(list) - for example in batch_list: - for k, v in example.items(): - example_merged[k].append(v) - batch_size = len(batch_list) - ret = {} - # voxel_nums_list = example_merged["num_voxels"] - # example_merged.pop("num_voxels") - for key, elems in example_merged.items(): - if key in ["voxels", "num_points", "num_gt", "voxel_labels"]: - ret[key] = np.concatenate(elems, axis=0) - elif key in [ - "gt_boxes", - ]: - task_max_gts = [] - for task_id in range(len(elems[0])): - max_gt = 0 - for k in range(batch_size): - max_gt = max(max_gt, len(elems[k][task_id])) - task_max_gts.append(max_gt) - res = [] - for idx, max_gt in enumerate(task_max_gts): - batch_task_gt_boxes3d = np.zeros((batch_size, max_gt, 7)) - for i in range(batch_size): - batch_task_gt_boxes3d[i, : len(elems[i][idx]), :] = elems[i][idx] - res.append(batch_task_gt_boxes3d) - ret[key] = res - elif key == "metadata": - ret[key] = elems - elif key == "calib": - ret[key] = {} - for elem in elems: - for k1, v1 in elem.items(): - if k1 not in ret[key]: - ret[key][k1] = [v1] - else: - ret[key][k1].append(v1) - for k1, v1 in ret[key].items(): - ret[key][k1] = np.stack(v1, axis=0) - elif key in ["coordinates", "points"]: - coors = [] - for i, coor in enumerate(elems): - coor_pad = np.pad( - coor, ((0, 0), (1, 0)), mode="constant", constant_values=i - ) - coors.append(coor_pad) - ret[key] = np.concatenate(coors, axis=0) - elif key in ["anchors", "anchors_mask", "reg_targets", "reg_weights", "labels"]: - ret[key] = defaultdict(list) - for elem in elems: - for idx, ele in enumerate(elem): - ret[key][str(idx)].append(ele) - else: - ret[key] = np.stack(elems, axis=0) - - return ret - - -def collate_batch_torch(batch_list): - example_merged = defaultdict(list) - for example in batch_list: - for k, v in example.items(): - example_merged[k].append(v) - batch_size = len(batch_list) - ret = {} - # voxel_nums_list = example_merged["num_voxels"] - # example_merged.pop("num_voxels") - for key, elems in example_merged.items(): - if key in ["voxels", "num_points", "num_gt", "voxel_labels"]: - ret[key] = torch.tensor(np.concatenate(elems, axis=0), dtype=type_map[key]) - elif key in [ - "gt_boxes", - ]: - task_max_gts = [] - for task_id in range(len(elems[0])): - max_gt = 0 - for k in range(batch_size): - max_gt = max(max_gt, len(elems[k][task_id])) - task_max_gts.append(max_gt) - res = [] - for idx, max_gt in enumerate(task_max_gts): - batch_task_gt_boxes3d = np.zeros((batch_size, max_gt, 9)) - for i in range(batch_size): - batch_task_gt_boxes3d[i, : len(elems[i][idx]), :] = elems[i][idx] - res.append(batch_task_gt_boxes3d) - ret[key] = res - elif key == "metadata": - ret[key] = elems - elif key == "calib": - ret[key] = {} - for elem in elems: - for k1, v1 in elem.items(): - if k1 not in ret[key]: - ret[key][k1] = [v1] - else: - ret[key][k1].append(v1) - for k1, v1 in ret[key].items(): - ret[key][k1] = torch.tensor(np.stack(v1, axis=0), dtype=type_map[key]) - elif key in ["coordinates", "points"]: - coors = [] - for i, coor in enumerate(elems): - coor_pad = np.pad( - coor, ((0, 0), (1, 0)), mode="constant", constant_values=i - ) - coors.append(coor_pad) - ret[key] = torch.tensor(np.concatenate(coors, axis=0), dtype=type_map[key]) - elif key in ["anchors", "anchors_mask", "reg_targets", "reg_weights", "labels"]: - ret[key] = defaultdict(list) - for elem in elems: - for idx, ele in enumerate(elem): - ret[key][str(idx)].append(torch.tensor(ele, dtype=type_map[key])) - else: - ret[key] = torch.tensor(np.stack(elems, axis=0), dtype=type_map[key]) - - return ret - - -def _dict_select(dict_, inds): - for k, v in dict_.items(): - if isinstance(v, dict): - _dict_select(v, inds) - else: - dict_[k] = v[inds] - - -def prep_pointcloud( - input_dict, - root_path, - voxel_generator, - target_assigners, - prep_cfg=None, - db_sampler=None, - remove_outside_points=False, - training=True, - create_targets=True, - num_point_features=4, - anchor_cache=None, - random_crop=False, - reference_detections=None, - out_size_factor=2, - out_dtype=np.float32, - min_points_in_gt=-1, - logger=None, -): - """ - convert point cloud to voxels, create targets if ground truths exists. - input_dict format: dataset.get_sensor_data format - """ - assert prep_cfg is not None - - task_class_names = [target_assigner.classes for target_assigner in target_assigners] - class_names = list(itertools.chain(*task_class_names)) - - # res = voxel_generator.generate( - # points, max_voxels) - # voxels = res["voxels"] - # coordinates = res["coordinates"] - # num_points = res["num_points_per_voxel"] - - num_voxels = np.array([voxels.shape[0]], dtype=np.int64) - - example = { - "voxels": voxels, - "num_points": num_points, - "points": points, - "coordinates": coordinates, - # "num_voxels": np.array([voxels.shape[0]], dtype=np.int64), - "num_voxels": num_voxels, - # "ground_plane": input_dict["ground_plane"], - # "gt_dict": gt_dict, - } - - if training: - example["gt_boxes"] = gt_dict["gt_boxes"] - else: - example["gt_boxes"] = [input_dict["lidar"]["annotations"]["boxes"]] - - if calib is not None: - example["calib"] = calib - - feature_map_size = grid_size[:2] // out_size_factor - feature_map_size = [*feature_map_size, 1][::-1] - - if anchor_cache is not None: - anchorss = anchor_cache["anchors"] - anchors_bvs = anchor_cache["anchors_bv"] - anchors_dicts = anchor_cache["anchors_dict"] - else: - rets = [ - target_assigner.generate_anchors(feature_map_size) - for target_assigner in target_assigners - ] - anchorss = [ret["anchors"].reshape([-1, 7]) for ret in rets] - anchors_dicts = [ - target_assigner.generate_anchors_dict(feature_map_size) - for target_assigner in target_assigners - ] - anchors_bvs = [ - box_np_ops.rbbox2d_to_near_bbox(anchors[:, [0, 1, 3, 4, 6]]) - for anchors in anchorss - ] - - example["anchors"] = anchorss - - if anchor_area_threshold >= 0: - example["anchors_mask"] = [] - for idx, anchors_bv in enumerate(anchors_bvs): - anchors_mask = None - # slow with high resolution. recommend disable this forever. - coors = coordinates - dense_voxel_map = box_np_ops.sparse_sum_for_anchors_mask( - coors, tuple(grid_size[::-1][1:]) - ) - dense_voxel_map = dense_voxel_map.cumsum(0) - dense_voxel_map = dense_voxel_map.cumsum(1) - anchors_area = box_np_ops.fused_get_anchors_area( - dense_voxel_map, anchors_bv, voxel_size, pc_range, grid_size - ) - anchors_mask = anchors_area > anchor_area_threshold - # example['anchors_mask'] = anchors_mask.astype(np.uint8) - example["anchors_mask"].append(anchors_mask) - - if not training: - return example - - # voxel_labels = box_np_ops.assign_label_to_voxel(gt_boxes, coordinates, - # voxel_size, coors_range) - """ - example.update({ - 'gt_boxes': gt_boxes.astype(out_dtype), - 'num_gt': np.array([gt_boxes.shape[0]]), - # 'voxel_labels': voxel_labels, - }) - """ - if create_targets: - targets_dicts = [] - for idx, target_assigner in enumerate(target_assigners): - if "anchors_mask" in example: - anchors_mask = example["anchors_mask"][idx] - else: - anchors_mask = None - targets_dict = target_assigner.assign_v2( - anchors_dicts[idx], - gt_dict["gt_boxes"][idx], - anchors_mask, - gt_classes=gt_dict["gt_classes"][idx], - gt_names=gt_dict["gt_names"][idx], - ) - targets_dicts.append(targets_dict) - - example.update( - { - "labels": [targets_dict["labels"] for targets_dict in targets_dicts], - "reg_targets": [ - targets_dict["bbox_targets"] for targets_dict in targets_dicts - ], - "reg_weights": [ - targets_dict["bbox_outside_weights"] - for targets_dict in targets_dicts - ], - } - ) - - return example - - -def prep_sequence_pointcloud( - input_dict, - root_path, - voxel_generator, - target_assigners, - prep_cfg=None, - db_sampler=None, - remove_outside_points=False, - training=True, - create_targets=True, - num_point_features=4, - anchor_cache=None, - random_crop=False, - reference_detections=None, - out_size_factor=2, - out_dtype=np.float32, - min_points_in_gt=-1, - logger=None, -): - """ - convert point cloud to voxels, create targets if ground truths exists. - input_dict format: dataset.get_sensor_data format - """ - assert prep_cfg is not None - - remove_environment = prep_cfg.REMOVE_ENVIRONMENT - max_voxels = prep_cfg.MAX_VOXELS_NUM - shuffle_points = prep_cfg.SHUFFLE - anchor_area_threshold = prep_cfg.ANCHOR_AREA_THRES - - if training: - remove_unknown = prep_cfg.REMOVE_UNKOWN_EXAMPLES - gt_rotation_noise = prep_cfg.GT_ROT_NOISE - gt_loc_noise_std = prep_cfg.GT_LOC_NOISE - global_rotation_noise = prep_cfg.GLOBAL_ROT_NOISE - global_scaling_noise = prep_cfg.GLOBAL_SCALE_NOISE - global_random_rot_range = prep_cfg.GLOBAL_ROT_PER_OBJ_RANGE - global_translate_noise_std = prep_cfg.GLOBAL_TRANS_NOISE - gt_points_drop = prep_cfg.GT_DROP_PERCENTAGE - gt_drop_max_keep = prep_cfg.GT_DROP_MAX_KEEP_POINTS - remove_points_after_sample = prep_cfg.REMOVE_POINTS_AFTER_SAMPLE - min_points_in_gt = prep_cfg.get("MIN_POINTS_IN_GT", -1) - - task_class_names = [target_assigner.classes for target_assigner in target_assigners] - class_names = list(itertools.chain(*task_class_names)) - - # points_only = input_dict["lidar"]["points"] - # times = input_dict["lidar"]["times"] - # points = np.hstack([points_only, times]) - try: - points = input_dict["current_frame"]["lidar"]["combined"] - except Exception: - points = input_dict["current_frame"]["lidar"]["points"] - keyframe_points = input_dict["keyframe"]["lidar"]["combined"] - - if training: - anno_dict = input_dict["current_frame"]["lidar"]["annotations"] - gt_dict = { - "gt_boxes": anno_dict["boxes"], - "gt_names": np.array(anno_dict["names"]).reshape(-1), - } - - if "difficulty" not in anno_dict: - difficulty = np.zeros([anno_dict["boxes"].shape[0]], dtype=np.int32) - gt_dict["difficulty"] = difficulty - else: - gt_dict["difficulty"] = anno_dict["difficulty"] - # if use_group_id and "group_ids" in anno_dict: - # group_ids = anno_dict["group_ids"] - # gt_dict["group_ids"] = group_ids - - calib = None - if "calib" in input_dict: - calib = input_dict["current_frame"]["calib"] - - if reference_detections is not None: - assert calib is not None and "image" in input_dict["current_frame"] - C, R, T = box_np_ops.projection_matrix_to_CRT_kitti(P2) - frustums = box_np_ops.get_frustum_v2(reference_detections, C) - frustums -= T - frustums = np.einsum("ij, akj->aki", np.linalg.inv(R), frustums) - frustums = box_np_ops.camera_to_lidar(frustums, rect, Trv2c) - surfaces = box_np_ops.corner_to_surfaces_3d_jit(frustums) - masks = points_in_convex_polygon_3d_jit(points, surfaces) - points = points[masks.any(-1)] - - if remove_outside_points: - assert calib is not None - image_shape = input_dict["current_frame"]["image"]["image_shape"] - points = box_np_ops.remove_outside_points( - points, calib["rect"], calib["Trv2c"], calib["P2"], image_shape - ) - if remove_environment is True and training: - selected = kitti.keep_arrays_by_name(gt_names, target_assigner.classes) - _dict_select(gt_dict, selected) - masks = box_np_ops.points_in_rbbox(points, gt_dict["gt_boxes"]) - points = points[masks.any(-1)] - - if training: - # boxes_lidar = gt_dict["gt_boxes"] - # cv2.imshow('pre-noise', bev_map) - selected = kitti.drop_arrays_by_name( - gt_dict["gt_names"], ["DontCare", "ignore"] - ) - _dict_select(gt_dict, selected) - if remove_unknown: - remove_mask = gt_dict["difficulty"] == -1 - """ - gt_boxes_remove = gt_boxes[remove_mask] - gt_boxes_remove[:, 3:6] += 0.25 - points = prep.remove_points_in_boxes(points, gt_boxes_remove) - """ - keep_mask = np.logical_not(remove_mask) - _dict_select(gt_dict, keep_mask) - gt_dict.pop("difficulty") - - if min_points_in_gt > 0: - # points_count_rbbox takes 10ms with 10 sweeps nuscenes data - point_counts = box_np_ops.points_count_rbbox(points, gt_dict["gt_boxes"]) - mask = point_counts >= min_points_in_gt - _dict_select(gt_dict, mask) - - gt_boxes_mask = np.array( - [n in class_names for n in gt_dict["gt_names"]], dtype=np.bool_ - ) - - # db_sampler = None - if db_sampler is not None: - group_ids = None - # if "group_ids" in gt_dict: - # group_ids = gt_dict["group_ids"] - sampled_dict = db_sampler.sample_all( - root_path, - gt_dict["gt_boxes"], - gt_dict["gt_names"], - num_point_features, - random_crop, - gt_group_ids=group_ids, - calib=calib, - ) - - if sampled_dict is not None: - sampled_gt_names = sampled_dict["gt_names"] - sampled_gt_boxes = sampled_dict["gt_boxes"] - sampled_points = sampled_dict["points"] - sampled_gt_masks = sampled_dict["gt_masks"] - gt_dict["gt_names"] = np.concatenate( - [gt_dict["gt_names"], sampled_gt_names], axis=0 - ) - gt_dict["gt_boxes"] = np.concatenate( - [gt_dict["gt_boxes"], sampled_gt_boxes] - ) - gt_boxes_mask = np.concatenate( - [gt_boxes_mask, sampled_gt_masks], axis=0 - ) - - # if group_ids is not None: - # sampled_group_ids = sampled_dict["group_ids"] - # gt_dict["group_ids"] = np.concatenate( - # [gt_dict["group_ids"], sampled_group_ids]) - - if remove_points_after_sample: - masks = box_np_ops.points_in_rbbox(points, sampled_gt_boxes) - points = points[np.logical_not(masks.any(-1))] - - points = np.concatenate([sampled_points, points], axis=0) - - pc_range = voxel_generator.point_cloud_range - - # group_ids = None - # if "group_ids" in gt_dict: - # group_ids = gt_dict["group_ids"] - - # prep.noise_per_object_v3_( - # gt_dict["gt_boxes"], - # points, - # gt_boxes_mask, - # rotation_perturb=gt_rotation_noise, - # center_noise_std=gt_loc_noise_std, - # global_random_rot_range=global_random_rot_range, - # group_ids=group_ids, - # num_try=100) - - # should remove unrelated objects after noise per object - # for k, v in gt_dict.items(): - # print(k, v.shape) - - _dict_select(gt_dict, gt_boxes_mask) - - gt_classes = np.array( - [class_names.index(n) + 1 for n in gt_dict["gt_names"]], dtype=np.int32 - ) - gt_dict["gt_classes"] = gt_classes - - # concatenate - points_current = points.shape[0] - points_keyframe = keyframe_points.shape[0] - points = np.concatenate((points, keyframe_points), axis=0) - - # data aug - gt_dict["gt_boxes"], points = prep.random_flip(gt_dict["gt_boxes"], points) - gt_dict["gt_boxes"], points = prep.global_rotation( - gt_dict["gt_boxes"], points, rotation=global_rotation_noise - ) - gt_dict["gt_boxes"], points = prep.global_scaling_v2( - gt_dict["gt_boxes"], points, *global_scaling_noise - ) - prep.global_translate_(gt_dict["gt_boxes"], points, global_translate_noise_std) - - # slice - points_keyframe = points[points_current:, :] - points = points[:points_current, :] - - bv_range = voxel_generator.point_cloud_range[[0, 1, 3, 4]] - mask = prep.filter_gt_box_outside_range(gt_dict["gt_boxes"], bv_range) - _dict_select(gt_dict, mask) - - task_masks = [] - flag = 0 - for class_name in task_class_names: - task_masks.append( - [ - np.where(gt_dict["gt_classes"] == class_name.index(i) + 1 + flag) - for i in class_name - ] - ) - flag += len(class_name) - - task_boxes = [] - task_classes = [] - task_names = [] - flag2 = 0 - for idx, mask in enumerate(task_masks): - task_box = [] - task_class = [] - task_name = [] - for m in mask: - task_box.append(gt_dict["gt_boxes"][m]) - task_class.append(gt_dict["gt_classes"][m] - flag2) - task_name.append(gt_dict["gt_names"][m]) - task_boxes.append(np.concatenate(task_box, axis=0)) - task_classes.append(np.concatenate(task_class)) - task_names.append(np.concatenate(task_name)) - flag2 += len(mask) - - for task_box in task_boxes: - # limit rad to [-pi, pi] - task_box[:, -1] = box_np_ops.limit_period( - task_box[:, -1], offset=0.5, period=2 * np.pi - ) - - # print(gt_dict.keys()) - gt_dict["gt_classes"] = task_classes - gt_dict["gt_names"] = task_names - gt_dict["gt_boxes"] = task_boxes - - # if shuffle_points: - # # shuffle is a little slow. - # np.random.shuffle(points) - - # [0, -40, -3, 70.4, 40, 1] - voxel_size = voxel_generator.voxel_size - pc_range = voxel_generator.point_cloud_range - grid_size = voxel_generator.grid_size - # [352, 400] - - # points = points[:int(points.shape[0] * 0.1), :] - voxels, coordinates, num_points = voxel_generator.generate(points, max_voxels) - - # res = voxel_generator.generate( - # points, max_voxels) - # voxels = res["voxels"] - # coordinates = res["coordinates"] - # num_points = res["num_points_per_voxel"] - - num_voxels = np.array([voxels.shape[0]], dtype=np.int64) - - # key frame voxel - keyframe_info = voxel_generator.generate(keyframe_points, max_voxels) - keyframe_info = keyframe_voxels, keyframe_coordinates, keyframe_num_points - - keyframe_num_voxels = np.array([keyframe_voxels.shape[0]], dtype=np.int64) - - example = { - "voxels": voxels, - "num_points": num_points, - "points": points, - "coordinates": coordinates, - "num_voxels": num_voxels, - } - - example_keyframe = { - "voxels": keyframe_voxels, - "num_points": keyframe_num_points, - "points": keyframe_points, - "coordinates": keyframe_coordinates, - "num_voxels": keyframe_num_voxels, - } - - if training: - example["gt_boxes"] = gt_dict["gt_boxes"] - - if calib is not None: - example["calib"] = calib - - feature_map_size = grid_size[:2] // out_size_factor - feature_map_size = [*feature_map_size, 1][::-1] - - if anchor_cache is not None: - anchorss = anchor_cache["anchors"] - anchors_bvs = anchor_cache["anchors_bv"] - anchors_dicts = anchor_cache["anchors_dict"] - else: - rets = [ - target_assigner.generate_anchors(feature_map_size) - for target_assigner in target_assigners - ] - anchorss = [ret["anchors"].reshape([-1, 7]) for ret in rets] - anchors_dicts = [ - target_assigner.generate_anchors_dict(feature_map_size) - for target_assigner in target_assigners - ] - anchors_bvs = [ - box_np_ops.rbbox2d_to_near_bbox(anchors[:, [0, 1, 3, 4, 6]]) - for anchors in anchorss - ] - - example["anchors"] = anchorss - - if anchor_area_threshold >= 0: - example["anchors_mask"] = [] - for idx, anchors_bv in enumerate(anchors_bvs): - anchors_mask = None - # slow with high resolution. recommend disable this forever. - coors = coordinates - dense_voxel_map = box_np_ops.sparse_sum_for_anchors_mask( - coors, tuple(grid_size[::-1][1:]) - ) - dense_voxel_map = dense_voxel_map.cumsum(0) - dense_voxel_map = dense_voxel_map.cumsum(1) - anchors_area = box_np_ops.fused_get_anchors_area( - dense_voxel_map, anchors_bv, voxel_size, pc_range, grid_size - ) - anchors_mask = anchors_area > anchor_area_threshold - # example['anchors_mask'] = anchors_mask.astype(np.uint8) - example["anchors_mask"].append(anchors_mask) - - example_sequences = {} - example_sequences["current_frame"] = example - example_sequences["keyframe"] = example_keyframe - - if not training: - return example_sequences - - # voxel_labels = box_np_ops.assign_label_to_voxel(gt_boxes, coordinates, - # voxel_size, coors_range) - """ - example.update({ - 'gt_boxes': gt_boxes.astype(out_dtype), - 'num_gt': np.array([gt_boxes.shape[0]]), - # 'voxel_labels': voxel_labels, - }) - """ - if create_targets: - targets_dicts = [] - for idx, target_assigner in enumerate(target_assigners): - if "anchors_mask" in example: - anchors_mask = example["anchors_mask"][idx] - else: - anchors_mask = None - targets_dict = target_assigner.assign_v2( - anchors_dicts[idx], - gt_dict["gt_boxes"][idx], - anchors_mask, - gt_classes=gt_dict["gt_classes"][idx], - gt_names=gt_dict["gt_names"][idx], - ) - targets_dicts.append(targets_dict) - - example_sequences["current_frame"].update( - { - "labels": [targets_dict["labels"] for targets_dict in targets_dicts], - "reg_targets": [ - targets_dict["bbox_targets"] for targets_dict in targets_dicts - ], - "reg_weights": [ - targets_dict["bbox_outside_weights"] - for targets_dict in targets_dicts - ], - } - ) - return example_sequences - - -def prep_pointcloud_rpn( - input_dict, - root_path, - task_class_names=[], - prep_cfg=None, - db_sampler=None, - remove_outside_points=False, - training=True, - num_point_features=4, - random_crop=False, - reference_detections=None, - out_dtype=np.float32, - min_points_in_gt=-1, - logger=None, -): - """ - convert point cloud to voxels, create targets if ground truths exists. - input_dict format: dataset.get_sensor_data format - """ - assert prep_cfg is not None - - remove_environment = prep_cfg.REMOVE_UNKOWN_EXAMPLES - - if training: - remove_unknown = prep_cfg.REMOVE_UNKOWN_EXAMPLES - gt_rotation_noise = prep_cfg.GT_ROT_NOISE - gt_loc_noise_std = prep_cfg.GT_LOC_NOISE - global_rotation_noise = prep_cfg.GLOBAL_ROT_NOISE - global_scaling_noise = prep_cfg.GLOBAL_SCALE_NOISE - global_random_rot_range = prep_cfg.GLOBAL_ROT_PER_OBJ_RANGE - global_translate_noise_std = prep_cfg.GLOBAL_TRANS_NOISE - gt_points_drop = prep_cfg.GT_DROP_PERCENTAGE - gt_drop_max_keep = prep_cfg.GT_DROP_MAX_KEEP_POINTS - remove_points_after_sample = prep_cfg.REMOVE_POINTS_AFTER_SAMPLE - - class_names = list(itertools.chain(*task_class_names)) - - # points_only = input_dict["lidar"]["points"] - # times = input_dict["lidar"]["times"] - # points = np.hstack([points_only, times]) - points = input_dict["lidar"]["points"] - - if training: - anno_dict = input_dict["lidar"]["annotations"] - gt_dict = { - "gt_boxes": anno_dict["boxes"], - "gt_names": np.array(anno_dict["names"]).reshape(-1), - } - - if "difficulty" not in anno_dict: - difficulty = np.zeros([anno_dict["boxes"].shape[0]], dtype=np.int32) - gt_dict["difficulty"] = difficulty - else: - gt_dict["difficulty"] = anno_dict["difficulty"] - # if use_group_id and "group_ids" in anno_dict: - # group_ids = anno_dict["group_ids"] - # gt_dict["group_ids"] = group_ids - - calib = None - if "calib" in input_dict: - calib = input_dict["calib"] - - if reference_detections is not None: - assert calib is not None and "image" in input_dict - C, R, T = box_np_ops.projection_matrix_to_CRT_kitti(P2) - frustums = box_np_ops.get_frustum_v2(reference_detections, C) - frustums -= T - frustums = np.einsum("ij, akj->aki", np.linalg.inv(R), frustums) - frustums = box_np_ops.camera_to_lidar(frustums, rect, Trv2c) - surfaces = box_np_ops.corner_to_surfaces_3d_jit(frustums) - masks = points_in_convex_polygon_3d_jit(points, surfaces) - points = points[masks.any(-1)] - - if remove_outside_points: - assert calib is not None - image_shape = input_dict["image"]["image_shape"] - points = box_np_ops.remove_outside_points( - points, calib["rect"], calib["Trv2c"], calib["P2"], image_shape - ) - if remove_environment is True and training: - selected = kitti.keep_arrays_by_name(gt_names, target_assigner.classes) - _dict_select(gt_dict, selected) - masks = box_np_ops.points_in_rbbox(points, gt_dict["gt_boxes"]) - points = points[masks.any(-1)] - - if training: - selected = kitti.drop_arrays_by_name( - gt_dict["gt_names"], ["DontCare", "ignore"] - ) - _dict_select(gt_dict, selected) - if remove_unknown: - remove_mask = gt_dict["difficulty"] == -1 - """ - gt_boxes_remove = gt_boxes[remove_mask] - gt_boxes_remove[:, 3:6] += 0.25 - points = prep.remove_points_in_boxes(points, gt_boxes_remove) - """ - keep_mask = np.logical_not(remove_mask) - _dict_select(gt_dict, keep_mask) - gt_dict.pop("difficulty") - - gt_boxes_mask = np.array( - [n in class_names for n in gt_dict["gt_names"]], dtype=np.bool_ - ) - - # db_sampler = None - if db_sampler is not None: - group_ids = None - # if "group_ids" in gt_dict: - # group_ids = gt_dict["group_ids"] - sampled_dict = db_sampler.sample_all( - root_path, - gt_dict["gt_boxes"], - gt_dict["gt_names"], - num_point_features, - random_crop, - gt_group_ids=group_ids, - calib=calib, - ) - - if sampled_dict is not None: - sampled_gt_names = sampled_dict["gt_names"] - sampled_gt_boxes = sampled_dict["gt_boxes"] - sampled_points = sampled_dict["points"] - sampled_gt_masks = sampled_dict["gt_masks"] - gt_dict["gt_names"] = np.concatenate( - [gt_dict["gt_names"], sampled_gt_names], axis=0 - ) - gt_dict["gt_boxes"] = np.concatenate( - [gt_dict["gt_boxes"], sampled_gt_boxes] - ) - gt_boxes_mask = np.concatenate( - [gt_boxes_mask, sampled_gt_masks], axis=0 - ) - - # if group_ids is not None: - # sampled_group_ids = sampled_dict["group_ids"] - # gt_dict["group_ids"] = np.concatenate( - # [gt_dict["group_ids"], sampled_group_ids]) - - if remove_points_after_sample: - masks = box_np_ops.points_in_rbbox(points, sampled_gt_boxes) - points = points[np.logical_not(masks.any(-1))] - - points = np.concatenate([sampled_points, points], axis=0) - - # group_ids = None - # if "group_ids" in gt_dict: - # group_ids = gt_dict["group_ids"] - - prep.noise_per_object_v3_( - gt_dict["gt_boxes"], - points, - gt_boxes_mask, - rotation_perturb=gt_rotation_noise, - center_noise_std=gt_loc_noise_std, - global_random_rot_range=global_random_rot_range, - group_ids=None, - num_try=100, - ) - - # should remove unrelated objects after noise per object - # for k, v in gt_dict.items(): - # print(k, v.shape) - - _dict_select(gt_dict, gt_boxes_mask) - - gt_classes = np.array( - [class_names.index(n) + 1 for n in gt_dict["gt_names"]], dtype=np.int32 - ) - gt_dict["gt_classes"] = gt_classes - - gt_dict["gt_boxes"], points = prep.random_flip(gt_dict["gt_boxes"], points) - gt_dict["gt_boxes"], points = prep.global_rotation( - gt_dict["gt_boxes"], points, rotation=global_rotation_noise - ) - gt_dict["gt_boxes"], points = prep.global_scaling_v2( - gt_dict["gt_boxes"], points, *global_scaling_noise - ) - prep.global_translate_(gt_dict["gt_boxes"], points, global_translate_noise_std) - - task_masks = [] - flag = 0 - for class_name in task_class_names: - task_masks.append( - [ - np.where(gt_dict["gt_classes"] == class_name.index(i) + 1 + flag) - for i in class_name - ] - ) - flag += len(class_name) - - task_boxes = [] - task_classes = [] - task_names = [] - flag2 = 0 - for idx, mask in enumerate(task_masks): - task_box = [] - task_class = [] - task_name = [] - for m in mask: - task_box.append(gt_dict["gt_boxes"][m]) - task_class.append(gt_dict["gt_classes"][m] - flag2) - task_name.append(gt_dict["gt_names"][m]) - task_boxes.append(np.concatenate(task_box, axis=0)) - task_classes.append(np.concatenate(task_class)) - task_names.append(np.concatenate(task_name)) - flag2 += len(mask) - - for task_box in task_boxes: - # limit rad to [-pi, pi] - task_box[:, -1] = box_np_ops.limit_period( - task_box[:, -1], offset=0.5, period=2 * np.pi - ) - - # print(gt_dict.keys()) - gt_dict["gt_classes"] = task_classes - gt_dict["gt_names"] = task_names - gt_dict["gt_boxes"] = task_boxes - - example = { - "pts_input": points, - "pts_rect": None, - "pts_features": None, - "gt_boxes3d": gt_dict["gt_boxes"], - "rpn_cls_label": [], - "rpn_reg_label": [], - } - - if calib is not None: - example["calib"] = calib - - return example diff --git a/det3d/datasets/waymo/waymo.py b/det3d/datasets/waymo/waymo.py index 8aca521..f659dba 100644 --- a/det3d/datasets/waymo/waymo.py +++ b/det3d/datasets/waymo/waymo.py @@ -3,6 +3,7 @@ import json import random import operator +from numba.cuda.simulator.api import detect import numpy as np from functools import reduce @@ -27,16 +28,21 @@ def __init__( class_names=None, test_mode=False, sample=False, + nsweeps=1, + load_interval=1, **kwargs, ): + self.load_interval = load_interval self.sample = sample + self.nsweeps = nsweeps + print("Using {} sweeps".format(nsweeps)) super(WaymoDataset, self).__init__( root_path, info_path, pipeline, test_mode=test_mode, class_names=class_names ) self._info_path = info_path self._class_names = class_names - self._num_point_features = WaymoDataset.NumPointFeatures + self._num_point_features = WaymoDataset.NumPointFeatures if nsweeps == 1 else WaymoDataset.NumPointFeatures+1 def reset(self): assert False @@ -46,7 +52,9 @@ def load_infos(self, info_path): with open(self._info_path, "rb") as f: _waymo_infos_all = pickle.load(f) - self._waymo_infos = _waymo_infos_all + self._waymo_infos = _waymo_infos_all[::self.load_interval] + + print("Using {} Frames".format(len(self._waymo_infos))) def __len__(self): @@ -63,6 +71,7 @@ def get_sensor_data(self, idx): "type": "lidar", "points": None, "annotations": None, + "nsweeps": self.nsweeps, }, "metadata": { "image_prefix": self._root_path, @@ -72,7 +81,7 @@ def get_sensor_data(self, idx): "calib": None, "cam": {}, "mode": "val" if self.test_mode else "train", - "type": "WaymoDataset" + "type": "WaymoDataset", } data, _ = self.pipeline(res, info) @@ -87,6 +96,7 @@ def evaluation(self, detections, output_dir=None, testset=False): infos = self._waymo_infos infos = reorganize_info(infos) + _create_pd_detection(detections, infos, output_dir) print("use waymo devkit tool for evaluation") diff --git a/det3d/datasets/waymo/waymo_common.py b/det3d/datasets/waymo/waymo_common.py index 320df12..572d80f 100644 --- a/det3d/datasets/waymo/waymo_common.py +++ b/det3d/datasets/waymo/waymo_common.py @@ -18,15 +18,10 @@ except: print("No Tensorflow") +from nuscenes.utils.geometry_utils import transform_matrix +from pyquaternion import Quaternion + -INDEX_LENGTH = 15 -MAX_FRAME = 1000000 -CAT_NAME_MAP = { - 1: 'VEHICLE', - 2: 'PEDESTRIAN', - 3: 'SIGN', - 4: 'CYCLIST', -} CAT_NAME_TO_ID = { 'VEHICLE': 1, 'PEDESTRIAN': 2, @@ -40,19 +35,24 @@ def get_obj(path): obj = pickle.load(f) return obj -def label_to_type(label): - if label <= 1: - return int(label) + 1 - else: - return 4 +# ignore sign class +LABEL_TO_TYPE = {0: 1, 1:2, 2:4} + +import uuid + +class UUIDGeneration(): + def __init__(self): + self.mapping = {} + def get_uuid(self,seed): + if seed not in self.mapping: + self.mapping[seed] = uuid.uuid4().hex + return self.mapping[seed] +uuid_gen = UUIDGeneration() def _create_pd_detection(detections, infos, result_path, tracking=False): """Creates a prediction objects file.""" - assert tracking is False, "Not Supported Yet" - from waymo_open_dataset import dataset_pb2 from waymo_open_dataset import label_pb2 from waymo_open_dataset.protos import metrics_pb2 - from waymo_open_dataset.utils import box_utils objects = metrics_pb2.Objects() @@ -63,11 +63,16 @@ def _create_pd_detection(detections, infos, result_path, tracking=False): box3d = detection["box3d_lidar"].detach().cpu().numpy() scores = detection["scores"].detach().cpu().numpy() labels = detection["label_preds"].detach().cpu().numpy() + + # transform back to Waymo coordinate + # x,y,z,w,l,h,r2 + # x,y,z,l,w,h,r1 + # r2 = -pi/2 - r1 box3d[:, -1] = -box3d[:, -1] - np.pi / 2 + box3d = box3d[:, [0, 1, 2, 4, 3, 5, -1]] - if box3d.shape[1] > 7: - # drop velocity - box3d = box3d[:, [0, 1, 2, 3, 4, 5, -1]] + if tracking: + tracking_ids = detection['tracking_ids'] for i in range(box3d.shape[0]): det = box3d[i] @@ -91,23 +96,28 @@ def _create_pd_detection(detections, infos, result_path, tracking=False): o.object.box.CopyFrom(box) o.score = score # Use correct type. - o.object.type = label_to_type(label) # int(label)+1 + o.object.type = LABEL_TO_TYPE[label] + + if tracking: + o.object.id = uuid_gen.get_uuid(int(tracking_ids[i])) objects.objects.append(o) # Write objects to a file. - f = open(os.path.join(result_path, 'my_preds.bin'), 'wb') + if tracking: + path = os.path.join(result_path, 'tracking_pred.bin') + else: + path = os.path.join(result_path, 'detection_pred.bin') + + print("results saved to {}".format(path)) + f = open(path, 'wb') f.write(objects.SerializeToString()) f.close() - -def _test_create_pd_detection(infos, tracking=False): - """Creates a prediction objects file.""" - assert tracking is False, "Not Supported Yet" - from waymo_open_dataset import dataset_pb2 +def _create_gt_detection(infos, tracking=True): + """Creates a gt prediction object file for local evaluation.""" from waymo_open_dataset import label_pb2 from waymo_open_dataset.protos import metrics_pb2 - from waymo_open_dataset.utils import box_utils objects = metrics_pb2.Objects() @@ -123,11 +133,8 @@ def _test_create_pd_detection(infos, tracking=False): continue names = np.array([TYPE_LIST[ann['label']] for ann in annos]) - - if box3d.shape[1] > 7: - # drop velocity - box3d = box3d[:, [0, 1, 2, 3, 4, 5, -1]] + box3d = box3d[:, [0, 1, 2, 3, 4, 5, -1]] for i in range(box3d.shape[0]): if num_points_in_gt[i] == 0: @@ -157,6 +164,7 @@ def _test_create_pd_detection(infos, tracking=False): # Use correct type. o.object.type = CAT_NAME_TO_ID[label] o.object.num_lidar_points_in_box = num_points_in_gt[i] + o.object.id = annos[i]['name'] objects.objects.append(o) @@ -164,34 +172,139 @@ def _test_create_pd_detection(infos, tracking=False): f = open(os.path.join(args.result_path, 'gt_preds.bin'), 'wb') f.write(objects.SerializeToString()) f.close() - +def veh_pos_to_transform(veh_pos): + "convert vehicle pose to two transformation matrix" + rotation = veh_pos[:3, :3] + tran = veh_pos[:3, 3] + + global_from_car = transform_matrix( + tran, Quaternion(matrix=rotation), inverse=False + ) + + car_from_global = transform_matrix( + tran, Quaternion(matrix=rotation), inverse=True + ) + + return global_from_car, car_from_global def _fill_infos(root_path, frames, split='train', nsweeps=1): # load all train infos infos = [] for frame_name in tqdm(frames): # global id - path = os.path.join(root_path, split, frame_name) + lidar_path = os.path.join(root_path, split, 'lidar', frame_name) + ref_path = os.path.join(root_path, split, 'annos', frame_name) + + ref_obj = get_obj(ref_path) + ref_time = 1e-6 * int(ref_obj['frame_name'].split("_")[-1]) + + ref_pose = np.reshape(ref_obj['veh_to_global'], [4, 4]) + _, ref_from_global = veh_pos_to_transform(ref_pose) info = { - "path": path, + "path": lidar_path, + "anno_path": ref_path, "token": frame_name, + "timestamp": ref_time, + "sweeps": [] } + sequence_id = int(frame_name.split("_")[1]) + frame_id = int(frame_name.split("_")[3][:-4]) # remove .pkl + + prev_id = frame_id + sweeps = [] + while len(sweeps) < nsweeps - 1: + if prev_id <= 0: + if len(sweeps) == 0: + sweep = { + "path": lidar_path, + "token": frame_name, + "transform_matrix": None, + "time_lag": 0 + } + sweeps.append(sweep) + else: + sweeps.append(sweeps[-1]) + else: + prev_id = prev_id - 1 + # global identifier + + curr_name = 'seq_{}_frame_{}.pkl'.format(sequence_id, prev_id) + curr_lidar_path = os.path.join(root_path, split, 'lidar', curr_name) + curr_label_path = os.path.join(root_path, split, 'annos', curr_name) + + curr_obj = get_obj(curr_label_path) + curr_pose = np.reshape(curr_obj['veh_to_global'], [4, 4]) + global_from_car, _ = veh_pos_to_transform(curr_pose) + + tm = reduce( + np.dot, + [ref_from_global, global_from_car], + ) + + curr_time = int(curr_obj['frame_name'].split("_")[-1]) + time_lag = ref_time - 1e-6 * curr_time + + sweep = { + "path": curr_lidar_path, + "transform_matrix": tm, + "time_lag": time_lag, + } + sweeps.append(sweep) + + info["sweeps"] = sweeps + + if split != 'test': + # read boxes + TYPE_LIST = ['UNKNOWN', 'VEHICLE', 'PEDESTRIAN', 'SIGN', 'CYCLIST'] + annos = ref_obj['objects'] + num_points_in_gt = np.array([ann['num_points'] for ann in annos]) + gt_boxes = np.array([ann['box'] for ann in annos]).reshape(-1, 9) + + if len(gt_boxes) != 0: + # transform from Waymo to KITTI coordinate + # Waymo: x, y, z, length, width, height, rotation from positive x axis clockwisely + # KITTI: x, y, z, width, length, height, rotation from negative y axis counterclockwisely + gt_boxes[:, -1] = -np.pi / 2 - gt_boxes[:, -1] + gt_boxes[:, [3, 4]] = gt_boxes[:, [4, 3]] + + gt_names = np.array([TYPE_LIST[ann['label']] for ann in annos]) + mask_not_zero = (num_points_in_gt > 0).reshape(-1) + + # filter boxes without lidar points + info['gt_boxes'] = gt_boxes[mask_not_zero, :].astype(np.float32) + info['gt_names'] = gt_names[mask_not_zero].astype(str) + infos.append(info) return infos +def sort_frame(frames): + indices = [] + + for f in frames: + seq_id = int(f.split("_")[1]) + frame_id= int(f.split("_")[3][:-4]) + + idx = seq_id * 1000 + frame_id + indices.append(idx) + + rank = list(np.argsort(np.array(indices))) + + frames = [frames[r] for r in rank] + return frames + def get_available_frames(root, split): - dir_path = os.path.join(root, split) + dir_path = os.path.join(root, split, 'lidar') available_frames = list(os.listdir(dir_path)) + sorted_frames = sort_frame(available_frames) + print(split, " split ", "exist frame num:", len(available_frames)) - return available_frames + return sorted_frames def create_waymo_infos(root_path, split='train', nsweeps=1): - assert split != 'test', "Not Supported Yet" - frames = get_available_frames(root_path, split) waymo_infos = _fill_infos( @@ -212,6 +325,7 @@ def parse_args(): parser.add_argument("--info_path", type=str) parser.add_argument("--result_path", type=str) parser.add_argument("--gt", action='store_true' ) + parser.add_argument("--tracking", action='store_true') args = parser.parse_args() return args @@ -232,10 +346,10 @@ def reorganize_info(infos): infos = pickle.load(f) if args.gt: - _test_create_pd_detection(infos) - assert 0 + _create_gt_detection(infos, tracking=args.tracking) + exit() infos = reorganize_info(infos) with open(args.path, 'rb') as f: preds = pickle.load(f) - _create_pd_detection(preds, infos, args.result_path) + _create_pd_detection(preds, infos, args.result_path, tracking=args.tracking) diff --git a/det3d/datasets/waymo/waymo_converter.py b/det3d/datasets/waymo/waymo_converter.py index 04d8486..06efaf1 100644 --- a/det3d/datasets/waymo/waymo_converter.py +++ b/det3d/datasets/waymo/waymo_converter.py @@ -1,5 +1,5 @@ """Tool to convert Waymo Open Dataset to pickle files. - Taken from https://github.com/WangYueFt/pillar-od + Adapted from https://github.com/WangYueFt/pillar-od # Copyright (c) Massachusetts Institute of Technology and its affiliates. # Licensed under MIT License """ @@ -8,56 +8,64 @@ from __future__ import division from __future__ import print_function -from absl import app -from absl import flags -import glob +import glob, argparse, tqdm, pickle, os -import time +import waymo_decoder import tensorflow.compat.v2 as tf -import waymo_decoder from waymo_open_dataset import dataset_pb2 -import pickle + from multiprocessing import Pool -import tqdm tf.enable_v2_behavior() -flags.DEFINE_string('input_file_pattern', None, 'Path to read input') -flags.DEFINE_string('output_filebase', None, 'Path to write output') - -FLAGS = flags.FLAGS fnames = None +LIDAR_PATH = None +ANNO_PATH = None def convert(idx): - global fnames - fname = fnames[idx] - dataset = tf.data.TFRecordDataset(fname, compression_type='') - for frame_id, data in enumerate(dataset): - frame = dataset_pb2.Frame() - frame.ParseFromString(bytearray(data.numpy())) - decoded_frame = waymo_decoder.decode_frame(frame) - - with open(FLAGS.output_filebase+'seq_{}_frame_{}.pkl'.format(idx, frame_id), 'wb') as f: - pickle.dump(decoded_frame, f) + global fnames + fname = fnames[idx] + dataset = tf.data.TFRecordDataset(fname, compression_type='') + for frame_id, data in enumerate(dataset): + frame = dataset_pb2.Frame() + frame.ParseFromString(bytearray(data.numpy())) + decoded_frame = waymo_decoder.decode_frame(frame, frame_id) + decoded_annos = waymo_decoder.decode_annos(frame, frame_id) + with open(os.path.join(LIDAR_PATH, 'seq_{}_frame_{}.pkl'.format(idx, frame_id)), 'wb') as f: + pickle.dump(decoded_frame, f) + + with open(os.path.join(ANNO_PATH, 'seq_{}_frame_{}.pkl'.format(idx, frame_id)), 'wb') as f: + pickle.dump(decoded_annos, f) -def main(unused_argv): - gpus = tf.config.experimental.list_physical_devices('GPU') - for gpu in gpus: - tf.config.experimental.set_memory_growth(gpu, True) - assert FLAGS.input_file_pattern - assert FLAGS.output_filebase +def main(args): + global fnames + fnames = list(glob.glob(args.record_path)) - global fnames - fnames = list(glob.glob(FLAGS.input_file_pattern)) + print("Number of files {}".format(len(fnames))) - print("Number of files {}".format(len(fnames))) + with Pool(128) as p: # change according to your cpu + r = list(tqdm.tqdm(p.imap(convert, range(len(fnames))), total=len(fnames))) - with Pool(8) as p: - r = list(tqdm.tqdm(p.imap(convert, range(len(fnames))), total=len(fnames))) +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='Waymo Data Converter') + parser.add_argument('--root_path', type=str, required=True) + parser.add_argument('--record_path', type=str, required=True) + args = parser.parse_args() -if __name__ == '__main__': - app.run(main) + if not os.path.isdir(args.root_path): + os.mkdir(args.root_path) + + LIDAR_PATH = os.path.join(args.root_path, 'lidar') + ANNO_PATH = os.path.join(args.root_path, 'annos') + + if not os.path.isdir(LIDAR_PATH): + os.mkdir(LIDAR_PATH) + + if not os.path.isdir(ANNO_PATH): + os.mkdir(ANNO_PATH) + + main(args) diff --git a/det3d/datasets/waymo/waymo_decoder.py b/det3d/datasets/waymo/waymo_decoder.py index 17098bd..3255546 100644 --- a/det3d/datasets/waymo/waymo_decoder.py +++ b/det3d/datasets/waymo/waymo_decoder.py @@ -19,13 +19,13 @@ from waymo_open_dataset.utils import transform_utils tf.enable_v2_behavior() -def decode_frame(frame): - """Decodes native waymo Frame proto to pickle dict.""" +def decode_frame(frame, frame_id): + """Decodes native waymo Frame proto to tf.Examples.""" lidars = extract_points(frame.lasers, frame.context.laser_calibrations, frame.pose) - objects = extract_objects(frame.laser_labels) + frame_name = '{scene_name}_{location}_{time_of_day}_{timestamp}'.format( scene_name=frame.context.name, location=frame.context.stats.location, @@ -35,13 +35,37 @@ def decode_frame(frame): example_data = { 'scene_name': frame.context.name, 'frame_name': frame_name, + 'frame_id': frame_id, 'lidars': lidars, - 'objects': objects, } return example_data # return encode_tf_example(example_data, FEATURE_SPEC) +def decode_annos(frame, frame_id): + """Decodes some meta data (e.g. calibration matrices, frame matrices).""" + + veh_to_global = np.array(frame.pose.transform) + + ref_pose = np.reshape(np.array(frame.pose.transform), [4, 4]) + global_from_ref_rotation = ref_pose[:3, :3] + objects = extract_objects(frame.laser_labels, global_from_ref_rotation) + + frame_name = '{scene_name}_{location}_{time_of_day}_{timestamp}'.format( + scene_name=frame.context.name, + location=frame.context.stats.location, + time_of_day=frame.context.stats.time_of_day, + timestamp=frame.timestamp_micros) + + annos = { + 'scene_name': frame.context.name, + 'frame_name': frame_name, + 'frame_id': frame_id, + 'veh_to_global': veh_to_global, + 'objects': objects, + } + + return annos def extract_points_from_range_image(laser, calibration, frame_pose): @@ -131,13 +155,13 @@ def extract_points(lasers, laser_calibrations, frame_pose): def global_vel_to_ref(vel, global_from_ref_rotation): # inverse means ref_from_global, rotation_matrix for normalization - # remove z axis velocity - vel = [vel[0], vel[1], 0.0] + vel = [vel[0], vel[1], 0] ref = np.dot(Quaternion(matrix=global_from_ref_rotation).inverse.rotation_matrix, vel) + ref = [ref[0], ref[1], 0.0] return ref -def extract_objects(laser_labels): +def extract_objects(laser_labels, global_from_ref_rotation): """Extract objects.""" objects = [] for object_id, label in enumerate(laser_labels): @@ -160,22 +184,24 @@ def extract_objects(laser_labels): else: combined_difficulty_level = label.detection_difficulty_level + ref_velocity = global_vel_to_ref(speed, global_from_ref_rotation) + objects.append({ 'id': object_id, 'name': label.id, 'label': category_label, 'box': np.array([box.center_x, box.center_y, box.center_z, - box.length, box.width, box.height, box.heading], - dtype=np.float32), + box.length, box.width, box.height, ref_velocity[0], + ref_velocity[1], box.heading], dtype=np.float32), 'num_points': num_lidar_points_in_box, 'detection_difficulty_level': label.detection_difficulty_level, 'combined_difficulty_level': combined_difficulty_level, - 'speed': + 'global_speed': np.array(speed, dtype=np.float32), - 'accel': + 'global_accel': np.array(accel, dtype=np.float32), }) return objects diff --git a/det3d/models/__init__.py b/det3d/models/__init__.py index db9a2e6..d24d502 100644 --- a/det3d/models/__init__.py +++ b/det3d/models/__init__.py @@ -1,4 +1,3 @@ -# from .anchor_heads import * # noqa: F401,F403 import importlib spconv_spec = importlib.util.find_spec("spconv") found = spconv_spec is not None @@ -13,8 +12,7 @@ build_head, build_loss, build_neck, - build_roi_extractor, - build_shared_head, + build_roi_head ) from .detectors import * # noqa: F401,F403 from .necks import * # noqa: F401,F403 @@ -26,26 +24,19 @@ LOSSES, NECKS, READERS, - ROI_EXTRACTORS, - SHARED_HEADS, ) - -# from .roi_extractors import * # noqa: F401,F403 -# from .shared_heads import * # noqa: F401,F403 +from .second_stage import * +from .roi_heads import * __all__ = [ "READERS", "BACKBONES", "NECKS", - "ROI_EXTRACTORS", - "SHARED_HEADS", "HEADS", "LOSSES", "DETECTORS", "build_backbone", "build_neck", - "build_roi_extractor", - "build_shared_head", "build_head", "build_loss", "build_detector", diff --git a/det3d/models/backbones/__init__.py b/det3d/models/backbones/__init__.py index b59b2e9..b50cbd9 100644 --- a/det3d/models/backbones/__init__.py +++ b/det3d/models/backbones/__init__.py @@ -1,8 +1,9 @@ import importlib spconv_spec = importlib.util.find_spec("spconv") found = spconv_spec is not None + if found: - from .scn import SpMiddleFHD + from .scn import SpMiddleResNetFHD else: print("No spconv, sparse convolution disabled!") diff --git a/det3d/models/backbones/resnet.py b/det3d/models/backbones/resnet.py deleted file mode 100644 index a3e62aa..0000000 --- a/det3d/models/backbones/resnet.py +++ /dev/null @@ -1,522 +0,0 @@ -import logging - -import torch.nn as nn -import torch.utils.checkpoint as cp -from det3d.torchie.cnn import constant_init, kaiming_init -from det3d.torchie.trainer import load_checkpoint -from torch.nn.modules.batchnorm import _BatchNorm - -from det3d.ops import DeformConv, ModulatedDeformConv -from ..registry import BACKBONES -from ..utils import build_conv_layer, build_norm_layer - - -class BasicBlock(nn.Module): - expansion = 1 - - def __init__( - self, - inplanes, - planes, - stride=1, - dilation=1, - downsample=None, - style="pytorch", - with_cp=False, - conv_cfg=None, - norm_cfg=dict(type="BN"), - dcn=None, - gcb=None, - gen_attention=None, - ): - super(BasicBlock, self).__init__() - assert dcn is None, "Not implemented yet." - assert gen_attention is None, "Not implemented yet." - assert gcb is None, "Not implemented yet." - - self.norm1_name, norm1 = build_norm_layer(norm_cfg, planes, postfix=1) - self.norm2_name, norm2 = build_norm_layer(norm_cfg, planes, postfix=2) - - self.conv1 = build_conv_layer( - conv_cfg, - inplanes, - planes, - 3, - stride=stride, - padding=dilation, - dilation=dilation, - bias=False, - ) - self.add_module(self.norm1_name, norm1) - self.conv2 = build_conv_layer( - conv_cfg, planes, planes, 3, padding=1, bias=False - ) - self.add_module(self.norm2_name, norm2) - - self.relu = nn.ReLU(inplace=True) - self.downsample = downsample - self.stride = stride - self.dilation = dilation - assert not with_cp - - @property - def norm1(self): - return getattr(self, self.norm1_name) - - @property - def norm2(self): - return getattr(self, self.norm2_name) - - def forward(self, x): - identity = x - - out = self.conv1(x) - out = self.norm1(out) - out = self.relu(out) - - out = self.conv2(out) - out = self.norm2(out) - - if self.downsample is not None: - identity = self.downsample(x) - - out += identity - out = self.relu(out) - - return out - - -class Bottleneck(nn.Module): - expansion = 4 - - def __init__( - self, - inplanes, - planes, - stride=1, - dilation=1, - downsample=None, - style="pytorch", - with_cp=False, - conv_cfg=None, - norm_cfg=dict(type="BN"), - dcn=None, - gcb=None, - gen_attention=None, - ): - """Bottleneck block for ResNet. - If style is "pytorch", the stride-two layer is the 3x3 conv layer, - if it is "caffe", the stride-two layer is the first 1x1 conv layer. - """ - super(Bottleneck, self).__init__() - assert style in ["pytorch", "caffe"] - assert dcn is None or isinstance(dcn, dict) - assert gcb is None or isinstance(gcb, dict) - assert gen_attention is None or isinstance(gen_attention, dict) - - self.inplanes = inplanes - self.planes = planes - self.stride = stride - self.dilation = dilation - self.style = style - self.with_cp = with_cp - self.conv_cfg = conv_cfg - self.norm_cfg = norm_cfg - self.dcn = dcn - self.with_dcn = dcn is not None - self.gcb = gcb - self.with_gcb = gcb is not None - self.gen_attention = gen_attention - self.with_gen_attention = gen_attention is not None - - if self.style == "pytorch": - self.conv1_stride = 1 - self.conv2_stride = stride - else: - self.conv1_stride = stride - self.conv2_stride = 1 - - self.norm1_name, norm1 = build_norm_layer(norm_cfg, planes, postfix=1) - self.norm2_name, norm2 = build_norm_layer(norm_cfg, planes, postfix=2) - self.norm3_name, norm3 = build_norm_layer( - norm_cfg, planes * self.expansion, postfix=3 - ) - - self.conv1 = build_conv_layer( - conv_cfg, - inplanes, - planes, - kernel_size=1, - stride=self.conv1_stride, - bias=False, - ) - self.add_module(self.norm1_name, norm1) - fallback_on_stride = False - self.with_modulated_dcn = False - if self.with_dcn: - fallback_on_stride = dcn.get("fallback_on_stride", False) - self.with_modulated_dcn = dcn.get("modulated", False) - if not self.with_dcn or fallback_on_stride: - self.conv2 = build_conv_layer( - conv_cfg, - planes, - planes, - kernel_size=3, - stride=self.conv2_stride, - padding=dilation, - dilation=dilation, - bias=False, - ) - else: - assert conv_cfg is None, "conv_cfg must be None for DCN" - deformable_groups = dcn.get("deformable_groups", 1) - if not self.with_modulated_dcn: - conv_op = DeformConv - offset_channels = 18 - else: - conv_op = ModulatedDeformConv - offset_channels = 27 - self.conv2_offset = nn.Conv2d( - planes, - deformable_groups * offset_channels, - kernel_size=3, - stride=self.conv2_stride, - padding=dilation, - dilation=dilation, - ) - self.conv2 = conv_op( - planes, - planes, - kernel_size=3, - stride=self.conv2_stride, - padding=dilation, - dilation=dilation, - deformable_groups=deformable_groups, - bias=False, - ) - self.add_module(self.norm2_name, norm2) - self.conv3 = build_conv_layer( - conv_cfg, planes, planes * self.expansion, kernel_size=1, bias=False - ) - self.add_module(self.norm3_name, norm3) - - self.relu = nn.ReLU(inplace=True) - self.downsample = downsample - - @property - def norm1(self): - return getattr(self, self.norm1_name) - - @property - def norm2(self): - return getattr(self, self.norm2_name) - - @property - def norm3(self): - return getattr(self, self.norm3_name) - - def forward(self, x): - def _inner_forward(x): - identity = x - - out = self.conv1(x) - out = self.norm1(out) - out = self.relu(out) - - if not self.with_dcn: - out = self.conv2(out) - elif self.with_modulated_dcn: - offset_mask = self.conv2_offset(out) - offset = offset_mask[:, :18, :, :] - mask = offset_mask[:, -9:, :, :].sigmoid() - out = self.conv2(out, offset, mask) - else: - offset = self.conv2_offset(out) - out = self.conv2(out, offset) - out = self.norm2(out) - out = self.relu(out) - - if self.with_gen_attention: - out = self.gen_attention_block(out) - - out = self.conv3(out) - out = self.norm3(out) - - if self.with_gcb: - out = self.context_block(out) - - if self.downsample is not None: - identity = self.downsample(x) - - out += identity - - return out - - if self.with_cp and x.requires_grad: - out = cp.checkpoint(_inner_forward, x) - else: - out = _inner_forward(x) - - out = self.relu(out) - - return out - - -def make_res_layer( - block, - inplanes, - planes, - blocks, - stride=1, - dilation=1, - style="pytorch", - with_cp=False, - conv_cfg=None, - norm_cfg=dict(type="BN"), - dcn=None, - gcb=None, - gen_attention=None, - gen_attention_blocks=[], -): - downsample = None - if stride != 1 or inplanes != planes * block.expansion: - downsample = nn.Sequential( - build_conv_layer( - conv_cfg, - inplanes, - planes * block.expansion, - kernel_size=1, - stride=stride, - bias=False, - ), - build_norm_layer(norm_cfg, planes * block.expansion)[1], - ) - - layers = [] - layers.append( - block( - inplanes=inplanes, - planes=planes, - stride=stride, - dilation=dilation, - downsample=downsample, - style=style, - with_cp=with_cp, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - dcn=dcn, - gcb=gcb, - gen_attention=gen_attention if (0 in gen_attention_blocks) else None, - ) - ) - inplanes = planes * block.expansion - for i in range(1, blocks): - layers.append( - block( - inplanes=inplanes, - planes=planes, - stride=1, - dilation=dilation, - style=style, - with_cp=with_cp, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - dcn=dcn, - gcb=gcb, - gen_attention=gen_attention if (i in gen_attention_blocks) else None, - ) - ) - - return nn.Sequential(*layers) - - -@BACKBONES.register_module -class ResNet(nn.Module): - """ResNet backbone. - Args: - depth (int): Depth of resnet, from {18, 34, 50, 101, 152}. - num_stages (int): Resnet stages, normally 4. - strides (Sequence[int]): Strides of the first block of each stage. - dilations (Sequence[int]): Dilation of each stage. - out_indices (Sequence[int]): Output from which stages. - style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two - layer is the 3x3 conv layer, otherwise the stride-two layer is - the first 1x1 conv layer. - frozen_stages (int): Stages to be frozen (stop grad and set eval mode). - -1 means not freezing any parameters. - norm_cfg (dict): dictionary to construct and config norm layer. - norm_eval (bool): Whether to set norm layers to eval mode, namely, - freeze running stats (mean and var). Note: Effect on Batch Norm - and its variants only. - with_cp (bool): Use checkpoint or not. Using checkpoint will save some - memory while slowing down the training speed. - zero_init_residual (bool): whether to use zero init for last norm layer - in resblocks to let them behave as identity. - """ - - arch_settings = { - 18: (BasicBlock, (2, 2, 2, 2)), - 34: (BasicBlock, (3, 4, 6, 3)), - 50: (Bottleneck, (3, 4, 6, 3)), - 101: (Bottleneck, (3, 4, 23, 3)), - 152: (Bottleneck, (3, 8, 36, 3)), - } - - def __init__( - self, - depth, - num_stages=4, - strides=(1, 2, 2, 2), - dilations=(1, 1, 1, 1), - out_indices=(0, 1, 2, 3), - style="pytorch", - frozen_stages=-1, - conv_cfg=None, - norm_cfg=dict(type="BN", requires_grad=True), - norm_eval=True, - dcn=None, - stage_with_dcn=(False, False, False, False), - gcb=None, - stage_with_gcb=(False, False, False, False), - gen_attention=None, - stage_with_gen_attention=((), (), (), ()), - with_cp=False, - zero_init_residual=True, - ): - super(ResNet, self).__init__() - if depth not in self.arch_settings: - raise KeyError("invalid depth {} for resnet".format(depth)) - self.depth = depth - self.num_stages = num_stages - assert num_stages >= 1 and num_stages <= 4 - self.strides = strides - self.dilations = dilations - assert len(strides) == len(dilations) == num_stages - self.out_indices = out_indices - assert max(out_indices) < num_stages - self.style = style - self.frozen_stages = frozen_stages - self.conv_cfg = conv_cfg - self.norm_cfg = norm_cfg - self.with_cp = with_cp - self.norm_eval = norm_eval - self.dcn = dcn - self.stage_with_dcn = stage_with_dcn - if dcn is not None: - assert len(stage_with_dcn) == num_stages - self.gen_attention = gen_attention - self.gcb = gcb - self.stage_with_gcb = stage_with_gcb - if gcb is not None: - assert len(stage_with_gcb) == num_stages - self.zero_init_residual = zero_init_residual - self.block, stage_blocks = self.arch_settings[depth] - self.stage_blocks = stage_blocks[:num_stages] - self.inplanes = 64 - - self._make_stem_layer() - - self.res_layers = [] - for i, num_blocks in enumerate(self.stage_blocks): - stride = strides[i] - dilation = dilations[i] - dcn = self.dcn if self.stage_with_dcn[i] else None - gcb = self.gcb if self.stage_with_gcb[i] else None - planes = 64 * 2 ** i - res_layer = make_res_layer( - self.block, - self.inplanes, - planes, - num_blocks, - stride=stride, - dilation=dilation, - style=self.style, - with_cp=with_cp, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - dcn=dcn, - gcb=gcb, - gen_attention=gen_attention, - gen_attention_blocks=stage_with_gen_attention[i], - ) - self.inplanes = planes * self.block.expansion - layer_name = "layer{}".format(i + 1) - self.add_module(layer_name, res_layer) - self.res_layers.append(layer_name) - - self._freeze_stages() - - self.feat_dim = self.block.expansion * 64 * 2 ** (len(self.stage_blocks) - 1) - - @property - def norm1(self): - return getattr(self, self.norm1_name) - - def _make_stem_layer(self): - self.conv1 = build_conv_layer( - self.conv_cfg, 3, 64, kernel_size=7, stride=2, padding=3, bias=False - ) - self.norm1_name, norm1 = build_norm_layer(self.norm_cfg, 64, postfix=1) - self.add_module(self.norm1_name, norm1) - self.relu = nn.ReLU(inplace=True) - self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) - - def _freeze_stages(self): - if self.frozen_stages >= 0: - self.norm1.eval() - for m in [self.conv1, self.norm1]: - for param in m.parameters(): - param.requires_grad = False - - for i in range(1, self.frozen_stages + 1): - m = getattr(self, "layer{}".format(i)) - m.eval() - for param in m.parameters(): - param.requires_grad = False - - def init_weights(self, pretrained=None): - if isinstance(pretrained, str): - logger = logging.getLogger() - load_checkpoint(self, pretrained, strict=False, logger=logger) - elif pretrained is None: - for m in self.modules(): - if isinstance(m, nn.Conv2d): - kaiming_init(m) - elif isinstance(m, (_BatchNorm, nn.GroupNorm)): - constant_init(m, 1) - - if self.dcn is not None: - for m in self.modules(): - if isinstance(m, Bottleneck) and hasattr(m, "conv2_offset"): - constant_init(m.conv2_offset, 0) - - if self.zero_init_residual: - for m in self.modules(): - if isinstance(m, Bottleneck): - constant_init(m.norm3, 0) - elif isinstance(m, BasicBlock): - constant_init(m.norm2, 0) - else: - raise TypeError("pretrained must be a str or None") - - def forward(self, x): - x = self.conv1(x) - x = self.norm1(x) - x = self.relu(x) - x = self.maxpool(x) - outs = [] - for i, layer_name in enumerate(self.res_layers): - res_layer = getattr(self, layer_name) - x = res_layer(x) - if i in self.out_indices: - outs.append(x) - return tuple(outs) - - def train(self, mode=True): - super(ResNet, self).train(mode) - self._freeze_stages() - if mode and self.norm_eval: - for m in self.modules(): - # trick: eval have effect on BatchNorm only - if isinstance(m, _BatchNorm): - m.eval() diff --git a/det3d/models/backbones/scn.py b/det3d/models/backbones/scn.py index d20ed0b..017d9e7 100644 --- a/det3d/models/backbones/scn.py +++ b/det3d/models/backbones/scn.py @@ -1,20 +1,11 @@ -import time - import numpy as np import spconv -import torch -from det3d.models.utils import Empty, change_default_args -from det3d.torchie.cnn import constant_init, kaiming_init -from det3d.torchie.trainer import load_checkpoint from spconv import SparseConv3d, SubMConv3d from torch import nn -from torch.nn import BatchNorm1d from torch.nn import functional as F -from torch.nn.modules.batchnorm import _BatchNorm -from .. import builder from ..registry import BACKBONES -from ..utils import build_conv_layer, build_norm_layer +from ..utils import build_norm_layer def conv3x3(in_planes, out_planes, stride=1, indice_key=None, bias=True): @@ -89,222 +80,6 @@ def forward(self, x): return out -@BACKBONES.register_module -class SpMiddleFHD(nn.Module): - def __init__( - self, num_input_features=128, norm_cfg=None, name="SpMiddleFHD", **kwargs - ): - super(SpMiddleFHD, self).__init__() - self.name = name - - self.dcn = None - self.zero_init_residual = False - - if norm_cfg is None: - norm_cfg = dict(type="BN1d", eps=1e-3, momentum=0.01) - - self.middle_conv = spconv.SparseSequential( - SubMConv3d(num_input_features, 16, 3, bias=False, indice_key="subm0"), - build_norm_layer(norm_cfg, 16)[1], - nn.ReLU(), - SubMConv3d(16, 16, 3, bias=False, indice_key="subm0"), - build_norm_layer(norm_cfg, 16)[1], - nn.ReLU(), - SparseConv3d( - 16, 32, 3, 2, padding=1, bias=False - ), # [1600, 1200, 41] -> [800, 600, 21] - build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - SubMConv3d(32, 32, 3, indice_key="subm1", bias=False), - build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - SubMConv3d(32, 32, 3, indice_key="subm1", bias=False), - build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - SparseConv3d( - 32, 64, 3, 2, padding=1, bias=False - ), # [800, 600, 21] -> [400, 300, 11] - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm2", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm2", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm2", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SparseConv3d( - 64, 64, 3, 2, padding=[0, 1, 1], bias=False - ), # [400, 300, 11] -> [200, 150, 5] - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm3", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm3", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm3", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SparseConv3d( - 64, 64, (3, 1, 1), (2, 1, 1), bias=False - ), # [200, 150, 5] -> [200, 150, 2] - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - ) - - def init_weights(self, pretrained=None): - if isinstance(pretrained, str): - logger = logging.getLogger() - load_checkpoint(self, pretrained, strict=False, logger=logger) - elif pretrained is None: - for m in self.modules(): - if isinstance(m, nn.Conv2d): - kaiming_init(m) - elif isinstance(m, (_BatchNorm, nn.GroupNorm)): - constant_init(m, 1) - - if self.dcn is not None: - for m in self.modules(): - if isinstance(m, Bottleneck) and hasattr(m, "conv2_offset"): - constant_init(m.conv2_offset, 0) - - if self.zero_init_residual: - for m in self.modules(): - if isinstance(m, Bottleneck): - constant_init(m.norm3, 0) - elif isinstance(m, BasicBlock): - constant_init(m.norm2, 0) - else: - raise TypeError("pretrained must be a str or None") - - def forward(self, voxel_features, coors, batch_size, input_shape): - - # input: # [41, 1600, 1408] - sparse_shape = np.array(input_shape[::-1]) + [1, 0, 0] - coors = coors.int() - - ret = spconv.SparseConvTensor(voxel_features, coors, sparse_shape, batch_size) - ret = self.middle_conv(ret) - ret = ret.dense() - - N, C, D, H, W = ret.shape - ret = ret.view(N, C * D, H, W) - - return ret - - -@BACKBONES.register_module -class SpMiddleFHDNobn(nn.Module): - def __init__( - self, num_input_features=128, norm_cfg=None, name="SpMiddleFHD", **kwargs - ): - super(SpMiddleFHDNobn, self).__init__() - self.name = name - - self.dcn = None - self.zero_init_residual = False - - if norm_cfg is None: - norm_cfg = dict(type="BN1d", eps=1e-3, momentum=0.01) - - self.middle_conv = spconv.SparseSequential( - SubMConv3d(num_input_features, 16, 3, bias=True, indice_key="subm0"), - # build_norm_layer(norm_cfg, 16)[1], - nn.ReLU(), - SubMConv3d(16, 16, 3, bias=True, indice_key="subm0"), - # build_norm_layer(norm_cfg, 16)[1], - nn.ReLU(), - SparseConv3d( - 16, 32, 3, 2, padding=1, bias=True - ), # [1600, 1200, 41] -> [800, 600, 21] - # build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - SubMConv3d(32, 32, 3, indice_key="subm1", bias=True), - # build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - SubMConv3d(32, 32, 3, indice_key="subm1", bias=True), - # build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - SparseConv3d( - 32, 64, 3, 2, padding=1, bias=True - ), # [800, 600, 21] -> [400, 300, 11] - # build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm2", bias=True), - # build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm2", bias=True), - # build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm2", bias=True), - # build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SparseConv3d( - 64, 64, 3, 2, padding=[0, 1, 1], bias=True - ), # [400, 300, 11] -> [200, 150, 5] - # build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm3", bias=True), - # build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm3", bias=True), - # build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm3", bias=True), - # build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SparseConv3d( - 64, 64, (3, 1, 1), (2, 1, 1), bias=True - ), # [200, 150, 5] -> [200, 150, 2] - # build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - ) - - def init_weights(self, pretrained=None): - if isinstance(pretrained, str): - logger = logging.getLogger() - load_checkpoint(self, pretrained, strict=False, logger=logger) - elif pretrained is None: - for m in self.modules(): - if isinstance(m, nn.Conv2d): - kaiming_init(m) - elif isinstance(m, (_BatchNorm, nn.GroupNorm)): - constant_init(m, 1) - - if self.dcn is not None: - for m in self.modules(): - if isinstance(m, Bottleneck) and hasattr(m, "conv2_offset"): - constant_init(m.conv2_offset, 0) - - if self.zero_init_residual: - for m in self.modules(): - if isinstance(m, Bottleneck): - constant_init(m.norm3, 0) - elif isinstance(m, BasicBlock): - constant_init(m.norm2, 0) - else: - raise TypeError("pretrained must be a str or None") - - def forward(self, voxel_features, coors, batch_size, input_shape): - - # input: # [41, 1600, 1408] - sparse_shape = np.array(input_shape[::-1]) + [1, 0, 0] - coors = coors.int() - - ret = spconv.SparseConvTensor(voxel_features, coors, sparse_shape, batch_size) - ret = self.middle_conv(ret) - ret = ret.dense() - - N, C, D, H, W = ret.shape - ret = ret.view(N, C * D, H, W) - - return ret - - @BACKBONES.register_module class SpMiddleResNetFHD(nn.Module): def __init__( @@ -320,337 +95,83 @@ def __init__( norm_cfg = dict(type="BN1d", eps=1e-3, momentum=0.01) # input: # [1600, 1200, 41] - self.middle_conv = spconv.SparseSequential( + self.conv_input = spconv.SparseSequential( SubMConv3d(num_input_features, 16, 3, bias=False, indice_key="res0"), build_norm_layer(norm_cfg, 16)[1], - nn.ReLU(), + nn.ReLU(inplace=True) + ) + + self.conv1 = spconv.SparseSequential( SparseBasicBlock(16, 16, norm_cfg=norm_cfg, indice_key="res0"), SparseBasicBlock(16, 16, norm_cfg=norm_cfg, indice_key="res0"), + ) + + self.conv2 = spconv.SparseSequential( SparseConv3d( 16, 32, 3, 2, padding=1, bias=False ), # [1600, 1200, 41] -> [800, 600, 21] build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), + nn.ReLU(inplace=True), SparseBasicBlock(32, 32, norm_cfg=norm_cfg, indice_key="res1"), SparseBasicBlock(32, 32, norm_cfg=norm_cfg, indice_key="res1"), + ) + + self.conv3 = spconv.SparseSequential( SparseConv3d( 32, 64, 3, 2, padding=1, bias=False ), # [800, 600, 21] -> [400, 300, 11] build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), + nn.ReLU(inplace=True), SparseBasicBlock(64, 64, norm_cfg=norm_cfg, indice_key="res2"), SparseBasicBlock(64, 64, norm_cfg=norm_cfg, indice_key="res2"), + ) + + self.conv4 = spconv.SparseSequential( SparseConv3d( 64, 128, 3, 2, padding=[0, 1, 1], bias=False ), # [400, 300, 11] -> [200, 150, 5] build_norm_layer(norm_cfg, 128)[1], - nn.ReLU(), + nn.ReLU(inplace=True), SparseBasicBlock(128, 128, norm_cfg=norm_cfg, indice_key="res3"), SparseBasicBlock(128, 128, norm_cfg=norm_cfg, indice_key="res3"), - SparseConv3d( - 128, 128, (3, 1, 1), (2, 1, 1), bias=False - ), # [200, 150, 5] -> [200, 150, 2] - build_norm_layer(norm_cfg, 128)[1], - nn.ReLU(), ) - def forward(self, voxel_features, coors, batch_size, input_shape): - - # input: # [41, 1600, 1408] - sparse_shape = np.array(input_shape[::-1]) + [1, 0, 0] - - coors = coors.int() - ret = spconv.SparseConvTensor(voxel_features, coors, sparse_shape, batch_size) - ret = self.middle_conv(ret) - ret = ret.dense() - - N, C, D, H, W = ret.shape - ret = ret.view(N, C * D, H, W) - - return ret - - -@BACKBONES.register_module -class SASSDFHD(nn.Module): - def __init__( - self, num_input_features=128, norm_cfg=None, name="SASSDFHD", **kwargs - ): - super(SASSDFHD, self).__init__() - self.name = name - - self.dcn = None - self.zero_init_residual = False - - if norm_cfg is None: - norm_cfg = dict(type="BN1d", eps=1e-3, momentum=0.01) - self.middle_conv = spconv.SparseSequential( - SubMConv3d(num_input_features, 16, 3, bias=False, indice_key="subm0"), - build_norm_layer(norm_cfg, 16)[1], - nn.ReLU(), - SubMConv3d(16, 16, 3, bias=False, indice_key="subm0"), - build_norm_layer(norm_cfg, 16)[1], - nn.ReLU(), - SparseConv3d( - 16, 32, 3, 2, padding=1, bias=False - ), # [1600, 1200, 41] -> [800, 600, 21] - build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - SubMConv3d(32, 32, 3, indice_key="subm1", bias=False), - build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - SubMConv3d(32, 32, 3, indice_key="subm1", bias=False), - build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), + self.extra_conv = spconv.SparseSequential( SparseConv3d( - 32, 64, 3, 2, padding=1, bias=False - ), # [800, 600, 21] -> [400, 300, 11] - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm2", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm2", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm2", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SparseConv3d( - 64, 64, 3, 2, padding=[0, 1, 1], bias=False - ), # [400, 300, 11] -> [200, 150, 5] - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm3", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm3", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, indice_key="subm3", bias=False), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SparseConv3d( - 64, 64, (1, 1, 1), (1, 1, 1), bias=False + 128, 128, (3, 1, 1), (2, 1, 1), bias=False ), # [200, 150, 5] -> [200, 150, 2] - build_norm_layer(norm_cfg, 64)[1], + build_norm_layer(norm_cfg, 128)[1], nn.ReLU(), ) - def init_weights(self, pretrained=None): - if isinstance(pretrained, str): - logger = logging.getLogger() - load_checkpoint(self, pretrained, strict=False, logger=logger) - elif pretrained is None: - for m in self.modules(): - if isinstance(m, nn.Conv2d): - kaiming_init(m) - elif isinstance(m, (_BatchNorm, nn.GroupNorm)): - constant_init(m, 1) - - if self.dcn is not None: - for m in self.modules(): - if isinstance(m, Bottleneck) and hasattr(m, "conv2_offset"): - constant_init(m.conv2_offset, 0) - - if self.zero_init_residual: - for m in self.modules(): - if isinstance(m, Bottleneck): - constant_init(m.norm3, 0) - elif isinstance(m, BasicBlock): - constant_init(m.norm2, 0) - else: - raise TypeError("pretrained must be a str or None") - def forward(self, voxel_features, coors, batch_size, input_shape): # input: # [41, 1600, 1408] sparse_shape = np.array(input_shape[::-1]) + [1, 0, 0] - coors = coors.int() + coors = coors.int() ret = spconv.SparseConvTensor(voxel_features, coors, sparse_shape, batch_size) - ret = self.middle_conv(ret) - ret = ret.dense() - - N, C, D, H, W = ret.shape - ret = ret.view(N, C * D, H, W) - return ret + x = self.conv_input(ret) + x_conv1 = self.conv1(x) + x_conv2 = self.conv2(x_conv1) + x_conv3 = self.conv3(x_conv2) + x_conv4 = self.conv4(x_conv3) -@BACKBONES.register_module -class SASSDResNetFHD(nn.Module): - def __init__( - self, num_input_features=128, norm_cfg=None, name="SASSDResNetFHD", **kwargs - ): - super(SASSDResNetFHD, self).__init__() - self.name = name + ret = self.extra_conv(x_conv4) - self.dcn = None - self.zero_init_residual = False - - if norm_cfg is None: - norm_cfg = dict(type="BN1d", eps=1e-3, momentum=0.01) - - self.middle_conv = spconv.SparseSequential( - SubMConv3d(num_input_features, 16, 3, bias=False, indice_key="subm0"), - build_norm_layer(norm_cfg, 16)[1], - nn.ReLU(), - SparseBasicBlock(16, 16, norm_cfg=norm_cfg, indice_key="res0"), - SparseBasicBlock(16, 16, norm_cfg=norm_cfg, indice_key="res0"), - SparseConv3d( - 16, 32, 3, 2, padding=1, bias=False - ), # [1600, 1200, 41] -> [800, 600, 21] - build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - SparseBasicBlock(32, 32, norm_cfg=norm_cfg, indice_key="res1"), - SparseBasicBlock(32, 32, norm_cfg=norm_cfg, indice_key="res1"), - SparseConv3d( - 32, 64, 3, 2, padding=1, bias=False - ), # [800, 600, 21] -> [400, 300, 11] - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SparseBasicBlock(64, 64, norm_cfg=norm_cfg, indice_key="res2"), - SparseBasicBlock(64, 64, norm_cfg=norm_cfg, indice_key="res2"), - SparseConv3d( - 64, 64, 3, 2, padding=[0, 1, 1], bias=False - ), # [400, 300, 11] -> [200, 150, 5] - build_norm_layer(norm_cfg, 128)[1], - nn.ReLU(), - SparseBasicBlock(64, 64, norm_cfg=norm_cfg, indice_key="res3"), - SparseBasicBlock(64, 64, norm_cfg=norm_cfg, indice_key="res3"), - SparseConv3d( - 64, 64, (1, 1, 1), (1, 1, 1), bias=False - ), # [200, 150, 5] -> [200, 150, 2] - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - ) - - def init_weights(self, pretrained=None): - if isinstance(pretrained, str): - logger = logging.getLogger() - load_checkpoint(self, pretrained, strict=False, logger=logger) - elif pretrained is None: - for m in self.modules(): - if isinstance(m, nn.Conv2d): - kaiming_init(m) - elif isinstance(m, (_BatchNorm, nn.GroupNorm)): - constant_init(m, 1) - - if self.dcn is not None: - for m in self.modules(): - if isinstance(m, Bottleneck) and hasattr(m, "conv2_offset"): - constant_init(m.conv2_offset, 0) - - if self.zero_init_residual: - for m in self.modules(): - if isinstance(m, Bottleneck): - constant_init(m.norm3, 0) - elif isinstance(m, BasicBlock): - constant_init(m.norm2, 0) - else: - raise TypeError("pretrained must be a str or None") - - def forward(self, voxel_features, coors, batch_size, input_shape): - - # input: # [41, 1600, 1408] - sparse_shape = np.array(input_shape[::-1]) + [1, 0, 0] - coors = coors.int() - - ret = spconv.SparseConvTensor(voxel_features, coors, sparse_shape, batch_size) - ret = self.middle_conv(ret) ret = ret.dense() N, C, D, H, W = ret.shape ret = ret.view(N, C * D, H, W) - return ret - - - - -@BACKBONES.register_module -class RCNNSpMiddleFHD(nn.Module): - def __init__( - self, num_input_features=128, norm_cfg=None, name="RCNNSpMiddleFHD", **kwargs - ): - super(RCNNSpMiddleFHD, self).__init__() - self.name = name - - self.dcn = None - self.zero_init_residual = False - - if norm_cfg is None: - norm_cfg = dict(type="BN1d", eps=1e-3, momentum=0.01) - - self.middle_conv = spconv.SparseSequential( - SubMConv3d(num_input_features, 16, 3, bias=False, indice_key="subm0"), - build_norm_layer(norm_cfg, 16)[1], - nn.ReLU(), - SubMConv3d(16, 16, 3, bias=False, indice_key="subm0"), - build_norm_layer(norm_cfg, 16)[1], - nn.ReLU(), - SparseConv3d( - 16, 32, 3, 2, padding=1, bias=False - ), # [32, 80, 41] -> [16, 40, 21] - build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - SubMConv3d(32, 32, 3, bias=False, indice_key="subm1"), - build_norm_layer(norm_cfg, 32)[1], - nn.ReLU(), - # SubMConv3d(32, 32, 3, bias=False, indice_key="subm1"), - # build_norm_layer(norm_cfg, 32)[1], - # nn.ReLU(), - SparseConv3d( - 32, 64, 3, 2, bias=False, padding=1 - ), # [16, 40, 21] -> [8, 20, 11] - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, bias=False, indice_key="subm2"), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - # SubMConv3d(64, 64, 3, bias=False, indice_key="subm2"), - # build_norm_layer(norm_cfg, 64)[1], - # nn.ReLU(), - # SubMConv3d(64, 64, 3, bias=False, indice_key="subm2"), - # build_norm_layer(norm_cfg, 64)[1], - # nn.ReLU(), - SparseConv3d( - 64, 64, 3, 2, bias=False, padding=[1, 1, 0] - ), # [8, 20, 11] -> [4, 10, 5] - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - SubMConv3d(64, 64, 3, bias=False, indice_key="subm3"), - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - # SubMConv3d(64, 64, 3, bias=False, indice_key="subm3"), - # build_norm_layer(norm_cfg, 64)[1], - # nn.ReLU(), - # SubMConv3d(64, 64, 3, bias=False, indice_key="subm3"), - # build_norm_layer(norm_cfg, 64)[1], - # nn.ReLU(), - SparseConv3d( - 64, 64, (1, 1, 3), (1, 1, 2), bias=False - ), # [4, 10, 5] -> [4, 10, 2] - build_norm_layer(norm_cfg, 64)[1], - nn.ReLU(), - ) - - def forward(self, voxel_features, coors, batch_size, input_shape): - - # input: # [41, 1600, 1408] - sparse_shape = np.array(input_shape[::-1]) + [0, 0, 1] - - # coors[:, 1] += 1 - coors = coors.int() - ret = spconv.SparseConvTensor(voxel_features, coors, sparse_shape, batch_size) - - ret = self.middle_conv(ret) - - ret = ret.dense() - - ret = ret.permute(0, 1, 4, 2, 3).contiguous() - N, C, W, D, H = ret.shape - ret = ret.view(N, C * W, D, H) + multi_scale_voxel_features = { + 'conv1': x_conv1, + 'conv2': x_conv2, + 'conv3': x_conv3, + 'conv4': x_conv4, + } - return ret + return ret, multi_scale_voxel_features \ No newline at end of file diff --git a/det3d/models/backbones/senet.py b/det3d/models/backbones/senet.py deleted file mode 100644 index 678147e..0000000 --- a/det3d/models/backbones/senet.py +++ /dev/null @@ -1,542 +0,0 @@ -""" -ResNet code gently borrowed from -https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py -""" -from __future__ import absolute_import, division, print_function - -import math -from collections import OrderedDict - -import torch.nn as nn -from torch.utils import model_zoo - -__all__ = [ - "SENet", - "senet154", - "se_resnet50", - "se_resnet101", - "se_resnet152", - "se_resnext50_32x4d", - "se_resnext101_32x4d", -] - -pretrained_settings = { - "senet154": { - "imagenet": { - "url": "http://data.lip6.fr/cadene/pretrainedmodels/senet154-c7b49a05.pth", - "input_space": "RGB", - "input_size": [3, 224, 224], - "input_range": [0, 1], - "mean": [0.485, 0.456, 0.406], - "std": [0.229, 0.224, 0.225], - "num_classes": 1000, - } - }, - "se_resnet50": { - "imagenet": { - "url": "http://data.lip6.fr/cadene/pretrainedmodels/se_resnet50-ce0d4300.pth", - "input_space": "RGB", - "input_size": [3, 224, 224], - "input_range": [0, 1], - "mean": [0.485, 0.456, 0.406], - "std": [0.229, 0.224, 0.225], - "num_classes": 1000, - } - }, - "se_resnet101": { - "imagenet": { - "url": "http://data.lip6.fr/cadene/pretrainedmodels/se_resnet101-7e38fcc6.pth", - "input_space": "RGB", - "input_size": [3, 224, 224], - "input_range": [0, 1], - "mean": [0.485, 0.456, 0.406], - "std": [0.229, 0.224, 0.225], - "num_classes": 1000, - } - }, - "se_resnet152": { - "imagenet": { - "url": "http://data.lip6.fr/cadene/pretrainedmodels/se_resnet152-d17c99b7.pth", - "input_space": "RGB", - "input_size": [3, 224, 224], - "input_range": [0, 1], - "mean": [0.485, 0.456, 0.406], - "std": [0.229, 0.224, 0.225], - "num_classes": 1000, - } - }, - "se_resnext50_32x4d": { - "imagenet": { - "url": "http://data.lip6.fr/cadene/pretrainedmodels/se_resnext50_32x4d-a260b3a4.pth", - "input_space": "RGB", - "input_size": [3, 224, 224], - "input_range": [0, 1], - "mean": [0.485, 0.456, 0.406], - "std": [0.229, 0.224, 0.225], - "num_classes": 1000, - } - }, - "se_resnext101_32x4d": { - "imagenet": { - "url": "http://data.lip6.fr/cadene/pretrainedmodels/se_resnext101_32x4d-3b2fe3d8.pth", - "input_space": "RGB", - "input_size": [3, 224, 224], - "input_range": [0, 1], - "mean": [0.485, 0.456, 0.406], - "std": [0.229, 0.224, 0.225], - "num_classes": 1000, - } - }, -} - - -class SEModule(nn.Module): - def __init__(self, channels, reduction): - super(SEModule, self).__init__() - self.avg_pool = nn.AdaptiveAvgPool2d(1) - self.fc1 = nn.Conv2d(channels, channels // reduction, kernel_size=1, padding=0) - self.relu = nn.ReLU(inplace=False) - self.fc2 = nn.Conv2d(channels // reduction, channels, kernel_size=1, padding=0) - self.sigmoid = nn.Sigmoid() - - def forward(self, x): - module_input = x - x = self.avg_pool(x) - x = self.fc1(x) - x = self.relu(x) - x = self.fc2(x) - x = self.sigmoid(x) - return module_input * x - - -class Bottleneck(nn.Module): - """ - Base class for bottlenecks that implements `forward()` method. - """ - - def forward(self, x): - residual = x - - out = self.conv1(x) - out = self.bn1(out) - out = self.relu(out) - - out = self.conv2(out) - out = self.bn2(out) - out = self.relu(out) - - out = self.conv3(out) - out = self.bn3(out) - - if self.downsample is not None: - residual = self.downsample(x) - - out = self.se_module(out) + residual - out = self.relu(out) - - return out - - -class SEBottleneck(Bottleneck): - """ - Bottleneck for SENet154. - """ - - expansion = 4 - - def __init__(self, inplanes, planes, groups, reduction, stride=1, downsample=None): - super(SEBottleneck, self).__init__() - self.conv1 = nn.Conv2d(inplanes, planes * 2, kernel_size=1, bias=False) - self.bn1 = nn.BatchNorm2d(planes * 2) - self.conv2 = nn.Conv2d( - planes * 2, - planes * 4, - kernel_size=3, - stride=stride, - padding=1, - groups=groups, - bias=False, - ) - self.bn2 = nn.BatchNorm2d(planes * 4) - self.conv3 = nn.Conv2d(planes * 4, planes * 4, kernel_size=1, bias=False) - self.bn3 = nn.BatchNorm2d(planes * 4) - self.relu = nn.ReLU(inplace=False) - self.se_module = SEModule(planes * 4, reduction=reduction) - self.downsample = downsample - self.stride = stride - - -class SEResNetBottleneck(Bottleneck): - """ - ResNet bottleneck with a Squeeze-and-Excitation module. It follows Caffe - implementation and uses `stride=stride` in `conv1` and not in `conv2` - (the latter is used in the torchvision implementation of ResNet). - """ - - expansion = 4 - - def __init__(self, inplanes, planes, groups, reduction, stride=1, downsample=None): - super(SEResNetBottleneck, self).__init__() - self.conv1 = nn.Conv2d( - inplanes, planes, kernel_size=1, bias=False, stride=stride - ) - self.bn1 = nn.BatchNorm2d(planes) - self.conv2 = nn.Conv2d( - planes, planes, kernel_size=3, padding=1, groups=groups, bias=False - ) - self.bn2 = nn.BatchNorm2d(planes) - self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False) - self.bn3 = nn.BatchNorm2d(planes * 4) - self.relu = nn.ReLU(inplace=True) - self.se_module = SEModule(planes * 4, reduction=reduction) - self.downsample = downsample - self.stride = stride - - -class SEResNeXtBottleneck(Bottleneck): - """ - ResNeXt bottleneck type C with a Squeeze-and-Excitation module. - """ - - expansion = 4 - - def __init__( - self, - inplanes, - planes, - groups, - reduction, - stride=1, - downsample=None, - base_width=4, - ): - super(SEResNeXtBottleneck, self).__init__() - width = math.floor(planes * (base_width / 64)) * groups - self.conv1 = nn.Conv2d(inplanes, width, kernel_size=1, bias=False, stride=1) - self.bn1 = nn.BatchNorm2d(width) - self.conv2 = nn.Conv2d( - width, - width, - kernel_size=3, - stride=stride, - padding=1, - groups=groups, - bias=False, - ) - self.bn2 = nn.BatchNorm2d(width) - self.conv3 = nn.Conv2d(width, planes * 4, kernel_size=1, bias=False) - self.bn3 = nn.BatchNorm2d(planes * 4) - self.relu = nn.ReLU(inplace=True) - self.se_module = SEModule(planes * 4, reduction=reduction) - self.downsample = downsample - self.stride = stride - - -class SENet(nn.Module): - def __init__( - self, - block, - layers, - groups, - reduction, - dropout_p=0.2, - inplanes=128, - input_3x3=True, - downsample_kernel_size=3, - downsample_padding=1, - num_classes=1000, - ): - """ - Parameters - ---------- - block (nn.Module): Bottleneck class. - - For SENet154: SEBottleneck - - For SE-ResNet models: SEResNetBottleneck - - For SE-ResNeXt models: SEResNeXtBottleneck - layers (list of ints): Number of residual blocks for 4 layers of the - network (layer1...layer4). - groups (int): Number of groups for the 3x3 convolution in each - bottleneck block. - - For SENet154: 64 - - For SE-ResNet models: 1 - - For SE-ResNeXt models: 32 - reduction (int): Reduction ratio for Squeeze-and-Excitation modules. - - For all models: 16 - dropout_p (float or None): Drop probability for the Dropout layer. - If `None` the Dropout layer is not used. - - For SENet154: 0.2 - - For SE-ResNet models: None - - For SE-ResNeXt models: None - inplanes (int): Number of input channels for layer1. - - For SENet154: 128 - - For SE-ResNet models: 64 - - For SE-ResNeXt models: 64 - input_3x3 (bool): If `True`, use three 3x3 convolutions instead of - a single 7x7 convolution in layer0. - - For SENet154: True - - For SE-ResNet models: False - - For SE-ResNeXt models: False - downsample_kernel_size (int): Kernel size for downsampling convolutions - in layer2, layer3 and layer4. - - For SENet154: 3 - - For SE-ResNet models: 1 - - For SE-ResNeXt models: 1 - downsample_padding (int): Padding for downsampling convolutions in - layer2, layer3 and layer4. - - For SENet154: 1 - - For SE-ResNet models: 0 - - For SE-ResNeXt models: 0 - num_classes (int): Number of outputs in `last_linear` layer. - - For all models: 1000 - """ - super(SENet, self).__init__() - self.inplanes = inplanes - if input_3x3: - layer0_modules = [ - ("conv1", nn.Conv2d(3, 64, 3, stride=2, padding=1, bias=False)), - ("bn1", nn.BatchNorm2d(64)), - ("relu1", nn.ReLU(inplace=True)), - ("conv2", nn.Conv2d(64, 64, 3, stride=1, padding=1, bias=False)), - ("bn2", nn.BatchNorm2d(64)), - ("relu2", nn.ReLU(inplace=True)), - ("conv3", nn.Conv2d(64, inplanes, 3, stride=1, padding=1, bias=False)), - ("bn3", nn.BatchNorm2d(inplanes)), - ("relu3", nn.ReLU(inplace=True)), - ] - else: - layer0_modules = [ - ( - "conv1", - nn.Conv2d( - 3, inplanes, kernel_size=7, stride=2, padding=3, bias=False - ), - ), - ("bn1", nn.BatchNorm2d(inplanes)), - ("relu1", nn.ReLU(inplace=True)), - ] - # To preserve compatibility with Caffe weights `ceil_mode=True` - # is used instead of `padding=1`. - layer0_modules.append(("pool", nn.MaxPool2d(3, stride=2, ceil_mode=True))) - self.layer0 = nn.Sequential(OrderedDict(layer0_modules)) - self.layer1 = self._make_layer( - block, - planes=64, - blocks=layers[0], - groups=groups, - reduction=reduction, - downsample_kernel_size=1, - downsample_padding=0, - ) - self.layer2 = self._make_layer( - block, - planes=128, - blocks=layers[1], - stride=2, - groups=groups, - reduction=reduction, - downsample_kernel_size=downsample_kernel_size, - downsample_padding=downsample_padding, - ) - self.layer3 = self._make_layer( - block, - planes=256, - blocks=layers[2], - stride=2, - groups=groups, - reduction=reduction, - downsample_kernel_size=downsample_kernel_size, - downsample_padding=downsample_padding, - ) - self.layer4 = self._make_layer( - block, - planes=512, - blocks=layers[3], - stride=2, - groups=groups, - reduction=reduction, - downsample_kernel_size=downsample_kernel_size, - downsample_padding=downsample_padding, - ) - self.avg_pool = nn.AvgPool2d(7, stride=1) - self.dropout = nn.Dropout(dropout_p) if dropout_p is not None else None - self.last_linear = nn.Linear(512 * block.expansion, num_classes) - - def _make_layer( - self, - block, - planes, - blocks, - groups, - reduction, - stride=1, - downsample_kernel_size=1, - downsample_padding=0, - ): - downsample = None - if stride != 1 or self.inplanes != planes * block.expansion: - downsample = nn.Sequential( - nn.Conv2d( - self.inplanes, - planes * block.expansion, - kernel_size=downsample_kernel_size, - stride=stride, - padding=downsample_padding, - bias=False, - ), - nn.BatchNorm2d(planes * block.expansion), - ) - - layers = [] - layers.append( - block(self.inplanes, planes, groups, reduction, stride, downsample) - ) - self.inplanes = planes * block.expansion - for i in range(1, blocks): - layers.append(block(self.inplanes, planes, groups, reduction)) - - return nn.Sequential(*layers) - - def features(self, x): - x = self.layer0(x) - x = self.layer1(x) - x = self.layer2(x) - x = self.layer3(x) - x = self.layer4(x) - return x - - def logits(self, x): - x = self.avg_pool(x) - if self.dropout is not None: - x = self.dropout(x) - x = x.view(x.size(0), -1) - x = self.last_linear(x) - return x - - def forward(self, x): - x = self.features(x) - x = self.logits(x) - return x - - -def initialize_pretrained_model(model, num_classes, settings): - assert ( - num_classes == settings["num_classes"] - ), "num_classes should be {}, but is {}".format( - settings["num_classes"], num_classes - ) - model.load_state_dict(model_zoo.load_url(settings["url"])) - model.input_space = settings["input_space"] - model.input_size = settings["input_size"] - model.input_range = settings["input_range"] - model.mean = settings["mean"] - model.std = settings["std"] - - -def senet154(num_classes=1000, pretrained="imagenet"): - model = SENet( - SEBottleneck, - [3, 8, 36, 3], - groups=64, - reduction=16, - dropout_p=0.2, - num_classes=num_classes, - ) - if pretrained is not None: - settings = pretrained_settings["senet154"][pretrained] - initialize_pretrained_model(model, num_classes, settings) - return model - - -def se_resnet50(num_classes=1000, pretrained="imagenet"): - model = SENet( - SEResNetBottleneck, - [3, 4, 6, 3], - groups=1, - reduction=16, - dropout_p=None, - inplanes=64, - input_3x3=False, - downsample_kernel_size=1, - downsample_padding=0, - num_classes=num_classes, - ) - if pretrained is not None: - settings = pretrained_settings["se_resnet50"][pretrained] - initialize_pretrained_model(model, num_classes, settings) - return model - - -def se_resnet101(num_classes=1000, pretrained="imagenet"): - model = SENet( - SEResNetBottleneck, - [3, 4, 23, 3], - groups=1, - reduction=16, - dropout_p=None, - inplanes=64, - input_3x3=False, - downsample_kernel_size=1, - downsample_padding=0, - num_classes=num_classes, - ) - if pretrained is not None: - settings = pretrained_settings["se_resnet101"][pretrained] - initialize_pretrained_model(model, num_classes, settings) - return model - - -def se_resnet152(num_classes=1000, pretrained="imagenet"): - model = SENet( - SEResNetBottleneck, - [3, 8, 36, 3], - groups=1, - reduction=16, - dropout_p=None, - inplanes=64, - input_3x3=False, - downsample_kernel_size=1, - downsample_padding=0, - num_classes=num_classes, - ) - if pretrained is not None: - settings = pretrained_settings["se_resnet152"][pretrained] - initialize_pretrained_model(model, num_classes, settings) - return model - - -def se_resnext50_32x4d(num_classes=1000, pretrained="imagenet"): - model = SENet( - SEResNeXtBottleneck, - [3, 4, 6, 3], - groups=32, - reduction=16, - dropout_p=None, - inplanes=64, - input_3x3=False, - downsample_kernel_size=1, - downsample_padding=0, - num_classes=num_classes, - ) - if pretrained is not None: - settings = pretrained_settings["se_resnext50_32x4d"][pretrained] - initialize_pretrained_model(model, num_classes, settings) - return model - - -def se_resnext101_32x4d(num_classes=1000, pretrained="imagenet"): - model = SENet( - SEResNeXtBottleneck, - [3, 4, 23, 3], - groups=32, - reduction=16, - dropout_p=None, - inplanes=64, - input_3x3=False, - downsample_kernel_size=1, - downsample_padding=0, - num_classes=num_classes, - ) - if pretrained is not None: - settings = pretrained_settings["se_resnext101_32x4d"][pretrained] - initialize_pretrained_model(model, num_classes, settings) - return model diff --git a/det3d/models/backbones/ssd_vgg.py b/det3d/models/backbones/ssd_vgg.py deleted file mode 100644 index 812fc9f..0000000 --- a/det3d/models/backbones/ssd_vgg.py +++ /dev/null @@ -1,134 +0,0 @@ -import logging - -import torch -import torch.nn as nn -import torch.nn.functional as F -from det3d.torchie.cnn import VGG, constant_init, kaiming_init, normal_init, xavier_init -from det3d.torchie.trainer import load_checkpoint - -from ..registry import BACKBONES - - -@BACKBONES.register_module -class SSDVGG(VGG): - extra_setting = { - 300: (256, "S", 512, 128, "S", 256, 128, 256, 128, 256), - 512: (256, "S", 512, 128, "S", 256, 128, "S", 256, 128, "S", 256, 128), - } - - def __init__( - self, - input_size, - depth, - with_last_pool=False, - ceil_mode=True, - out_indices=(3, 4), - out_feature_indices=(22, 34), - l2_norm_scale=20.0, - ): - super(SSDVGG, self).__init__( - depth, - with_last_pool=with_last_pool, - ceil_mode=ceil_mode, - out_indices=out_indices, - ) - assert input_size in (300, 512) - self.input_size = input_size - - self.features.add_module( - str(len(self.features)), nn.MaxPool2d(kernel_size=3, stride=1, padding=1) - ) - self.features.add_module( - str(len(self.features)), - nn.Conv2d(512, 1024, kernel_size=3, padding=6, dilation=6), - ) - self.features.add_module(str(len(self.features)), nn.ReLU(inplace=True)) - self.features.add_module( - str(len(self.features)), nn.Conv2d(1024, 1024, kernel_size=1) - ) - self.features.add_module(str(len(self.features)), nn.ReLU(inplace=True)) - self.out_feature_indices = out_feature_indices - - self.inplanes = 1024 - self.extra = self._make_extra_layers(self.extra_setting[input_size]) - self.l2_norm = L2Norm( - self.features[out_feature_indices[0] - 1].out_channels, l2_norm_scale - ) - - def init_weights(self, pretrained=None): - if isinstance(pretrained, str): - logger = logging.getLogger() - load_checkpoint(self, pretrained, strict=False, logger=logger) - elif pretrained is None: - for m in self.features.modules(): - if isinstance(m, nn.Conv2d): - kaiming_init(m) - elif isinstance(m, nn.BatchNorm2d): - constant_init(m, 1) - elif isinstance(m, nn.Linear): - normal_init(m, std=0.01) - else: - raise TypeError("pretrained must be a str or None") - - for m in self.extra.modules(): - if isinstance(m, nn.Conv2d): - xavier_init(m, distribution="uniform") - - constant_init(self.l2_norm, self.l2_norm.scale) - - def forward(self, x): - outs = [] - for i, layer in enumerate(self.features): - x = layer(x) - if i in self.out_feature_indices: - outs.append(x) - for i, layer in enumerate(self.extra): - x = F.relu(layer(x), inplace=True) - if i % 2 == 1: - outs.append(x) - outs[0] = self.l2_norm(outs[0]) - if len(outs) == 1: - return outs[0] - else: - return tuple(outs) - - def _make_extra_layers(self, outplanes): - layers = [] - kernel_sizes = (1, 3) - num_layers = 0 - outplane = None - for i in range(len(outplanes)): - if self.inplanes == "S": - self.inplanes = outplane - continue - k = kernel_sizes[num_layers % 2] - if outplanes[i] == "S": - outplane = outplanes[i + 1] - conv = nn.Conv2d(self.inplanes, outplane, k, stride=2, padding=1) - else: - outplane = outplanes[i] - conv = nn.Conv2d(self.inplanes, outplane, k, stride=1, padding=0) - layers.append(conv) - self.inplanes = outplanes[i] - num_layers += 1 - if self.input_size == 512: - layers.append(nn.Conv2d(self.inplanes, 256, 4, padding=1)) - - return nn.Sequential(*layers) - - -class L2Norm(nn.Module): - def __init__(self, n_dims, scale=20.0, eps=1e-10): - super(L2Norm, self).__init__() - self.n_dims = n_dims - self.weight = nn.Parameter(torch.Tensor(self.n_dims)) - self.eps = eps - self.scale = scale - - def forward(self, x): - # normalization layer convert to FP32 in FP16 training - x_float = x.float() - norm = x_float.pow(2).sum(1, keepdim=True).sqrt() + self.eps - return ( - self.weight[None, :, None, None].float().expand_as(x_float) * x_float / norm - ).type_as(x) diff --git a/det3d/models/bbox_heads/__init__.py b/det3d/models/bbox_heads/__init__.py index bcc506c..98b3464 100644 --- a/det3d/models/bbox_heads/__init__.py +++ b/det3d/models/bbox_heads/__init__.py @@ -1,3 +1,3 @@ -from .mg_head import Head, MultiGroupHead, CenterHead +from .center_head import CenterHead -__all__ = ["MultiGroupHead", "Head", "CenterHead"] +__all__ = ["CenterHead"] diff --git a/det3d/models/bbox_heads/center_head.py b/det3d/models/bbox_heads/center_head.py new file mode 100644 index 0000000..741648a --- /dev/null +++ b/det3d/models/bbox_heads/center_head.py @@ -0,0 +1,489 @@ +# ------------------------------------------------------------------------------ +# Portions of this code are from +# det3d (https://github.com/poodarchu/Det3D/tree/56402d4761a5b73acd23080f537599b0888cce07) +# Copyright (c) 2019 朱本金 +# Licensed under the MIT License +# ------------------------------------------------------------------------------ + +import logging +from collections import defaultdict +from det3d.core import box_torch_ops +import torch +from det3d.torchie.cnn import kaiming_init +from torch import double, nn +from det3d.models.losses.centernet_loss import FastFocalLoss, RegLoss +from det3d.models.utils import Sequential +from ..registry import HEADS +import copy +try: + from det3d.ops.dcn import DeformConv +except: + print("Deformable Convolution not built!") + + +class FeatureAdaption(nn.Module): + """Feature Adaption Module. + + Feature Adaption Module is implemented based on DCN v1. + It uses anchor shape prediction rather than feature map to + predict offsets of deformable conv layer. + + Args: + in_channels (int): Number of channels in the input feature map. + out_channels (int): Number of channels in the output feature map. + kernel_size (int): Deformable conv kernel size. + deformable_groups (int): Deformable conv group size. + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size=3, + deformable_groups=4): + super(FeatureAdaption, self).__init__() + offset_channels = kernel_size * kernel_size * 2 + self.conv_offset = nn.Conv2d( + in_channels, deformable_groups * offset_channels, 1, bias=True) + self.conv_adaption = DeformConv( + in_channels, + out_channels, + kernel_size=kernel_size, + padding=(kernel_size - 1) // 2, + deformable_groups=deformable_groups) + self.relu = nn.ReLU(inplace=True) + self.init_offset() + + def init_offset(self): + self.conv_offset.weight.data.zero_() + + def forward(self, x,): + offset = self.conv_offset(x) + x = self.relu(self.conv_adaption(x, offset)) + return x + +class SepHead(nn.Module): + def __init__( + self, + in_channels, + heads, + head_conv=64, + final_kernel=1, + bn=False, + init_bias=-2.19, + **kwargs, + ): + super(SepHead, self).__init__(**kwargs) + + self.heads = heads + for head in self.heads: + classes, num_conv = self.heads[head] + + fc = Sequential() + for i in range(num_conv-1): + fc.add(nn.Conv2d(in_channels, head_conv, + kernel_size=final_kernel, stride=1, + padding=final_kernel // 2, bias=True)) + if bn: + fc.add(nn.BatchNorm2d(head_conv)) + fc.add(nn.ReLU()) + + fc.add(nn.Conv2d(head_conv, classes, + kernel_size=final_kernel, stride=1, + padding=final_kernel // 2, bias=True)) + + if 'hm' in head: + fc[-1].bias.data.fill_(init_bias) + else: + for m in fc.modules(): + if isinstance(m, nn.Conv2d): + kaiming_init(m) + + self.__setattr__(head, fc) + + + def forward(self, x): + ret_dict = dict() + for head in self.heads: + ret_dict[head] = self.__getattr__(head)(x) + + return ret_dict + +class DCNSepHead(nn.Module): + def __init__( + self, + in_channels, + num_cls, + heads, + head_conv=64, + final_kernel=1, + bn=False, + init_bias=-2.19, + **kwargs, + ): + super(DCNSepHead, self).__init__(**kwargs) + + # feature adaptation with dcn + # use separate features for classification / regression + self.feature_adapt_cls = FeatureAdaption( + in_channels, + in_channels, + kernel_size=3, + deformable_groups=4) + + self.feature_adapt_reg = FeatureAdaption( + in_channels, + in_channels, + kernel_size=3, + deformable_groups=4) + + # heatmap prediction head + self.cls_head = Sequential( + nn.Conv2d(in_channels, head_conv, + kernel_size=3, padding=1, bias=True), + nn.BatchNorm2d(64), + nn.ReLU(inplace=True), + nn.Conv2d(head_conv, num_cls, + kernel_size=3, stride=1, + padding=1, bias=True) + ) + self.cls_head[-1].bias.data.fill_(init_bias) + + # other regression target + self.task_head = SepHead(in_channels, heads, head_conv=head_conv, bn=bn, final_kernel=final_kernel) + + + def forward(self, x): + center_feat = self.feature_adapt_cls(x) + reg_feat = self.feature_adapt_reg(x) + + cls_score = self.cls_head(center_feat) + ret = self.task_head(reg_feat) + ret['hm'] = cls_score + + return ret + + +@HEADS.register_module +class CenterHead(nn.Module): + def __init__( + self, + in_channels=[128,], + tasks=[], + dataset='nuscenes', + weight=0.25, + code_weights=[], + common_heads=dict(), + logger=None, + init_bias=-2.19, + share_conv_channel=64, + num_hm_conv=2, + dcn_head=False, + ): + super(CenterHead, self).__init__() + + num_classes = [len(t["class_names"]) for t in tasks] + self.class_names = [t["class_names"] for t in tasks] + self.code_weights = code_weights + self.weight = weight # weight between hm loss and loc loss + self.dataset = dataset + + self.in_channels = in_channels + self.num_classes = num_classes + + self.crit = FastFocalLoss() + self.crit_reg = RegLoss() + + self.box_n_dim = 9 if 'vel' in common_heads else 7 + self.use_direction_classifier = False + + if not logger: + logger = logging.getLogger("CenterHead") + self.logger = logger + + logger.info( + f"num_classes: {num_classes}" + ) + + # a shared convolution + self.shared_conv = nn.Sequential( + nn.Conv2d(in_channels, share_conv_channel, + kernel_size=3, padding=1, bias=True), + nn.BatchNorm2d(share_conv_channel), + nn.ReLU(inplace=True) + ) + + self.tasks = nn.ModuleList() + print("Use HM Bias: ", init_bias) + + if dcn_head: + print("Use Deformable Convolution in the CenterHead!") + + for num_cls in num_classes: + heads = copy.deepcopy(common_heads) + if not dcn_head: + heads.update(dict(hm=(num_cls, num_hm_conv))) + self.tasks.append( + SepHead(share_conv_channel, heads, bn=True, init_bias=init_bias, final_kernel=3) + ) + else: + self.tasks.append( + DCNSepHead(share_conv_channel, num_cls, heads, bn=True, init_bias=init_bias, final_kernel=3) + ) + + logger.info("Finish CenterHead Initialization") + + def forward(self, x, *kwargs): + ret_dicts = [] + + x = self.shared_conv(x) + + for task in self.tasks: + ret_dicts.append(task(x)) + + return ret_dicts + + def _sigmoid(self, x): + y = torch.clamp(x.sigmoid_(), min=1e-4, max=1-1e-4) + return y + + def loss(self, example, preds_dicts, **kwargs): + rets = [] + for task_id, preds_dict in enumerate(preds_dicts): + # heatmap focal loss + preds_dict['hm'] = self._sigmoid(preds_dict['hm']) + + hm_loss = self.crit(preds_dict['hm'], example['hm'][task_id], example['ind'][task_id], example['mask'][task_id], example['cat'][task_id]) + + target_box = example['anno_box'][task_id] + # reconstruct the anno_box from multiple reg heads + if self.dataset in ['waymo', 'nuscenes']: + if 'vel' in preds_dict: + preds_dict['anno_box'] = torch.cat((preds_dict['reg'], preds_dict['height'], preds_dict['dim'], + preds_dict['vel'], preds_dict['rot']), dim=1) + else: + preds_dict['anno_box'] = torch.cat((preds_dict['reg'], preds_dict['height'], preds_dict['dim'], + preds_dict['rot']), dim=1) + target_box = target_box[..., [0, 1, 2, 3, 4, 5, -2, -1]] # remove vel target + else: + raise NotImplementedError() + + ret = {} + + # Regression loss for dimension, offset, height, rotation + box_loss = self.crit_reg(preds_dict['anno_box'], example['mask'][task_id], example['ind'][task_id], target_box) + + loc_loss = (box_loss*box_loss.new_tensor(self.code_weights)).sum() + + loss = hm_loss + self.weight*loc_loss + + ret.update({'loss': loss, 'hm_loss': hm_loss.detach().cpu(), 'loc_loss':loc_loss, 'loc_loss_elem': box_loss.detach().cpu(), 'num_positive': example['mask'][task_id].float().sum()}) + + rets.append(ret) + + """convert batch-key to key-batch + """ + rets_merged = defaultdict(list) + for ret in rets: + for k, v in ret.items(): + rets_merged[k].append(v) + + return rets_merged + + @torch.no_grad() + def predict(self, example, preds_dicts, test_cfg, **kwargs): + """decode, nms, then return the detection result. Additionaly support double flip testing + """ + # get loss info + rets = [] + metas = [] + + double_flip = test_cfg.get('double_flip', False) + + post_center_range = test_cfg.post_center_limit_range + if len(post_center_range) > 0: + post_center_range = torch.tensor( + post_center_range, + dtype=preds_dicts[0]['hm'].dtype, + device=preds_dicts[0]['hm'].device, + ) + + for task_id, preds_dict in enumerate(preds_dicts): + # convert N C H W to N H W C + for key, val in preds_dict.items(): + preds_dict[key] = val.permute(0, 2, 3, 1).contiguous() + + batch_size = preds_dict['hm'].shape[0] + + if double_flip: + assert batch_size % 4 == 0, print(batch_size) + batch_size = int(batch_size / 4) + for k in preds_dict.keys(): + # transform the prediction map back to their original coordinate befor flipping + # the flipped predictions are ordered in a group of 4. The first one is the original pointcloud + # the second one is X flip pointcloud(y=-y), the third one is Y flip pointcloud(x=-x), and the last one is + # X and Y flip pointcloud(x=-x, y=-y). + # Also please note that pytorch's flip function is defined on higher dimensional space, so dims=[2] means that + # it is flipping along the axis with H length(which is normaly the Y axis), however in our traditional word, it is flipping along + # the X axis. The below flip follows pytorch's definition yflip(y=-y) xflip(x=-x) + _, H, W, C = preds_dict[k].shape + preds_dict[k] = preds_dict[k].reshape(int(batch_size), 4, H, W, C) + preds_dict[k][:, 1] = torch.flip(preds_dict[k][:, 1], dims=[1]) + preds_dict[k][:, 2] = torch.flip(preds_dict[k][:, 2], dims=[2]) + preds_dict[k][:, 3] = torch.flip(preds_dict[k][:, 3], dims=[1, 2]) + + if "metadata" not in example or len(example["metadata"]) == 0: + meta_list = [None] * batch_size + else: + meta_list = example["metadata"] + if double_flip: + meta_list = meta_list[:4*int(batch_size):4] + + batch_hm = torch.sigmoid(preds_dict['hm']) + + batch_dim = torch.exp(preds_dict['dim']) + + batch_rots = preds_dict['rot'][..., 0:1] + batch_rotc = preds_dict['rot'][..., 1:2] + batch_reg = preds_dict['reg'] + batch_hei = preds_dict['height'] + + if double_flip: + batch_hm = batch_hm.mean(dim=1) + batch_hei = batch_hei.mean(dim=1) + batch_dim = batch_dim.mean(dim=1) + + # y = -y reg_y = 1-reg_y + batch_reg[:, 1, ..., 1] = 1 - batch_reg[:, 1, ..., 1] + batch_reg[:, 2, ..., 0] = 1 - batch_reg[:, 2, ..., 0] + + batch_reg[:, 3, ..., 0] = 1 - batch_reg[:, 3, ..., 0] + batch_reg[:, 3, ..., 1] = 1 - batch_reg[:, 3, ..., 1] + batch_reg = batch_reg.mean(dim=1) + + # first yflip + # y = -y theta = pi -theta + # sin(pi-theta) = sin(theta) cos(pi-theta) = -cos(theta) + # batch_rots[:, 1] the same + batch_rotc[:, 1] *= -1 + + # then xflip x = -x theta = 2pi - theta + # sin(2pi - theta) = -sin(theta) cos(2pi - theta) = cos(theta) + # batch_rots[:, 2] the same + batch_rots[:, 2] *= -1 + + # double flip + batch_rots[:, 3] *= -1 + batch_rotc[:, 3] *= -1 + + batch_rotc = batch_rotc.mean(dim=1) + batch_rots = batch_rots.mean(dim=1) + + batch_rot = torch.atan2(batch_rots, batch_rotc) + + batch, H, W, num_cls = batch_hm.size() + + batch_reg = batch_reg.reshape(batch, H*W, 2) + batch_hei = batch_hei.reshape(batch, H*W, 1) + + batch_rot = batch_rot.reshape(batch, H*W, 1) + batch_dim = batch_dim.reshape(batch, H*W, 3) + batch_hm = batch_hm.reshape(batch, H*W, num_cls) + + ys, xs = torch.meshgrid([torch.arange(0, H), torch.arange(0, W)]) + ys = ys.view(1, H, W).repeat(batch, 1, 1).to(batch_hm.device).float() + xs = xs.view(1, H, W).repeat(batch, 1, 1).to(batch_hm.device).float() + + xs = xs.view(batch, -1, 1) + batch_reg[:, :, 0:1] + ys = ys.view(batch, -1, 1) + batch_reg[:, :, 1:2] + + xs = xs * test_cfg.out_size_factor * test_cfg.voxel_size[0] + test_cfg.pc_range[0] + ys = ys * test_cfg.out_size_factor * test_cfg.voxel_size[1] + test_cfg.pc_range[1] + + if 'vel' in preds_dict: + batch_vel = preds_dict['vel'] + + if double_flip: + # flip vy + batch_vel[:, 1, ..., 1] *= -1 + # flip vx + batch_vel[:, 2, ..., 0] *= -1 + + batch_vel[:, 3] *= -1 + + batch_vel = batch_vel.mean(dim=1) + + batch_vel = batch_vel.reshape(batch, H*W, 2) + batch_box_preds = torch.cat([xs, ys, batch_hei, batch_dim, batch_vel, batch_rot], dim=2) + else: + batch_box_preds = torch.cat([xs, ys, batch_hei, batch_dim, batch_rot], dim=2) + + metas.append(meta_list) + + if test_cfg.get('per_class_nms', False): + pass + else: + rets.append(self.post_processing(batch_box_preds, batch_hm, test_cfg, post_center_range)) + + # Merge branches results + ret_list = [] + num_samples = len(rets[0]) + + ret_list = [] + for i in range(num_samples): + ret = {} + for k in rets[0][i].keys(): + if k in ["box3d_lidar", "scores"]: + ret[k] = torch.cat([ret[i][k] for ret in rets]) + elif k in ["label_preds"]: + flag = 0 + for j, num_class in enumerate(self.num_classes): + rets[j][i][k] += flag + flag += num_class + ret[k] = torch.cat([ret[i][k] for ret in rets]) + + ret['metadata'] = metas[0][i] + ret_list.append(ret) + + return ret_list + + @torch.no_grad() + def post_processing(self, batch_box_preds, batch_hm, test_cfg, post_center_range): + batch_size = len(batch_hm) + + prediction_dicts = [] + for i in range(batch_size): + box_preds = batch_box_preds[i] + hm_preds = batch_hm[i] + + scores, labels = torch.max(hm_preds, dim=-1) + + score_mask = scores > test_cfg.score_threshold + distance_mask = (box_preds[..., :3] >= post_center_range[:3]).all(1) \ + & (box_preds[..., :3] <= post_center_range[3:]).all(1) + + mask = distance_mask & score_mask + + box_preds = box_preds[mask] + scores = scores[mask] + labels = labels[mask] + + boxes_for_nms = box_preds[:, [0, 1, 2, 3, 4, 5, -1]] + + selected = box_torch_ops.rotate_nms_pcdet(boxes_for_nms, scores, + thresh=test_cfg.nms.nms_iou_threshold, + pre_maxsize=test_cfg.nms.nms_pre_max_size, + post_max_size=test_cfg.nms.nms_post_max_size) + + selected_boxes = box_preds[selected] + selected_scores = scores[selected] + selected_labels = labels[selected] + + prediction_dict = { + 'box3d_lidar': selected_boxes, + 'scores': selected_scores, + 'label_preds': selected_labels + } + + prediction_dicts.append(prediction_dict) + + return prediction_dicts diff --git a/det3d/models/bbox_heads/mg_head.py b/det3d/models/bbox_heads/mg_head.py deleted file mode 100644 index 2561c2c..0000000 --- a/det3d/models/bbox_heads/mg_head.py +++ /dev/null @@ -1,1800 +0,0 @@ -# ------------------------------------------------------------------------------ -# Portions of this code are from -# det3d (https://github.com/poodarchu/Det3D/tree/56402d4761a5b73acd23080f537599b0888cce07) -# Copyright (c) 2019 朱本金 -# Licensed under the MIT License -# ------------------------------------------------------------------------------ - -import logging -from collections import defaultdict -from enum import Enum - -import numpy as np -import torch -from det3d.core import box_torch_ops -from det3d.models.builder import build_loss -from det3d.models.losses import metrics -from det3d.torchie.cnn import constant_init, kaiming_init -from det3d.torchie.trainer import load_checkpoint -from torch import nn -from torch.nn import functional as F -from torch.nn.modules.batchnorm import _BatchNorm -from det3d.models.losses.centernet_loss import FocalLoss, SmoothRegLoss, RegLoss, RegClsLoss, FastFocalLoss -from det3d.core.utils.center_utils import ddd_decode -from det3d.models.utils import Sequential -from .. import builder -from ..losses import accuracy -from ..registry import HEADS -import copy -try: - from det3d.ops.dcn import DeformConv, ModulatedDeformConvPack -except: - print("Deformable Convolution not built!") -from det3d.core.utils.center_utils import _transpose_and_gather_feat - - -class FeatureAdaption(nn.Module): - """Feature Adaption Module. - - Feature Adaption Module is implemented based on DCN v1. - It uses anchor shape prediction rather than feature map to - predict offsets of deformable conv layer. - - Args: - in_channels (int): Number of channels in the input feature map. - out_channels (int): Number of channels in the output feature map. - kernel_size (int): Deformable conv kernel size. - deformable_groups (int): Deformable conv group size. - """ - - def __init__(self, - in_channels, - out_channels, - kernel_size=3, - deformable_groups=4): - super(FeatureAdaption, self).__init__() - offset_channels = kernel_size * kernel_size * 2 - self.conv_offset = nn.Conv2d( - in_channels, deformable_groups * offset_channels, 1, bias=True) - self.conv_adaption = DeformConv( - in_channels, - out_channels, - kernel_size=kernel_size, - padding=(kernel_size - 1) // 2, - deformable_groups=deformable_groups) - self.relu = nn.ReLU(inplace=True) - self.init_offset() - - def init_offset(self): - self.conv_offset.weight.data.zero_() - self.conv_offset.bias.data.zero_() - - def init_weights(self): - pass - """normal_init(self.conv_offset, std=0.1) - normal_init(self.conv_adaption, std=0.01) - """ - - def forward(self, x,): - offset = self.conv_offset(x) - x = self.relu(self.conv_adaption(x, offset)) - return x - -def one_hot_f(tensor, depth, dim=-1, on_value=1.0, dtype=torch.float32): - tensor_onehot = torch.zeros( - *list(tensor.shape), depth, dtype=dtype, device=tensor.device - ) - tensor_onehot.scatter_(dim, tensor.unsqueeze(dim).long(), on_value) - return tensor_onehot - - -def add_sin_difference(boxes1, boxes2): - rad_pred_encoding = torch.sin(boxes1[..., -1:]) * torch.cos(boxes2[..., -1:]) - rad_tg_encoding = torch.cos(boxes1[..., -1:]) * torch.sin(boxes2[..., -1:]) - boxes1 = torch.cat([boxes1[..., :-1], rad_pred_encoding], dim=-1) - boxes2 = torch.cat([boxes2[..., :-1], rad_tg_encoding], dim=-1) - return boxes1, boxes2 - - -def _get_pos_neg_loss(cls_loss, labels): - # cls_loss: [N, num_anchors, num_class] - # labels: [N, num_anchors] - batch_size = cls_loss.shape[0] - if cls_loss.shape[-1] == 1 or len(cls_loss.shape) == 2: - cls_pos_loss = (labels > 0).type_as(cls_loss) * cls_loss.view(batch_size, -1) - cls_neg_loss = (labels == 0).type_as(cls_loss) * cls_loss.view(batch_size, -1) - cls_pos_loss = cls_pos_loss.sum() / batch_size - cls_neg_loss = cls_neg_loss.sum() / batch_size - else: - cls_pos_loss = cls_loss[..., 1:].sum() / batch_size - cls_neg_loss = cls_loss[..., 0].sum() / batch_size - return cls_pos_loss, cls_neg_loss - -def limit_period(val, offset=0.5, period=np.pi): - return val - torch.floor(val / period + offset) * period - -def get_direction_target(anchors, reg_targets, one_hot=True, dir_offset=0.0): - batch_size = reg_targets.shape[0] - anchors = anchors.view(batch_size, -1, anchors.shape[-1]) - rot_gt = reg_targets[..., -1] + anchors[..., -1] - #dir_cls_targets = ((rot_gt - dir_offset) > 0).long() - dir_cls_targets = (limit_period(rot_gt - dir_offset, 0.5, np.pi*2) > - 0).long() - if one_hot: - dir_cls_targets = one_hot_f(dir_cls_targets, 2, dtype=anchors.dtype) - return dir_cls_targets - -def get_direction_target_center(reg_targets, one_hot=False, dir_offset=0.0): - rot_gt = reg_targets[..., -1] - dir_cls_targets = ((rot_gt - dir_offset) > 0).long() - if one_hot: - dir_cls_targets = one_hot_f(dir_cls_targets, 2, dtype=reg_targets.dtype) - return dir_cls_targets - - -def smooth_l1_loss(pred, gt, sigma): - def _smooth_l1_loss(pred, gt, sigma): - sigma2 = sigma ** 2 - cond_point = 1 / sigma2 - x = pred - gt - abs_x = torch.abs(x) - - in_mask = abs_x < cond_point - out_mask = 1 - in_mask - - in_value = 0.5 * (sigma * x) ** 2 - out_value = abs_x - 0.5 / sigma2 - - value = in_value * in_mask.type_as(in_value) + out_value * out_mask.type_as( - out_value - ) - return value - - value = _smooth_l1_loss(pred, gt, sigma) - loss = value.mean(dim=1).sum() - return loss - - -def smooth_l1_loss_detectron2(input, target, beta: float, reduction: str = "none"): - """ - Smooth L1 loss defined in the Fast R-CNN paper as: - | 0.5 * x ** 2 / beta if abs(x) < beta - smoothl1(x) = | - | abs(x) - 0.5 * beta otherwise, - where x = input - target. - Smooth L1 loss is related to Huber loss, which is defined as: - | 0.5 * x ** 2 if abs(x) < beta - huber(x) = | - | beta * (abs(x) - 0.5 * beta) otherwise - Smooth L1 loss is equal to huber(x) / beta. This leads to the following - differences: - - As beta -> 0, Smooth L1 loss converges to L1 loss, while Huber loss - converges to a constant 0 loss. - - As beta -> +inf, Smooth L1 converges to a constant 0 loss, while Huber loss - converges to L2 loss. - - For Smooth L1 loss, as beta varies, the L1 segment of the loss has a constant - slope of 1. For Huber loss, the slope of the L1 segment is beta. - Smooth L1 loss can be seen as exactly L1 loss, but with the abs(x) < beta - portion replaced with a quadratic function such that at abs(x) = beta, its - slope is 1. The quadratic segment smooths the L1 loss near x = 0. - Args: - input (Tensor): input tensor of any shape - target (Tensor): target value tensor with the same shape as input - beta (float): L1 to L2 change point. - For beta values < 1e-5, L1 loss is computed. - reduction: 'none' | 'mean' | 'sum' - 'none': No reduction will be applied to the output. - 'mean': The output will be averaged. - 'sum': The output will be summed. - Returns: - The loss with the reduction option applied. - Note: - PyTorch's builtin "Smooth L1 loss" implementation does not actually - implement Smooth L1 loss, nor does it implement Huber loss. It implements - the special case of both in which they are equal (beta=1). - See: https://pytorch.org/docs/stable/nn.html#torch.nn.SmoothL1Loss. - """ - if beta < 1e-5: - # if beta == 0, then torch.where will result in nan gradients when - # the chain rule is applied due to pytorch implementation details - # (the False branch "0.5 * n ** 2 / 0" has an incoming gradient of - # zeros, rather than "no gradient"). To avoid this issue, we define - # small values of beta to be exactly l1 loss. - loss = torch.abs(input - target) - else: - n = torch.abs(input - target) - cond = n < beta - loss = torch.where(cond, 0.5 * n ** 2 / beta, n - 0.5 * beta) - - if reduction == "mean": - loss = loss.mean() - elif reduction == "sum": - loss = loss.sum() - return loss - - - -def create_loss( - loc_loss_ftor, - cls_loss_ftor, - box_preds, - cls_preds, - cls_targets, - cls_weights, - reg_targets, - reg_weights, - num_class, - encode_background_as_zeros=True, - encode_rad_error_by_sin=True, - bev_only=False, - box_code_size=9, -): - batch_size = int(box_preds.shape[0]) - - if bev_only: - box_preds = box_preds.view(batch_size, -1, box_code_size - 2) - else: - box_preds = box_preds.view(batch_size, -1, box_code_size) - - if encode_background_as_zeros: - cls_preds = cls_preds.view(batch_size, -1, num_class) - else: - cls_preds = cls_preds.view(batch_size, -1, num_class + 1) - - cls_targets = cls_targets.squeeze(-1) - one_hot_targets = one_hot_f(cls_targets, depth=num_class + 1, dtype=box_preds.dtype) - if encode_background_as_zeros: - one_hot_targets = one_hot_targets[..., 1:] - - if encode_rad_error_by_sin: - # sin(a - b) = sinacosb-cosasinb - box_preds, reg_targets = add_sin_difference(box_preds, reg_targets) - - loc_losses = loc_loss_ftor(box_preds, reg_targets, weights=reg_weights) # [N, M] - cls_losses = cls_loss_ftor( - cls_preds, one_hot_targets, weights=cls_weights - ) # [N, M] - - return loc_losses, cls_losses - - -class LossNormType(Enum): - NormByNumPositives = "norm_by_num_positives" - NormByNumExamples = "norm_by_num_examples" - NormByNumPosNeg = "norm_by_num_pos_neg" - DontNorm = "dont_norm" - - -@HEADS.register_module -class Head(nn.Module): - def __init__( - self, - num_input, - num_pred, - num_cls, - use_dir=False, - num_dir=0, - header=True, - name="", - focal_loss_init=False, - init_bias=-2.19, - head_conv=64, - num_classes=None, - **kwargs, - ): - """ - A heavier head that contains two convolution for each branch. This head design matches the CenterHead below - """ - super(Head, self).__init__(**kwargs) - self.use_dir = use_dir - - self.pred_heads = num_pred - - for head in self.pred_heads: - classes, num_conv = self.pred_heads[head] - - fc = Sequential() - for i in range(num_conv-1): - fc.add(nn.Conv2d(num_input, head_conv, - kernel_size=3, stride=1, - padding=3 // 2, bias=True)) - fc.add(nn.BatchNorm2d(head_conv)) - fc.add(nn.ReLU()) - - fc.add(nn.Conv2d(64, num_classes * 2 *classes, - kernel_size=3, stride=1, - padding=3 // 2, bias=True)) - - for m in fc.modules(): - if isinstance(m, nn.Conv2d): - kaiming_init(m) - - self.__setattr__(head, fc) - - self.conv_cls = Sequential( - nn.Conv2d(num_input, head_conv, - kernel_size=3, padding=1, bias=True), - nn.BatchNorm2d(64), - nn.ReLU(inplace=True), - nn.Conv2d(num_input, num_cls, - kernel_size=3, stride=1, - padding=1, bias=True) - ) - - # Focal loss paper points out that it is important to initialize the bias - self.conv_cls[-1].bias.data.fill_(init_bias) - - if self.use_dir: - self.conv_dir = nn.Conv2d(num_input, num_dir, 1) - - def forward(self, x): - ret_list = [] - - cls_preds = self.conv_cls(x).permute(0, 2, 3, 1).contiguous() - - ret_dict = dict() - for head in self.pred_heads: - ret_dict[head] = self.__getattr__(head)(x) - - if 'vel' in ret_dict: - box_preds = torch.cat((ret_dict['reg'], ret_dict['height'], ret_dict['dim'], - ret_dict['vel'], ret_dict['rot']), dim=1).permute(0, 2, 3, 1).contiguous() - else: - box_preds = torch.cat((ret_dict['reg'], ret_dict['height'], ret_dict['dim'], - ret_dict['rot']), dim=1).permute(0, 2, 3, 1).contiguous() - - - ret_dict = {"box_preds": box_preds, "cls_preds": cls_preds} - if self.use_dir: - dir_preds = self.conv_dir(x).permute(0, 2, 3, 1).contiguous() - ret_dict["dir_cls_preds"] = dir_preds - - return ret_dict - - - -@HEADS.register_module -class MultiGroupHead(nn.Module): - def __init__( - self, - mode="3d", - in_channels=[128,], - norm_cfg=None, - tasks=[], - weights=[], - num_classes=[1,], - box_coder=None, - with_cls=True, - with_reg=True, - reg_class_agnostic=False, - encode_background_as_zeros=True, - loss_norm=dict( - type="NormByNumPositives", pos_class_weight=1.0, neg_class_weight=1.0, - ), - loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0,), - use_sigmoid_score=True, - loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0,), - encode_rad_error_by_sin=True, - loss_aux=None, - direction_offset=0.0, - name="rpn", - common_heads=None, - init_bias=-2.19, - share_conv_channel=64, - logger=None, - ): - super(MultiGroupHead, self).__init__() - - assert with_cls or with_reg - - num_classes = [len(t["class_names"]) for t in tasks] - self.class_names = [t["class_names"] for t in tasks] - self.num_anchor_per_locs = [2 * n for n in num_classes] - - self.box_coder = box_coder - box_code_sizes = [box_coder.code_size] * len(num_classes) - - self.with_cls = with_cls - self.with_reg = with_reg - self.in_channels = in_channels - self.num_classes = num_classes - self.reg_class_agnostic = reg_class_agnostic - self.encode_rad_error_by_sin = encode_rad_error_by_sin - self.encode_background_as_zeros = encode_background_as_zeros - self.use_sigmoid_score = use_sigmoid_score - self.box_n_dim = self.box_coder.code_size - self.anchor_dim = self.box_coder.n_dim - - self.loss_cls = build_loss(loss_cls) - self.loss_reg = build_loss(loss_bbox) - if loss_aux is not None: - self.loss_aux = build_loss(loss_aux) - - self.loss_norm = loss_norm - - if not logger: - logger = logging.getLogger("MultiGroupHead") - self.logger = logger - - self.dcn = None - self.zero_init_residual = False - - self.use_direction_classifier = loss_aux is not None - if loss_aux: - self.direction_offset = direction_offset - - self.bev_only = True if mode == "bev" else False - - num_clss = [] - num_preds = [] - num_dirs = [] - - for num_c, num_a, box_cs in zip( - num_classes, self.num_anchor_per_locs, box_code_sizes - ): - if self.encode_background_as_zeros: - num_cls = num_a * num_c - else: - num_cls = num_a * (num_c + 1) - num_clss.append(num_cls) - - if self.use_direction_classifier: - num_dir = num_a * 2 - num_dirs.append(num_dir) - else: - num_dir = None - - # here like CenterHead, we regress to diffrent targets in separate heads - num_pred = copy.deepcopy(common_heads) - - num_preds.append(num_pred) - - logger.info( - f"num_classes: {num_classes}, num_dirs: {num_dirs}" - ) - - - self.shared_conv = nn.Sequential( - nn.Conv2d(in_channels, share_conv_channel, - kernel_size=3, padding=1, bias=True), - nn.BatchNorm2d(share_conv_channel), - nn.ReLU(inplace=True) - ) - - self.tasks = nn.ModuleList() - for task_id, (num_pred, num_cls) in enumerate(zip(num_preds, num_clss)): - self.tasks.append( - Head( - share_conv_channel, - num_pred, - num_cls, - use_dir=self.use_direction_classifier, - num_dir=num_dirs[task_id] - if self.use_direction_classifier - else None, - header=False, - init_bias=init_bias, - num_classes=num_classes[task_id], - ) - ) - - logger.info("Finish MultiGroupHead Initialization") - - def init_weights(self, pretrained=None): - if isinstance(pretrained, str): - logger = logging.getLogger() - load_checkpoint(self, pretrained, strict=False, logger=logger) - elif pretrained is None: - for m in self.modules(): - if isinstance(m, nn.Conv2d): - kaiming_init(m) - elif isinstance(m, (_BatchNorm, nn.GroupNorm)): - constant_init(m, 1) - - if self.dcn is not None: - for m in self.modules(): - if isinstance(m, Bottleneck) and hasattr(m, "conv2_offset"): - constant_init(m.conv2_offset, 0) - - if self.zero_init_residual: - for m in self.modules(): - if isinstance(m, Bottleneck): - constant_init(m.norm3, 0) - elif isinstance(m, BasicBlock): - constant_init(m.norm2, 0) - else: - raise TypeError("pretrained must be a str or None") - - def forward(self, x): - x = self.shared_conv(x) - ret_dicts = [] - for task in self.tasks: - ret_dicts.append(task(x)) - - return ret_dicts - - def prepare_loss_weights( - self, - labels, - loss_norm=dict( - type="NormByNumPositives", pos_cls_weight=1.0, neg_cls_weight=1.0, - ), - dtype=torch.float32, - ): - loss_norm_type = getattr(LossNormType, loss_norm["type"]) - pos_cls_weight = loss_norm["pos_cls_weight"] - neg_cls_weight = loss_norm["neg_cls_weight"] - - cared = labels >= 0 - # cared: [N, num_anchors] - positives = labels > 0 - negatives = labels == 0 - negative_cls_weights = negatives.type(dtype) * neg_cls_weight - cls_weights = negative_cls_weights + pos_cls_weight * positives.type(dtype) - reg_weights = positives.type(dtype) - if loss_norm_type == LossNormType.NormByNumExamples: - num_examples = cared.type(dtype).sum(1, keepdim=True) - num_examples = torch.clamp(num_examples, min=1.0) - cls_weights /= num_examples - bbox_normalizer = positives.sum(1, keepdim=True).type(dtype) - reg_weights /= torch.clamp(bbox_normalizer, min=1.0) - elif loss_norm_type == LossNormType.NormByNumPositives: - pos_normalizer = positives.sum(1, keepdim=True).type(dtype) - reg_weights /= torch.clamp(pos_normalizer, min=1.0) - cls_weights /= torch.clamp(pos_normalizer, min=1.0) - elif loss_norm_type == LossNormType.NormByNumPosNeg: - pos_neg = torch.stack([positives, negatives], dim=-1).type(dtype) - normalizer = pos_neg.sum(1, keepdim=True) # [N, 1, 2] - cls_normalizer = (pos_neg * normalizer).sum(-1) # [N, M] - cls_normalizer = torch.clamp(cls_normalizer, min=1.0) - # cls_normalizer will be pos_or_neg_weight/num_pos_or_neg - normalizer = torch.clamp(normalizer, min=1.0) - reg_weights /= normalizer[:, 0:1, 0] - cls_weights /= cls_normalizer - elif loss_norm_type == LossNormType.DontNorm: # support ghm loss - pos_normalizer = positives.sum(1, keepdim=True).type(dtype) - reg_weights /= torch.clamp(pos_normalizer, min=1.0) - else: - raise ValueError(f"unknown loss norm type. available: {list(LossNormType)}") - return cls_weights, reg_weights, cared - - def loss(self, example, preds_dicts, **kwargs): - - voxels = example["voxels"] - num_points = example["num_points"] - coors = example["coordinates"] - batch_anchors = example["anchors"] - batch_size_device = batch_anchors[0].shape[0] - - rets = [] - for task_id, preds_dict in enumerate(preds_dicts): - losses = dict() - - num_class = self.num_classes[task_id] - - box_preds = preds_dict["box_preds"] - cls_preds = preds_dict["cls_preds"] - - labels = example["labels"][task_id] - if kwargs.get("mode", False): - reg_targets = example["reg_targets"][task_id][:, :, [0, 1, 3, 4, 6]] - reg_targets_left = example["reg_targets"][task_id][:, :, [2, 5]] - else: - reg_targets = example["reg_targets"][task_id] - - cls_weights, reg_weights, cared = self.prepare_loss_weights( - labels, loss_norm=self.loss_norm, dtype=torch.float32, - ) - cls_targets = labels * cared.type_as(labels) - cls_targets = cls_targets.unsqueeze(-1) - - loc_loss, cls_loss = create_loss( - self.loss_reg, - self.loss_cls, - box_preds, - cls_preds, - cls_targets, - cls_weights, - reg_targets, - reg_weights, - num_class, - self.encode_background_as_zeros, - self.encode_rad_error_by_sin, - bev_only=self.bev_only, - box_code_size=self.box_n_dim, - ) - - loc_loss_reduced = loc_loss.sum() / batch_size_device - loc_loss_reduced *= self.loss_reg._loss_weight - cls_pos_loss, cls_neg_loss = _get_pos_neg_loss(cls_loss, labels) - cls_pos_loss /= self.loss_norm["pos_cls_weight"] - cls_neg_loss /= self.loss_norm["neg_cls_weight"] - cls_loss_reduced = cls_loss.sum() / batch_size_device - cls_loss_reduced *= self.loss_cls._loss_weight - - loss = loc_loss_reduced + cls_loss_reduced - - if self.use_direction_classifier: - dir_targets = get_direction_target( - example["anchors"][task_id], - reg_targets, - dir_offset=self.direction_offset, - ) - dir_logits = preds_dict["dir_cls_preds"].view(batch_size_device, -1, 2) - weights = (labels > 0).type_as(dir_logits) - weights /= torch.clamp(weights.sum(-1, keepdim=True), min=1.0) - dir_loss = self.loss_aux(dir_logits, dir_targets, weights=weights) - dir_loss = dir_loss.sum() / batch_size_device - loss += dir_loss * self.loss_aux._loss_weight - - # losses['loss_aux'] = dir_loss - - loc_loss_elem = [ - loc_loss[:, :, i].sum() / batch_size_device - for i in range(loc_loss.shape[-1]) - ] - ret = { - "loss": loss, - "cls_pos_loss": cls_pos_loss.detach().cpu(), - "cls_neg_loss": cls_neg_loss.detach().cpu(), - "dir_loss_reduced": dir_loss.detach().cpu() - if self.use_direction_classifier - else torch.tensor(0), - "cls_loss_reduced": cls_loss_reduced.detach().cpu().mean(), - "loc_loss_reduced": loc_loss_reduced.detach().cpu().mean(), - "loc_loss_elem": [elem.detach().cpu() for elem in loc_loss_elem], - "num_pos": (labels > 0)[0].sum(), - "num_neg": (labels == 0)[0].sum(), - } - - # self.rpn_acc.clear() - # losses['acc'] = self.rpn_acc( - # example['labels'][task_id], - # cls_preds, - # cared, - # ) - - # losses['pr'] = {} - # self.rpn_pr.clear() - # prec, rec = self.rpn_pr( - # example['labels'][task_id], - # cls_preds, - # cared, - # ) - # for i, thresh in enumerate(self.rpn_pr.thresholds): - # losses["pr"][f"prec@{int(thresh*100)}"] = float(prec[i]) - # losses["pr"][f"rec@{int(thresh*100)}"] = float(rec[i]) - - rets.append(ret) - """convert batch-key to key-batch - """ - rets_merged = defaultdict(list) - for ret in rets: - for k, v in ret.items(): - rets_merged[k].append(v) - - return rets_merged - - def predict(self, example, preds_dicts, test_cfg, **kwargs): - """start with v1.6.0, this function don't contain any kitti-specific code. - Returns: - predict: list of pred_dict. - pred_dict: { - box3d_lidar: [N, 7] 3d box. - scores: [N] - label_preds: [N] - metadata: meta-data which contains dataset-specific information. - for kitti, it contains image idx (label idx), - for nuscenes, sample_token is saved in it. - } - """ - voxels = example["voxels"] - num_points = example["num_points"] - coors = example["coordinates"] - batch_anchors = example["anchors"] - batch_size_device = batch_anchors[0].shape[0] - rets = [] - for task_id, preds_dict in enumerate(preds_dicts): - batch_size = batch_anchors[task_id].shape[0] - - if "metadata" not in example or len(example["metadata"]) == 0: - meta_list = [None] * batch_size - else: - meta_list = example["metadata"] - - batch_task_anchors = example["anchors"][task_id].view( - batch_size, -1, self.anchor_dim - ) - - if "anchors_mask" not in example: - batch_anchors_mask = [None] * batch_size - else: - batch_anchors_mask = example["anchors_mask"][task_id].view( - batch_size, -1 - ) - - batch_box_preds = preds_dict["box_preds"] - batch_cls_preds = preds_dict["cls_preds"] - - if self.bev_only: - box_ndim = self.box_n_dim - 2 - else: - box_ndim = self.box_n_dim - - if kwargs.get("mode", False): - batch_box_preds_base = batch_box_preds.view(batch_size, -1, box_ndim) - batch_box_preds = batch_task_anchors.clone() - batch_box_preds[:, :, [0, 1, 3, 4, 6]] = batch_box_preds_base - else: - batch_box_preds = batch_box_preds.view(batch_size, -1, box_ndim) - - num_class_with_bg = self.num_classes[task_id] - - if not self.encode_background_as_zeros: - num_class_with_bg = self.num_classes[task_id] + 1 - - batch_cls_preds = batch_cls_preds.view(batch_size, -1, num_class_with_bg) - - batch_reg_preds = self.box_coder.decode_torch( - batch_box_preds[:, :, : self.box_coder.code_size], batch_task_anchors - ) - - if self.use_direction_classifier: - batch_dir_preds = preds_dict["dir_cls_preds"] - batch_dir_preds = batch_dir_preds.view(batch_size, -1, 2) - else: - batch_dir_preds = [None] * batch_size - rets.append( - self.get_task_detections( - task_id, - num_class_with_bg, - test_cfg, - batch_cls_preds, - batch_reg_preds, - batch_dir_preds, - batch_anchors_mask, - meta_list, - ) - ) - # Merge branches results - num_tasks = len(rets) - ret_list = [] - # len(rets) == task num - # len(rets[0]) == batch_size - num_preds = len(rets) - num_samples = len(rets[0]) - - ret_list = [] - for i in range(num_samples): - ret = {} - for k in rets[0][i].keys(): - if k in ["box3d_lidar", "scores"]: - ret[k] = torch.cat([ret[i][k] for ret in rets]) - elif k in ["label_preds"]: - flag = 0 - for j, num_class in enumerate(self.num_classes): - rets[j][i][k] += flag - flag += num_class - ret[k] = torch.cat([ret[i][k] for ret in rets]) - elif k == "metadata": - # metadata - ret[k] = rets[0][i][k] - ret_list.append(ret) - - return ret_list - - def get_task_detections( - self, - task_id, - num_class_with_bg, - test_cfg, - batch_cls_preds, - batch_reg_preds, - batch_dir_preds=None, - batch_anchors_mask=None, - meta_list=None, - ): - predictions_dicts = [] - post_center_range = test_cfg.post_center_limit_range - if len(post_center_range) > 0: - post_center_range = torch.tensor( - post_center_range, - dtype=batch_reg_preds.dtype, - device=batch_reg_preds.device, - ) - - for box_preds, cls_preds, dir_preds, a_mask, meta in zip( - batch_reg_preds, - batch_cls_preds, - batch_dir_preds, - batch_anchors_mask, - meta_list, - ): - if a_mask is not None: - box_preds = box_preds[a_mask] - cls_preds = cls_preds[a_mask] - - box_preds = box_preds.float() - cls_preds = cls_preds.float() - - if self.use_direction_classifier: - if a_mask is not None: - dir_preds = dir_preds[a_mask] - dir_labels = torch.max(dir_preds, dim=-1)[1] - - if self.encode_background_as_zeros: - # this don't support softmax - assert self.use_sigmoid_score is True - total_scores = torch.sigmoid(cls_preds) - else: - # encode background as first element in one-hot vector - if self.use_sigmoid_score: - total_scores = torch.sigmoid(cls_preds)[..., 1:] - else: - total_scores = F.softmax(cls_preds, dim=-1)[..., 1:] - - # Apply NMS in birdeye view - if test_cfg.nms.use_rotate_nms: - nms_func = box_torch_ops.rotate_nms - else: - nms_func = box_torch_ops.nms - - feature_map_size_prod = ( - batch_reg_preds.shape[1] // self.num_anchor_per_locs[task_id] - ) - - if test_cfg.nms.use_multi_class_nms: - assert self.encode_background_as_zeros is True - boxes_for_nms = box_preds[:, [0, 1, 3, 4, -1]] - if not test_cfg.nms.use_rotate_nms: - box_preds_corners = box_torch_ops.center_to_corner_box2d( - boxes_for_nms[:, :2], boxes_for_nms[:, 2:4], boxes_for_nms[:, 4] - ) - boxes_for_nms = box_torch_ops.corner_to_standup_nd( - box_preds_corners - ) - - selected_boxes, selected_labels, selected_scores = [], [], [] - selected_dir_labels = [] - - scores = total_scores - boxes = boxes_for_nms - selected_per_class = [] - score_threshs = [test_cfg.score_threshold] * self.num_classes[task_id] - pre_max_sizes = [test_cfg.nms.nms_pre_max_size] * self.num_classes[ - task_id - ] - post_max_sizes = [test_cfg.nms.nms_post_max_size] * self.num_classes[ - task_id - ] - iou_thresholds = [test_cfg.nms.nms_iou_threshold] * self.num_classes[ - task_id - ] - - for class_idx, score_thresh, pre_ms, post_ms, iou_th in zip( - range(self.num_classes[task_id]), - score_threshs, - pre_max_sizes, - post_max_sizes, - iou_thresholds, - ): - self._nms_class_agnostic = False - if self._nms_class_agnostic: - class_scores = total_scores.view( - feature_map_size_prod, -1, self.num_classes[task_id] - )[..., class_idx] - class_scores = class_scores.contiguous().view(-1) - class_boxes_nms = boxes.view(-1, boxes_for_nms.shape[-1]) - class_boxes = box_preds - class_dir_labels = dir_labels - else: - # anchors_range = self.target_assigner.anchors_range(class_idx) - anchors_range = self.target_assigners[task_id].anchors_range - class_scores = total_scores.view( - -1, self._num_classes[task_id] - )[anchors_range[0] : anchors_range[1], class_idx] - class_boxes_nms = boxes.view(-1, boxes_for_nms.shape[-1])[ - anchors_range[0] : anchors_range[1], : - ] - class_scores = class_scores.contiguous().view(-1) - class_boxes_nms = class_boxes_nms.contiguous().view( - -1, boxes_for_nms.shape[-1] - ) - class_boxes = box_preds.view(-1, box_preds.shape[-1])[ - anchors_range[0] : anchors_range[1], : - ] - class_boxes = class_boxes.contiguous().view( - -1, box_preds.shape[-1] - ) - if self.use_direction_classifier: - class_dir_labels = dir_labels.view(-1)[ - anchors_range[0] : anchors_range[1] - ] - class_dir_labels = class_dir_labels.contiguous().view(-1) - if score_thresh > 0.0: - class_scores_keep = class_scores >= score_thresh - if class_scores_keep.shape[0] == 0: - selected_per_class.append(None) - continue - class_scores = class_scores[class_scores_keep] - if class_scores.shape[0] != 0: - if score_thresh > 0.0: - class_boxes_nms = class_boxes_nms[class_scores_keep] - class_boxes = class_boxes[class_scores_keep] - class_dir_labels = class_dir_labels[class_scores_keep] - keep = nms_func( - class_boxes_nms, class_scores, pre_ms, post_ms, iou_th - ) - if keep.shape[0] != 0: - selected_per_class.append(keep) - else: - selected_per_class.append(None) - else: - selected_per_class.append(None) - selected = selected_per_class[-1] - - if selected is not None: - selected_boxes.append(class_boxes[selected]) - selected_labels.append( - torch.full( - [class_boxes[selected].shape[0]], - class_idx, - dtype=torch.int64, - device=box_preds.device, - ) - ) - if self.use_direction_classifier: - selected_dir_labels.append(class_dir_labels[selected]) - selected_scores.append(class_scores[selected]) - # else: - # selected_boxes.append(torch.Tensor([], device=class_boxes.device)) - # selected_labels.append(torch.Tensor([], device=box_preds.device)) - # selected_scores.append(torch.Tensor([], device=class_scores.device)) - # if self.use_direction_classifier: - # selected_dir_labels.append(torch.Tensor([], device=class_dir_labels.device)) - - selected_boxes = torch.cat(selected_boxes, dim=0) - selected_labels = torch.cat(selected_labels, dim=0) - selected_scores = torch.cat(selected_scores, dim=0) - if self.use_direction_classifier: - selected_dir_labels = torch.cat(selected_dir_labels, dim=0) - - else: - # get highest score per prediction, than apply nms - # to remove overlapped box. - if num_class_with_bg == 1: - top_scores = total_scores.squeeze(-1) - top_labels = torch.zeros( - total_scores.shape[0], - device=total_scores.device, - dtype=torch.long, - ) - - else: - top_scores, top_labels = torch.max(total_scores, dim=-1) - - if test_cfg.score_threshold > 0.0: - thresh = torch.tensor( - [test_cfg.score_threshold], device=total_scores.device - ).type_as(total_scores) - top_scores_keep = top_scores >= thresh - top_scores = top_scores.masked_select(top_scores_keep) - - if top_scores.shape[0] != 0: - if test_cfg.score_threshold > 0.0: - box_preds = box_preds[top_scores_keep] - if self.use_direction_classifier: - dir_labels = dir_labels[top_scores_keep] - top_labels = top_labels[top_scores_keep] - - """We change Det3D's cpu nms to pcdet's gpu nms which gives a big speed up""" - # # GPU NMS from PCDet(https://github.com/sshaoshuai/PCDet) - boxes_for_nms = box_torch_ops.boxes3d_to_bevboxes_lidar_torch(box_preds) - if not test_cfg.nms.use_rotate_nms: - box_preds_corners = box_torch_ops.center_to_corner_box2d( - boxes_for_nms[:, :2], - boxes_for_nms[:, 2:4], - boxes_for_nms[:, 4], - ) - boxes_for_nms = box_torch_ops.corner_to_standup_nd( - box_preds_corners - ) - # the nms in 3d detection just remove overlap boxes. - selected = box_torch_ops.rotate_nms_pcdet(boxes_for_nms, top_scores, - thresh=test_cfg.nms.nms_iou_threshold, - pre_maxsize=test_cfg.nms.nms_pre_max_size, - post_max_size=test_cfg.nms.nms_post_max_size) - - else: - selected = [] - # if selected is not None: - selected_boxes = box_preds[selected] - if self.use_direction_classifier: - selected_dir_labels = dir_labels[selected] - selected_labels = top_labels[selected] - selected_scores = top_scores[selected] - - # finally generate predictions. - # self.logger.info(f"selected boxes: {selected_boxes.shape}") - if selected_boxes.shape[0] != 0: - # self.logger.info(f"result not none~ Selected boxes: {selected_boxes.shape}") - box_preds = selected_boxes - scores = selected_scores - label_preds = selected_labels - if self.use_direction_classifier: - dir_labels = selected_dir_labels - opp_labels = ( - (box_preds[..., -1] - self.direction_offset) > 0 - ) ^ dir_labels.byte() - box_preds[..., -1] += torch.where( - opp_labels, - torch.tensor(np.pi).type_as(box_preds), - torch.tensor(0.0).type_as(box_preds), - ) - final_box_preds = box_preds - final_scores = scores - final_labels = label_preds - if post_center_range is not None: - mask = (final_box_preds[:, :3] >= post_center_range[:3]).all(1) - mask &= (final_box_preds[:, :3] <= post_center_range[3:]).all(1) - predictions_dict = { - "box3d_lidar": final_box_preds[mask], - "scores": final_scores[mask], - "label_preds": label_preds[mask], - "metadata": meta, - } - else: - predictions_dict = { - "box3d_lidar": final_box_preds, - "scores": final_scores, - "label_preds": label_preds, - "metadata": meta, - } - else: - dtype = batch_reg_preds.dtype - device = batch_reg_preds.device - predictions_dict = { - "box3d_lidar": torch.zeros([0, self.anchor_dim], dtype=dtype, device=device), - "scores": torch.zeros([0], dtype=dtype, device=device), - "label_preds": torch.zeros( - [0], dtype=top_labels.dtype, device=device - ), - "metadata": meta, - } - - predictions_dicts.append(predictions_dict) - - return predictions_dicts - - -class SepHead(nn.Module): - def __init__( - self, - in_channels, - heads, - head_conv=64, - name="", - final_kernel=1, - bn=False, - init_bias=-2.19, - directional_classifier=False, - **kwargs, - ): - super(SepHead, self).__init__(**kwargs) - - self.heads = heads - for head in self.heads: - classes, num_conv = self.heads[head] - - fc = Sequential() - for i in range(num_conv-1): - fc.add(nn.Conv2d(in_channels, head_conv, - kernel_size=final_kernel, stride=1, - padding=final_kernel // 2, bias=True)) - if bn: - fc.add(nn.BatchNorm2d(head_conv)) - fc.add(nn.ReLU()) - - fc.add(nn.Conv2d(head_conv, classes, - kernel_size=final_kernel, stride=1, - padding=final_kernel // 2, bias=True)) - - if 'hm' in head: - fc[-1].bias.data.fill_(init_bias) - else: - for m in fc.modules(): - if isinstance(m, nn.Conv2d): - kaiming_init(m) - - self.__setattr__(head, fc) - - assert directional_classifier is False, "Doesn't work well with nuScenes in my experiments, please open a pull request if you are able to get it work. We really appreciate contribution for this." - - - def forward(self, x): - ret_dict = dict() - for head in self.heads: - ret_dict[head] = self.__getattr__(head)(x) - - return ret_dict - -class DCNSepHead(nn.Module): - def __init__( - self, - in_channels, - num_cls, - heads, - head_conv=64, - name="", - final_kernel=1, - bn=False, - init_bias=-2.19, - directional_classifier=False, - **kwargs, - ): - super(DCNSepHead, self).__init__(**kwargs) - - # feature adaptation with dcn - # use separate features for classification / regression - self.feature_adapt_cls = FeatureAdaption( - in_channels, - in_channels, - kernel_size=3, - deformable_groups=4) - - self.feature_adapt_reg = FeatureAdaption( - in_channels, - in_channels, - kernel_size=3, - deformable_groups=4) - - - # heatmap prediction head - self.cls_head = Sequential( - nn.Conv2d(in_channels, head_conv, - kernel_size=3, padding=1, bias=True), - nn.BatchNorm2d(64), - nn.ReLU(inplace=True), - nn.Conv2d(head_conv, num_cls, - kernel_size=3, stride=1, - padding=1, bias=True) - ) - self.cls_head[-1].bias.data.fill_(init_bias) - - # other regression target - self.task_head = SepHead(in_channels, heads, head_conv=head_conv, bn=bn, final_kernel=final_kernel) - - - def forward(self, x): - center_feat = self.feature_adapt_cls(x) - reg_feat = self.feature_adapt_reg(x) - - cls_score = self.cls_head(center_feat) - ret = self.task_head(reg_feat) - ret['hm'] = cls_score - - return ret - - -@HEADS.register_module -class CenterHead(nn.Module): - def __init__( - self, - mode="3d", - in_channels=[128,], - norm_cfg=None, - tasks=[], - dataset='nuscenes', - weight=0.25, - code_weights=[], - common_heads=dict(), - encode_rad_error_by_sin=False, - loss_aux=None, - direction_offset=0.0, - direction_weight=0.0, - name="centerhead", - logger=None, - init_bias=-2.19, - share_conv_channel=64, - smooth_loss=False, - no_log=False, - num_hm_conv=2, - dcn_head=False, - bn=True - ): - super(CenterHead, self).__init__() - - num_classes = [len(t["class_names"]) for t in tasks] - self.class_names = [t["class_names"] for t in tasks] - self.code_weights = code_weights - self.weight = weight # weight between hm loss and loc loss - self.dataset = dataset - - self.encode_background_as_zeros = True - self.use_sigmoid_score = True - self.in_channels = in_channels - self.num_classes = num_classes - - self.crit = FocalLoss() - self.crit_reg = RegLoss() - self.loss_aux = None - - self.no_log = no_log - - self.box_n_dim = 9 if dataset == 'nuscenes' else 7 # change this if your box is different - self.num_anchor_per_locs = [n for n in num_classes] - self.use_direction_classifier = False - - if not logger: - logger = logging.getLogger("CenterHead") - self.logger = logger - - self.bev_only = True if mode == "bev" else False - - logger.info( - f"num_classes: {num_classes}" - ) - - # a shared convolution - self.shared_conv = nn.Sequential( - nn.Conv2d(in_channels, share_conv_channel, - kernel_size=3, padding=1, bias=True), - nn.BatchNorm2d(share_conv_channel), - nn.ReLU(inplace=True) - ) - - self.tasks = nn.ModuleList() - print("Use HM Bias: ", init_bias) - - self.smooth_loss = smooth_loss - if self.smooth_loss: - print("Use Smooth L1 Loss!!") - self.crit_reg = SmoothRegLoss() - - if dcn_head: - print("Use Deformable Convolution in the CenterHead!") - - for num_cls in num_classes: - heads = copy.deepcopy(common_heads) - if not dcn_head: - heads.update(dict(hm=(num_cls, num_hm_conv))) - self.tasks.append( - SepHead(share_conv_channel, heads, bn=True, init_bias=init_bias, final_kernel=3, directional_classifier=False) - ) - else: - self.tasks.append( - DCNSepHead(share_conv_channel, num_cls, heads, bn=True, init_bias=init_bias, final_kernel=3, directional_classifier=False) - ) - - logger.info("Finish CenterHead Initialization") - - def init_weights(self, pretrained=None): - pass - - def forward(self, x): - ret_dicts = [] - - x = self.shared_conv(x) - - for task in self.tasks: - ret_dicts.append(task(x)) - - return ret_dicts - - def _sigmoid(self, x): - y = torch.clamp(x.sigmoid_(), min=1e-4, max=1-1e-4) - return y - - def loss(self, example, preds_dicts, **kwargs): - rets = [] - for task_id, preds_dict in enumerate(preds_dicts): - # heatmap focal loss - preds_dict['hm'] = self._sigmoid(preds_dict['hm']) - - hm_loss = self.crit(preds_dict['hm'], example['hm'][task_id]) - - target_box = example['anno_box'][task_id] - # reconstruct the anno_box from multiple reg heads - if self.dataset == 'nuscenes': - preds_dict['anno_box'] = torch.cat((preds_dict['reg'], preds_dict['height'], preds_dict['dim'], - preds_dict['vel'], preds_dict['rot']), dim=1) - elif self.dataset == 'waymo': - preds_dict['anno_box'] = torch.cat((preds_dict['reg'], preds_dict['height'], preds_dict['dim'], - preds_dict['rot']), dim=1) - else: - raise NotImplementedError() - - loss = 0 - ret = {} - - # Regression loss for dimension, offset, height, rotation - box_loss = self.crit_reg(preds_dict['anno_box'], example['mask'][task_id], example['ind'][task_id], target_box) - - loc_loss = (box_loss*box_loss.new_tensor(self.code_weights)).sum() - - loss += hm_loss + self.weight*loc_loss - - ret.update({'loss': loss, 'hm_loss': hm_loss.detach().cpu(), 'loc_loss':loc_loss, 'loc_loss_elem': box_loss.detach().cpu(), 'num_positive': example['mask'][task_id].float().sum()}) - - rets.append(ret) - - """convert batch-key to key-batch - """ - rets_merged = defaultdict(list) - for ret in rets: - for k, v in ret.items(): - rets_merged[k].append(v) - - return rets_merged - - - def predict(self, example, preds_dicts, test_cfg, **kwargs): - """decode, nms, then return the detection result. Additionaly support double flip testing - """ - # get loss info - rets = [] - metas = [] - - double_flip = test_cfg.get('double_flip', False) - - post_center_range = test_cfg.post_center_limit_range - if len(post_center_range) > 0: - post_center_range = torch.tensor( - post_center_range, - dtype=preds_dicts[0]['hm'].dtype, - device=preds_dicts[0]['hm'].device, - ) - - for task_id, preds_dict in enumerate(preds_dicts): - batch_size = preds_dict['hm'].shape[0] - num_class_with_bg = self.num_classes[task_id] - - if double_flip: - assert batch_size % 4 == 0, print(batch_size) - batch_size = int(batch_size / 4) - for k in preds_dict.keys(): - # transform the prediction map back to their original coordinate befor flipping - # the flipped predictions are ordered in a group of 4. The first one is the original pointcloud - # the second one is X flip pointcloud(y=-y), the third one is Y flip pointcloud(x=-x), and the last one is - # X and Y flip pointcloud(x=-x, y=-y). - # Also please note that pytorch's flip function is defined on higher dimensional space, so dims=[2] means that - # it is flipping along the axis with H length(which is normaly the Y axis), however in our traditional word, it is flipping along - # the X axis. The below flip follows pytorch's definition yflip(y=-y) xflip(x=-x) - _, C, H, W = preds_dict[k].shape - preds_dict[k] = preds_dict[k].reshape(int(batch_size), 4, C, H, W) - preds_dict[k][:, 1] = torch.flip(preds_dict[k][:, 1], dims=[2]) - preds_dict[k][:, 2] = torch.flip(preds_dict[k][:, 2], dims=[3]) - preds_dict[k][:, 3] = torch.flip(preds_dict[k][:, 3], dims=[2, 3]) - - - if "metadata" not in example or len(example["metadata"]) == 0: - meta_list = [None] * batch_size - else: - meta_list = example["metadata"] - if double_flip: - meta_list = meta_list[:4*int(batch_size):4] - - if "anchors_mask" not in example: - batch_anchors_mask = [None] * batch_size - else: - assert False - - batch_hm = preds_dict['hm'].sigmoid_() - - batch_reg = preds_dict['reg'] - batch_hei = preds_dict['height'] - - if not self.no_log: - batch_dim = torch.exp(preds_dict['dim']) - else: - batch_dim = preds_dict['dim'] - - if double_flip: - batch_hm = batch_hm.mean(dim=1) - batch_hei = batch_hei.mean(dim=1) - batch_dim = batch_dim.mean(dim=1) - - # y = -y reg_y = 1-reg_y - batch_reg[:, 1, 1] = 1 - batch_reg[:, 1, 1] - batch_reg[:, 2, 0] = 1 - batch_reg[:, 2, 0] - - batch_reg[:, 3, 0] = 1 - batch_reg[:, 3, 0] - batch_reg[:, 3, 1] = 1 - batch_reg[:, 3, 1] - batch_reg = batch_reg.mean(dim=1) - - batch_rots = preds_dict['rot'][:, :, 0:1] - batch_rotc = preds_dict['rot'][:, :, 1:2] - - # first yflip - # y = -y theta = pi -theta - # sin(pi-theta) = sin(theta) cos(pi-theta) = -cos(theta) - # batch_rots[:, 1] the same - batch_rotc[:, 1] = -batch_rotc[:, 1] - - - # then xflip x = -x theta = 2pi - theta - # sin(2pi - theta) = -sin(theta) cos(2pi - theta) = cos(theta) - # batch_rots[:, 2] the same - batch_rots[:, 2] = -batch_rots[:, 2] - - # double flip - batch_rots[:, 3] = -batch_rots[:, 3] - batch_rotc[:, 3] = -batch_rotc[:, 3] - - batch_rotc = batch_rotc.mean(dim=1) - batch_rots = batch_rots.mean(dim=1) - - else: - batch_rots = preds_dict['rot'][:, 0].unsqueeze(1) - batch_rotc = preds_dict['rot'][:, 1].unsqueeze(1) - - - if 'vel' in preds_dict: - batch_vel = preds_dict['vel'] - if double_flip: - # flip vy - batch_vel[:, 1, 1] = - batch_vel[:, 1, 1] - # flip vx - batch_vel[:, 2, 0] = - batch_vel[:, 2, 0] - - batch_vel[:, 3] = - batch_vel[:, 3] - - batch_vel = batch_vel.mean(dim=1) - else: - batch_vel = None - - batch_dir_preds = [None] * batch_size - - temp = ddd_decode( - batch_hm, - batch_rots, - batch_rotc, - batch_hei, - batch_dim, - batch_dir_preds, - batch_vel, - None, - reg=batch_reg, - post_center_range=post_center_range, - K=test_cfg.max_per_img, - score_threshold=test_cfg.score_threshold, - cfg=test_cfg, - task_id=task_id - ) - - batch_reg_preds = [box['box3d_lidar'] for box in temp] - batch_cls_preds = [box['scores'] for box in temp] - batch_cls_labels = [box['label_preds'] for box in temp] - - metas.append(meta_list) - - if test_cfg.get('max_pool_nms', False) or test_cfg.get('circle_nms', False): - rets.append(temp) - continue - - rets.append( - self.get_task_detections( - task_id, - num_class_with_bg, - test_cfg, - batch_cls_preds, - batch_reg_preds, - batch_cls_labels, - batch_dir_preds, - batch_anchors_mask, - meta_list, - ) - ) - - # Merge branches results - num_tasks = len(rets) - ret_list = [] - num_preds = len(rets) - num_samples = len(rets[0]) - - ret_list = [] - for i in range(num_samples): - ret = {} - for k in rets[0][i].keys(): - if k in ["box3d_lidar", "scores"]: - ret[k] = torch.cat([ret[i][k] for ret in rets]) - elif k in ["label_preds"]: - flag = 0 - for j, num_class in enumerate(self.num_classes): - rets[j][i][k] += flag - flag += num_class - ret[k] = torch.cat([ret[i][k] for ret in rets]) - - ret['metadata'] = metas[0][i] - ret_list.append(ret) - - return ret_list - - def get_task_detections( - self, - task_id, - num_class_with_bg, - test_cfg, - batch_cls_preds, - batch_reg_preds, - batch_cls_labels, - batch_dir_preds=None, - batch_anchors_mask=None, - meta_list=None, - ): - predictions_dicts = [] - post_center_range = test_cfg.post_center_limit_range - if len(post_center_range) > 0: - post_center_range = torch.tensor( - post_center_range, - dtype=batch_reg_preds[0].dtype, - device=batch_reg_preds[0].device, - ) - - for box_preds, cls_preds, cls_labels, dir_preds, a_mask, meta in zip( - batch_reg_preds, - batch_cls_preds, - batch_cls_labels, - batch_dir_preds, - batch_anchors_mask, - meta_list, - ): - if a_mask is not None: - box_preds = box_preds[a_mask] - cls_preds = cls_preds[a_mask] - - box_preds = box_preds.float() - cls_preds = cls_preds.float() - - if self.use_direction_classifier: - if a_mask is not None: - dir_preds = dir_preds[a_mask] - dir_labels = torch.max(dir_preds, dim=-1)[1] - - if self.encode_background_as_zeros: - # this don't support softmax - assert self.use_sigmoid_score is True - # total_scores = torch.sigmoid(cls_preds) - total_scores = cls_preds - else: - # encode background as first element in one-hot vector - if self.use_sigmoid_score: - total_scores = torch.sigmoid(cls_preds)[..., 1:] - else: - total_scores = F.softmax(cls_preds, dim=-1)[..., 1:] - - # Apply NMS in birdeye view - if test_cfg.nms.use_rotate_nms: - nms_func = box_torch_ops.rotate_nms - else: - nms_func = box_torch_ops.nms - - assert test_cfg.nms.use_multi_class_nms is False - """feature_map_size_prod = ( - batch_reg_preds.shape[1] // self.num_anchor_per_locs[task_id] - )""" - if test_cfg.nms.use_multi_class_nms: - assert self.encode_background_as_zeros is True - boxes_for_nms = box_preds[:, [0, 1, 3, 4, -1]] - if not test_cfg.nms.use_rotate_nms: - box_preds_corners = box_torch_ops.center_to_corner_box2d( - boxes_for_nms[:, :2], boxes_for_nms[:, 2:4], boxes_for_nms[:, 4] - ) - boxes_for_nms = box_torch_ops.corner_to_standup_nd( - box_preds_corners - ) - - selected_boxes, selected_labels, selected_scores = [], [], [] - selected_dir_labels = [] - - scores = total_scores - boxes = boxes_for_nms - selected_per_class = [] - score_threshs = [test_cfg.score_threshold] * self.num_classes[task_id] - pre_max_sizes = [test_cfg.nms.nms_pre_max_size] * self.num_classes[ - task_id - ] - post_max_sizes = [test_cfg.nms.nms_post_max_size] * self.num_classes[ - task_id - ] - iou_thresholds = [test_cfg.nms.nms_iou_threshold] * self.num_classes[ - task_id - ] - - for class_idx, score_thresh, pre_ms, post_ms, iou_th in zip( - range(self.num_classes[task_id]), - score_threshs, - pre_max_sizes, - post_max_sizes, - iou_thresholds, - ): - self._nms_class_agnostic = False - if self._nms_class_agnostic: - class_scores = total_scores.view( - feature_map_size_prod, -1, self.num_classes[task_id] - )[..., class_idx] - class_scores = class_scores.contiguous().view(-1) - class_boxes_nms = boxes.view(-1, boxes_for_nms.shape[-1]) - class_boxes = box_preds - class_dir_labels = dir_labels - else: - # anchors_range = self.target_assigner.anchors_range(class_idx) - anchors_range = self.target_assigners[task_id].anchors_range - class_scores = total_scores.view( - -1, self._num_classes[task_id] - )[anchors_range[0] : anchors_range[1], class_idx] - class_boxes_nms = boxes.view(-1, boxes_for_nms.shape[-1])[ - anchors_range[0] : anchors_range[1], : - ] - class_scores = class_scores.contiguous().view(-1) - class_boxes_nms = class_boxes_nms.contiguous().view( - -1, boxes_for_nms.shape[-1] - ) - class_boxes = box_preds.view(-1, box_preds.shape[-1])[ - anchors_range[0] : anchors_range[1], : - ] - class_boxes = class_boxes.contiguous().view( - -1, box_preds.shape[-1] - ) - if self.use_direction_classifier: - class_dir_labels = dir_labels.view(-1)[ - anchors_range[0] : anchors_range[1] - ] - class_dir_labels = class_dir_labels.contiguous().view(-1) - if score_thresh > 0.0: - class_scores_keep = class_scores >= score_thresh - if class_scores_keep.shape[0] == 0: - selected_per_class.append(None) - continue - class_scores = class_scores[class_scores_keep] - if class_scores.shape[0] != 0: - if score_thresh > 0.0: - class_boxes_nms = class_boxes_nms[class_scores_keep] - class_boxes = class_boxes[class_scores_keep] - class_dir_labels = class_dir_labels[class_scores_keep] - keep = nms_func( - class_boxes_nms, class_scores, pre_ms, post_ms, iou_th - ) - if keep.shape[0] != 0: - selected_per_class.append(keep) - else: - selected_per_class.append(None) - else: - selected_per_class.append(None) - selected = selected_per_class[-1] - - if selected is not None: - selected_boxes.append(class_boxes[selected]) - selected_labels.append( - torch.full( - [class_boxes[selected].shape[0]], - class_idx, - dtype=torch.int64, - device=box_preds.device, - ) - ) - if self.use_direction_classifier: - selected_dir_labels.append(class_dir_labels[selected]) - selected_scores.append(class_scores[selected]) - # else: - # selected_boxes.append(torch.Tensor([], device=class_boxes.device)) - # selected_labels.append(torch.Tensor([], device=box_preds.device)) - # selected_scores.append(torch.Tensor([], device=class_scores.device)) - # if self.use_direction_classifier: - # selected_dir_labels.append(torch.Tensor([], device=class_dir_labels.device)) - - selected_boxes = torch.cat(selected_boxes, dim=0) - selected_labels = torch.cat(selected_labels, dim=0) - selected_scores = torch.cat(selected_scores, dim=0) - if self.use_direction_classifier: - selected_dir_labels = torch.cat(selected_dir_labels, dim=0) - - else: - # get highest score per prediction, than apply nms - # to remove overlapped box. - if num_class_with_bg == 1: - top_scores = total_scores.squeeze(-1) - top_labels = torch.zeros( - total_scores.shape[0], - device=total_scores.device, - dtype=torch.long, - ) - - else: - top_labels = cls_labels.long() - top_scores = total_scores.squeeze(-1) - # top_scores, top_labels = torch.max(total_scores, dim=-1) - - if test_cfg.score_threshold > 0.0: - thresh = torch.tensor( - [test_cfg.score_threshold], device=total_scores.device - ).type_as(total_scores) - top_scores_keep = top_scores >= thresh - top_scores = top_scores.masked_select(top_scores_keep) - - if top_scores.shape[0] != 0: - if test_cfg.score_threshold > 0.0: - box_preds = box_preds[top_scores_keep] - if self.use_direction_classifier: - dir_labels = dir_labels[top_scores_keep] - top_labels = top_labels[top_scores_keep] - # boxes_for_nms = box_preds[:, [0, 1, 3, 4, -1]] - - # GPU NMS from PCDet(https://github.com/sshaoshuai/PCDet) - boxes_for_nms = box_torch_ops.boxes3d_to_bevboxes_lidar_torch(box_preds) - if not test_cfg.nms.use_rotate_nms: - box_preds_corners = box_torch_ops.center_to_corner_box2d( - boxes_for_nms[:, :2], - boxes_for_nms[:, 2:4], - boxes_for_nms[:, 4], - ) - boxes_for_nms = box_torch_ops.corner_to_standup_nd( - box_preds_corners - ) - # the nms in 3d detection just remove overlap boxes. - - selected = box_torch_ops.rotate_nms_pcdet(boxes_for_nms, top_scores, - thresh=test_cfg.nms.nms_iou_threshold, - pre_maxsize=test_cfg.nms.nms_pre_max_size, - post_max_size=test_cfg.nms.nms_post_max_size) - else: - selected = [] - - # if selected is not None: - selected_boxes = box_preds[selected] - if self.use_direction_classifier: - selected_dir_labels = dir_labels[selected] - selected_labels = top_labels[selected] - selected_scores = top_scores[selected] - - # finally generate predictions. - # self.logger.info(f"selected boxes: {selected_boxes.shape}") - if selected_boxes.shape[0] != 0: - # self.logger.info(f"result not none~ Selected boxes: {selected_boxes.shape}") - box_preds = selected_boxes - scores = selected_scores - label_preds = selected_labels - if self.use_direction_classifier: - dir_labels = selected_dir_labels - opp_labels = ( - (box_preds[..., -1] - self.direction_offset) > 0 - ) ^ dir_labels.byte() - box_preds[..., -1] += torch.where( - opp_labels, - torch.tensor(np.pi).type_as(box_preds), - torch.tensor(0.0).type_as(box_preds), - ) - final_box_preds = box_preds - final_scores = scores - final_labels = label_preds - if post_center_range is not None: - mask = (final_box_preds[:, :3] >= post_center_range[:3]).all(1) - mask &= (final_box_preds[:, :3] <= post_center_range[3:]).all(1) - predictions_dict = { - "box3d_lidar": final_box_preds[mask], - "scores": final_scores[mask], - "label_preds": label_preds[mask], - "metadata": meta, - } - else: - predictions_dict = { - "box3d_lidar": final_box_preds, - "scores": final_scores, - "label_preds": label_preds, - "metadata": meta, - } - else: - dtype = batch_reg_preds[0].dtype - device = batch_reg_preds[0].device - predictions_dict = { - "box3d_lidar": torch.zeros([0, self.box_n_dim], dtype=dtype, device=device), - "scores": torch.zeros([0], dtype=dtype, device=device), - "label_preds": torch.zeros( - [0], dtype=top_labels.dtype, device=device - ), - "metadata": meta, - } - - predictions_dicts.append(predictions_dict) - - return predictions_dicts diff --git a/det3d/models/builder.py b/det3d/models/builder.py index baba775..0b15789 100644 --- a/det3d/models/builder.py +++ b/det3d/models/builder.py @@ -8,8 +8,8 @@ LOSSES, NECKS, READERS, - ROI_EXTRACTORS, - SHARED_HEADS, + SECOND_STAGE, + ROI_HEAD ) @@ -20,6 +20,12 @@ def build(cfg, registry, default_args=None): else: return build_from_cfg(cfg, registry, default_args) +def build_second_stage_module(cfg): + return build(cfg, SECOND_STAGE) + +def build_roi_head(cfg): + return build(cfg, ROI_HEAD) + def build_reader(cfg): return build(cfg, READERS) @@ -32,15 +38,6 @@ def build_backbone(cfg): def build_neck(cfg): return build(cfg, NECKS) - -def build_roi_extractor(cfg): - return build(cfg, ROI_EXTRACTORS) - - -def build_shared_head(cfg): - return build(cfg, SHARED_HEADS) - - def build_head(cfg): return build(cfg, HEADS) diff --git a/det3d/models/detectors/__init__.py b/det3d/models/detectors/__init__.py index 4e1d767..888e458 100644 --- a/det3d/models/detectors/__init__.py +++ b/det3d/models/detectors/__init__.py @@ -2,6 +2,7 @@ from .point_pillars import PointPillars from .single_stage import SingleStageDetector from .voxelnet import VoxelNet +from .two_stage import TwoStageDetector __all__ = [ "BaseDetector", diff --git a/det3d/models/detectors/point_pillars.py b/det3d/models/detectors/point_pillars.py index cd2b0dd..a1e0c90 100644 --- a/det3d/models/detectors/point_pillars.py +++ b/det3d/models/detectors/point_pillars.py @@ -1,6 +1,6 @@ from ..registry import DETECTORS from .single_stage import SingleStageDetector - +from copy import deepcopy @DETECTORS.register_module class PointPillars(SingleStageDetector): @@ -53,7 +53,7 @@ def forward(self, example, return_loss=True, **kwargs): else: return self.bbox_head.predict(example, preds, self.test_cfg) - def pred_hm(self, example): + def forward_two_stage(self, example, return_loss=True, **kwargs): voxels = example["voxels"] coordinates = example["coordinates"] num_points_in_voxel = example["num_points"] @@ -70,9 +70,21 @@ def pred_hm(self, example): ) x = self.extract_feat(data) - preds_dicts = self.bbox_head(x) + bev_feature = x + preds = self.bbox_head(x) + + # manual deepcopy ... + new_preds = [] + for pred in preds: + new_pred = {} + for k, v in pred.items(): + new_pred[k] = v.detach() - return preds_dicts + new_preds.append(new_pred) - def pred_result(self, example, preds): - self.bbox_head.predict(example, preds, self.test_cfg) \ No newline at end of file + boxes = self.bbox_head.predict(example, new_preds, self.test_cfg) + + if return_loss: + return boxes, bev_feature, self.bbox_head.loss(example, preds) + else: + return boxes, bev_feature, None \ No newline at end of file diff --git a/det3d/models/detectors/single_stage.py b/det3d/models/detectors/single_stage.py index 2cb4dcf..6275b69 100644 --- a/det3d/models/detectors/single_stage.py +++ b/det3d/models/detectors/single_stage.py @@ -3,6 +3,8 @@ from .. import builder from ..registry import DETECTORS from .base import BaseDetector +from ..utils.finetune_utils import FrozenBatchNorm2d +from det3d.torchie.trainer import load_checkpoint @DETECTORS.register_module @@ -26,19 +28,17 @@ def __init__( self.train_cfg = train_cfg self.test_cfg = test_cfg - # self.init_weights(pretrained=pretrained) + self.init_weights(pretrained=pretrained) def init_weights(self, pretrained=None): - super(SingleStageDetector, self).init_weights(pretrained) - self.backbone.init_weights(pretrained=pretrained) - if self.with_neck: - if isinstance(self.neck, nn.Sequential): - for m in self.neck: - m.init_weights() - else: - self.neck.init_weights() - self.bbox_head.init_weights() - + if pretrained is None: + return + try: + load_checkpoint(self, pretrained, strict=False) + print("init weight from {}".format(pretrained)) + except: + print("no pretrained model at {}".format(pretrained)) + def extract_feat(self, data): input_features = self.reader(data) x = self.backbone(input_features) @@ -46,24 +46,6 @@ def extract_feat(self, data): x = self.neck(x) return x - def forward_dummy(self, example): - x = self.extract_feat(example) - outs = self.bbox_head(x) - return outs - - """ - def simple_test(self, example, example_meta, rescale=False): - x = self.extract_feat(example) - outs = self.bbox_head(x) - bbox_inputs = outs + (example_meta, self.test_cfg, rescale) - bbox_list = self.bbox_head.get_bboxes(*bbox_inputs) - bbox_results = [ - bbox2result(det_bboxes, det_labels, self.bbox_head.num_classes) - for det_bboxes, det_labels in bbox_list - ] - return bbox_results[0] - """ - def aug_test(self, example, rescale=False): raise NotImplementedError @@ -72,3 +54,9 @@ def forward(self, example, return_loss=True, **kwargs): def predict(self, example, preds_dicts): pass + + def freeze(self): + for p in self.parameters(): + p.requires_grad = False + FrozenBatchNorm2d.convert_frozen_batchnorm(self) + return self \ No newline at end of file diff --git a/det3d/models/detectors/two_stage.py b/det3d/models/detectors/two_stage.py new file mode 100644 index 0000000..322280a --- /dev/null +++ b/det3d/models/detectors/two_stage.py @@ -0,0 +1,193 @@ +from det3d.core.bbox import box_torch_ops +from ..registry import DETECTORS +from .base import BaseDetector +from .. import builder +import torch +from torch import nn + +@DETECTORS.register_module +class TwoStageDetector(BaseDetector): + def __init__( + self, + first_stage_cfg, + second_stage_modules, + roi_head, + NMS_POST_MAXSIZE, + num_point=1, + freeze=False, + **kwargs + ): + super(TwoStageDetector, self).__init__() + self.single_det = builder.build_detector(first_stage_cfg, **kwargs) + self.NMS_POST_MAXSIZE = NMS_POST_MAXSIZE + + if freeze: + print("Freeze First Stage Network") + # we train the model in two steps + self.single_det = self.single_det.freeze() + self.bbox_head = self.single_det.bbox_head + + self.second_stage = nn.ModuleList() + # can be any number of modules + # bird eye view, cylindrical view, image, multiple timesteps, etc.. + for module in second_stage_modules: + self.second_stage.append(builder.build_second_stage_module(module)) + + self.roi_head = builder.build_roi_head(roi_head) + + self.num_point = num_point + + def combine_loss(self, one_stage_loss, roi_loss, tb_dict): + one_stage_loss['loss'][0] += (roi_loss) + + for i in range(len(one_stage_loss['loss'])): + one_stage_loss['roi_reg_loss'].append(tb_dict['rcnn_loss_reg']) + one_stage_loss['roi_cls_loss'].append(tb_dict['rcnn_loss_cls']) + + return one_stage_loss + + def get_box_center(self, boxes): + # box [List] + centers = [] + for box in boxes: + if self.num_point == 1 or len(box['box3d_lidar']) == 0: + centers.append(box['box3d_lidar'][:, :3]) + + elif self.num_point == 5: + center2d = box['box3d_lidar'][:, :2] + height = box['box3d_lidar'][:, 2:3] + dim2d = box['box3d_lidar'][:, 3:5] + rotation_y = box['box3d_lidar'][:, -1] + + corners = box_torch_ops.center_to_corner_box2d(center2d, dim2d, rotation_y) + + front_middle = torch.cat([(corners[:, 0] + corners[:, 1])/2, height], dim=-1) + back_middle = torch.cat([(corners[:, 2] + corners[:, 3])/2, height], dim=-1) + left_middle = torch.cat([(corners[:, 0] + corners[:, 3])/2, height], dim=-1) + right_middle = torch.cat([(corners[:, 1] + corners[:, 2])/2, height], dim=-1) + + points = torch.cat([box['box3d_lidar'][:, :3], front_middle, back_middle, left_middle, \ + right_middle], dim=0) + + centers.append(points) + else: + raise NotImplementedError() + + return centers + + def reorder_first_stage_pred_and_feature(self, first_pred, example, features): + batch_size = len(first_pred) + box_length = first_pred[0]['box3d_lidar'].shape[1] + feature_vector_length = sum([feat[0].shape[-1] for feat in features]) + + rois = first_pred[0]['box3d_lidar'].new_zeros((batch_size, + self.NMS_POST_MAXSIZE, box_length + )) + roi_scores = first_pred[0]['scores'].new_zeros((batch_size, + self.NMS_POST_MAXSIZE + )) + roi_labels = first_pred[0]['label_preds'].new_zeros((batch_size, + self.NMS_POST_MAXSIZE), dtype=torch.long + ) + roi_features = features[0][0].new_zeros((batch_size, + self.NMS_POST_MAXSIZE, feature_vector_length + )) + + for i in range(batch_size): + num_obj = features[0][i].shape[0] + # basically move rotation to position 6, so now the box is 7 + C . C is 2 for nuscenes to + # include velocity target + + box_preds = first_pred[i]['box3d_lidar'] + + if self.roi_head.code_size == 9: + # x, y, z, w, l, h, rotation_y, velocity_x, velocity_y + box_preds = box_preds[:, [0, 1, 2, 3, 4, 5, 8, 6, 7]] + + rois[i, :num_obj] = box_preds + roi_labels[i, :num_obj] = first_pred[i]['label_preds'] + 1 + roi_scores[i, :num_obj] = first_pred[i]['scores'] + roi_features[i, :num_obj] = torch.cat([feat[i] for feat in features], dim=-1) + + example['rois'] = rois + example['roi_labels'] = roi_labels + example['roi_scores'] = roi_scores + example['roi_features'] = roi_features + + example['has_class_labels']= True + + return example + + def post_process(self, batch_dict): + batch_size = batch_dict['batch_size'] + pred_dicts = [] + + for index in range(batch_size): + box_preds = batch_dict['batch_box_preds'][index] + cls_preds = batch_dict['batch_cls_preds'][index] # this is the predicted iou + label_preds = batch_dict['roi_labels'][index] + + if box_preds.shape[-1] == 9: + # move rotation to the end (the create submission file will take elements from 0:6 and -1) + box_preds = box_preds[:, [0, 1, 2, 3, 4, 5, 7, 8, 6]] + + scores = torch.sqrt(torch.sigmoid(cls_preds).reshape(-1) * batch_dict['roi_scores'][index].reshape(-1)) + mask = (label_preds != 0).reshape(-1) + + box_preds = box_preds[mask, :] + scores = scores[mask] + labels = label_preds[mask]-1 + + # currently don't need nms + pred_dict = { + 'box3d_lidar': box_preds, + 'scores': scores, + 'label_preds': labels, + "metadata": batch_dict["metadata"][index] + } + + pred_dicts.append(pred_dict) + + return pred_dicts + + + def forward(self, example, return_loss=True, **kwargs): + out = self.single_det.forward_two_stage(example, + return_loss, **kwargs) + if len(out) == 4: + one_stage_pred, bev_feature, voxel_feature, one_stage_loss = out + example['voxel_feature'] = voxel_feature + elif len(out) == 3: + one_stage_pred, bev_feature, one_stage_loss = out + else: + raise NotImplementedError + + # N C H W -> N H W C + example['bev_feature'] = bev_feature.permute(0, 2, 3, 1).contiguous() + + centers_vehicle_frame = self.get_box_center(one_stage_pred) + + if self.roi_head.code_size == 7 and return_loss is True: + # drop velocity + example['gt_boxes_and_cls'] = example['gt_boxes_and_cls'][:, :, [0, 1, 2, 3, 4, 5, 6, -1]] + + features = [] + + for module in self.second_stage: + feature = module.forward(example, centers_vehicle_frame, self.num_point) + features.append(feature) + # feature is two level list + # first level is number of two stage information streams + # second level is batch + + example = self.reorder_first_stage_pred_and_feature(first_pred=one_stage_pred, example=example, features=features) + + # final classification / regression + batch_dict = self.roi_head(example, training=return_loss) + + if return_loss: + roi_loss, tb_dict = self.roi_head.get_loss() + + return self.combine_loss(one_stage_loss, roi_loss, tb_dict) + else: + return self.post_process(batch_dict) diff --git a/det3d/models/detectors/voxelnet.py b/det3d/models/detectors/voxelnet.py index 6248dbf..80b9011 100644 --- a/det3d/models/detectors/voxelnet.py +++ b/det3d/models/detectors/voxelnet.py @@ -1,6 +1,8 @@ from ..registry import DETECTORS from .single_stage import SingleStageDetector - +from det3d.torchie.trainer import load_checkpoint +import torch +from copy import deepcopy @DETECTORS.register_module class VoxelNet(SingleStageDetector): @@ -17,16 +19,16 @@ def __init__( super(VoxelNet, self).__init__( reader, backbone, neck, bbox_head, train_cfg, test_cfg, pretrained ) - + def extract_feat(self, data): input_features = self.reader(data["features"], data["num_voxels"]) - x = self.backbone( + x, voxel_feature = self.backbone( input_features, data["coors"], data["batch_size"], data["input_shape"] ) if self.with_neck: x = self.neck(x) - return x + return x, voxel_feature def forward(self, example, return_loss=True, **kwargs): voxels = example["voxels"] @@ -44,7 +46,7 @@ def forward(self, example, return_loss=True, **kwargs): input_shape=example["shape"][0], ) - x = self.extract_feat(data) + x, _ = self.extract_feat(data) preds = self.bbox_head(x) if return_loss: @@ -52,7 +54,7 @@ def forward(self, example, return_loss=True, **kwargs): else: return self.bbox_head.predict(example, preds, self.test_cfg) - def pred_hm(self, example): + def forward_two_stage(self, example, return_loss=True, **kwargs): voxels = example["voxels"] coordinates = example["coordinates"] num_points_in_voxel = example["num_points"] @@ -68,10 +70,22 @@ def pred_hm(self, example): input_shape=example["shape"][0], ) - x = self.extract_feat(data) - preds_dicts = self.bbox_head(x) + x, voxel_feature = self.extract_feat(data) + bev_feature = x + preds = self.bbox_head(x) + + # manual deepcopy ... + new_preds = [] + for pred in preds: + new_pred = {} + for k, v in pred.items(): + new_pred[k] = v.detach() - return preds_dicts + new_preds.append(new_pred) - def pred_result(self, example, preds): - return self.bbox_head.predict(example, preds, self.test_cfg) + boxes = self.bbox_head.predict(example, new_preds, self.test_cfg) + + if return_loss: + return boxes, bev_feature, voxel_feature, self.bbox_head.loss(example, preds) + else: + return boxes, bev_feature, voxel_feature, None diff --git a/det3d/models/losses/__init__.py b/det3d/models/losses/__init__.py index d18db85..e69de29 100644 --- a/det3d/models/losses/__init__.py +++ b/det3d/models/losses/__init__.py @@ -1,34 +0,0 @@ -from .balanced_l1_loss import BalancedL1Loss -from .cross_entropy_loss import CrossEntropyLoss -from .ghm_loss import GHMCLoss, GHMRLoss - -# from .iou_loss import IoULoss -from .mse_loss import MSELoss -from .accuracy import accuracy -from .smooth_l1_loss import SmoothL1Loss -from .losses import ( - WeightedL2LocalizationLoss, - WeightedSmoothL1Loss, - WeightedSigmoidClassificationLoss, - SigmoidFocalLoss, - SoftmaxFocalClassificationLoss, - WeightedSoftmaxClassificationLoss, - BootstrappedSigmoidClassificationLoss, -) - -__all__ = [ - "BalancedL1Loss", - "CrossEntropyLoss", - "FocalLoss", - "GHMCLoss", - "MSELoss", - "SmoothL1Loss", - "WeightedL2LocalizationLoss", - "WeightedSmoothL1Loss", - "WeightedL1Loss" - "WeightedSigmoidClassificationLoss", - "SigmoidFocalLoss", - "SoftmaxFocalClassificationLoss", - "WeightedSoftmaxClassificationLoss", - "BootstrappedSigmoidClassificationLoss", -] diff --git a/det3d/models/losses/accuracy.py b/det3d/models/losses/accuracy.py deleted file mode 100644 index c917a15..0000000 --- a/det3d/models/losses/accuracy.py +++ /dev/null @@ -1,30 +0,0 @@ -import torch.nn as nn - - -def accuracy(pred, target, topk=1): - assert isinstance(topk, (int, tuple)) - if isinstance(topk, int): - topk = (topk,) - return_single = True - else: - return_single = False - - maxk = max(topk) - _, pred_label = pred.topk(maxk, dim=1) - pred_label = pred_label.t() - correct = pred_label.eq(target.view(1, -1).expand_as(pred_label)) - - res = [] - for k in topk: - correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) - res.append(correct_k.mul_(100.0 / pred.size(0))) - return res[0] if return_single else res - - -class Accuracy(nn.Module): - def __init__(self, topk=(1,)): - super().__init__() - self.topk = topk - - def forward(self, pred, target): - return accuracy(pred, target, self.topk) diff --git a/det3d/models/losses/balanced_l1_loss.py b/det3d/models/losses/balanced_l1_loss.py deleted file mode 100644 index 7f94b17..0000000 --- a/det3d/models/losses/balanced_l1_loss.py +++ /dev/null @@ -1,63 +0,0 @@ -import numpy as np -import torch -import torch.nn as nn - -from ..registry import LOSSES -from .utils import weighted_loss - - -@weighted_loss -def balanced_l1_loss(pred, target, beta=1.0, alpha=0.5, gamma=1.5, reduction="mean"): - assert beta > 0 - assert pred.size() == target.size() and target.numel() > 0 - - diff = torch.abs(pred - target) - b = np.e ** (gamma / alpha) - 1 - loss = torch.where( - diff < beta, - alpha / b * (b * diff + 1) * torch.log(b * diff / beta + 1) - alpha * diff, - gamma * diff + gamma / b - alpha * beta, - ) - - return loss - - -@LOSSES.register_module -class BalancedL1Loss(nn.Module): - """Balanced L1 Loss - arXiv: https://arxiv.org/pdf/1904.02701.pdf (CVPR 2019) - """ - - def __init__( - self, alpha=0.5, gamma=1.5, beta=1.0, reduction="mean", loss_weight=1.0 - ): - super(BalancedL1Loss, self).__init__() - self.alpha = alpha - self.gamma = gamma - self.beta = beta - self.reduction = reduction - self.loss_weight = loss_weight - - def forward( - self, - pred, - target, - weight=None, - avg_factor=None, - reduction_override=None, - **kwargs - ): - assert reduction_override in (None, "none", "mean", "sum") - reduction = reduction_override if reduction_override else self.reduction - loss_bbox = self.loss_weight * balanced_l1_loss( - pred, - target, - weight, - alpha=self.alpha, - gamma=self.gamma, - beta=self.beta, - reduction=reduction, - avg_factor=avg_factor, - **kwargs - ) - return loss_bbox diff --git a/det3d/models/losses/centernet_loss.py b/det3d/models/losses/centernet_loss.py index 44033f7..4d65e42 100644 --- a/det3d/models/losses/centernet_loss.py +++ b/det3d/models/losses/centernet_loss.py @@ -1,118 +1,8 @@ -# ------------------------------------------------------------------------------ -# Portions of this code are from -# CornerNet (https://github.com/princeton-vl/CornerNet) -# Copyright (c) 2018, University of Michigan -# Licensed under the BSD 3-Clause License -# ------------------------------------------------------------------------------ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import torch import torch.nn as nn import torch.nn.functional as F from det3d.core.utils.center_utils import _transpose_and_gather_feat -def _neg_loss(pred, gt): - ''' Modified focal loss. Exactly the same as CornerNet. - Runs faster and costs a little bit more memory - Arguments: - pred (batch x c x h x w) - gt_regr (batch x c x h x w) - ''' - pos_inds = gt.eq(1).float() - neg_inds = gt.lt(1).float() - - neg_weights = torch.pow(1 - gt, 4) - - loss = 0 - - pos_loss = torch.log(pred) * torch.pow(1 - pred, 2) * pos_inds - neg_loss = torch.log(1 - pred) * torch.pow(pred, 2) * neg_weights * neg_inds - - num_pos = pos_inds.float().sum() - - pos_loss = pos_loss.sum() - neg_loss = neg_loss.sum() - - if num_pos == 0: - loss = loss - neg_loss - else: - loss = loss - (pos_loss + neg_loss) / num_pos - return loss - -def _reg_loss(regr, gt_regr, mask): - ''' L1 regression loss - Arguments: - regr (batch x max_objects x dim) - gt_regr (batch x max_objects x dim) - mask (batch x max_objects) - ''' - num = mask.float().sum() - mask = mask.unsqueeze(2).expand_as(gt_regr).float() - isnotnan = (~ torch.isnan(gt_regr)).float() - mask *= isnotnan - regr = regr * mask - gt_regr = gt_regr * mask - - loss = torch.abs(regr - gt_regr) - loss = loss.transpose(2, 0) - - loss = torch.sum(loss, dim=2) - loss = torch.sum(loss, dim=1) - # else: - # # D x M x B - # loss = loss.reshape(loss.shape[0], -1) - - loss = loss / (num + 1e-4) - # import pdb; pdb.set_trace() - return loss - - -def _smooth_reg_loss(regr, gt_regr, mask, sigma=3): - ''' L1 regression loss - Arguments: - regr (batch x max_objects x dim) - gt_regr (batch x max_objects x dim) - mask (batch x max_objects) - ''' - num = mask.float().sum() - mask = mask.unsqueeze(2).expand_as(gt_regr).float() - isnotnan = (~ torch.isnan(gt_regr)).float() - mask *= isnotnan - regr = regr * mask - gt_regr = gt_regr * mask - - abs_diff = torch.abs(regr - gt_regr) - - abs_diff_lt_1 = torch.le(abs_diff, 1 / (sigma ** 2)).type_as(abs_diff) - - loss = abs_diff_lt_1 * 0.5 * torch.pow(abs_diff * sigma, 2) + ( - abs_diff - 0.5 / (sigma ** 2) - ) * (1.0 - abs_diff_lt_1) - - loss = loss.transpose(2, 0) - - loss = torch.sum(loss, dim=2) - loss = torch.sum(loss, dim=1) - # else: - # # D x M x B - # loss = loss.reshape(loss.shape[0], -1) - - loss = loss / (num + 1e-4) - # import pdb; pdb.set_trace() - return loss - - -class FocalLoss(nn.Module): - '''nn.Module warpper for focal loss''' - def __init__(self): - super(FocalLoss, self).__init__() - self.neg_loss = _neg_loss - - def forward(self, out, target): - return self.neg_loss(out, target) - class RegLoss(nn.Module): '''Regression loss for an output tensor Arguments: @@ -126,80 +16,22 @@ def __init__(self): def forward(self, output, mask, ind, target): pred = _transpose_and_gather_feat(output, ind) - loss = _reg_loss(pred, target, mask) - return loss + mask = mask.float().unsqueeze(2) -class SmoothRegLoss(nn.Module): - '''Regression loss for an output tensor - Arguments: - output (batch x dim x h x w) - mask (batch x max_objects) - ind (batch x max_objects) - target (batch x max_objects x dim) - ''' - def __init__(self): - super(SmoothRegLoss, self).__init__() - - def forward(self, output, mask, ind, target, sin_loss): - assert sin_loss is False - pred = _transpose_and_gather_feat(output, ind) - loss = _smooth_reg_loss(pred, target, mask) - return loss - - - - -def _reg_cls_loss(regr, gt_regr, mask, is_reduce=True): - ''' L1 regression loss - Arguments: - regr (batch x max_objects x dim) - gt_regr (batch x max_objects x dim) - mask (batch x max_objects) - ''' - regr = regr.squeeze(-1) - gt_regr = gt_regr.float() - num = mask.float().sum() - - regr = regr[mask, :] - gt_regr = gt_regr[mask] - - if len(gt_regr) > 0: - loss = torch.nn.functional.cross_entropy(regr.reshape(-1, regr.shape[-1]), gt_regr.long().reshape(-1), reduction='sum') - else: - loss = (0 * mask).sum() - - loss = loss / (num + 1e-4) - # import pdb; pdb.set_trace() - return loss - - - -class RegClsLoss(nn.Module): - '''Regression CE loss for an output tensor - Arguments: - output (batch x dim x h x w) - mask (batch x max_objects) - ind (batch x max_objects) - target (batch x max_objects x dim) - ''' - def __init__(self): - super(RegClsLoss, self).__init__() - - def forward(self, output, mask, ind, target, is_reduce=True): - pred = _transpose_and_gather_feat(output, ind) - loss = _reg_cls_loss(pred, target, mask, is_reduce) + loss = F.l1_loss(pred*mask, target*mask, reduction='none') + loss = loss / (mask.sum() + 1e-4) + loss = loss.transpose(2 ,0).sum(dim=2).sum(dim=1) return loss - class FastFocalLoss(nn.Module): ''' Reimplemented focal loss, exactly the same as the CornerNet version. Faster and costs much less memory. ''' - def __init__(self, opt=None): + def __init__(self): super(FastFocalLoss, self).__init__() - def forward(self, out, target, ind, mask, cat, is_reduce=True): + def forward(self, out, target, ind, mask, cat): ''' Arguments: out, target: B x C x H x W @@ -209,20 +41,14 @@ def forward(self, out, target, ind, mask, cat, is_reduce=True): mask = mask.float() gt = torch.pow(1 - target, 4) neg_loss = torch.log(1 - out) * torch.pow(out, 2) * gt + neg_loss = neg_loss.sum() pos_pred_pix = _transpose_and_gather_feat(out, ind) # B x M x C pos_pred = pos_pred_pix.gather(2, cat.unsqueeze(2)) # B x M num_pos = mask.sum() pos_loss = torch.log(pos_pred) * torch.pow(1 - pos_pred, 2) * \ mask.unsqueeze(2) - - if is_reduce: - pos_loss = pos_loss.sum() - if num_pos == 0: - return - neg_loss - return - (pos_loss + neg_loss) / num_pos - else: - if num_pos == 0: - return -pos_loss * 0, - neg_loss.sum() - else: - return -pos_loss/num_pos, -neg_loss / num_pos + pos_loss = pos_loss.sum() + if num_pos == 0: + return - neg_loss + return - (pos_loss + neg_loss) / num_pos diff --git a/det3d/models/losses/cross_entropy_loss.py b/det3d/models/losses/cross_entropy_loss.py deleted file mode 100644 index 82ae48f..0000000 --- a/det3d/models/losses/cross_entropy_loss.py +++ /dev/null @@ -1,102 +0,0 @@ -import torch -import torch.nn as nn -import torch.nn.functional as F - -from ..registry import LOSSES -from .utils import weight_reduce_loss - - -def cross_entropy(pred, label, weight=None, reduction="mean", avg_factor=None): - # element-wise losses - loss = F.cross_entropy(pred, label, reduction="none") - - # apply weights and do the reduction - if weight is not None: - weight = weight.float() - loss = weight_reduce_loss( - loss, weight=weight, reduction=reduction, avg_factor=avg_factor - ) - - return loss - - -def _expand_binary_labels(labels, label_weights, label_channels): - bin_labels = labels.new_full((labels.size(0), label_channels), 0) - inds = torch.nonzero(labels >= 1).squeeze() - if inds.numel() > 0: - bin_labels[inds, labels[inds] - 1] = 1 - if label_weights is None: - bin_label_weights = None - else: - bin_label_weights = label_weights.view(-1, 1).expand( - label_weights.size(0), label_channels - ) - return bin_labels, bin_label_weights - - -def binary_cross_entropy(pred, label, weight=None, reduction="mean", avg_factor=None): - if pred.dim() != label.dim(): - label, weight = _expand_binary_labels(label, weight, pred.size(-1)) - - # weighted element-wise losses - if weight is not None: - weight = weight.float() - loss = F.binary_cross_entropy_with_logits( - pred, label.float(), weight, reduction="none" - ) - # do the reduction for the weighted loss - loss = weight_reduce_loss(loss, reduction=reduction, avg_factor=avg_factor) - - return loss - - -def mask_cross_entropy(pred, target, label, reduction="mean", avg_factor=None): - # TODO: handle these two reserved arguments - assert reduction == "mean" and avg_factor is None - num_rois = pred.size()[0] - inds = torch.arange(0, num_rois, dtype=torch.long, device=pred.device) - pred_slice = pred[inds, label].squeeze(1) - return F.binary_cross_entropy_with_logits(pred_slice, target, reduction="mean")[ - None - ] - - -@LOSSES.register_module -class CrossEntropyLoss(nn.Module): - def __init__( - self, use_sigmoid=False, use_mask=False, reduction="mean", loss_weight=1.0 - ): - super(CrossEntropyLoss, self).__init__() - assert (use_sigmoid is False) or (use_mask is False) - self.use_sigmoid = use_sigmoid - self.use_mask = use_mask - self.reduction = reduction - self.loss_weight = loss_weight - - if self.use_sigmoid: - self.cls_criterion = binary_cross_entropy - elif self.use_mask: - self.cls_criterion = mask_cross_entropy - else: - self.cls_criterion = cross_entropy - - def forward( - self, - cls_score, - label, - weight=None, - avg_factor=None, - reduction_override=None, - **kwargs - ): - assert reduction_override in (None, "none", "mean", "sum") - reduction = reduction_override if reduction_override else self.reduction - loss_cls = self.loss_weight * self.cls_criterion( - cls_score, - label, - weight, - reduction=reduction, - avg_factor=avg_factor, - **kwargs - ) - return loss_cls diff --git a/det3d/models/losses/focal_loss.py b/det3d/models/losses/focal_loss.py deleted file mode 100644 index c67499d..0000000 --- a/det3d/models/losses/focal_loss.py +++ /dev/null @@ -1,68 +0,0 @@ -import torch.nn as nn -import torch.nn.functional as F -from det3d.ops import sigmoid_focal_loss as _sigmoid_focal_loss - -from ..registry import LOSSES -from .utils import weight_reduce_loss - - -# This method is only for debugging -def py_sigmoid_focal_loss( - pred, target, weight=None, gamma=2.0, alpha=0.25, reduction="mean", avg_factor=None -): - pred_sigmoid = pred.sigmoid() - target = target.type_as(pred) - pt = (1 - pred_sigmoid) * target + pred_sigmoid * (1 - target) - focal_weight = (alpha * target + (1 - alpha) * (1 - target)) * pt.pow(gamma) - loss = ( - F.binary_cross_entropy_with_logits(pred, target, reduction="none") - * focal_weight - ) - loss = weight_reduce_loss(loss, weight, reduction, avg_factor) - return loss - - -def sigmoid_focal_loss( - pred, target, weight=None, gamma=2.0, alpha=0.25, reduction="mean", avg_factor=None -): - # Function.apply does not accept keyword arguments, so the decorator - # "weighted_loss" is not applicable - loss = _sigmoid_focal_loss(pred, target, gamma, alpha) - # TODO: find a proper way to handle the shape of weight - if weight is not None: - weight = weight.view(-1, 1) - loss = weight_reduce_loss(loss, weight, reduction, avg_factor) - return loss - - -@LOSSES.register_module -class FocalLoss(nn.Module): - def __init__( - self, use_sigmoid=True, gamma=2.0, alpha=0.25, reduction="mean", loss_weight=1.0 - ): - super(FocalLoss, self).__init__() - assert use_sigmoid is True, "Only sigmoid focal loss supported now." - self.use_sigmoid = use_sigmoid - self.gamma = gamma - self.alpha = alpha - self.reduction = reduction - self.loss_weight = loss_weight - - def forward( - self, pred, target, weight=None, avg_factor=None, reduction_override=None - ): - assert reduction_override in (None, "none", "mean", "sum") - reduction = reduction_override if reduction_override else self.reduction - if self.use_sigmoid: - loss_cls = self.loss_weight * sigmoid_focal_loss( - pred, - target, - weight, - gamma=self.gamma, - alpha=self.alpha, - reduction=reduction, - avg_factor=avg_factor, - ) - else: - raise NotImplementedError - return loss_cls diff --git a/det3d/models/losses/ghm_loss.py b/det3d/models/losses/ghm_loss.py deleted file mode 100644 index ee170ca..0000000 --- a/det3d/models/losses/ghm_loss.py +++ /dev/null @@ -1,140 +0,0 @@ -##################### -# THIS LOSS IS NOT WORKING!!!! -##################### -""" -The implementation of GHM-C and GHM-R losses. -Details can be found in the paper `Gradient Harmonized Single-stage Detector`: -https://arxiv.org/abs/1811.05181 -Copyright (c) 2018 Multimedia Laboratory, CUHK. -Licensed under the MIT License (see LICENSE for details) -Written by Buyu Li -""" - -import torch -from det3d.models.losses.losses import Loss, _sigmoid_cross_entropy_with_logits - - -class GHMCLoss(Loss): - def __init__(self, bins=10, momentum=0): - self.bins = bins - self.momentum = momentum - self.edges = [float(x) / bins for x in range(bins + 1)] - self.edges[-1] += 1e-6 - if momentum > 0: - self.acc_sum = [0.0 for _ in range(bins)] - self.count = 50 - - def _compute_loss( - self, prediction_tensor, target_tensor, weights, class_indices=None - ): - """ Args: - input [batch_num, class_num]: - The direct prediction of classification fc layer. - target [batch_num, class_num]: - Binary target (0 or 1) for each sample each class. The value is -1 - when the sample is ignored. - """ - input = prediction_tensor - target = target_tensor - batch_size = prediction_tensor.shape[0] - num_anchors = prediction_tensor.shape[1] - num_class = prediction_tensor.shape[2] - - edges = self.edges - weights_ghm = torch.zeros_like(input).view(-1, num_class) - per_entry_cross_ent = _sigmoid_cross_entropy_with_logits( - labels=target_tensor, logits=prediction_tensor - ) - - # gradient length - g = torch.abs(input.sigmoid().detach() - target).view(-1, num_class) - - valid = weights.view(-1, 1).expand(-1, num_class) >= 0 - num_examples = max(valid.float().sum().item(), 1.0) - num_valid_bins = 0 # n valid bins - - for i in range(self.bins): - inds = (g >= edges[i]) & (g < edges[i + 1]) & valid - num_in_bin = inds.sum().item() - if num_in_bin > 0: - if self.momentum > 0: - self.acc_sum[i] = ( - self.momentum * self.acc_sum[i] - + (1 - self.momentum) * num_in_bin - ) - weights_ghm[inds] = num_examples / self.acc_sum[i] - else: - weights_ghm[inds] = num_examples / num_in_bin - num_valid_bins += 1 - - if num_valid_bins > 0: - weights_ghm = weights_ghm / num_valid_bins - - loss = per_entry_cross_ent * weights_ghm.view( - batch_size, num_anchors, num_class - ) - - # loss = torch.nn.BCEWithLogitsLoss( - # weight=weights_ghm.view(batch_size, num_anchors, num_class), - # reduction='none', - # )(prediction_tensor, target_tensor) - - return loss - - -class GHMRLoss(Loss): - def __init__(self, mu=0.02, bins=10, momentum=0, code_weights=None): - self.mu = mu - self.bins = bins - self.edges = [float(x) / bins for x in range(bins + 1)] - self.edges[-1] = 1e3 - self.momentum = momentum - if momentum > 0: - self.acc_sum = [0.0 for _ in range(bins)] - self._codewise = True - - def _compute_loss(self, prediction_tensor, target_tensor, weights): - """ Args: - input [batch_num, class_num]: - The direct prediction of classification fc layer. - target [batch_num, class_num]: - Binary target (0 or 1) for each sample each class. The value is -1 - when the sample is ignored. - """ - # ASL1 loss - diff = prediction_tensor - target_tensor - loss = torch.sqrt(diff * diff + self.mu * self.mu) - self.mu - batch_size = prediction_tensor.shape[0] - num_anchors = prediction_tensor.shape[1] - num_codes = prediction_tensor.shape[2] - - # gradient length - g = ( - torch.abs(diff / torch.sqrt(self.mu * self.mu + diff * diff)) - .detach() - .view(-1, num_codes) - ) - weights_ghm = torch.zeros_like(g) - - valid = weights.view(-1, 1).expand(-1, num_codes) > 0 - # print(g.shape, prediction_tensor.shape, valid.shape) - num_examples = max(valid.float().sum().item() / num_codes, 1.0) - num_valid_bins = 0 # n: valid bins - for i in range(self.bins): - inds = (g >= self.edges[i]) & (g < self.edges[i + 1]) & valid - num_in_bin = inds.sum().item() - if num_in_bin > 0: - num_valid_bins += 1 - if self.momentum > 0: - self.acc_sum[i] = ( - self.momentum * self.acc_sum[i] - + (1 - self.momentum) * num_in_bin - ) - weights_ghm[inds] = num_examples / self.acc_sum[i] - else: - weights_ghm[inds] = num_examples / num_in_bin - if num_valid_bins > 0: - weights_ghm /= num_valid_bins - weights_ghm = weights_ghm.view(batch_size, num_anchors, num_codes) - loss = loss * weights_ghm / num_examples - return loss diff --git a/det3d/models/losses/iou_loss.py b/det3d/models/losses/iou_loss.py deleted file mode 100644 index e8d1930..0000000 --- a/det3d/models/losses/iou_loss.py +++ /dev/null @@ -1,100 +0,0 @@ -import torch -import torch.nn as nn -from det3d.core import bbox_overlaps - -from ..registry import LOSSES -from .utils import weighted_loss - - -@weighted_loss -def iou_loss(pred, target, eps=1e-6): - """IoU loss. - Computing the IoU loss between a set of predicted bboxes and target bboxes. - The loss is calculated as negative log of IoU. - Args: - pred (Tensor): Predicted bboxes of format (x1, y1, x2, y2), - shape (n, 4). - target (Tensor): Corresponding gt bboxes, shape (n, 4). - eps (float): Eps to avoid log(0). - Return: - Tensor: Loss tensor. - """ - ious = bbox_overlaps(pred, target, is_aligned=True).clamp(min=eps) - loss = -ious.log() - return loss - - -@weighted_loss -def bounded_iou_loss(pred, target, beta=0.2, eps=1e-3): - """Improving Object Localization with Fitness NMS and Bounded IoU Loss, - https://arxiv.org/abs/1711.00164. - Args: - pred (tensor): Predicted bboxes. - target (tensor): Target bboxes. - beta (float): beta parameter in smoothl1. - eps (float): eps to avoid NaN. - """ - pred_ctrx = (pred[:, 0] + pred[:, 2]) * 0.5 - pred_ctry = (pred[:, 1] + pred[:, 3]) * 0.5 - pred_w = pred[:, 2] - pred[:, 0] + 1 - pred_h = pred[:, 3] - pred[:, 1] + 1 - with torch.no_grad(): - target_ctrx = (target[:, 0] + target[:, 2]) * 0.5 - target_ctry = (target[:, 1] + target[:, 3]) * 0.5 - target_w = target[:, 2] - target[:, 0] + 1 - target_h = target[:, 3] - target[:, 1] + 1 - - dx = target_ctrx - pred_ctrx - dy = target_ctry - pred_ctry - - loss_dx = 1 - torch.max( - (target_w - 2 * dx.abs()) / (target_w + 2 * dx.abs() + eps), - torch.zeros_like(dx), - ) - loss_dy = 1 - torch.max( - (target_h - 2 * dy.abs()) / (target_h + 2 * dy.abs() + eps), - torch.zeros_like(dy), - ) - loss_dw = 1 - torch.min(target_w / (pred_w + eps), pred_w / (target_w + eps)) - loss_dh = 1 - torch.min(target_h / (pred_h + eps), pred_h / (target_h + eps)) - loss_comb = torch.stack([loss_dx, loss_dy, loss_dw, loss_dh], dim=-1).view( - loss_dx.size(0), -1 - ) - - loss = torch.where( - loss_comb < beta, 0.5 * loss_comb * loss_comb / beta, loss_comb - 0.5 * beta - ) - return loss - - -@LOSSES.register_module -class IoULoss(nn.Module): - def __init__(self, eps=1e-6, reduction="mean", loss_weight=1.0): - super(IoULoss, self).__init__() - self.eps = eps - self.reduction = reduction - self.loss_weight = loss_weight - - def forward( - self, - pred, - target, - weight=None, - avg_factor=None, - reduction_override=None, - **kwargs - ): - if weight is not None and not torch.any(weight > 0): - return (pred * weight).sum() # 0 - assert reduction_override in (None, "none", "mean", "sum") - reduction = reduction_override if reduction_override else self.reduction - loss = self.loss_weight * iou_loss( - pred, - target, - weight, - eps=self.eps, - reduction=reduction, - avg_factor=avg_factor, - **kwargs - ) - return loss diff --git a/det3d/models/losses/losses.py b/det3d/models/losses/losses.py deleted file mode 100644 index 8adda51..0000000 --- a/det3d/models/losses/losses.py +++ /dev/null @@ -1,577 +0,0 @@ -"""Classification and regression loss functions for object detection. - -Localization losses: - * WeightedL2LocalizationLoss - * WeightedSmoothL1LocalizationLoss - -Classification losses: - * WeightedSigmoidClassificationLoss - * WeightedSoftmaxClassificationLoss - * BootstrappedSigmoidClassificationLoss -""" -from abc import ABCMeta, abstractmethod - -import numpy as np -import torch -import torch.nn as nn -import torch.nn.functional as F -from torch.autograd import Variable - -from ..registry import LOSSES -from .utils import weight_reduce_loss - - -def indices_to_dense_vector( - indices, size, indices_value=1.0, default_value=0, dtype=np.float32 -): - """Creates dense vector with indices set to specific value and rest to zeros. - - This function exists because it is unclear if it is safe to use - tf.sparse_to_dense(indices, [size], 1, validate_indices=False) - with indices which are not ordered. - This function accepts a dynamic size (e.g. tf.shape(tensor)[0]) - - Args: - indices: 1d Tensor with integer indices which are to be set to - indices_values. - size: scalar with size (integer) of output Tensor. - indices_value: values of elements specified by indices in the output vector - default_value: values of other elements in the output vector. - dtype: data type. - - Returns: - dense 1D Tensor of shape [size] with indices set to indices_values and the - rest set to default_value. - """ - dense = torch.zeros(size).fill_(default_value) - dense[indices] = indices_value - - return dense - - -class Loss(object): - """Abstract base class for loss functions.""" - - __metaclass__ = ABCMeta - - def __call__( - self, - prediction_tensor, - target_tensor, - ignore_nan_targets=False, - scope=None, - **params - ): - """Call the loss function. - - Args: - prediction_tensor: an N-d tensor of shape [batch, anchors, ...] - representing predicted quantities. - target_tensor: an N-d tensor of shape [batch, anchors, ...] representing - regression or classification targets. - ignore_nan_targets: whether to ignore nan targets in the loss computation. - E.g. can be used if the target tensor is missing groundtruth data that - shouldn't be factored into the loss. - scope: Op scope name. Defaults to 'Loss' if None. - **params: Additional keyword arguments for specific implementations of - the Loss. - - Returns: - loss: a tensor representing the value of the loss function. - """ - if ignore_nan_targets: - target_tensor = torch.where( - torch.isnan(target_tensor), prediction_tensor, target_tensor - ) - return self._compute_loss(prediction_tensor, target_tensor, **params) - - @abstractmethod - def _compute_loss(self, prediction_tensor, target_tensor, **params): - """Method to be overridden by implementations. - - Args: - prediction_tensor: a tensor representing predicted quantities - target_tensor: a tensor representing regression or classification targets - **params: Additional keyword arguments for specific implementations of - the Loss. - - Returns: - loss: an N-d tensor of shape [batch, anchors, ...] containing the loss per - anchor - """ - pass - - -@LOSSES.register_module -class WeightedL2LocalizationLoss(Loss): - """L2 localization loss function with anchorwise output support. - - Loss[b,a] = .5 * ||weights[b,a] * (prediction[b,a,:] - target[b,a,:])||^2 - """ - - def __init__(self, code_weights=None, device=None): - super().__init__() - if code_weights is not None: - self._code_weights = np.array(code_weights, dtype=np.float32) - self._code_weights = Variable( - torch.from_numpy(self._code_weights).to(device) - ) - else: - self._code_weights = None - - def _compute_loss(self, prediction_tensor, target_tensor, weights): - """Compute loss function. - - Args: - prediction_tensor: A float tensor of shape [batch_size, num_anchors, - code_size] representing the (encoded) predicted locations of objects. - target_tensor: A float tensor of shape [batch_size, num_anchors, - code_size] representing the regression targets - weights: a float tensor of shape [batch_size, num_anchors] - - Returns: - loss: a float tensor of shape [batch_size, num_anchors] tensor - representing the value of the loss function. - """ - diff = prediction_tensor - target_tensor - if self._code_weights is not None: - self._code_weights = self._code_weights.type_as(prediction_tensor) - self._code_weights = self._code_weights.view(1, 1, -1) - diff = self._code_weights * diff - weighted_diff = diff * weights.unsqueeze(-1) - square_diff = 0.5 * weighted_diff * weighted_diff - return square_diff.sum(2) - - -@LOSSES.register_module -class WeightedSmoothL1Loss(nn.Module): - """Smooth L1 localization loss function. - - The smooth L1_loss is defined elementwise as .5 x^2 if |x|<1 and |x|-.5 - otherwise, where x is the difference between predictions and target. - - See also Equation (3) in the Fast R-CNN paper by Ross Girshick (ICCV 2015) - """ - - def __init__( - self, - sigma=3.0, - reduction="mean", - code_weights=None, - codewise=True, - loss_weight=1.0, - raw_l1=False - ): - super(WeightedSmoothL1Loss, self).__init__() - - # just l1 loss - self.raw_l1 = raw_l1 - - print("Raw L1 Loss:", raw_l1) - - self._sigma = sigma - - if code_weights is not None: - self._code_weights = torch.tensor(code_weights, - dtype=torch.float32) - else: - self._code_weights = None - - # self._code_weights = None - - self._codewise = codewise - self._reduction = reduction - self._loss_weight = loss_weight - - def forward(self, prediction_tensor, target_tensor, weights=None): - """Compute loss function. - - Args: - prediction_tensor: A float tensor of shape [batch_size, num_anchors, - code_size] representing the (encoded) predicted locations of objects. - target_tensor: A float tensor of shape [batch_size, num_anchors, - code_size] representing the regression targets - weights: a float tensor of shape [batch_size, num_anchors] - - Returns: - loss: a float tensor of shape [batch_size, num_anchors] tensor - representing the value of the loss function. - """ - diff = prediction_tensor - target_tensor - if self._code_weights is not None: - # code_weights = self._code_weights.type_as(prediction_tensor).to(diff.device) - diff = self._code_weights.view(1, 1, -1).to(diff.device) * diff - abs_diff = torch.abs(diff) - - if not self.raw_l1: - abs_diff_lt_1 = torch.le(abs_diff, 1 / (self._sigma ** 2)).type_as(abs_diff) - loss = abs_diff_lt_1 * 0.5 * torch.pow(abs_diff * self._sigma, 2) + ( - abs_diff - 0.5 / (self._sigma ** 2) - ) * (1.0 - abs_diff_lt_1) - else: - loss = abs_diff - - if self._codewise: - anchorwise_smooth_l1norm = loss - if weights is not None: - anchorwise_smooth_l1norm *= weights.unsqueeze(-1) - else: - anchorwise_smooth_l1norm = torch.sum(loss, 2) # * weights - if weights is not None: - anchorwise_smooth_l1norm *= weights - - return anchorwise_smooth_l1norm - - -@LOSSES.register_module -class WeightedL1Loss(nn.Module): - """L1 localization loss function. - """ - - def __init__( - self, - reduction="mean", - code_weights=None, - codewise=True, - loss_weight=1.0, - ): - super(WeightedL1Loss, self).__init__() - - if code_weights is not None: - self._code_weights = torch.tensor(code_weights, - dtype=torch.float32) - else: - self._code_weights = None - - self._codewise = codewise - self._reduction = reduction - self._loss_weight = loss_weight - - def forward(self, prediction_tensor, target_tensor, weights=None): - """Compute loss function. - - Args: - prediction_tensor: A float tensor of shape [batch_size, num_anchors, - code_size] representing the (encoded) predicted locations of objects. - target_tensor: A float tensor of shape [batch_size, num_anchors, - code_size] representing the regression targets - weights: a float tensor of shape [batch_size, num_anchors] - - Returns: - loss: a float tensor of shape [batch_size, num_anchors] tensor - representing the value of the loss function. - """ - diff = prediction_tensor - target_tensor - if self._code_weights is not None: - diff = self._code_weights.view(1, 1, -1).to(diff.device) * diff - loss = torch.abs(diff) - - if self._codewise: - anchorwise_l1norm = loss - if weights is not None: - anchorwise_l1norm *= weights.unsqueeze(-1) - else: - anchorwise_l1norm = torch.sum(loss, 2) # * weights - if weights is not None: - anchorwise_l1norm *= weights - - return anchorwise_l1norm - -def _sigmoid_cross_entropy_with_logits(logits, labels): - # to be compatible with tensorflow, we don't use ignore_idx - loss = torch.clamp(logits, min=0) - logits * labels.type_as(logits) - loss += torch.log1p(torch.exp(-torch.abs(logits))) - # transpose_param = [0] + [param[-1]] + param[1:-1] - # logits = logits.permute(*transpose_param) - # loss_ftor = nn.NLLLoss(reduce=False) - # loss = loss_ftor(F.logsigmoid(logits), labels) - return loss - - -def _softmax_cross_entropy_with_logits(logits, labels): - param = list(range(len(logits.shape))) - transpose_param = [0] + [param[-1]] + param[1:-1] - logits = logits.permute(*transpose_param) # [N, ..., C] -> [N, C, ...] - loss_ftor = nn.CrossEntropyLoss(reduction="none") - loss = loss_ftor(logits, labels.max(dim=-1)[1]) - return loss - - -@LOSSES.register_module -class WeightedSigmoidClassificationLoss(Loss): - """Sigmoid cross entropy classification loss function.""" - - def _compute_loss( - self, prediction_tensor, target_tensor, weights, class_indices=None - ): - """Compute loss function. - - Args: - prediction_tensor: A float tensor of shape [batch_size, num_anchors, - num_classes] representing the predicted logits for each class - target_tensor: A float tensor of shape [batch_size, num_anchors, - num_classes] representing one-hot encoded classification targets - weights: a float tensor of shape [batch_size, num_anchors] - class_indices: (Optional) A 1-D integer tensor of class indices. - If provided, computes loss only for the specified class indices. - - Returns: - loss: a float tensor of shape [batch_size, num_anchors, num_classes] - representing the value of the loss function. - """ - weights = weights.unsqueeze(-1) - if class_indices is not None: - weights *= ( - indices_to_dense_vector(class_indices, prediction_tensor.shape[2]) - .view(1, 1, -1) - .type_as(prediction_tensor) - ) - per_entry_cross_ent = _sigmoid_cross_entropy_with_logits( - labels=target_tensor, logits=prediction_tensor - ) - return per_entry_cross_ent * weights - - -@LOSSES.register_module -class SigmoidFocalLoss(nn.Module): - """Sigmoid focal cross entropy loss. - - Focal loss down-weights well classified examples and focusses on the hard - examples. See https://arxiv.org/pdf/1708.02002.pdf for the loss definition. - """ - - def __init__(self, gamma=2.0, alpha=0.25, reduction="mean", loss_weight=1.0): - """Constructor. - - Args: - gamma: exponent of the modulating factor (1 - p_t) ^ gamma. - alpha: optional alpha weighting factor to balance positives vs negatives. - all_zero_negative: bool. if True, will treat all zero as background. - else, will treat first label as background. only affect alpha. - """ - super(SigmoidFocalLoss, self).__init__() - self._alpha = alpha - self._gamma = gamma - self._reduction = reduction - self._loss_weight = loss_weight - - def forward( - self, prediction_tensor, target_tensor, weights=None, class_indices=None - ): - """Compute loss function. - - Args: - prediction_tensor: A float tensor of shape [batch_size, num_anchors, - num_classes] representing the predicted logits for each class - target_tensor: A float tensor of shape [batch_size, num_anchors, - num_classes] representing one-hot encoded classification targets - weights: a float tensor of shape [batch_size, num_anchors] - class_indices: (Optional) A 1-D integer tensor of class indices. - If provided, computes loss only for the specified class indices. - - Returns: - loss: a float tensor of shape [batch_size, num_anchors, num_classes] - representing the value of the loss function. - """ - weights = weights.unsqueeze(2) - if class_indices is not None: - weights *= ( - indices_to_dense_vector(class_indices, prediction_tensor.shape[2]) - .view(1, 1, -1) - .type_as(prediction_tensor) - ) - per_entry_cross_ent = _sigmoid_cross_entropy_with_logits( - labels=target_tensor, logits=prediction_tensor - ) - prediction_probabilities = torch.sigmoid(prediction_tensor) - p_t = (target_tensor * prediction_probabilities) + ( - (1 - target_tensor) * (1 - prediction_probabilities) - ) - modulating_factor = 1.0 - if self._gamma: - modulating_factor = torch.pow(1.0 - p_t, self._gamma) - alpha_weight_factor = 1.0 - if self._alpha is not None: - alpha_weight_factor = target_tensor * self._alpha + (1 - target_tensor) * ( - 1 - self._alpha - ) - - focal_cross_entropy_loss = ( - modulating_factor * alpha_weight_factor * per_entry_cross_ent - ) - return focal_cross_entropy_loss * weights - - -@LOSSES.register_module -class SoftmaxFocalClassificationLoss(Loss): - """Softmax focal cross entropy loss. - - Focal loss down-weights well classified examples and focusses on the hard - examples. See https://arxiv.org/pdf/1708.02002.pdf for the loss definition. - """ - - def __init__(self, gamma=2.0, alpha=0.25): - """Constructor. - - Args: - gamma: exponent of the modulating factor (1 - p_t) ^ gamma. - alpha: optional alpha weighting factor to balance positives vs negatives. - """ - self._alpha = alpha - self._gamma = gamma - - def _compute_loss( - self, prediction_tensor, target_tensor, weights, class_indices=None - ): - """Compute loss function. - - Args: - prediction_tensor: A float tensor of shape [batch_size, num_anchors, - num_classes] representing the predicted logits for each class - target_tensor: A float tensor of shape [batch_size, num_anchors, - num_classes] representing one-hot encoded classification targets - weights: a float tensor of shape [batch_size, num_anchors] - class_indices: (Optional) A 1-D integer tensor of class indices. - If provided, computes loss only for the specified class indices. - - Returns: - loss: a float tensor of shape [batch_size, num_anchors, num_classes] - representing the value of the loss function. - """ - weights = weights.unsqueeze(2) - if class_indices is not None: - weights *= ( - indices_to_dense_vector(class_indices, prediction_tensor.shape[2]) - .view(1, 1, -1) - .type_as(prediction_tensor) - ) - per_entry_cross_ent = _softmax_cross_entropy_with_logits( - labels=target_tensor, logits=prediction_tensor - ) - # convert [N, num_anchors] to [N, num_anchors, num_classes] - per_entry_cross_ent = per_entry_cross_ent.unsqueeze(-1) * target_tensor - prediction_probabilities = F.softmax(prediction_tensor, dim=-1) - p_t = (target_tensor * prediction_probabilities) + ( - (1 - target_tensor) * (1 - prediction_probabilities) - ) - modulating_factor = 1.0 - if self._gamma: - modulating_factor = torch.pow(1.0 - p_t, self._gamma) - alpha_weight_factor = 1.0 - if self._alpha is not None: - alpha_weight_factor = torch.where( - target_tensor[..., 0] == 1, - torch.tensor(1 - self._alpha).type_as(per_entry_cross_ent), - torch.tensor(self._alpha).type_as(per_entry_cross_ent), - ) - focal_cross_entropy_loss = ( - modulating_factor * alpha_weight_factor * per_entry_cross_ent - ) - return focal_cross_entropy_loss * weights - - -@LOSSES.register_module -class WeightedSoftmaxClassificationLoss(nn.Module): - """Softmax loss function.""" - - def __init__(self, logit_scale=1.0, loss_weight=1.0, name=""): - """Constructor. - - Args: - logit_scale: When this value is high, the prediction is "diffused" and - when this value is low, the prediction is made peakier. - (default 1.0) - - """ - super(WeightedSoftmaxClassificationLoss, self).__init__() - self.name = name - self._loss_weight = loss_weight - self._logit_scale = logit_scale - - def forward(self, prediction_tensor, target_tensor, weights): - """Compute loss function. - - Args: - prediction_tensor: A float tensor of shape [batch_size, num_anchors, - num_classes] representing the predicted logits for each class - target_tensor: A float tensor of shape [batch_size, num_anchors, - num_classes] representing one-hot encoded classification targets - weights: a float tensor of shape [batch_size, num_anchors] - - Returns: - loss: a float tensor of shape [batch_size, num_anchors] - representing the value of the loss function. - """ - num_classes = prediction_tensor.shape[-1] - prediction_tensor = torch.div(prediction_tensor, self._logit_scale) - per_row_cross_ent = _softmax_cross_entropy_with_logits( - labels=target_tensor.view(-1, num_classes), - logits=prediction_tensor.view(-1, num_classes), - ) - - return per_row_cross_ent.view(weights.shape) * weights - - -@LOSSES.register_module -class BootstrappedSigmoidClassificationLoss(Loss): - """Bootstrapped sigmoid cross entropy classification loss function. - - This loss uses a convex combination of training labels and the current model's - predictions as training targets in the classification loss. The idea is that - as the model improves over time, its predictions can be trusted more and we - can use these predictions to mitigate the damage of noisy/incorrect labels, - because incorrect labels are likely to be eventually highly inconsistent with - other stimuli predicted to have the same label by the model. - - In "soft" bootstrapping, we use all predicted class probabilities, whereas in - "hard" bootstrapping, we use the single class favored by the model. - - See also Training Deep Neural Networks On Noisy Labels with Bootstrapping by - Reed et al. (ICLR 2015). - """ - - def __init__(self, alpha, bootstrap_type="soft"): - """Constructor. - - Args: - alpha: a float32 scalar tensor between 0 and 1 representing interpolation - weight - bootstrap_type: set to either 'hard' or 'soft' (default) - - Raises: - ValueError: if bootstrap_type is not either 'hard' or 'soft' - """ - if bootstrap_type != "hard" and bootstrap_type != "soft": - raise ValueError( - "Unrecognized bootstrap_type: must be one of " "'hard' or 'soft.'" - ) - self._alpha = alpha - self._bootstrap_type = bootstrap_type - - def _compute_loss(self, prediction_tensor, target_tensor, weights): - """Compute loss function. - - Args: - prediction_tensor: A float tensor of shape [batch_size, num_anchors, - num_classes] representing the predicted logits for each class - target_tensor: A float tensor of shape [batch_size, num_anchors, - num_classes] representing one-hot encoded classification targets - weights: a float tensor of shape [batch_size, num_anchors] - - Returns: - loss: a float tensor of shape [batch_size, num_anchors, num_classes] - representing the value of the loss function. - """ - if self._bootstrap_type == "soft": - bootstrap_target_tensor = self._alpha * target_tensor + ( - 1.0 - self._alpha - ) * torch.sigmoid(prediction_tensor) - else: - bootstrap_target_tensor = ( - self._alpha * target_tensor - + (1.0 - self._alpha) * (torch.sigmoid(prediction_tensor) > 0.5).float() - ) - per_entry_cross_ent = _sigmoid_cross_entropy_with_logits( - labels=bootstrap_target_tensor, logits=prediction_tensor - ) - return per_entry_cross_ent * weights.unsqueeze(2) diff --git a/det3d/models/losses/metrics.py b/det3d/models/losses/metrics.py deleted file mode 100644 index ea002fb..0000000 --- a/det3d/models/losses/metrics.py +++ /dev/null @@ -1,284 +0,0 @@ -import numpy as np -import torch -import torch.nn.functional as F -from torch import nn - - -class Scalar(nn.Module): - def __init__(self): - super().__init__() - self.register_buffer("total", torch.FloatTensor([0.0])) - self.register_buffer("count", torch.FloatTensor([0.0])) - - def forward(self, scalar): - if not scalar.eq(0.0): - self.count += 1 - self.total += scalar.data.float() - return self.value.cpu() - - @property - def value(self): - return self.total / self.count - - def clear(self): - self.total.zero_() - self.count.zero_() - - -class Accuracy(nn.Module): - def __init__( - self, dim=1, ignore_idx=-1, threshold=0.5, encode_background_as_zeros=True - ): - super().__init__() - self.register_buffer("total", torch.FloatTensor([0.0])) - self.register_buffer("count", torch.FloatTensor([0.0])) - self._ignore_idx = ignore_idx - self._dim = dim - self._threshold = threshold - self._encode_background_as_zeros = encode_background_as_zeros - - def forward(self, labels, preds, weights=None): - # labels: [N, ...] - # preds: [N, C, ...] - if self._encode_background_as_zeros: - scores = torch.sigmoid(preds) - labels_pred = torch.max(preds, dim=self._dim)[1] + 1 - pred_labels = torch.where( - (scores > self._threshold).any(self._dim), - labels_pred, - torch.tensor(0).type_as(labels_pred), - ) - else: - pred_labels = torch.max(preds, dim=self._dim)[1] - N, *Ds = labels.shape - labels = labels.view(N, int(np.prod(Ds))) - pred_labels = pred_labels.view(N, int(np.prod(Ds))) - if weights is None: - weights = (labels != self._ignore_idx).float() - else: - weights = weights.float() - - num_examples = torch.sum(weights) - num_examples = torch.clamp(num_examples, min=1.0).float() - total = torch.sum((pred_labels == labels.long()).float()) - self.count += num_examples - self.total += total - return self.value.cpu() - # return (total / num_examples.data).cpu() - - @property - def value(self): - return self.total / self.count - - def clear(self): - self.total.zero_() - self.count.zero_() - - -class Precision(nn.Module): - def __init__(self, dim=1, ignore_idx=-1, threshold=0.5): - super().__init__() - self.register_buffer("total", torch.FloatTensor([0.0])) - self.register_buffer("count", torch.FloatTensor([0.0])) - self._ignore_idx = ignore_idx - self._dim = dim - self._threshold = threshold - - def forward(self, labels, preds, weights=None): - # labels: [N, ...] - # preds: [N, C, ...] - if preds.shape[self._dim] == 1: # BCE - pred_labels = ( - (torch.sigmoid(preds) > self._threshold).long().squeeze(self._dim) - ) - else: - assert preds.shape[self._dim] == 2, "precision only support 2 class" - pred_labels = torch.max(preds, dim=self._dim)[1] - N, *Ds = labels.shape - labels = labels.view(N, int(np.prod(Ds))) - pred_labels = pred_labels.view(N, int(np.prod(Ds))) - if weights is None: - weights = (labels != self._ignore_idx).float() - else: - weights = weights.float() - - pred_trues = pred_labels > 0 - pred_falses = pred_labels == 0 - trues = labels > 0 - falses = labels == 0 - true_positives = (weights * (trues & pred_trues).float()).sum() - true_negatives = (weights * (falses & pred_falses).float()).sum() - false_positives = (weights * (falses & pred_trues).float()).sum() - false_negatives = (weights * (trues & pred_falses).float()).sum() - count = true_positives + false_positives - # print(count, true_positives) - if count > 0: - self.count += count - self.total += true_positives - return self.value.cpu() - # return (total / num_examples.data).cpu() - - @property - def value(self): - return self.total / self.count - - def clear(self): - self.total.zero_() - self.count.zero_() - - -class Recall(nn.Module): - def __init__(self, dim=1, ignore_idx=-1, threshold=0.5): - super().__init__() - self.register_buffer("total", torch.FloatTensor([0.0])) - self.register_buffer("count", torch.FloatTensor([0.0])) - self._ignore_idx = ignore_idx - self._dim = dim - self._threshold = threshold - - def forward(self, labels, preds, weights=None): - # labels: [N, ...] - # preds: [N, C, ...] - if preds.shape[self._dim] == 1: # BCE - pred_labels = ( - (torch.sigmoid(preds) > self._threshold).long().squeeze(self._dim) - ) - else: - assert preds.shape[self._dim] == 2, "precision only support 2 class" - pred_labels = torch.max(preds, dim=self._dim)[1] - N, *Ds = labels.shape - labels = labels.view(N, int(np.prod(Ds))) - pred_labels = pred_labels.view(N, int(np.prod(Ds))) - if weights is None: - weights = (labels != self._ignore_idx).float() - else: - weights = weights.float() - pred_trues = pred_labels == 1 - pred_falses = pred_labels == 0 - trues = labels == 1 - falses = labels == 0 - true_positives = (weights * (trues & pred_trues).float()).sum() - true_negatives = (weights * (falses & pred_falses).float()).sum() - false_positives = (weights * (falses & pred_trues).float()).sum() - false_negatives = (weights * (trues & pred_falses).float()).sum() - count = true_positives + false_negatives - if count > 0: - self.count += count - self.total += true_positives - return self.value.cpu() - # return (total / num_examples.data).cpu() - - @property - def value(self): - return self.total / self.count - - def clear(self): - self.total.zero_() - self.count.zero_() - - -def _calc_binary_metrics(labels, scores, weights=None, ignore_idx=-1, threshold=0.5): - - pred_labels = (scores > threshold).long() - N, *Ds = labels.shape - labels = labels.view(N, int(np.prod(Ds))) - pred_labels = pred_labels.view(N, int(np.prod(Ds))) - pred_trues = pred_labels > 0 - pred_falses = pred_labels == 0 - trues = labels > 0 - falses = labels == 0 - true_positives = (weights * (trues & pred_trues).float()).sum() - true_negatives = (weights * (falses & pred_falses).float()).sum() - false_positives = (weights * (falses & pred_trues).float()).sum() - false_negatives = (weights * (trues & pred_falses).float()).sum() - return true_positives, true_negatives, false_positives, false_negatives - - -class PrecisionRecall(nn.Module): - def __init__( - self, - dim=1, - ignore_idx=-1, - thresholds=0.5, - use_sigmoid_score=False, - encode_background_as_zeros=True, - ): - super().__init__() - if not isinstance(thresholds, (list, tuple)): - thresholds = [thresholds] - - self.register_buffer("prec_total", torch.FloatTensor(len(thresholds)).zero_()) - self.register_buffer("prec_count", torch.FloatTensor(len(thresholds)).zero_()) - self.register_buffer("rec_total", torch.FloatTensor(len(thresholds)).zero_()) - self.register_buffer("rec_count", torch.FloatTensor(len(thresholds)).zero_()) - - self._ignore_idx = ignore_idx - self._dim = dim - self._thresholds = thresholds - self._use_sigmoid_score = use_sigmoid_score - self._encode_background_as_zeros = encode_background_as_zeros - - def forward(self, labels, preds, weights=None): - # labels: [N, ...] - # preds: [N, ..., C] - if self._encode_background_as_zeros: - # this don't support softmax - assert self._use_sigmoid_score is True - total_scores = torch.sigmoid(preds) - # scores, label_preds = torch.max(total_scores, dim=1) - else: - if self._use_sigmoid_score: - total_scores = torch.sigmoid(preds)[..., 1:] - else: - total_scores = F.softmax(preds, dim=-1)[..., 1:] - """ - if preds.shape[self._dim] == 1: # BCE - scores = torch.sigmoid(preds) - else: - # assert preds.shape[ - # self._dim] == 2, "precision only support 2 class" - # TODO: add support for [N, C, ...] format. - # TODO: add multiclass support - if self._use_sigmoid_score: - scores = torch.sigmoid(preds)[:, ..., 1:].sum(-1) - else: - scores = F.softmax(preds, dim=self._dim)[:, ..., 1:].sum(-1) - """ - scores = torch.max(total_scores, dim=-1)[0] - if weights is None: - weights = (labels != self._ignore_idx).float() - else: - weights = weights.float() - for i, thresh in enumerate(self._thresholds): - tp, tn, fp, fn = _calc_binary_metrics( - labels, scores, weights, self._ignore_idx, thresh - ) - rec_count = tp + fn - prec_count = tp + fp - if rec_count > 0: - self.rec_count[i] += rec_count - self.rec_total[i] += tp - if prec_count > 0: - self.prec_count[i] += prec_count - self.prec_total[i] += tp - - return self.value - - @property - def value(self): - prec_count = torch.clamp(self.prec_count, min=1.0) - rec_count = torch.clamp(self.rec_count, min=1.0) - return ( - (self.prec_total / prec_count).cpu(), - (self.rec_total / rec_count).cpu(), - ) - - @property - def thresholds(self): - return self._thresholds - - def clear(self): - self.rec_count.zero_() - self.prec_count.zero_() - self.prec_total.zero_() - self.rec_total.zero_() diff --git a/det3d/models/losses/mse_loss.py b/det3d/models/losses/mse_loss.py deleted file mode 100644 index c772d6e..0000000 --- a/det3d/models/losses/mse_loss.py +++ /dev/null @@ -1,21 +0,0 @@ -import torch.nn as nn -import torch.nn.functional as F - -from ..registry import LOSSES -from .utils import weighted_loss - -mse_loss = weighted_loss(F.mse_loss) - - -@LOSSES.register_module -class MSELoss(nn.Module): - def __init__(self, reduction="mean", loss_weight=1.0): - super().__init__() - self.reduction = reduction - self.loss_weight = loss_weight - - def forward(self, pred, target, weight=None, avg_factor=None): - loss = self.loss_weight * mse_loss( - pred, target, weight, reduction=self.reduction, avg_factor=avg_factor - ) - return loss diff --git a/det3d/models/losses/smooth_l1_loss.py b/det3d/models/losses/smooth_l1_loss.py deleted file mode 100644 index 13c94d3..0000000 --- a/det3d/models/losses/smooth_l1_loss.py +++ /dev/null @@ -1,45 +0,0 @@ -import torch -import torch.nn as nn - -from ..registry import LOSSES -from .utils import weighted_loss - - -@weighted_loss -def smooth_l1_loss(pred, target, beta=1.0): - assert beta > 0 - assert pred.size() == target.size() and target.numel() > 0 - diff = torch.abs(pred - target) - loss = torch.where(diff < beta, 0.5 * diff * diff / beta, diff - 0.5 * beta) - return loss - - -@LOSSES.register_module -class SmoothL1Loss(nn.Module): - def __init__(self, beta=1.0, reduction="mean", loss_weight=1.0): - super(SmoothL1Loss, self).__init__() - self.beta = beta - self.reduction = reduction - self.loss_weight = loss_weight - - def forward( - self, - pred, - target, - weight=None, - avg_factor=None, - reduction_override=None, - **kwargs - ): - assert reduction_override in (None, "none", "mean", "sum") - reduction = reduction_override if reduction_override else self.reduction - loss_bbox = self.loss_weight * smooth_l1_loss( - pred, - target, - weight, - beta=self.beta, - reduction=reduction, - avg_factor=avg_factor, - **kwargs - ) - return loss_bbox diff --git a/det3d/models/losses/utils.py b/det3d/models/losses/utils.py deleted file mode 100644 index dca5b7f..0000000 --- a/det3d/models/losses/utils.py +++ /dev/null @@ -1,83 +0,0 @@ -import functools - -import torch.nn.functional as F - - -def reduce_loss(loss, reduction): - """Reduce loss as specified. - Args: - loss (Tensor): Elementwise loss tensor. - reduction (str): Options are "none", "mean" and "sum". - Return: - Tensor: Reduced loss tensor. - """ - reduction_enum = F._Reduction.get_enum(reduction) - # none: 0, elementwise_mean:1, sum: 2 - if reduction_enum == 0: - return loss - elif reduction_enum == 1: - return loss.mean() - elif reduction_enum == 2: - return loss.sum() - - -def weight_reduce_loss(loss, weight=None, reduction="mean", avg_factor=None): - """Apply element-wise weight and reduce loss. - Args: - loss (Tensor): Element-wise loss. - weight (Tensor): Element-wise weights. - reduction (str): Same as built-in losses of PyTorch. - avg_factor (float): Avarage factor when computing the mean of losses. - Returns: - Tensor: Processed loss values. - """ - # if weight is specified, apply element-wise weight - if weight is not None: - loss = loss * weight - - # if avg_factor is not specified, just reduce the loss - if avg_factor is None: - loss = reduce_loss(loss, reduction) - else: - # if reduction is mean, then average the loss by avg_factor - if reduction == "mean": - loss = loss.sum() / avg_factor - # if reduction is 'none', then do nothing, otherwise raise an error - elif reduction != "none": - raise ValueError('avg_factor can not be used with reduction="sum"') - return loss - - -def weighted_loss(loss_func): - """Create a weighted version of a given loss function. - To use this decorator, the loss function must have the signature like - `loss_func(pred, target, **kwargs)`. The function only needs to compute - element-wise loss without any reduction. This decorator will add weight - and reduction arguments to the function. The decorated function will have - the signature like `loss_func(pred, target, weight=None, reduction='mean', - avg_factor=None, **kwargs)`. - :Example: - >>> @weighted_loss - >>> def l1_loss(pred, target): - >>> return (pred - target).abs() - >>> pred = torch.Tensor([0, 2, 3]) - >>> target = torch.Tensor([1, 1, 1]) - >>> weight = torch.Tensor([1, 0, 1]) - >>> l1_loss(pred, target) - tensor(1.3333) - >>> l1_loss(pred, target, weight) - tensor(1.) - >>> l1_loss(pred, target, reduction='none') - tensor([1., 1., 2.]) - >>> l1_loss(pred, target, weight, avg_factor=2) - tensor(1.5000) - """ - - @functools.wraps(loss_func) - def wrapper(pred, target, weight=None, reduction="mean", avg_factor=None, **kwargs): - # get element-wise loss - loss = loss_func(pred, target, **kwargs) - loss = weight_reduce_loss(loss, weight, reduction, avg_factor) - return loss - - return wrapper diff --git a/det3d/models/necks/__init__.py b/det3d/models/necks/__init__.py index dade221..1a1db7e 100644 --- a/det3d/models/necks/__init__.py +++ b/det3d/models/necks/__init__.py @@ -1,4 +1,3 @@ -from .fpn import FPN -from .rpn import RPN, PointModule +from .rpn import RPN -__all__ = ["RPN", "PointModule", "FPN"] \ No newline at end of file +__all__ = ["RPN"] diff --git a/det3d/models/necks/fpn.py b/det3d/models/necks/fpn.py deleted file mode 100644 index 88357ef..0000000 --- a/det3d/models/necks/fpn.py +++ /dev/null @@ -1,144 +0,0 @@ -import torch.nn as nn -import torch.nn.functional as F -from det3d.core import auto_fp16 -from det3d.torchie.cnn import xavier_init - -from ..registry import NECKS -from ..utils import ConvModule - - -@NECKS.register_module -class FPN(nn.Module): - def __init__( - self, - in_channels, - out_channels, - num_outs, - start_level=0, - end_level=-1, - add_extra_convs=False, - extra_convs_on_inputs=True, - relu_before_extra_convs=False, - no_norm_on_lateral=False, - conv_cfg=None, - norm_cfg=None, - activation=None, - ): - super(FPN, self).__init__() - assert isinstance(in_channels, list) - self.in_channels = in_channels - self.out_channels = out_channels - self.num_ins = len(in_channels) - self.num_outs = num_outs - self.activation = activation - self.relu_before_extra_convs = relu_before_extra_convs - self.no_norm_on_lateral = no_norm_on_lateral - self.fp16_enabled = False - - if end_level == -1: - self.backbone_end_level = self.num_ins - assert num_outs >= self.num_ins - start_level - else: - # if end_level < inputs, no extra level is allowed - self.backbone_end_level = end_level - assert end_level <= len(in_channels) - assert num_outs == end_level - start_level - self.start_level = start_level - self.end_level = end_level - self.add_extra_convs = add_extra_convs - self.extra_convs_on_inputs = extra_convs_on_inputs - - self.lateral_convs = nn.ModuleList() - self.fpn_convs = nn.ModuleList() - - for i in range(self.start_level, self.backbone_end_level): - l_conv = ConvModule( - in_channels[i], - out_channels, - 1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg if not self.no_norm_on_lateral else None, - activation=self.activation, - inplace=False, - ) - fpn_conv = ConvModule( - out_channels, - out_channels, - 3, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - activation=self.activation, - inplace=False, - ) - - self.lateral_convs.append(l_conv) - self.fpn_convs.append(fpn_conv) - - # add extra conv layers (e.g., RetinaNet) - extra_levels = num_outs - self.backbone_end_level + self.start_level - if add_extra_convs and extra_levels >= 1: - for i in range(extra_levels): - if i == 0 and self.extra_convs_on_inputs: - in_channels = self.in_channels[self.backbone_end_level - 1] - else: - in_channels = out_channels - extra_fpn_conv = ConvModule( - in_channels, - out_channels, - 3, - stride=2, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - activation=self.activation, - inplace=False, - ) - self.fpn_convs.append(extra_fpn_conv) - - # default init_weights for conv(msra) and norm in ConvModule - def init_weights(self): - for m in self.modules(): - if isinstance(m, nn.Conv2d): - xavier_init(m, distribution="uniform") - - @auto_fp16() - def forward(self, inputs): - assert len(inputs) == len(self.in_channels) - - # build laterals - laterals = [ - lateral_conv(inputs[i + self.start_level]) - for i, lateral_conv in enumerate(self.lateral_convs) - ] - - # build top-down path - used_backbone_levels = len(laterals) - for i in range(used_backbone_levels - 1, 0, -1): - laterals[i - 1] += F.interpolate( - laterals[i], scale_factor=2, mode="nearest" - ) - - # build outputs - # part 1: from original levels - outs = [self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels)] - # part 2: add extra levels - if self.num_outs > len(outs): - # use max pool to get more levels on top of outputs - # (e.g., Faster R-CNN, Mask R-CNN) - if not self.add_extra_convs: - for i in range(self.num_outs - used_backbone_levels): - outs.append(F.max_pool2d(outs[-1], 1, stride=2)) - # add conv layers on top of original feature maps (RetinaNet) - else: - if self.extra_convs_on_inputs: - orig = inputs[self.backbone_end_level - 1] - outs.append(self.fpn_convs[used_backbone_levels](orig)) - else: - outs.append(self.fpn_convs[used_backbone_levels](outs[-1])) - for i in range(used_backbone_levels + 1, self.num_outs): - if self.relu_before_extra_convs: - outs.append(self.fpn_convs[i](F.relu(outs[-1]))) - else: - outs.append(self.fpn_convs[i](outs[-1])) - return tuple(outs) diff --git a/det3d/models/necks/rpn.py b/det3d/models/necks/rpn.py index 9dd7511..8d48939 100644 --- a/det3d/models/necks/rpn.py +++ b/det3d/models/necks/rpn.py @@ -158,43 +158,3 @@ def forward(self, x): return x - -@NECKS.register_module -class PointModule(nn.Module): - def __init__( - self, - num_input_features, - layers=[1024, 128,], - norm_cfg=None, - name="rpn", - logger=None, - **kwargs - ): - super(PointModule, self).__init__() - - if norm_cfg is None: - norm_cfg = dict(type="BN", eps=1e-3, momentum=0.01) - self._norm_cfg = norm_cfg - - blocks = [ - nn.Conv2d(num_input_features, layers[0], 1, bias=False), - build_norm_layer(self._norm_cfg, layers[0])[1], - nn.ReLU(), - nn.Conv2d(layers[0], layers[1], 1, bias=False), - build_norm_layer(self._norm_cfg, layers[1])[1], - nn.ReLU(), - ] - self.pn = nn.ModuleList(blocks) - self.out = nn.MaxPool1d(3, stride=1, padding=1) - - def forward(self, x): - x = x.flatten(1, -1) - x = x.view(x.shape[0], x.shape[1], 1, 1) - - for l in self.pn: - x = l(x) - - x = x.view(x.shape[0], 1, x.shape[1]) - x = self.out(x).view(x.shape[0], x.shape[2], 1, 1) - - return x diff --git a/det3d/models/readers/__init__.py b/det3d/models/readers/__init__.py index e3a8ce8..b1187f3 100644 --- a/det3d/models/readers/__init__.py +++ b/det3d/models/readers/__init__.py @@ -1,10 +1,8 @@ from .pillar_encoder import PillarFeatureNet, PointPillarsScatter -from .voxel_encoder import SimpleVoxel, VFEV3_ablation, VoxelFeatureExtractorV3 +from .voxel_encoder import VoxelFeatureExtractorV3 __all__ = [ "VoxelFeatureExtractorV3", - "SimpleVoxel", "PillarFeatureNet", "PointPillarsScatter", - "VFEV3_ablation", ] diff --git a/det3d/models/readers/cropped_voxel_encoder.py b/det3d/models/readers/cropped_voxel_encoder.py deleted file mode 100644 index 4edbd05..0000000 --- a/det3d/models/readers/cropped_voxel_encoder.py +++ /dev/null @@ -1,204 +0,0 @@ -from collections import defaultdict - -import numpy as np -import torch -from det3d.core.bbox.box_np_ops import points_in_rbbox, riou_cc, rotation_3d_in_axis -from det3d.core.input.voxel_generator import VoxelGenerator -from det3d.torchie.trainer.utils import get_dist_info - - -def crop2assign( - example, predictions, cfg=None, device=torch.device("cpu"), training=True -): - rank, world_size = get_dist_info() - # STEP 2: crop enlarged point clouds; organise as new batch' - # 2.1 Prepare batch targets, filter invalid target - batch_targets = [] - # 2.1 Prepare batch data - batch_data = defaultdict(list) - - if training: - gt_boxes3d = defaultdict(list) - - for idx, sample in enumerate(example["annos"]): - for branch in sample["gt_boxes"]: - mask = np.ma.masked_equal(branch, 0).mask - if np.any(mask): - gt_boxes3d[idx].extend(branch[np.where(mask.sum(axis=1) == 0)]) - else: - gt_boxes3d[idx].extend(branch) - - # Batch size - cnt = 0 - for idx, dt_boxes3d in enumerate(predictions): - sample_points = ( - example["points"][example["points"][:, 0] == idx][:, 1:].cpu().numpy() - ) - if sample_points[:, -1].min() < 0: - sample_points[:, -1] += 0.5 - - if training: - gp = example["ground_plane"][idx] - - sample_gt_boxes3d = np.array(gt_boxes3d[idx]) - if not sample_gt_boxes3d.shape[0] == 0: - gt_bevs = sample_gt_boxes3d[:, [0, 1, 3, 4, -1]] - else: - gt_bevs = np.zeros((0, 5)) - - sample_dt_boxed3d = dt_boxes3d["box3d_lidar"].cpu().numpy() - - if training: - dt_bevs = sample_dt_boxed3d[:, [0, 1, 3, 4, -1]] - - gp_height = cfg.anchor.center - cfg.anchor.height / 2 - - # Find max match gt - ious = riou_cc(gt_bevs, dt_bevs) - dt_max_matched_gt = ious.argmax(axis=0) - max_ious = ious.max(axis=0) - - selected = np.where(max_ious >= 0.7) - - max_ious = max_ious[selected] - dt_max_matched_gt = dt_max_matched_gt[selected] - # remove fp - sample_dt_boxed3d = sample_dt_boxed3d[selected] - - # enlarge box - sample_dt_boxed3d[:, [3, 4]] += cfg.roi_context - - indices = points_in_rbbox(sample_points, sample_dt_boxed3d) - num_points_in_gt = indices.sum(0) - # remove empty dt boxes - selected_by_points = np.where(num_points_in_gt > 0) - - boxes = sample_dt_boxed3d[selected_by_points] - indices = indices.transpose()[selected_by_points].transpose() - - if training: - dt_max_matched_gt = dt_max_matched_gt[selected_by_points] - cnt += len(boxes) - - # voxel_generators to form fixed_size batch input - fixed_size = cfg.dense_shape # w, l, h - - for i, box in enumerate(boxes): - - if training: - batch_data["ground_plane"].append(gp) - - # 1. generate regression targets - matched_gt = sample_gt_boxes3d[dt_max_matched_gt[i]] - - height_a = cfg.anchor.height - z_center_a = cfg.anchor.center - # z_g = matched_gt[2] + matched_gt[5]/2. - (-0.14) # z top - z_g = matched_gt[2] - z_center_a - h_g = matched_gt[5] - height_a - g_g = (matched_gt[2] - matched_gt[5] / 2) - (float(gp)) - dt_target = np.array([idx, i, z_g, h_g, g_g]) - - batch_targets.append(dt_target) - - if not training: - batch_data["stage_one_output_boxes"].append(box) - - # 2. prepare data - - box_points = sample_points[indices[:, i]].copy() - # img = kitti_vis(box_points, np.array(box).reshape(1, -1)) - # img.tofile(open("./bev.bin", "wb")) - - # move to center - box_center = box[:2] - box_points[:, :2] -= box_center - # rotate to canonical - box_yaw = box[-1:] - box_points_canonical = rotation_3d_in_axis( - box_points[:, :3][np.newaxis, ...], -box_yaw, axis=2 - )[0] - box_points_canonical = np.hstack((box_points_canonical, box_points[:, -1:])) - - if box_points_canonical.shape[0] > 0: - point_cloud_range = [ - -box[3] / 2, - -box[4] / 2, - -3.5, - box[3] / 2, - box[4] / 2, - 1.5, - ] - else: - import pdb - - pdb.set_trace() - - voxel_size = [ - (point_cloud_range[3] - point_cloud_range[0]) / fixed_size[0], - (point_cloud_range[4] - point_cloud_range[1]) / fixed_size[1], - (point_cloud_range[5] - point_cloud_range[2]) / fixed_size[2], - ] - for vs in voxel_size: - if not vs > 0: - import pdb - - pdb.set_trace() - - vg = VoxelGenerator( - voxel_size=voxel_size, - point_cloud_range=point_cloud_range, - max_num_points=20, - max_voxels=2000, - ) - - voxels, coordinates, num_points = vg.generate( - box_points_canonical, max_voxels=2000 - ) - batch_data["voxels"].append(voxels) - batch_data["coordinates"].append(coordinates) - batch_data["num_points"].append(num_points) - - if training: - batch_targets = torch.tensor( - np.array(batch_targets), dtype=torch.float32, device=device - ) - batch_data["targets"] = batch_targets - - batch_data["ground_plane"] = torch.tensor( - batch_data["ground_plane"], dtype=torch.float32, device=device - ) - - for k, v in batch_data.items(): - if k in ["voxels", "num_points"]: - batch_data[k] = np.concatenate(v, axis=0) - elif k in ["stage_one_output_boxes"]: - batch_data[k] = np.array(v) - elif k in ["coordinates"]: - coors = [] - for i, coor in enumerate(v): - coor_pad = np.pad( - coor, ((0, 0), (1, 0)), mode="constant", constant_values=i - ) - coors.append(coor_pad) - batch_data[k] = np.concatenate(coors, axis=0) - - for k, v in batch_data.items(): - if k in ["coordinates", "num_points"]: - batch_data[k] = torch.tensor(v, dtype=torch.int32, device=device) - elif k in ["voxels", "stage_one_output_boxes"]: - batch_data[k] = torch.tensor(v, dtype=torch.float32, device=device) - - if training: - if ( - not cnt - == (batch_data["coordinates"][:, 0].max() + 1) - == batch_data["targets"].shape[0] - ): - import pdb - - pdb.set_trace() - - batch_data["shape"] = fixed_size - - return batch_data diff --git a/det3d/models/readers/pillar_encoder.py b/det3d/models/readers/pillar_encoder.py index 9ef676a..d155035 100644 --- a/det3d/models/readers/pillar_encoder.py +++ b/det3d/models/readers/pillar_encoder.py @@ -5,11 +5,9 @@ """ import torch -from det3d.models.utils import Empty, change_default_args, get_paddings_indicator +from det3d.models.utils import get_paddings_indicator from torch import nn from torch.nn import functional as F - -from .. import builder from ..registry import BACKBONES, READERS from ..utils import build_norm_layer diff --git a/det3d/models/readers/voxel_encoder.py b/det3d/models/readers/voxel_encoder.py index 94b5dd3..b889314 100644 --- a/det3d/models/readers/voxel_encoder.py +++ b/det3d/models/readers/voxel_encoder.py @@ -1,198 +1,9 @@ -import time - -import numpy as np -import torch -from det3d.models.utils import Empty, change_default_args, get_paddings_indicator from torch import nn from torch.nn import functional as F -from .. import builder from ..registry import READERS -@READERS.register_module -class VFELayer(nn.Module): - def __init__(self, in_channels, out_channels, use_norm=True, name="vfe"): - super(VFELayer, self).__init__() - self.name = name - self.units = int(out_channels / 2) - if use_norm: - BatchNorm1d = change_default_args(eps=1e-3, momentum=0.01)(nn.BatchNorm1d) - Linear = change_default_args(bias=False)(nn.Linear) - else: - BatchNorm1d = Empty - Linear = change_default_args(bias=True)(nn.Linear) - self.linear = Linear(in_channels, self.units) - self.norm = BatchNorm1d(self.units) - - def forward(self, inputs): - # [K, T, 7] tensordot [7, units] = [K, T, units] - voxel_count = inputs.shape[1] - x = self.linear(inputs) - x = self.norm(x.permute(0, 2, 1).contiguous()).permute(0, 2, 1).contiguous() - pointwise = F.relu(x) - # [K, T, units] - - aggregated = torch.max(pointwise, dim=1, keepdim=True)[0] - # [K, 1, units] - repeated = aggregated.repeat(1, voxel_count, 1) - - concatenated = torch.cat([pointwise, repeated], dim=2) - # [K, T, 2 * units] - return concatenated - - -@READERS.register_module -class VoxelFeatureExtractor(nn.Module): - def __init__( - self, - num_input_features=4, - use_norm=True, - num_filters=[32, 128], - with_distance=False, - voxel_size=(0.2, 0.2, 4), - name="VoxelFeatureExtractor", - ): - super(VoxelFeatureExtractor, self).__init__() - self.name = name - if use_norm: - BatchNorm1d = change_default_args(eps=1e-3, momentum=0.01)(nn.BatchNorm1d) - Linear = change_default_args(bias=False)(nn.Linear) - else: - BatchNorm1d = Empty - Linear = change_default_args(bias=True)(nn.Linear) - assert len(num_filters) == 2 - num_input_features += 3 # add mean features - if with_distance: - num_input_features += 1 - self._with_distance = with_distance - self.vfe1 = VFELayer(num_input_features, num_filters[0], use_norm) - self.vfe2 = VFELayer(num_filters[0], num_filters[1], use_norm) - self.linear = Linear(num_filters[1], num_filters[1]) - # var_torch_init(self.linear.weight) - # var_torch_init(self.linear.bias) - self.norm = BatchNorm1d(num_filters[1]) - - def forward(self, features, num_voxels, coors): - # features: [concated_num_points, num_voxel_size, 3(4)] - # num_voxels: [concated_num_points] - # t = time.time() - # torch.cuda.synchronize() - - points_mean = features[:, :, :3].sum(dim=1, keepdim=True) / num_voxels.type_as( - features - ).view(-1, 1, 1) - features_relative = features[:, :, :3] - points_mean - if self._with_distance: - points_dist = torch.norm(features[:, :, :3], 2, 2, keepdim=True) - features = torch.cat([features, features_relative, points_dist], dim=-1) - else: - features = torch.cat([features, features_relative], dim=-1) - voxel_count = features.shape[1] - mask = get_paddings_indicator(num_voxels, voxel_count, axis=0) - mask = torch.unsqueeze(mask, -1).type_as(features) - # mask = features.max(dim=2, keepdim=True)[0] != 0 - - # torch.cuda.synchronize() - # print("vfe prep forward time", time.time() - t) - x = self.vfe1(features) - x *= mask - x = self.vfe2(x) - x *= mask - x = self.linear(x) - x = self.norm(x.permute(0, 2, 1).contiguous()).permute(0, 2, 1).contiguous() - x = F.relu(x) - x *= mask - # x: [concated_num_points, num_voxel_size, 128] - voxelwise = torch.max(x, dim=1)[0] - return voxelwise - - -@READERS.register_module -class VoxelFeatureExtractorV2(nn.Module): - def __init__( - self, - num_input_features=4, - use_norm=True, - num_filters=[32, 128], - with_distance=False, - voxel_size=(0.2, 0.2, 4), - name="VoxelFeatureExtractor", - ): - super(VoxelFeatureExtractorV2, self).__init__() - self.name = name - if use_norm: - BatchNorm1d = change_default_args(eps=1e-3, momentum=0.01)(nn.BatchNorm1d) - Linear = change_default_args(bias=False)(nn.Linear) - else: - BatchNorm1d = Empty - Linear = change_default_args(bias=True)(nn.Linear) - assert len(num_filters) > 0 - num_input_features += 3 - if with_distance: - num_input_features += 1 - self._with_distance = with_distance - - num_filters = [num_input_features] + num_filters - filters_pairs = [ - [num_filters[i], num_filters[i + 1]] for i in range(len(num_filters) - 1) - ] - self.vfe_layers = nn.ModuleList( - [VFELayer(i, o, use_norm) for i, o in filters_pairs] - ) - self.linear = Linear(num_filters[-1], num_filters[-1]) - # var_torch_init(self.linear.weight) - # var_torch_init(self.linear.bias) - self.norm = BatchNorm1d(num_filters[-1]) - - def forward(self, features, num_voxels, coors): - # features: [concated_num_points, num_voxel_size, 3(4)] - # num_voxels: [concated_num_points] - points_mean = features[:, :, :3].sum(dim=1, keepdim=True) / num_voxels.type_as( - features - ).view(-1, 1, 1) - features_relative = features[:, :, :3] - points_mean - if self._with_distance: - points_dist = torch.norm(features[:, :, :3], 2, 2, keepdim=True) - features = torch.cat([features, features_relative, points_dist], dim=-1) - else: - features = torch.cat([features, features_relative], dim=-1) - voxel_count = features.shape[1] - mask = get_paddings_indicator(num_voxels, voxel_count, axis=0) - mask = torch.unsqueeze(mask, -1).type_as(features) - for vfe in self.vfe_layers: - features = vfe(features) - features *= mask - features = self.linear(features) - features = ( - self.norm(features.permute(0, 2, 1).contiguous()) - .permute(0, 2, 1) - .contiguous() - ) - features = F.relu(features) - features *= mask - # x: [concated_num_points, num_voxel_size, 128] - voxelwise = torch.max(features, dim=1)[0] - return voxelwise - - -@READERS.register_module -class VFEV3_ablation(nn.Module): - def __init__(self, num_input_features=4, norm_cfg=None, name="VFEV3_ablation"): - super(VFEV3_ablation, self).__init__() - self.name = name - self.num_input_features = num_input_features - - def forward(self, features, num_voxels, coors=None): - points_mean = features[:, :, [0, 1, 3]].sum( - dim=1, keepdim=False - ) / num_voxels.type_as(features).view(-1, 1) - points_mean = torch.cat( - [points_mean, 1.0 / num_voxels.to(torch.float32).view(-1, 1)], dim=1 - ) - - return points_mean.contiguous() - @READERS.register_module class VoxelFeatureExtractorV3(nn.Module): @@ -204,32 +15,10 @@ def __init__( self.num_input_features = num_input_features def forward(self, features, num_voxels, coors=None): + assert self.num_input_features == features.shape[-1] + points_mean = features[:, :, : self.num_input_features].sum( dim=1, keepdim=False ) / num_voxels.type_as(features).view(-1, 1) return points_mean.contiguous() - - -@READERS.register_module -class SimpleVoxel(nn.Module): - """Simple voxel encoder. only keep r, z and reflection feature. - """ - - def __init__(self, num_input_features=4, norm_cfg=None, name="SimpleVoxel"): - - super(SimpleVoxel, self).__init__() - - self.num_input_features = num_input_features - self.name = name - - def forward(self, features, num_voxels, coors=None): - # features: [concated_num_points, num_voxel_size, 3(4)] - # num_voxels: [concated_num_points] - points_mean = features[:, :, : self.num_input_features].sum( - dim=1, keepdim=False - ) / num_voxels.type_as(features).view(-1, 1) - feature = torch.norm(points_mean[:, :2], p=2, dim=1, keepdim=True) - # z is important for z position regression, but x, y is not. - res = torch.cat([feature, points_mean[:, 2 : self.num_input_features]], dim=1) - return res diff --git a/det3d/models/registry.py b/det3d/models/registry.py index 3423046..de7c71e 100644 --- a/det3d/models/registry.py +++ b/det3d/models/registry.py @@ -3,8 +3,8 @@ READERS = Registry("reader") BACKBONES = Registry("backbone") NECKS = Registry("neck") -ROI_EXTRACTORS = Registry("roi_extractor") -SHARED_HEADS = Registry("shared_head") HEADS = Registry("head") LOSSES = Registry("loss") DETECTORS = Registry("detector") +SECOND_STAGE = Registry("second_stage") +ROI_HEAD = Registry("roi_head") \ No newline at end of file diff --git a/det3d/models/roi_heads/__init__.py b/det3d/models/roi_heads/__init__.py new file mode 100644 index 0000000..31856f0 --- /dev/null +++ b/det3d/models/roi_heads/__init__.py @@ -0,0 +1,7 @@ +from .roi_head_template import RoIHeadTemplate +from .roi_head import RoIHead + +__all__ = [ + 'RoIHeadTemplate', + 'RoIHead' +] diff --git a/det3d/models/roi_heads/roi_head.py b/det3d/models/roi_heads/roi_head.py new file mode 100644 index 0000000..9ed6bc7 --- /dev/null +++ b/det3d/models/roi_heads/roi_head.py @@ -0,0 +1,106 @@ +# ------------------------------------------------------------------------------ +# Portions of this code are from +# OpenPCDet (https://github.com/open-mmlab/OpenPCDet) +# Licensed under the Apache License. +# ------------------------------------------------------------------------------ + +from torch import batch_norm +import torch.nn as nn + +from .roi_head_template import RoIHeadTemplate + +from det3d.core import box_torch_ops + +from ..registry import ROI_HEAD + +@ROI_HEAD.register_module +class RoIHead(RoIHeadTemplate): + def __init__(self, input_channels, model_cfg, num_class=1, code_size=7, test_cfg=None): + super().__init__(num_class=num_class, model_cfg=model_cfg) + self.model_cfg = model_cfg + self.test_cfg = test_cfg + self.code_size = code_size + + pre_channel = input_channels + + shared_fc_list = [] + for k in range(0, self.model_cfg.SHARED_FC.__len__()): + shared_fc_list.extend([ + nn.Conv1d(pre_channel, self.model_cfg.SHARED_FC[k], kernel_size=1, bias=False), + nn.BatchNorm1d(self.model_cfg.SHARED_FC[k]), + nn.ReLU() + ]) + pre_channel = self.model_cfg.SHARED_FC[k] + + if k != self.model_cfg.SHARED_FC.__len__() - 1 and self.model_cfg.DP_RATIO > 0: + shared_fc_list.append(nn.Dropout(self.model_cfg.DP_RATIO)) + + self.shared_fc_layer = nn.Sequential(*shared_fc_list) + + self.cls_layers = self.make_fc_layers( + input_channels=pre_channel, output_channels=self.num_class, fc_list=self.model_cfg.CLS_FC + ) + self.reg_layers = self.make_fc_layers( + input_channels=pre_channel, + output_channels=code_size, + fc_list=self.model_cfg.REG_FC + ) + self.init_weights(weight_init='xavier') + + def init_weights(self, weight_init='xavier'): + if weight_init == 'kaiming': + init_func = nn.init.kaiming_normal_ + elif weight_init == 'xavier': + init_func = nn.init.xavier_normal_ + elif weight_init == 'normal': + init_func = nn.init.normal_ + else: + raise NotImplementedError + + for m in self.modules(): + if isinstance(m, nn.Conv2d) or isinstance(m, nn.Conv1d): + if weight_init == 'normal': + init_func(m.weight, mean=0, std=0.001) + else: + init_func(m.weight) + if m.bias is not None: + nn.init.constant_(m.bias, 0) + nn.init.normal_(self.reg_layers[-1].weight, mean=0, std=0.001) + + def forward(self, batch_dict, training=True): + """ + :param input_data: input dict + :return: + """ + batch_dict['batch_size'] = len(batch_dict['rois']) + if training: + targets_dict = self.assign_targets(batch_dict) + batch_dict['rois'] = targets_dict['rois'] + batch_dict['roi_labels'] = targets_dict['roi_labels'] + batch_dict['roi_features'] = targets_dict['roi_features'] + + # RoI aware pooling + pooled_features = batch_dict['roi_features'].reshape(-1, 1, + batch_dict['roi_features'].shape[-1]).contiguous() # (BxN, 1, C) + + batch_size_rcnn = pooled_features.shape[0] + pooled_features = pooled_features.permute(0, 2, 1).contiguous() # (BxN, C, 1) + + shared_features = self.shared_fc_layer(pooled_features.view(batch_size_rcnn, -1, 1)) + rcnn_cls = self.cls_layers(shared_features).transpose(1, 2).contiguous().squeeze(dim=1) # (B, 1 or 2) + rcnn_reg = self.reg_layers(shared_features).transpose(1, 2).contiguous().squeeze(dim=1) # (B, C) + + if not training: + batch_cls_preds, batch_box_preds = self.generate_predicted_boxes( + batch_size=batch_dict['batch_size'], rois=batch_dict['rois'], cls_preds=rcnn_cls, box_preds=rcnn_reg + ) + batch_dict['batch_cls_preds'] = batch_cls_preds + batch_dict['batch_box_preds'] = batch_box_preds + batch_dict['cls_preds_normalized'] = False + else: + targets_dict['rcnn_cls'] = rcnn_cls + targets_dict['rcnn_reg'] = rcnn_reg + + self.forward_ret_dict = targets_dict + + return batch_dict \ No newline at end of file diff --git a/det3d/models/roi_heads/roi_head_template.py b/det3d/models/roi_heads/roi_head_template.py new file mode 100644 index 0000000..a9ea185 --- /dev/null +++ b/det3d/models/roi_heads/roi_head_template.py @@ -0,0 +1,183 @@ +# ------------------------------------------------------------------------------ +# Portions of this code are from +# OpenPCDet (https://github.com/open-mmlab/OpenPCDet) +# Licensed under the Apache License. +# ------------------------------------------------------------------------------ + +import numpy as np +import torch +import torch.nn as nn +import torch.nn.functional as F +from det3d.core.bbox import box_torch_ops +from .target_assigner.proposal_target_layer import ProposalTargetLayer + +def limit_period(val, offset=0.5, period=np.pi): + return val - torch.floor(val / period + offset) * period + + +class RoIHeadTemplate(nn.Module): + def __init__(self, num_class, model_cfg): + super().__init__() + self.model_cfg = model_cfg + self.num_class = num_class + self.proposal_target_layer = ProposalTargetLayer(roi_sampler_cfg=self.model_cfg.TARGET_CONFIG) + + self.forward_ret_dict = None + + def make_fc_layers(self, input_channels, output_channels, fc_list): + fc_layers = [] + pre_channel = input_channels + for k in range(0, fc_list.__len__()): + fc_layers.extend([ + nn.Conv1d(pre_channel, fc_list[k], kernel_size=1, bias=False), + nn.BatchNorm1d(fc_list[k]), + nn.ReLU() + ]) + pre_channel = fc_list[k] + if self.model_cfg.DP_RATIO >= 0 and k == 0: + fc_layers.append(nn.Dropout(self.model_cfg.DP_RATIO)) + fc_layers.append(nn.Conv1d(pre_channel, output_channels, kernel_size=1, bias=True)) + fc_layers = nn.Sequential(*fc_layers) + return fc_layers + + def assign_targets(self, batch_dict): + batch_size = batch_dict['batch_size'] + with torch.no_grad(): + targets_dict = self.proposal_target_layer.forward(batch_dict) + + rois = targets_dict['rois'] # (B, N, 7 + C) + gt_of_rois = targets_dict['gt_of_rois'] # (B, N, 7 + C + 1) + targets_dict['gt_of_rois_src'] = gt_of_rois.clone().detach() + + roi_ry = limit_period(rois[:, :, 6], offset=0.5, period=np.pi*2) + + gt_of_rois[:, :, :6] = gt_of_rois[:, :, :6] - rois[:, :, :6] + gt_of_rois[:, :, 6] = gt_of_rois[:, :, 6] - roi_ry + + gt_of_rois = box_torch_ops.rotate_points_along_z( + points=gt_of_rois.view(-1, 1, gt_of_rois.shape[-1]), angle=-roi_ry.view(-1) + ).view(batch_size, -1, gt_of_rois.shape[-1]) + + if rois.shape[-1] == 9: + # rotate velocity + gt_of_rois[:, :, 7:-1] = gt_of_rois[:, :, 7:-1] - rois[:, :, 7:] + + """ + roi_vel = gt_of_rois[:, :, 7:-1] + roi_vel = torch.cat([roi_vel, torch.zeros([roi_vel.shape[0], roi_vel.shape[1], 1]).to(roi_vel)], dim=-1) + + gt_of_rois[:, :, 7:-1] = box_torch_ops.rotate_points_along_z( + points=roi_vel.view(-1, 1, 3), angle=-roi_ry.view(-1) + ).view(batch_size, -1, 3)[..., :2] + """ + + # flip orientation if rois have opposite orientation + heading_label = gt_of_rois[:, :, 6] % (2 * np.pi) # 0 ~ 2pi + opposite_flag = (heading_label > np.pi * 0.5) & (heading_label < np.pi * 1.5) + heading_label[opposite_flag] = (heading_label[opposite_flag] + np.pi) % (2 * np.pi) # (0 ~ pi/2, 3pi/2 ~ 2pi) + flag = heading_label > np.pi + heading_label[flag] = heading_label[flag] - np.pi * 2 # (-pi/2, pi/2) + heading_label = torch.clamp(heading_label, min=-np.pi / 2, max=np.pi / 2) + + gt_of_rois[:, :, 6] = heading_label + + + targets_dict['gt_of_rois'] = gt_of_rois + return targets_dict + + def get_box_reg_layer_loss(self, forward_ret_dict): + loss_cfgs = self.model_cfg.LOSS_CONFIG + code_size = forward_ret_dict['rcnn_reg'].shape[-1] + reg_valid_mask = forward_ret_dict['reg_valid_mask'].view(-1) + gt_boxes3d_ct = forward_ret_dict['gt_of_rois'][..., 0:code_size] + rcnn_reg = forward_ret_dict['rcnn_reg'] # (rcnn_batch_size, C) + rcnn_batch_size = gt_boxes3d_ct.view(-1, code_size).shape[0] + + fg_mask = (reg_valid_mask > 0) + fg_sum = fg_mask.long().sum().item() + + tb_dict = {} + + if loss_cfgs.REG_LOSS == 'L1': + reg_targets = gt_boxes3d_ct.view(rcnn_batch_size, -1) + rcnn_loss_reg = F.l1_loss( + rcnn_reg.view(rcnn_batch_size, -1), + reg_targets, + reduction='none' + ) # [B, M, 7] + + rcnn_loss_reg = rcnn_loss_reg * rcnn_loss_reg.new_tensor(\ + loss_cfgs.LOSS_WEIGHTS['code_weights']) + + rcnn_loss_reg = (rcnn_loss_reg.view(rcnn_batch_size, -1) * fg_mask.unsqueeze(dim=-1).float()).sum() / max(fg_sum, 1) + rcnn_loss_reg = rcnn_loss_reg * loss_cfgs.LOSS_WEIGHTS['rcnn_reg_weight'] + tb_dict['rcnn_loss_reg'] = rcnn_loss_reg.detach() + else: + raise NotImplementedError + + return rcnn_loss_reg, tb_dict + + def get_box_cls_layer_loss(self, forward_ret_dict): + loss_cfgs = self.model_cfg.LOSS_CONFIG + rcnn_cls = forward_ret_dict['rcnn_cls'] + rcnn_cls_labels = forward_ret_dict['rcnn_cls_labels'].view(-1) + if loss_cfgs.CLS_LOSS == 'BinaryCrossEntropy': + rcnn_cls_flat = rcnn_cls.view(-1) + batch_loss_cls = F.binary_cross_entropy(torch.sigmoid(rcnn_cls_flat), rcnn_cls_labels.float(), reduction='none') + cls_valid_mask = (rcnn_cls_labels >= 0).float() + rcnn_loss_cls = (batch_loss_cls * cls_valid_mask).sum() / torch.clamp(cls_valid_mask.sum(), min=1.0) + elif loss_cfgs.CLS_LOSS == 'CrossEntropy': + batch_loss_cls = F.cross_entropy(rcnn_cls, rcnn_cls_labels, reduction='none', ignore_index=-1) + cls_valid_mask = (rcnn_cls_labels >= 0).float() + rcnn_loss_cls = (batch_loss_cls * cls_valid_mask).sum() / torch.clamp(cls_valid_mask.sum(), min=1.0) + else: + raise NotImplementedError + + rcnn_loss_cls = rcnn_loss_cls * loss_cfgs.LOSS_WEIGHTS['rcnn_cls_weight'] + tb_dict = {'rcnn_loss_cls': rcnn_loss_cls.detach()} + return rcnn_loss_cls, tb_dict + + def get_loss(self, tb_dict=None): + tb_dict = {} if tb_dict is None else tb_dict + rcnn_loss = 0 + rcnn_loss_cls, cls_tb_dict = self.get_box_cls_layer_loss(self.forward_ret_dict) + rcnn_loss += rcnn_loss_cls + tb_dict.update(cls_tb_dict) + + rcnn_loss_reg, reg_tb_dict = self.get_box_reg_layer_loss(self.forward_ret_dict) + rcnn_loss += rcnn_loss_reg + tb_dict.update(reg_tb_dict) + tb_dict['rcnn_loss'] = rcnn_loss.item() + return rcnn_loss, tb_dict + + def generate_predicted_boxes(self, batch_size, rois, cls_preds, box_preds): + """ + Args: + batch_size: + rois: (B, N, 7) + cls_preds: (BN, num_class) + box_preds: (BN, code_size) + + Returns: + + """ + code_size = box_preds.shape[-1] + # batch_cls_preds: (B, N, num_class or 1) + batch_cls_preds = cls_preds.view(batch_size, -1, cls_preds.shape[-1]) + batch_box_preds = box_preds.view(batch_size, -1, code_size) + + roi_ry = rois[:, :, 6].view(-1) + roi_xyz = rois[:, :, 0:3].view(-1, 3) + + local_rois = rois.clone().detach() + local_rois[:, :, 0:3] = 0 + + batch_box_preds = (batch_box_preds + local_rois).view(-1, code_size) + batch_box_preds = box_torch_ops.rotate_points_along_z( + batch_box_preds.unsqueeze(dim=1), roi_ry + ).squeeze(dim=1) + + batch_box_preds[:, 0:3] += roi_xyz + batch_box_preds = batch_box_preds.view(batch_size, -1, code_size) + + return batch_cls_preds, batch_box_preds diff --git a/det3d/models/roi_heads/target_assigner/proposal_target_layer.py b/det3d/models/roi_heads/target_assigner/proposal_target_layer.py new file mode 100644 index 0000000..460f3ed --- /dev/null +++ b/det3d/models/roi_heads/target_assigner/proposal_target_layer.py @@ -0,0 +1,244 @@ +# ------------------------------------------------------------------------------ +# Portions of this code are from +# OpenPCDet (https://github.com/open-mmlab/OpenPCDet) +# Licensed under the Apache License. +# ------------------------------------------------------------------------------ + +import numpy as np +import torch +import torch.nn as nn + +from ....ops.iou3d_nms.iou3d_nms_utils import boxes_iou3d_gpu + + +class ProposalTargetLayer(nn.Module): + def __init__(self, roi_sampler_cfg): + super().__init__() + self.roi_sampler_cfg = roi_sampler_cfg + + def forward(self, batch_dict): + """ + Args: + batch_dict: + batch_size: + rois: (B, num_rois, 7 + C) + roi_scores: (B, num_rois) + gt_boxes: (B, N, 7 + C + 1) + roi_labels: (B, num_rois) + Returns: + batch_dict: + rois: (B, M, 7 + C) + gt_of_rois: (B, M, 7 + C) + gt_iou_of_rois: (B, M) + roi_scores: (B, M) + roi_labels: (B, M) + reg_valid_mask: (B, M) + rcnn_cls_labels: (B, M) + """ + batch_rois, batch_gt_of_rois, batch_roi_ious, batch_roi_scores, batch_roi_labels, \ + batch_roi_features = self.sample_rois_for_rcnn( + batch_dict=batch_dict + ) + # regression valid mask + reg_valid_mask = (batch_roi_ious > self.roi_sampler_cfg.REG_FG_THRESH).long() + + # classification label + if self.roi_sampler_cfg.CLS_SCORE_TYPE == 'cls': + batch_cls_labels = (batch_roi_ious > self.roi_sampler_cfg.CLS_FG_THRESH).long() + ignore_mask = (batch_roi_ious > self.roi_sampler_cfg.CLS_BG_THRESH) & \ + (batch_roi_ious < self.roi_sampler_cfg.CLS_FG_THRESH) + batch_cls_labels[ignore_mask > 0] = -1 + elif self.roi_sampler_cfg.CLS_SCORE_TYPE == 'roi_iou': + # padding_mask = (torch.isclose(batch_rois.sum(dim=-1), batch_rois.new_zeros(1))) + + iou_bg_thresh = self.roi_sampler_cfg.CLS_BG_THRESH + iou_fg_thresh = self.roi_sampler_cfg.CLS_FG_THRESH + fg_mask = batch_roi_ious > iou_fg_thresh + bg_mask = batch_roi_ious < iou_bg_thresh + interval_mask = (fg_mask == 0) & (bg_mask == 0) + + batch_cls_labels = (fg_mask > 0).float() + batch_cls_labels[interval_mask] = \ + (batch_roi_ious[interval_mask] - iou_bg_thresh) / (iou_fg_thresh - iou_bg_thresh) + # batch_cls_labels[padding_mask > 0] = -1 + else: + raise NotImplementedError + + targets_dict = {'rois': batch_rois, 'gt_of_rois': batch_gt_of_rois, 'gt_iou_of_rois': batch_roi_ious, + 'roi_scores': batch_roi_scores, 'roi_labels': batch_roi_labels, + 'roi_features': batch_roi_features, 'reg_valid_mask': reg_valid_mask, + 'rcnn_cls_labels': batch_cls_labels} + + return targets_dict + + def sample_rois_for_rcnn(self, batch_dict): + """ + Args: + batch_dict: + batch_size: + rois: (B, num_rois, 7 + C) + roi_scores: (B, num_rois) + gt_boxes: (B, N, 7 + C + 1) + roi_labels: (B, num_rois) + Returns: + + """ + batch_size = batch_dict['batch_size'] + rois = batch_dict['rois'] + roi_scores = batch_dict['roi_scores'] + roi_labels = batch_dict['roi_labels'] + gt_boxes = batch_dict['gt_boxes_and_cls'] + roi_features = batch_dict['roi_features'] + + code_size = rois.shape[-1] + batch_rois = rois.new_zeros(batch_size, self.roi_sampler_cfg.ROI_PER_IMAGE, code_size) + batch_gt_of_rois = rois.new_zeros(batch_size, self.roi_sampler_cfg.ROI_PER_IMAGE, code_size + 1) + batch_roi_ious = rois.new_zeros(batch_size, self.roi_sampler_cfg.ROI_PER_IMAGE) + batch_roi_scores = rois.new_zeros(batch_size, self.roi_sampler_cfg.ROI_PER_IMAGE) + batch_roi_labels = rois.new_zeros((batch_size, self.roi_sampler_cfg.ROI_PER_IMAGE), dtype=torch.long) + batch_roi_features = roi_features.new_zeros(batch_size, self.roi_sampler_cfg.ROI_PER_IMAGE, + roi_features.shape[-1]) + + for index in range(batch_size): + cur_roi, cur_gt, cur_roi_labels, cur_roi_scores, cur_roi_features = \ + rois[index], gt_boxes[index], roi_labels[index], roi_scores[index], \ + roi_features[index] + + k = cur_gt.__len__() - 1 + while k > 0 and cur_gt[k].sum() == 0: + k -= 1 + cur_gt = cur_gt[:k + 1] + cur_gt = cur_gt.new_zeros((1, cur_gt.shape[1])) if len(cur_gt) == 0 else cur_gt + + if self.roi_sampler_cfg.get('SAMPLE_ROI_BY_EACH_CLASS', False): + max_overlaps, gt_assignment = self.get_max_iou_with_same_class( + rois=cur_roi[:, :7], roi_labels=cur_roi_labels, + gt_boxes=cur_gt[:, 0:7], gt_labels=cur_gt[:, -1].long() + ) + else: + iou3d = boxes_iou3d_gpu(cur_roi, cur_gt[:, 0:7]) # (M, N) + max_overlaps, gt_assignment = torch.max(iou3d, dim=1) + + sampled_inds = self.subsample_rois(max_overlaps=max_overlaps) + + batch_rois[index] = cur_roi[sampled_inds] + batch_roi_labels[index] = cur_roi_labels[sampled_inds] + batch_roi_ious[index] = max_overlaps[sampled_inds] + batch_roi_scores[index] = cur_roi_scores[sampled_inds] + batch_gt_of_rois[index] = cur_gt[gt_assignment[sampled_inds]] + batch_roi_features[index] = cur_roi_features[sampled_inds] + + return batch_rois, batch_gt_of_rois, batch_roi_ious, batch_roi_scores, batch_roi_labels, batch_roi_features + + def subsample_rois(self, max_overlaps): + # sample fg, easy_bg, hard_bg + fg_rois_per_image = int(np.round(self.roi_sampler_cfg.FG_RATIO * self.roi_sampler_cfg.ROI_PER_IMAGE)) + fg_thresh = min(self.roi_sampler_cfg.REG_FG_THRESH, self.roi_sampler_cfg.CLS_FG_THRESH) + + fg_inds = ((max_overlaps >= fg_thresh)).nonzero().view(-1) + easy_bg_inds = ((max_overlaps < self.roi_sampler_cfg.CLS_BG_THRESH_LO)).nonzero().view(-1) + hard_bg_inds = ((max_overlaps < self.roi_sampler_cfg.REG_FG_THRESH) & + (max_overlaps >= self.roi_sampler_cfg.CLS_BG_THRESH_LO)).nonzero().view(-1) + + fg_num_rois = fg_inds.numel() + bg_num_rois = hard_bg_inds.numel() + easy_bg_inds.numel() + + if fg_num_rois > 0 and bg_num_rois > 0: + # sampling fg + fg_rois_per_this_image = min(fg_rois_per_image, fg_num_rois) + + rand_num = torch.from_numpy(np.random.permutation(fg_num_rois)).type_as(max_overlaps).long() + fg_inds = fg_inds[rand_num[:fg_rois_per_this_image]] + + # sampling bg + bg_rois_per_this_image = self.roi_sampler_cfg.ROI_PER_IMAGE - fg_rois_per_this_image + bg_inds = self.sample_bg_inds( + hard_bg_inds, easy_bg_inds, bg_rois_per_this_image, self.roi_sampler_cfg.HARD_BG_RATIO + ) + + elif fg_num_rois > 0 and bg_num_rois == 0: + # sampling fg + rand_num = np.floor(np.random.rand(self.roi_sampler_cfg.ROI_PER_IMAGE) * fg_num_rois) + rand_num = torch.from_numpy(rand_num).type_as(max_overlaps).long() + fg_inds = fg_inds[rand_num] + bg_inds = [] + + elif bg_num_rois > 0 and fg_num_rois == 0: + # sampling bg + bg_rois_per_this_image = self.roi_sampler_cfg.ROI_PER_IMAGE + bg_inds = self.sample_bg_inds( + hard_bg_inds, easy_bg_inds, bg_rois_per_this_image, self.roi_sampler_cfg.HARD_BG_RATIO + ) + else: + print('maxoverlaps:(min=%f, max=%f)' % (max_overlaps.min().item(), max_overlaps.max().item())) + print('ERROR: FG=%d, BG=%d' % (fg_num_rois, bg_num_rois)) + raise NotImplementedError + + sampled_inds = torch.cat((fg_inds, bg_inds), dim=0) + return sampled_inds + + @staticmethod + def sample_bg_inds(hard_bg_inds, easy_bg_inds, bg_rois_per_this_image, hard_bg_ratio): + if hard_bg_inds.numel() > 0 and easy_bg_inds.numel() > 0: + hard_bg_rois_num = min(int(bg_rois_per_this_image * hard_bg_ratio), len(hard_bg_inds)) + easy_bg_rois_num = bg_rois_per_this_image - hard_bg_rois_num + + # sampling hard bg + rand_idx = torch.randint(low=0, high=hard_bg_inds.numel(), size=(hard_bg_rois_num,)).long() + hard_bg_inds = hard_bg_inds[rand_idx] + + # sampling easy bg + rand_idx = torch.randint(low=0, high=easy_bg_inds.numel(), size=(easy_bg_rois_num,)).long() + easy_bg_inds = easy_bg_inds[rand_idx] + + bg_inds = torch.cat([hard_bg_inds, easy_bg_inds], dim=0) + elif hard_bg_inds.numel() > 0 and easy_bg_inds.numel() == 0: + hard_bg_rois_num = bg_rois_per_this_image + # sampling hard bg + rand_idx = torch.randint(low=0, high=hard_bg_inds.numel(), size=(hard_bg_rois_num,)).long() + bg_inds = hard_bg_inds[rand_idx] + elif hard_bg_inds.numel() == 0 and easy_bg_inds.numel() > 0: + easy_bg_rois_num = bg_rois_per_this_image + # sampling easy bg + rand_idx = torch.randint(low=0, high=easy_bg_inds.numel(), size=(easy_bg_rois_num,)).long() + bg_inds = easy_bg_inds[rand_idx] + else: + raise NotImplementedError + + return bg_inds + + @staticmethod + def get_max_iou_with_same_class(rois, roi_labels, gt_boxes, gt_labels): + """ + Args: + rois: (N, 7) + roi_labels: (N) + gt_boxes: (N, ) + gt_labels: + + Returns: + + """ + """ + :param rois: (N, 7) + :param roi_labels: (N) + :param gt_boxes: (N, 8) + :return: + """ + max_overlaps = rois.new_zeros(rois.shape[0]) + gt_assignment = roi_labels.new_zeros(roi_labels.shape[0]) + + for k in range(gt_labels.min().item(), gt_labels.max().item() + 1): + roi_mask = (roi_labels == k) + gt_mask = (gt_labels == k) + if roi_mask.sum() > 0 and gt_mask.sum() > 0: + cur_roi = rois[roi_mask] + cur_gt = gt_boxes[gt_mask] + original_gt_assignment = gt_mask.nonzero().view(-1) + + iou3d = boxes_iou3d_gpu(cur_roi, cur_gt) # (M, N) + cur_max_overlaps, cur_gt_assignment = torch.max(iou3d, dim=1) + max_overlaps[roi_mask] = cur_max_overlaps + gt_assignment[roi_mask] = original_gt_assignment[cur_gt_assignment] + + return max_overlaps, gt_assignment diff --git a/det3d/models/second_stage/__init__.py b/det3d/models/second_stage/__init__.py new file mode 100644 index 0000000..d5db279 --- /dev/null +++ b/det3d/models/second_stage/__init__.py @@ -0,0 +1 @@ +from .bird_eye_view import BEVFeatureExtractor diff --git a/det3d/models/second_stage/bird_eye_view.py b/det3d/models/second_stage/bird_eye_view.py new file mode 100644 index 0000000..3cbff6d --- /dev/null +++ b/det3d/models/second_stage/bird_eye_view.py @@ -0,0 +1,41 @@ +import torch +from torch import nn + +from ..registry import SECOND_STAGE +from det3d.core.utils.center_utils import ( + bilinear_interpolate_torch, +) + +@SECOND_STAGE.register_module +class BEVFeatureExtractor(nn.Module): + def __init__(self, pc_start, + voxel_size, out_stride): + super().__init__() + self.pc_start = pc_start + self.voxel_size = voxel_size + self.out_stride = out_stride + + def absl_to_relative(self, absolute): + a1 = (absolute[..., 0] - self.pc_start[0]) / self.voxel_size[0] / self.out_stride + a2 = (absolute[..., 1] - self.pc_start[1]) / self.voxel_size[1] / self.out_stride + + return a1, a2 + + def forward(self, example, batch_centers, num_point): + batch_size = len(example['bev_feature']) + ret_maps = [] + + for batch_idx in range(batch_size): + xs, ys = self.absl_to_relative(batch_centers[batch_idx]) + + # N x C + feature_map = bilinear_interpolate_torch(example['bev_feature'][batch_idx], + xs, ys) + + if num_point > 1: + section_size = len(feature_map) // num_point + feature_map = torch.cat([feature_map[i*section_size: (i+1)*section_size] for i in range(num_point)], dim=1) + + ret_maps.append(feature_map) + + return ret_maps \ No newline at end of file diff --git a/det3d/models/utils/finetune_utils.py b/det3d/models/utils/finetune_utils.py new file mode 100644 index 0000000..e06cad8 --- /dev/null +++ b/det3d/models/utils/finetune_utils.py @@ -0,0 +1,111 @@ +import torch +import torch.distributed as dist +from torch import nn +from torch.autograd.function import Function +from torch.nn import functional as F +import logging + +class FrozenBatchNorm2d(nn.Module): + """ + BatchNorm2d where the batch statistics and the affine parameters are fixed. + It contains non-trainable buffers called + "weight" and "bias", "running_mean", "running_var", + initialized to perform identity transformation. + The pre-trained backbone models from Caffe2 only contain "weight" and "bias", + which are computed from the original four parameters of BN. + The affine transform `x * weight + bias` will perform the equivalent + computation of `(x - running_mean) / sqrt(running_var) * weight + bias`. + When loading a backbone model from Caffe2, "running_mean" and "running_var" + will be left unchanged as identity transformation. + Other pre-trained backbone models may contain all 4 parameters. + The forward is implemented by `F.batch_norm(..., training=False)`. + """ + + _version = 3 + + def __init__(self, num_features, eps=1e-5): + super().__init__() + self.num_features = num_features + self.eps = eps + self.register_buffer("weight", torch.ones(num_features)) + self.register_buffer("bias", torch.zeros(num_features)) + self.register_buffer("running_mean", torch.zeros(num_features)) + self.register_buffer("running_var", torch.ones(num_features) - eps) + + def forward(self, x): + if x.requires_grad: + # When gradients are needed, F.batch_norm will use extra memory + # because its backward op computes gradients for weight/bias as well. + scale = self.weight * (self.running_var + self.eps).rsqrt() + bias = self.bias - self.running_mean * scale + scale = scale.reshape(1, -1, 1, 1) + bias = bias.reshape(1, -1, 1, 1) + return x * scale + bias + else: + # When gradients are not needed, F.batch_norm is a single fused op + # and provide more optimization opportunities. + return F.batch_norm( + x, + self.running_mean, + self.running_var, + self.weight, + self.bias, + training=False, + eps=self.eps, + ) + + def _load_from_state_dict( + self, state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs + ): + version = local_metadata.get("version", None) + + if version is None or version < 2: + # No running_mean/var in early versions + # This will silent the warnings + if prefix + "running_mean" not in state_dict: + state_dict[prefix + "running_mean"] = torch.zeros_like(self.running_mean) + if prefix + "running_var" not in state_dict: + state_dict[prefix + "running_var"] = torch.ones_like(self.running_var) + + if version is not None and version < 3: + logger = logging.getLogger(__name__) + logger.info("FrozenBatchNorm {} is upgraded to version 3.".format(prefix.rstrip("."))) + # In version < 3, running_var are used without +eps. + state_dict[prefix + "running_var"] -= self.eps + + super()._load_from_state_dict( + state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs + ) + + def __repr__(self): + return "FrozenBatchNorm2d(num_features={}, eps={})".format(self.num_features, self.eps) + + @classmethod + def convert_frozen_batchnorm(cls, module): + """ + Convert BatchNorm/SyncBatchNorm in module into FrozenBatchNorm. + Args: + module (torch.nn.Module): + Returns: + If module is BatchNorm/SyncBatchNorm, returns a new module. + Otherwise, in-place convert module and return it. + Similar to convert_sync_batchnorm in + https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/batchnorm.py + """ + bn_module = nn.modules.batchnorm + bn_module = (bn_module.BatchNorm2d, bn_module.SyncBatchNorm) + res = module + if isinstance(module, bn_module): + res = cls(module.num_features) + if module.affine: + res.weight.data = module.weight.data.clone().detach() + res.bias.data = module.bias.data.clone().detach() + res.running_mean.data = module.running_mean.data + res.running_var.data = module.running_var.data + res.eps = module.eps + else: + for name, child in module.named_children(): + new_child = cls.convert_frozen_batchnorm(child) + if new_child is not child: + res.add_module(name, new_child) + return res \ No newline at end of file diff --git a/det3d/ops/iou3d_nms/__init__.py b/det3d/ops/iou3d_nms/__init__.py index 8e0ce48..c267f07 100644 --- a/det3d/ops/iou3d_nms/__init__.py +++ b/det3d/ops/iou3d_nms/__init__.py @@ -1 +1 @@ -from det3d.ops.iou3d_nms import iou3d_nms_cuda +from det3d.ops.iou3d_nms import iou3d_nms_cuda, iou3d_nms_utils diff --git a/det3d/ops/iou3d_nms/iou3d_nms_utils.py b/det3d/ops/iou3d_nms/iou3d_nms_utils.py new file mode 100644 index 0000000..4d71e33 --- /dev/null +++ b/det3d/ops/iou3d_nms/iou3d_nms_utils.py @@ -0,0 +1,107 @@ +""" +3D IoU Calculation and Rotated NMS +Written by Shaoshuai Shi +All Rights Reserved 2019-2020. +""" +import torch + +from . import iou3d_nms_cuda +import numpy as np + + + +def boxes_iou_bev(boxes_a, boxes_b): + """ + Args: + boxes_a: (N, 7) [x, y, z, dx, dy, dz, heading] + boxes_b: (N, 7) [x, y, z, dx, dy, dz, heading] + + Returns: + ans_iou: (N, M) + """ + assert boxes_a.shape[1] == boxes_b.shape[1] == 7 + ans_iou = torch.cuda.FloatTensor(torch.Size((boxes_a.shape[0], boxes_b.shape[0]))).zero_() + + iou3d_nms_cuda.boxes_iou_bev_gpu(boxes_a.contiguous(), boxes_b.contiguous(), ans_iou) + + return ans_iou + +def to_pcdet(boxes): + # transform back to pcdet's coordinate + boxes = boxes[:, [0, 1, 2, 4, 3, 5, -1]] + boxes[:, -1] = -boxes[:, -1] - np.pi/2 + return boxes + +def boxes_iou3d_gpu(boxes_a, boxes_b): + """ + Args: + boxes_a: (N, 7) [x, y, z, dx, dy, dz, heading] + boxes_b: (N, 7) [x, y, z, dx, dy, dz, heading] + + Returns: + ans_iou: (N, M) + """ + assert boxes_a.shape[1] == boxes_b.shape[1] == 7 + + # transform back to pcdet's coordinate + boxes_a = to_pcdet(boxes_a) + boxes_b = to_pcdet(boxes_b) + + # height overlap + boxes_a_height_max = (boxes_a[:, 2] + boxes_a[:, 5] / 2).view(-1, 1) + boxes_a_height_min = (boxes_a[:, 2] - boxes_a[:, 5] / 2).view(-1, 1) + boxes_b_height_max = (boxes_b[:, 2] + boxes_b[:, 5] / 2).view(1, -1) + boxes_b_height_min = (boxes_b[:, 2] - boxes_b[:, 5] / 2).view(1, -1) + + # bev overlap + overlaps_bev = torch.cuda.FloatTensor(torch.Size((boxes_a.shape[0], boxes_b.shape[0]))).zero_() # (N, M) + iou3d_nms_cuda.boxes_overlap_bev_gpu(boxes_a.contiguous(), boxes_b.contiguous(), overlaps_bev) + + max_of_min = torch.max(boxes_a_height_min, boxes_b_height_min) + min_of_max = torch.min(boxes_a_height_max, boxes_b_height_max) + overlaps_h = torch.clamp(min_of_max - max_of_min, min=0) + + # 3d iou + overlaps_3d = overlaps_bev * overlaps_h + + vol_a = (boxes_a[:, 3] * boxes_a[:, 4] * boxes_a[:, 5]).view(-1, 1) + vol_b = (boxes_b[:, 3] * boxes_b[:, 4] * boxes_b[:, 5]).view(1, -1) + + iou3d = overlaps_3d / torch.clamp(vol_a + vol_b - overlaps_3d, min=1e-6) + + return iou3d + + +def nms_gpu(boxes, scores, thresh, pre_maxsize=None, **kwargs): + """ + :param boxes: (N, 7) [x, y, z, dx, dy, dz, heading] + :param scores: (N) + :param thresh: + :return: + """ + assert boxes.shape[1] == 7 + order = scores.sort(0, descending=True)[1] + if pre_maxsize is not None: + order = order[:pre_maxsize] + + boxes = boxes[order].contiguous() + keep = torch.LongTensor(boxes.size(0)) + num_out = iou3d_nms_cuda.nms_gpu(boxes, keep, thresh) + return order[keep[:num_out].cuda()].contiguous(), None + + +def nms_normal_gpu(boxes, scores, thresh, **kwargs): + """ + :param boxes: (N, 7) [x, y, z, dx, dy, dz, heading] + :param scores: (N) + :param thresh: + :return: + """ + assert boxes.shape[1] == 7 + order = scores.sort(0, descending=True)[1] + + boxes = boxes[order].contiguous() + + keep = torch.LongTensor(boxes.size(0)) + num_out = iou3d_nms_cuda.nms_normal_gpu(boxes, keep, thresh) + return order[keep[:num_out].cuda()].contiguous(), None \ No newline at end of file diff --git a/det3d/ops/iou3d_nms/setup.py b/det3d/ops/iou3d_nms/setup.py index 0b3bdb5..74b89a8 100644 --- a/det3d/ops/iou3d_nms/setup.py +++ b/det3d/ops/iou3d_nms/setup.py @@ -5,6 +5,8 @@ name='iou3d_nms', ext_modules=[ CUDAExtension('iou3d_nms_cuda', [ + 'src/iou3d_cpu.cpp', + 'src/iou3d_nms_api.cpp', 'src/iou3d_nms.cpp', 'src/iou3d_nms_kernel.cu', ], diff --git a/det3d/ops/iou3d_nms/src/iou3d_cpu.cpp b/det3d/ops/iou3d_nms/src/iou3d_cpu.cpp new file mode 100644 index 0000000..d528ad9 --- /dev/null +++ b/det3d/ops/iou3d_nms/src/iou3d_cpu.cpp @@ -0,0 +1,252 @@ +/* +3D Rotated IoU Calculation (CPU) +Written by Shaoshuai Shi +All Rights Reserved 2020. +*/ + +#include +#include +#include +#include +#include +#include +#include +#include "iou3d_cpu.h" + +#define CHECK_CUDA(x) do { \ + if (!x.type().is_cuda()) { \ + fprintf(stderr, "%s must be CUDA tensor at %s:%d\n", #x, __FILE__, __LINE__); \ + exit(-1); \ + } \ +} while (0) +#define CHECK_CONTIGUOUS(x) do { \ + if (!x.is_contiguous()) { \ + fprintf(stderr, "%s must be contiguous tensor at %s:%d\n", #x, __FILE__, __LINE__); \ + exit(-1); \ + } \ +} while (0) +#define CHECK_INPUT(x) CHECK_CUDA(x);CHECK_CONTIGUOUS(x) + +inline float min(float a, float b){ + return a > b ? b : a; +} + +inline float max(float a, float b){ + return a > b ? a : b; +} + +const float EPS = 1e-8; +struct Point { + float x, y; + __device__ Point() {} + __device__ Point(double _x, double _y){ + x = _x, y = _y; + } + + __device__ void set(float _x, float _y){ + x = _x; y = _y; + } + + __device__ Point operator +(const Point &b)const{ + return Point(x + b.x, y + b.y); + } + + __device__ Point operator -(const Point &b)const{ + return Point(x - b.x, y - b.y); + } +}; + +inline float cross(const Point &a, const Point &b){ + return a.x * b.y - a.y * b.x; +} + +inline float cross(const Point &p1, const Point &p2, const Point &p0){ + return (p1.x - p0.x) * (p2.y - p0.y) - (p2.x - p0.x) * (p1.y - p0.y); +} + +inline int check_rect_cross(const Point &p1, const Point &p2, const Point &q1, const Point &q2){ + int ret = min(p1.x,p2.x) <= max(q1.x,q2.x) && + min(q1.x,q2.x) <= max(p1.x,p2.x) && + min(p1.y,p2.y) <= max(q1.y,q2.y) && + min(q1.y,q2.y) <= max(p1.y,p2.y); + return ret; +} + +inline int check_in_box2d(const float *box, const Point &p){ + //params: (7) [x, y, z, dx, dy, dz, heading] + const float MARGIN = 1e-2; + + float center_x = box[0], center_y = box[1]; + float angle_cos = cos(-box[6]), angle_sin = sin(-box[6]); // rotate the point in the opposite direction of box + float rot_x = (p.x - center_x) * angle_cos + (p.y - center_y) * (-angle_sin); + float rot_y = (p.x - center_x) * angle_sin + (p.y - center_y) * angle_cos; + + return (fabs(rot_x) < box[3] / 2 + MARGIN && fabs(rot_y) < box[4] / 2 + MARGIN); +} + +inline int intersection(const Point &p1, const Point &p0, const Point &q1, const Point &q0, Point &ans){ + // fast exclusion + if (check_rect_cross(p0, p1, q0, q1) == 0) return 0; + + // check cross standing + float s1 = cross(q0, p1, p0); + float s2 = cross(p1, q1, p0); + float s3 = cross(p0, q1, q0); + float s4 = cross(q1, p1, q0); + + if (!(s1 * s2 > 0 && s3 * s4 > 0)) return 0; + + // calculate intersection of two lines + float s5 = cross(q1, p1, p0); + if(fabs(s5 - s1) > EPS){ + ans.x = (s5 * q0.x - s1 * q1.x) / (s5 - s1); + ans.y = (s5 * q0.y - s1 * q1.y) / (s5 - s1); + + } + else{ + float a0 = p0.y - p1.y, b0 = p1.x - p0.x, c0 = p0.x * p1.y - p1.x * p0.y; + float a1 = q0.y - q1.y, b1 = q1.x - q0.x, c1 = q0.x * q1.y - q1.x * q0.y; + float D = a0 * b1 - a1 * b0; + + ans.x = (b0 * c1 - b1 * c0) / D; + ans.y = (a1 * c0 - a0 * c1) / D; + } + + return 1; +} + +inline void rotate_around_center(const Point ¢er, const float angle_cos, const float angle_sin, Point &p){ + float new_x = (p.x - center.x) * angle_cos + (p.y - center.y) * (-angle_sin) + center.x; + float new_y = (p.x - center.x) * angle_sin + (p.y - center.y) * angle_cos + center.y; + p.set(new_x, new_y); +} + +inline int point_cmp(const Point &a, const Point &b, const Point ¢er){ + return atan2(a.y - center.y, a.x - center.x) > atan2(b.y - center.y, b.x - center.x); +} + +inline float box_overlap(const float *box_a, const float *box_b){ + // params: box_a (7) [x, y, z, dx, dy, dz, heading] + // params: box_b (7) [x, y, z, dx, dy, dz, heading] + +// float a_x1 = box_a[0], a_y1 = box_a[1], a_x2 = box_a[2], a_y2 = box_a[3], a_angle = box_a[4]; +// float b_x1 = box_b[0], b_y1 = box_b[1], b_x2 = box_b[2], b_y2 = box_b[3], b_angle = box_b[4]; + float a_angle = box_a[6], b_angle = box_b[6]; + float a_dx_half = box_a[3] / 2, b_dx_half = box_b[3] / 2, a_dy_half = box_a[4] / 2, b_dy_half = box_b[4] / 2; + float a_x1 = box_a[0] - a_dx_half, a_y1 = box_a[1] - a_dy_half; + float a_x2 = box_a[0] + a_dx_half, a_y2 = box_a[1] + a_dy_half; + float b_x1 = box_b[0] - b_dx_half, b_y1 = box_b[1] - b_dy_half; + float b_x2 = box_b[0] + b_dx_half, b_y2 = box_b[1] + b_dy_half; + + Point center_a(box_a[0], box_a[1]); + Point center_b(box_b[0], box_b[1]); + + Point box_a_corners[5]; + box_a_corners[0].set(a_x1, a_y1); + box_a_corners[1].set(a_x2, a_y1); + box_a_corners[2].set(a_x2, a_y2); + box_a_corners[3].set(a_x1, a_y2); + + Point box_b_corners[5]; + box_b_corners[0].set(b_x1, b_y1); + box_b_corners[1].set(b_x2, b_y1); + box_b_corners[2].set(b_x2, b_y2); + box_b_corners[3].set(b_x1, b_y2); + + // get oriented corners + float a_angle_cos = cos(a_angle), a_angle_sin = sin(a_angle); + float b_angle_cos = cos(b_angle), b_angle_sin = sin(b_angle); + + for (int k = 0; k < 4; k++){ + rotate_around_center(center_a, a_angle_cos, a_angle_sin, box_a_corners[k]); + rotate_around_center(center_b, b_angle_cos, b_angle_sin, box_b_corners[k]); + } + + box_a_corners[4] = box_a_corners[0]; + box_b_corners[4] = box_b_corners[0]; + + // get intersection of lines + Point cross_points[16]; + Point poly_center; + int cnt = 0, flag = 0; + + poly_center.set(0, 0); + for (int i = 0; i < 4; i++){ + for (int j = 0; j < 4; j++){ + flag = intersection(box_a_corners[i + 1], box_a_corners[i], box_b_corners[j + 1], box_b_corners[j], cross_points[cnt]); + if (flag){ + poly_center = poly_center + cross_points[cnt]; + cnt++; + } + } + } + + // check corners + for (int k = 0; k < 4; k++){ + if (check_in_box2d(box_a, box_b_corners[k])){ + poly_center = poly_center + box_b_corners[k]; + cross_points[cnt] = box_b_corners[k]; + cnt++; + } + if (check_in_box2d(box_b, box_a_corners[k])){ + poly_center = poly_center + box_a_corners[k]; + cross_points[cnt] = box_a_corners[k]; + cnt++; + } + } + + poly_center.x /= cnt; + poly_center.y /= cnt; + + // sort the points of polygon + Point temp; + for (int j = 0; j < cnt - 1; j++){ + for (int i = 0; i < cnt - j - 1; i++){ + if (point_cmp(cross_points[i], cross_points[i + 1], poly_center)){ + temp = cross_points[i]; + cross_points[i] = cross_points[i + 1]; + cross_points[i + 1] = temp; + } + } + } + + // get the overlap areas + float area = 0; + for (int k = 0; k < cnt - 1; k++){ + area += cross(cross_points[k] - cross_points[0], cross_points[k + 1] - cross_points[0]); + } + + return fabs(area) / 2.0; +} + +inline float iou_bev(const float *box_a, const float *box_b){ + // params: box_a (7) [x, y, z, dx, dy, dz, heading] + // params: box_b (7) [x, y, z, dx, dy, dz, heading] + float sa = box_a[3] * box_a[4]; + float sb = box_b[3] * box_b[4]; + float s_overlap = box_overlap(box_a, box_b); + return s_overlap / fmaxf(sa + sb - s_overlap, EPS); +} + + +int boxes_iou_bev_cpu(at::Tensor boxes_a_tensor, at::Tensor boxes_b_tensor, at::Tensor ans_iou_tensor){ + // params boxes_a_tensor: (N, 7) [x, y, z, dx, dy, dz, heading] + // params boxes_b_tensor: (M, 7) [x, y, z, dx, dy, dz, heading] + // params ans_iou_tensor: (N, M) + + CHECK_CONTIGUOUS(boxes_a_tensor); + CHECK_CONTIGUOUS(boxes_b_tensor); + + int num_boxes_a = boxes_a_tensor.size(0); + int num_boxes_b = boxes_b_tensor.size(0); + const float *boxes_a = boxes_a_tensor.data(); + const float *boxes_b = boxes_b_tensor.data(); + float *ans_iou = ans_iou_tensor.data(); + + for (int i = 0; i < num_boxes_a; i++){ + for (int j = 0; j < num_boxes_b; j++){ + ans_iou[i * num_boxes_b + j] = iou_bev(boxes_a + i * 7, boxes_b + j * 7); + } + } + return 1; +} diff --git a/det3d/ops/iou3d_nms/src/iou3d_cpu.h b/det3d/ops/iou3d_nms/src/iou3d_cpu.h new file mode 100644 index 0000000..8835ee7 --- /dev/null +++ b/det3d/ops/iou3d_nms/src/iou3d_cpu.h @@ -0,0 +1,11 @@ +#ifndef IOU3D_CPU_H +#define IOU3D_CPU_H + +#include +#include +#include +#include + +int boxes_iou_bev_cpu(at::Tensor boxes_a_tensor, at::Tensor boxes_b_tensor, at::Tensor ans_iou_tensor); + +#endif diff --git a/det3d/ops/iou3d_nms/src/iou3d_nms.cpp b/det3d/ops/iou3d_nms/src/iou3d_nms.cpp index 9eb5978..d41da8a 100644 --- a/det3d/ops/iou3d_nms/src/iou3d_nms.cpp +++ b/det3d/ops/iou3d_nms/src/iou3d_nms.cpp @@ -1,7 +1,7 @@ /* 3D IoU Calculation and Rotated NMS(modified from 2D NMS written by others) Written by Shaoshuai Shi -All Rights Reserved 2019. +All Rights Reserved 2019-2020. */ #include @@ -9,9 +9,20 @@ All Rights Reserved 2019. #include #include #include - -#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, " must be a CUDAtensor ") -#define CHECK_CONTIGUOUS(x) AT_CHECK(x.is_contiguous(), #x, " must be contiguous ") +#include "iou3d_nms.h" + +#define CHECK_CUDA(x) do { \ + if (!x.type().is_cuda()) { \ + fprintf(stderr, "%s must be CUDA tensor at %s:%d\n", #x, __FILE__, __LINE__); \ + exit(-1); \ + } \ +} while (0) +#define CHECK_CONTIGUOUS(x) do { \ + if (!x.is_contiguous()) { \ + fprintf(stderr, "%s must be contiguous tensor at %s:%d\n", #x, __FILE__, __LINE__); \ + exit(-1); \ + } \ +} while (0) #define CHECK_INPUT(x) CHECK_CUDA(x);CHECK_CONTIGUOUS(x) #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0)) @@ -34,11 +45,12 @@ void boxesioubevLauncher(const int num_a, const float *boxes_a, const int num_b, void nmsLauncher(const float *boxes, unsigned long long * mask, int boxes_num, float nms_overlap_thresh); void nmsNormalLauncher(const float *boxes, unsigned long long * mask, int boxes_num, float nms_overlap_thresh); + int boxes_overlap_bev_gpu(at::Tensor boxes_a, at::Tensor boxes_b, at::Tensor ans_overlap){ - // params boxes_a: (N, 5) [x1, y1, x2, y2, ry] - // params boxes_b: (M, 5) + // params boxes_a: (N, 7) [x, y, z, dx, dy, dz, heading] + // params boxes_b: (M, 7) [x, y, z, dx, dy, dz, heading] // params ans_overlap: (N, M) - + CHECK_INPUT(boxes_a); CHECK_INPUT(boxes_b); CHECK_INPUT(ans_overlap); @@ -56,10 +68,9 @@ int boxes_overlap_bev_gpu(at::Tensor boxes_a, at::Tensor boxes_b, at::Tensor ans } int boxes_iou_bev_gpu(at::Tensor boxes_a, at::Tensor boxes_b, at::Tensor ans_iou){ - // params boxes_a: (N, 5) [x1, y1, x2, y2, ry] - // params boxes_b: (M, 5) + // params boxes_a: (N, 7) [x, y, z, dx, dy, dz, heading] + // params boxes_b: (M, 7) [x, y, z, dx, dy, dz, heading] // params ans_overlap: (N, M) - CHECK_INPUT(boxes_a); CHECK_INPUT(boxes_b); CHECK_INPUT(ans_iou); @@ -77,9 +88,8 @@ int boxes_iou_bev_gpu(at::Tensor boxes_a, at::Tensor boxes_b, at::Tensor ans_iou } int nms_gpu(at::Tensor boxes, at::Tensor keep, float nms_overlap_thresh){ - // params boxes: (N, 5) [x1, y1, x2, y2, ry] + // params boxes: (N, 7) [x, y, z, dx, dy, dz, heading] // params keep: (N) - CHECK_INPUT(boxes); CHECK_CONTIGUOUS(keep); @@ -127,7 +137,7 @@ int nms_gpu(at::Tensor boxes, at::Tensor keep, float nms_overlap_thresh){ int nms_normal_gpu(at::Tensor boxes, at::Tensor keep, float nms_overlap_thresh){ - // params boxes: (N, 5) [x1, y1, x2, y2, ry] + // params boxes: (N, 7) [x, y, z, dx, dy, dz, heading] // params keep: (N) CHECK_INPUT(boxes); @@ -176,11 +186,3 @@ int nms_normal_gpu(at::Tensor boxes, at::Tensor keep, float nms_overlap_thresh){ } - -PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { - m.def("boxes_overlap_bev_gpu", &boxes_overlap_bev_gpu, "oriented boxes overlap"); - m.def("boxes_iou_bev_gpu", &boxes_iou_bev_gpu, "oriented boxes iou"); - m.def("nms_gpu", &nms_gpu, "oriented nms gpu"); - m.def("nms_normal_gpu", &nms_normal_gpu, "nms gpu"); -} - diff --git a/det3d/ops/iou3d_nms/src/iou3d_nms.h b/det3d/ops/iou3d_nms/src/iou3d_nms.h new file mode 100644 index 0000000..aa7ae0e --- /dev/null +++ b/det3d/ops/iou3d_nms/src/iou3d_nms.h @@ -0,0 +1,14 @@ +#ifndef IOU3D_NMS_H +#define IOU3D_NMS_H + +#include +#include +#include +#include + +int boxes_overlap_bev_gpu(at::Tensor boxes_a, at::Tensor boxes_b, at::Tensor ans_overlap); +int boxes_iou_bev_gpu(at::Tensor boxes_a, at::Tensor boxes_b, at::Tensor ans_iou); +int nms_gpu(at::Tensor boxes, at::Tensor keep, float nms_overlap_thresh); +int nms_normal_gpu(at::Tensor boxes, at::Tensor keep, float nms_overlap_thresh); + +#endif diff --git a/det3d/ops/iou3d_nms/src/iou3d_nms_api.cpp b/det3d/ops/iou3d_nms/src/iou3d_nms_api.cpp new file mode 100644 index 0000000..5a2d3a3 --- /dev/null +++ b/det3d/ops/iou3d_nms/src/iou3d_nms_api.cpp @@ -0,0 +1,17 @@ +#include +#include +#include +#include +#include + +#include "iou3d_cpu.h" +#include "iou3d_nms.h" + + +PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { + m.def("boxes_overlap_bev_gpu", &boxes_overlap_bev_gpu, "oriented boxes overlap"); + m.def("boxes_iou_bev_gpu", &boxes_iou_bev_gpu, "oriented boxes iou"); + m.def("nms_gpu", &nms_gpu, "oriented nms gpu"); + m.def("nms_normal_gpu", &nms_normal_gpu, "nms gpu"); + m.def("boxes_iou_bev_cpu", &boxes_iou_bev_cpu, "oriented boxes iou"); +} diff --git a/det3d/ops/iou3d_nms/src/iou3d_nms_kernel.cu b/det3d/ops/iou3d_nms/src/iou3d_nms_kernel.cu index faf9de9..e5e305c 100644 --- a/det3d/ops/iou3d_nms/src/iou3d_nms_kernel.cu +++ b/det3d/ops/iou3d_nms/src/iou3d_nms_kernel.cu @@ -1,7 +1,7 @@ -/* +/* 3D IoU Calculation and Rotated NMS(modified from 2D NMS written by others) Written by Shaoshuai Shi -All Rights Reserved 2019. +All Rights Reserved 2019-2020. */ @@ -9,8 +9,8 @@ All Rights Reserved 2019. #define THREADS_PER_BLOCK 16 #define DIVUP(m, n) ((m) / (n) + ((m) % (n) > 0)) -//#define DEBUG -const int THREADS_PER_BLOCK_NMS = sizeof(unsigned long long) * 8; +// #define DEBUG +const int THREADS_PER_BLOCK_NMS = sizeof(unsigned long long) * 8; const float EPS = 1e-8; struct Point { float x, y; @@ -41,7 +41,7 @@ __device__ inline float cross(const Point &p1, const Point &p2, const Point &p0) } __device__ int check_rect_cross(const Point &p1, const Point &p2, const Point &q1, const Point &q2){ - int ret = min(p1.x,p2.x) <= max(q1.x,q2.x) && + int ret = min(p1.x,p2.x) <= max(q1.x,q2.x) && min(q1.x,q2.x) <= max(p1.x,p2.x) && min(p1.y,p2.y) <= max(q1.y,q2.y) && min(q1.y,q2.y) <= max(p1.y,p2.y); @@ -49,24 +49,19 @@ __device__ int check_rect_cross(const Point &p1, const Point &p2, const Point &q } __device__ inline int check_in_box2d(const float *box, const Point &p){ - //params: box (5) [x1, y1, x2, y2, angle] - const float MARGIN = 1e-5; - - float center_x = (box[0] + box[2]) / 2; - float center_y = (box[1] + box[3]) / 2; - float angle_cos = cos(-box[4]), angle_sin = sin(-box[4]); // rotate the point in the opposite direction of box - float rot_x = (p.x - center_x) * angle_cos + (p.y - center_y) * angle_sin + center_x; - float rot_y = -(p.x - center_x) * angle_sin + (p.y - center_y) * angle_cos + center_y; -#ifdef DEBUG - printf("box: (%.3f, %.3f, %.3f, %.3f, %.3f)\n", box[0], box[1], box[2], box[3], box[4]); - printf("center: (%.3f, %.3f), cossin(%.3f, %.3f), src(%.3f, %.3f), rot(%.3f, %.3f)\n", center_x, center_y, - angle_cos, angle_sin, p.x, p.y, rot_x, rot_y); -#endif - return (rot_x > box[0] - MARGIN && rot_x < box[2] + MARGIN && rot_y > box[1] - MARGIN && rot_y < box[3] + MARGIN); + //params: (7) [x, y, z, dx, dy, dz, heading] + const float MARGIN = 1e-2; + + float center_x = box[0], center_y = box[1]; + float angle_cos = cos(-box[6]), angle_sin = sin(-box[6]); // rotate the point in the opposite direction of box + float rot_x = (p.x - center_x) * angle_cos + (p.y - center_y) * (-angle_sin); + float rot_y = (p.x - center_x) * angle_sin + (p.y - center_y) * angle_cos; + + return (fabs(rot_x) < box[3] / 2 + MARGIN && fabs(rot_y) < box[4] / 2 + MARGIN); } __device__ inline int intersection(const Point &p1, const Point &p0, const Point &q1, const Point &q0, Point &ans){ - // fast exclusion + // fast exclusion if (check_rect_cross(p0, p1, q0, q1) == 0) return 0; // check cross standing @@ -82,7 +77,7 @@ __device__ inline int intersection(const Point &p1, const Point &p0, const Point if(fabs(s5 - s1) > EPS){ ans.x = (s5 * q0.x - s1 * q1.x) / (s5 - s1); ans.y = (s5 * q0.y - s1 * q1.y) / (s5 - s1); - + } else{ float a0 = p0.y - p1.y, b0 = p1.x - p0.x, c0 = p0.x * p1.y - p1.x * p0.y; @@ -92,13 +87,13 @@ __device__ inline int intersection(const Point &p1, const Point &p0, const Point ans.x = (b0 * c1 - b1 * c0) / D; ans.y = (a1 * c0 - a0 * c1) / D; } - + return 1; } __device__ inline void rotate_around_center(const Point ¢er, const float angle_cos, const float angle_sin, Point &p){ - float new_x = (p.x - center.x) * angle_cos + (p.y - center.y) * angle_sin + center.x; - float new_y = -(p.x - center.x) * angle_sin + (p.y - center.y) * angle_cos + center.y; + float new_x = (p.x - center.x) * angle_cos + (p.y - center.y) * (-angle_sin) + center.x; + float new_y = (p.x - center.x) * angle_sin + (p.y - center.y) * angle_cos + center.y; p.set(new_x, new_y); } @@ -107,14 +102,19 @@ __device__ inline int point_cmp(const Point &a, const Point &b, const Point &cen } __device__ inline float box_overlap(const float *box_a, const float *box_b){ - // params: box_a (5) [x1, y1, x2, y2, angle] - // params: box_b (5) [x1, y1, x2, y2, angle] - - float a_x1 = box_a[0], a_y1 = box_a[1], a_x2 = box_a[2], a_y2 = box_a[3], a_angle = box_a[4]; - float b_x1 = box_b[0], b_y1 = box_b[1], b_x2 = box_b[2], b_y2 = box_b[3], b_angle = box_b[4]; - - Point center_a((a_x1 + a_x2) / 2, (a_y1 + a_y2) / 2); - Point center_b((b_x1 + b_x2) / 2, (b_y1 + b_y2) / 2); + // params box_a: [x, y, z, dx, dy, dz, heading] + // params box_b: [x, y, z, dx, dy, dz, heading] + + float a_angle = box_a[6], b_angle = box_b[6]; + float a_dx_half = box_a[3] / 2, b_dx_half = box_b[3] / 2, a_dy_half = box_a[4] / 2, b_dy_half = box_b[4] / 2; + float a_x1 = box_a[0] - a_dx_half, a_y1 = box_a[1] - a_dy_half; + float a_x2 = box_a[0] + a_dx_half, a_y2 = box_a[1] + a_dy_half; + float b_x1 = box_b[0] - b_dx_half, b_y1 = box_b[1] - b_dy_half; + float b_x2 = box_b[0] + b_dx_half, b_y2 = box_b[1] + b_dy_half; + + Point center_a(box_a[0], box_a[1]); + Point center_b(box_b[0], box_b[1]); + #ifdef DEBUG printf("a: (%.3f, %.3f, %.3f, %.3f, %.3f), b: (%.3f, %.3f, %.3f, %.3f, %.3f)\n", a_x1, a_y1, a_x2, a_y2, a_angle, b_x1, b_y1, b_x2, b_y2, b_angle); @@ -133,7 +133,7 @@ __device__ inline float box_overlap(const float *box_a, const float *box_b){ box_b_corners[2].set(b_x2, b_y2); box_b_corners[3].set(b_x1, b_y2); - // get oriented corners + // get oriented corners float a_angle_cos = cos(a_angle), a_angle_sin = sin(a_angle); float b_angle_cos = cos(b_angle), b_angle_sin = sin(b_angle); @@ -163,6 +163,12 @@ __device__ inline float box_overlap(const float *box_a, const float *box_b){ if (flag){ poly_center = poly_center + cross_points[cnt]; cnt++; +#ifdef DEBUG + printf("Cross points (%.3f, %.3f): a(%.3f, %.3f)->(%.3f, %.3f), b(%.3f, %.3f)->(%.3f, %.3f) \n", + cross_points[cnt - 1].x, cross_points[cnt - 1].y, + box_a_corners[i].x, box_a_corners[i].y, box_a_corners[i + 1].x, box_a_corners[i + 1].y, + box_b_corners[i].x, box_b_corners[i].y, box_b_corners[i + 1].x, box_b_corners[i + 1].y); +#endif } } } @@ -173,11 +179,17 @@ __device__ inline float box_overlap(const float *box_a, const float *box_b){ poly_center = poly_center + box_b_corners[k]; cross_points[cnt] = box_b_corners[k]; cnt++; +#ifdef DEBUG + printf("b corners in a: corner_b(%.3f, %.3f)", cross_points[cnt - 1].x, cross_points[cnt - 1].y); +#endif } if (check_in_box2d(box_b, box_a_corners[k])){ poly_center = poly_center + box_a_corners[k]; cross_points[cnt] = box_a_corners[k]; cnt++; +#ifdef DEBUG + printf("a corners in b: corner_a(%.3f, %.3f)", cross_points[cnt - 1].x, cross_points[cnt - 1].y); +#endif } } @@ -189,8 +201,8 @@ __device__ inline float box_overlap(const float *box_a, const float *box_b){ for (int j = 0; j < cnt - 1; j++){ for (int i = 0; i < cnt - j - 1; i++){ if (point_cmp(cross_points[i], cross_points[i + 1], poly_center)){ - temp = cross_points[i]; - cross_points[i] = cross_points[i + 1]; + temp = cross_points[i]; + cross_points[i] = cross_points[i + 1]; cross_points[i + 1] = temp; } } @@ -213,44 +225,48 @@ __device__ inline float box_overlap(const float *box_a, const float *box_b){ } __device__ inline float iou_bev(const float *box_a, const float *box_b){ - // params: box_a (5) [x1, y1, x2, y2, angle] - // params: box_b (5) [x1, y1, x2, y2, angle] - float sa = (box_a[2] - box_a[0]) * (box_a[3] - box_a[1]); - float sb = (box_b[2] - box_b[0]) * (box_b[3] - box_b[1]); + // params box_a: [x, y, z, dx, dy, dz, heading] + // params box_b: [x, y, z, dx, dy, dz, heading] + float sa = box_a[3] * box_a[4]; + float sb = box_b[3] * box_b[4]; float s_overlap = box_overlap(box_a, box_b); return s_overlap / fmaxf(sa + sb - s_overlap, EPS); } __global__ void boxes_overlap_kernel(const int num_a, const float *boxes_a, const int num_b, const float *boxes_b, float *ans_overlap){ + // params boxes_a: (N, 7) [x, y, z, dx, dy, dz, heading] + // params boxes_b: (M, 7) [x, y, z, dx, dy, dz, heading] const int a_idx = blockIdx.y * THREADS_PER_BLOCK + threadIdx.y; const int b_idx = blockIdx.x * THREADS_PER_BLOCK + threadIdx.x; - + if (a_idx >= num_a || b_idx >= num_b){ return; } - const float * cur_box_a = boxes_a + a_idx * 5; - const float * cur_box_b = boxes_b + b_idx * 5; + const float * cur_box_a = boxes_a + a_idx * 7; + const float * cur_box_b = boxes_b + b_idx * 7; float s_overlap = box_overlap(cur_box_a, cur_box_b); ans_overlap[a_idx * num_b + b_idx] = s_overlap; } __global__ void boxes_iou_bev_kernel(const int num_a, const float *boxes_a, const int num_b, const float *boxes_b, float *ans_iou){ + // params boxes_a: (N, 7) [x, y, z, dx, dy, dz, heading] + // params boxes_b: (M, 7) [x, y, z, dx, dy, dz, heading] const int a_idx = blockIdx.y * THREADS_PER_BLOCK + threadIdx.y; const int b_idx = blockIdx.x * THREADS_PER_BLOCK + threadIdx.x; - + if (a_idx >= num_a || b_idx >= num_b){ return; } - const float * cur_box_a = boxes_a + a_idx * 5; - const float * cur_box_b = boxes_b + b_idx * 5; + const float * cur_box_a = boxes_a + a_idx * 7; + const float * cur_box_b = boxes_b + b_idx * 7; float cur_iou_bev = iou_bev(cur_box_a, cur_box_b); ans_iou[a_idx * num_b + b_idx] = cur_iou_bev; } __global__ void nms_kernel(const int boxes_num, const float nms_overlap_thresh, const float *boxes, unsigned long long *mask){ - //params: boxes (N, 5) [x1, y1, x2, y2, ry] + //params: boxes (N, 7) [x, y, z, dx, dy, dz, heading] //params: mask (N, N/THREADS_PER_BLOCK_NMS) const int row_start = blockIdx.y; @@ -261,20 +277,22 @@ __global__ void nms_kernel(const int boxes_num, const float nms_overlap_thresh, const int row_size = fminf(boxes_num - row_start * THREADS_PER_BLOCK_NMS, THREADS_PER_BLOCK_NMS); const int col_size = fminf(boxes_num - col_start * THREADS_PER_BLOCK_NMS, THREADS_PER_BLOCK_NMS); - __shared__ float block_boxes[THREADS_PER_BLOCK_NMS * 5]; + __shared__ float block_boxes[THREADS_PER_BLOCK_NMS * 7]; if (threadIdx.x < col_size) { - block_boxes[threadIdx.x * 5 + 0] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 0]; - block_boxes[threadIdx.x * 5 + 1] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 1]; - block_boxes[threadIdx.x * 5 + 2] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 2]; - block_boxes[threadIdx.x * 5 + 3] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 3]; - block_boxes[threadIdx.x * 5 + 4] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 4]; + block_boxes[threadIdx.x * 7 + 0] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 0]; + block_boxes[threadIdx.x * 7 + 1] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 1]; + block_boxes[threadIdx.x * 7 + 2] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 2]; + block_boxes[threadIdx.x * 7 + 3] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 3]; + block_boxes[threadIdx.x * 7 + 4] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 4]; + block_boxes[threadIdx.x * 7 + 5] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 5]; + block_boxes[threadIdx.x * 7 + 6] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 6]; } __syncthreads(); if (threadIdx.x < row_size) { const int cur_box_idx = THREADS_PER_BLOCK_NMS * row_start + threadIdx.x; - const float *cur_box = boxes + cur_box_idx * 5; + const float *cur_box = boxes + cur_box_idx * 7; int i = 0; unsigned long long t = 0; @@ -283,7 +301,7 @@ __global__ void nms_kernel(const int boxes_num, const float nms_overlap_thresh, start = threadIdx.x + 1; } for (i = start; i < col_size; i++) { - if (iou_bev(cur_box, block_boxes + i * 5) > nms_overlap_thresh){ + if (iou_bev(cur_box, block_boxes + i * 7) > nms_overlap_thresh){ t |= 1ULL << i; } } @@ -294,19 +312,22 @@ __global__ void nms_kernel(const int boxes_num, const float nms_overlap_thresh, __device__ inline float iou_normal(float const * const a, float const * const b) { - float left = fmaxf(a[0], b[0]), right = fminf(a[2], b[2]); - float top = fmaxf(a[1], b[1]), bottom = fminf(a[3], b[3]); + //params: a: [x, y, z, dx, dy, dz, heading] + //params: b: [x, y, z, dx, dy, dz, heading] + + float left = fmaxf(a[0] - a[3] / 2, b[0] - b[3] / 2), right = fminf(a[0] + a[3] / 2, b[0] + b[3] / 2); + float top = fmaxf(a[1] - a[4] / 2, b[1] - b[4] / 2), bottom = fminf(a[1] + a[4] / 2, b[1] + b[4] / 2); float width = fmaxf(right - left, 0.f), height = fmaxf(bottom - top, 0.f); float interS = width * height; - float Sa = (a[2] - a[0]) * (a[3] - a[1]); - float Sb = (b[2] - b[0]) * (b[3] - b[1]); + float Sa = a[3] * a[4]; + float Sb = b[3] * b[4]; return interS / fmaxf(Sa + Sb - interS, EPS); } __global__ void nms_normal_kernel(const int boxes_num, const float nms_overlap_thresh, const float *boxes, unsigned long long *mask){ - //params: boxes (N, 5) [x1, y1, x2, y2, ry] + //params: boxes (N, 7) [x, y, z, dx, dy, dz, heading] //params: mask (N, N/THREADS_PER_BLOCK_NMS) const int row_start = blockIdx.y; @@ -317,20 +338,22 @@ __global__ void nms_normal_kernel(const int boxes_num, const float nms_overlap_t const int row_size = fminf(boxes_num - row_start * THREADS_PER_BLOCK_NMS, THREADS_PER_BLOCK_NMS); const int col_size = fminf(boxes_num - col_start * THREADS_PER_BLOCK_NMS, THREADS_PER_BLOCK_NMS); - __shared__ float block_boxes[THREADS_PER_BLOCK_NMS * 5]; + __shared__ float block_boxes[THREADS_PER_BLOCK_NMS * 7]; if (threadIdx.x < col_size) { - block_boxes[threadIdx.x * 5 + 0] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 0]; - block_boxes[threadIdx.x * 5 + 1] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 1]; - block_boxes[threadIdx.x * 5 + 2] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 2]; - block_boxes[threadIdx.x * 5 + 3] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 3]; - block_boxes[threadIdx.x * 5 + 4] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 4]; + block_boxes[threadIdx.x * 7 + 0] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 0]; + block_boxes[threadIdx.x * 7 + 1] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 1]; + block_boxes[threadIdx.x * 7 + 2] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 2]; + block_boxes[threadIdx.x * 7 + 3] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 3]; + block_boxes[threadIdx.x * 7 + 4] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 4]; + block_boxes[threadIdx.x * 7 + 5] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 5]; + block_boxes[threadIdx.x * 7 + 6] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 6]; } __syncthreads(); if (threadIdx.x < row_size) { const int cur_box_idx = THREADS_PER_BLOCK_NMS * row_start + threadIdx.x; - const float *cur_box = boxes + cur_box_idx * 5; + const float *cur_box = boxes + cur_box_idx * 7; int i = 0; unsigned long long t = 0; @@ -339,7 +362,7 @@ __global__ void nms_normal_kernel(const int boxes_num, const float nms_overlap_t start = threadIdx.x + 1; } for (i = start; i < col_size; i++) { - if (iou_normal(cur_box, block_boxes + i * 5) > nms_overlap_thresh){ + if (iou_normal(cur_box, block_boxes + i * 7) > nms_overlap_thresh){ t |= 1ULL << i; } } @@ -356,7 +379,7 @@ void boxesoverlapLauncher(const int num_a, const float *boxes_a, const int num_b dim3 blocks(DIVUP(num_b, THREADS_PER_BLOCK), DIVUP(num_a, THREADS_PER_BLOCK)); // blockIdx.x(col), blockIdx.y(row) dim3 threads(THREADS_PER_BLOCK, THREADS_PER_BLOCK); - + boxes_overlap_kernel<<>>(num_a, boxes_a, num_b, boxes_b, ans_overlap); #ifdef DEBUG cudaDeviceSynchronize(); // for using printf in kernel function @@ -367,8 +390,11 @@ void boxesioubevLauncher(const int num_a, const float *boxes_a, const int num_b, dim3 blocks(DIVUP(num_b, THREADS_PER_BLOCK), DIVUP(num_a, THREADS_PER_BLOCK)); // blockIdx.x(col), blockIdx.y(row) dim3 threads(THREADS_PER_BLOCK, THREADS_PER_BLOCK); - + boxes_iou_bev_kernel<<>>(num_a, boxes_a, num_b, boxes_b, ans_iou); +#ifdef DEBUG + cudaDeviceSynchronize(); // for using printf in kernel function +#endif } diff --git a/det3d/ops/point_cloud/point_cloud_ops.py b/det3d/ops/point_cloud/point_cloud_ops.py index e7eb7e9..3583508 100644 --- a/det3d/ops/point_cloud/point_cloud_ops.py +++ b/det3d/ops/point_cloud/point_cloud_ops.py @@ -44,7 +44,7 @@ def _points_to_voxel_reverse_kernel( if voxelidx == -1: voxelidx = voxel_num if voxel_num >= max_voxels: - continue + continue voxel_num += 1 coor_to_voxelidx[coor[0], coor[1], coor[2]] = voxelidx coors[voxelidx] = coor @@ -98,7 +98,7 @@ def _points_to_voxel_kernel( if voxelidx == -1: voxelidx = voxel_num if voxel_num >= max_voxels: - continue + continue voxel_num += 1 coor_to_voxelidx[coor[0], coor[1], coor[2]] = voxelidx coors[voxelidx] = coor diff --git a/det3d/torchie/apis/train.py b/det3d/torchie/apis/train.py index fe6d8cb..ad48ef0 100644 --- a/det3d/torchie/apis/train.py +++ b/det3d/torchie/apis/train.py @@ -25,64 +25,6 @@ from .env import get_root_logger -def example_convert_to_torch(example, dtype=torch.float32, device=None) -> dict: - assert device is not None - - example_torch = {} - float_names = ["voxels", "bev_map"] - for k, v in example.items(): - if k in ["anchors", "reg_targets", "reg_weights"]: - res = [] - for kk, vv in v.items(): - res.append(torch.tensor(vv).to(device, non_blocking=True)) - # vv = np.array(vv) - # res.append(torch.tensor(vv, dtype=torch.float32, - # device=device)) - example_torch[k] = res - elif k in float_names: - # slow when directly provide fp32 data with dtype=torch.half - example_torch[k] = v.to(device, non_blocking=True) - # example_torch[k] = torch.tensor(v, - # dtype=torch.float32, - # device=device) - elif k in ["coordinates", "num_points"]: - example_torch[k] = v.to(device, non_blocking=True) - # example_torch[k] = torch.tensor(v, - # dtype=torch.int32, - # device=device) - elif k == "labels": - res = [] - for kk, vv in v.items(): - # vv = np.array(vv) - res.append(torch.tensor(vv).to(device, non_blocking=True)) - example_torch[k] = res - elif k == "points": - example_torch[k] = v.to(device, non_blocking=True) - # example_torch[k] = torch.tensor(v, - # dtype=torch.float, - # device=device) - elif k in ["anchors_mask"]: - res = [] - for kk, vv in v.items(): - res.append(torch.tensor(vv).to(device, non_blocking=True)) - example_torch[k] = res - elif k == "calib": - calib = {} - for k1, v1 in v.items(): - # calib[k1] = torch.tensor(v1, dtype=dtype, device=device) - calib[k1] = torch.tensor(v1).to(device, non_blocking=True) - example_torch[k] = calib - elif k == "num_voxels": - example_torch[k] = v.to(device, non_blocking=True) - # example_torch[k] = torch.tensor(v, - # dtype=torch.int64, - # device=device) - else: - example_torch[k] = v - - return example_torch - - def example_to_device(example, device=None, non_blocking=False) -> dict: assert device is not None @@ -98,6 +40,10 @@ def example_to_device(example, device=None, non_blocking=False) -> dict: "num_points", "points", "num_voxels", + "cyv_voxels", + "cyv_num_voxels", + "cyv_coordinates", + "cyv_num_points" ]: example_torch[k] = v.to(device, non_blocking=non_blocking) elif k == "calib": diff --git a/det3d/torchie/parallel/collate.py b/det3d/torchie/parallel/collate.py index f7b3fa8..90eeb9f 100644 --- a/det3d/torchie/parallel/collate.py +++ b/det3d/torchie/parallel/collate.py @@ -103,7 +103,8 @@ def collate_kitti(batch_list, samples_per_gpu=1): # voxel_nums_list = example_merged["num_voxels"] # example_merged.pop("num_voxels") for key, elems in example_merged.items(): - if key in ["voxels", "num_points", "num_gt", "voxel_labels", "num_voxels"]: + if key in ["voxels", "num_points", "num_gt", "voxel_labels", "num_voxels", + "cyv_voxels", "cyv_num_points", "cyv_num_voxels"]: ret[key] = torch.tensor(np.concatenate(elems, axis=0)) elif key in [ "gt_boxes", @@ -133,7 +134,7 @@ def collate_kitti(batch_list, samples_per_gpu=1): ret[key][k1].append(v1) for k1, v1 in ret[key].items(): ret[key][k1] = torch.tensor(np.stack(v1, axis=0)) - elif key in ["coordinates", "points"]: + elif key in ["coordinates", "points", "cyv_coordinates"]: coors = [] for i, coor in enumerate(elems): coor_pad = np.pad( @@ -152,6 +153,8 @@ def collate_kitti(batch_list, samples_per_gpu=1): for kk, vv in ret[key].items(): res.append(torch.stack(vv)) ret[key] = res + elif key == 'gt_boxes_and_cls': + ret[key] = torch.tensor(np.stack(elems, axis=0)) else: ret[key] = np.stack(elems, axis=0) diff --git a/det3d/torchie/trainer/checkpoint.py b/det3d/torchie/trainer/checkpoint.py index 68c25fa..728d730 100644 --- a/det3d/torchie/trainer/checkpoint.py +++ b/det3d/torchie/trainer/checkpoint.py @@ -47,6 +47,7 @@ def load_state_dict(module, state_dict, strict=False, logger=None): own_state = module.state_dict() for name, param in state_dict.items(): + # a hacky fixed to load a new voxelnet if name not in own_state: unexpected_keys.append(name) continue diff --git a/det3d/torchie/trainer/trainer.py b/det3d/torchie/trainer/trainer.py index 748236b..32d6eda 100644 --- a/det3d/torchie/trainer/trainer.py +++ b/det3d/torchie/trainer/trainer.py @@ -44,7 +44,12 @@ def example_to_device(example, device, non_blocking=False) -> dict: "coordinates", "num_points", "points", - "num_voxels" + "num_voxels", + "cyv_voxels", + "cyv_num_voxels", + "cyv_coordinates", + "cyv_num_points", + "gt_boxes_and_cls" ]: example_torch[k] = v.to(device, non_blocking=non_blocking) elif k == "calib": diff --git a/det3d/utils/config_tool.py b/det3d/utils/config_tool.py index e681946..96c6d1f 100644 --- a/det3d/utils/config_tool.py +++ b/det3d/utils/config_tool.py @@ -37,11 +37,16 @@ def change_detection_range(model_config, new_range): def get_downsample_factor(model_config): - neck_cfg = model_config["neck"] + try: + neck_cfg = model_config["neck"] + except: + model_config = model_config['first_stage_cfg'] + neck_cfg = model_config['neck'] downsample_factor = np.prod(neck_cfg.get("ds_layer_strides", [1])) if len(neck_cfg.get("us_layer_strides", [])) > 0: downsample_factor /= neck_cfg.get("us_layer_strides", [])[-1] - backbone_cfg = model_config["backbone"] + + backbone_cfg = model_config['backbone'] downsample_factor *= backbone_cfg["ds_factor"] downsample_factor = int(downsample_factor) assert downsample_factor > 0 diff --git a/det3d/version.py b/det3d/version.py deleted file mode 100644 index c06801d..0000000 --- a/det3d/version.py +++ /dev/null @@ -1,4 +0,0 @@ -# GENERATED VERSION FILE -# TIME: Tue Feb 18 13:03:07 2020 -__version__ = '1.0.rc0+6c2b891' -short_version = '1.0.rc0' diff --git a/det3d/visualization/__init__.py b/det3d/visualization/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/det3d/visualization/kitti.py b/det3d/visualization/kitti.py deleted file mode 100644 index a2fcb44..0000000 --- a/det3d/visualization/kitti.py +++ /dev/null @@ -1,483 +0,0 @@ -import os -import numpy as np -from OpenGL.GL import glLineWidth -import pyqtgraph as pg -import pyqtgraph.opengl as gl -import argparse - - -class Object3d(object): - """ 3d object label """ - - def __init__(self, label_file_line): - data = label_file_line.split(" ") - data[1:] = [float(x) for x in data[1:]] - - # extract label, truncation, occlusion - self.type = data[0] # 'Car', 'Pedestrian', ... - self.truncation = data[1] # truncated pixel ratio [0..1] - self.occlusion = int( - data[2] - ) # 0=visible, 1=partly occluded, 2=fully occluded, 3=unknown - self.alpha = data[3] # object observation angle [-pi..pi] - - # extract 2d bounding box in 0-based coordinates - self.xmin = data[4] # left - self.ymin = data[5] # top - self.xmax = data[6] # right - self.ymax = data[7] # bottom - self.box2d = np.array([self.xmin, self.ymin, self.xmax, self.ymax]) - - # extract 3d bounding box information - self.h = data[8] # box height - self.w = data[9] # box width - self.l = data[10] # box length (in meters) - # location (x,y,z) in camera coord. - self.t = (data[11], data[12], data[13]) - self.ry = data[14] # yaw angle (around Y-axis in camera coordinates) [-pi..pi] - - def print_object(self): - print( - "Type, truncation, occlusion, alpha: %s, %d, %d, %f" - % (self.type, self.truncation, self.occlusion, self.alpha) - ) - print( - "2d bbox (x0,y0,x1,y1): %f, %f, %f, %f" - % (self.xmin, self.ymin, self.xmax, self.ymax) - ) - print("3d bbox h,w,l: %f, %f, %f" % (self.h, self.w, self.l)) - print( - "3d bbox location, ry: (%f, %f, %f), %f" - % (self.t[0], self.t[1], self.t[2], self.ry) - ) - - -# ----------------------------------------------------------------------------------------- - - -def inverse_rigid_trans(Tr): - """ Inverse a rigid body transform matrix (3x4 as [R|t]) - [R'|-R't; 0|1] - """ - inv_Tr = np.zeros_like(Tr) # 3x4 - inv_Tr[0:3, 0:3] = np.transpose(Tr[0:3, 0:3]) - inv_Tr[0:3, 3] = np.dot(-np.transpose(Tr[0:3, 0:3]), Tr[0:3, 3]) - return inv_Tr - - -class Calibration(object): - """ Calibration matrices and utils - 3d XYZ in