Skip to content
This repository has been archived by the owner on Dec 31, 2024. It is now read-only.

Why L2 error in V3 version ( which is your latest version) is higher than V2 version? #6

Open
frkmac3 opened this issue Jan 26, 2024 · 1 comment

Comments

@frkmac3
Copy link

frkmac3 commented Jan 26, 2024

No description provided.

@frkmac3 frkmac3 changed the title Why L2 error in V3 version ( which is your latest version) is lower than V2 version? Why L2 error in V3 version ( which is your latest version) is higher than V2 version? Jan 26, 2024
@PointsCoder
Copy link
Owner

@frkmac3

V2 version was tested under the setting that is similar to ST-P3 (average over average, but gt occ map is different, code is mainly from here). In V3 we adopted a more standard evaluation which is tested and aligned with the paper-reported scores in UniAD and ST-P3. Therefore, please refer to the latest version V3 for results.

For more context, UniAD and ST-P3 use different evaluation metrics:

  • Validation samples are different: UniAD evaluates 6019 frames with masking, while ST-P3 drops the first and last several frames in a log (around 4800 test frames). This will have marginal effects on the performance.
  • GT-Occupancy: UniAD only cares about vehicles in the collision calculation, while ST-P3 takes both vehicles and pedestrians in their computation, and the vehicle numbers are also different in the two implementations. This will cause non-negligible impact on collision.
  • Average over agverage. UniAD computes L2 and collision at each time-step, while ST-P3 considers the average of all previous time-steps. This will cause a significant difference in all results.

We also release the gt_occ and evaluation code. The results should be aligned with the provided code. If you'd like to compare, please use our code for evaluation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants