Skip to content

aggregate reports - why the complicated pipeline? #250

Open
@Keeper-of-the-Keys

Description

@Keeper-of-the-Keys

Hey,
I hope I am not reopening something that has been discussed ad nauseam already, but I didn't see any discussion in the bug tracker here.

What is the reason that the pipeline for generating aggregate reports is so long?
By long I mean:

  1. OpenDMARC writes a HistoryFile
  2. opendmarc-importstats imports said history file into a db
  3. opendmarc-reports generates a report based on the db and send it

Superficially it would seem that OpenDMARC could also write directly to the DB instead of a file, I assume people smarter than me have thought about this a lot and came to the conclusion that the above pipeline is better and I would like to understand those reasons.

The reasons that I could think about are that writing to a file is "easier"/"cheaper in compute" and less prone to lockup/failure than writing to a db and that importstats may be very intensive for larger setups so you may not want to run that on the same machine.

Activity

dgeo

dgeo commented on Feb 19, 2024

@dgeo

Here we have 4 different machines, and one to import all files… using a unique DB would add a SPOF, using a DB cluster would add complexity (to an already-not-that-simple setup)… And adding IO/locks per mail seems a bad idea (there are many mails per second sometime…)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      aggregate reports - why the complicated pipeline? · Issue #250 · trusteddomainproject/OpenDMARC