-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API changes #112
API changes #112
Conversation
Split ParseEmail into three functions with the function signature io.Reader (Email, error) The three functions are ParseEmail (the existing main entry point) ParseEmailHeaders ParseEmailWithoutAttachments ParseEmailHeadings replaces ParseHeadings that takes an Email parameter instead of an io.Reader. The existing ParseEmail function is renamed to parseEmail and the processing determined by the new processType type.
…chments not wanted
The test comparison function has also been moved from reflect.DeepEqual to go-cmp.Equal.
I've run a quick benchmark to compare The results are as follows on my ancient thinkpad t470s:
I'm surprised that |
Apologies, after looking at my work from the weekend I realise I hadn't put types into the I can provide a new PR after receiving comments, if any. |
I've been playing with some better
Again, I'm happy to rework this patch set based on your thoughts and advice. |
This simplifies the approach to skipping both inline and attached files by continuing to the next call to multipartReader.NextPart.
Thank you for taking the time to open this PR. I had a chance to read through the code and description carefully. I now see that this PR closely follows a discussion that we have already had in that it exports My previous concerns regarding an explicit
You can find more details on this in my previous comments: #85 (comment) and #85 (comment). I appreciate the desire to process emails without attachments. However, I am concerned that introducing functions for every possible use case (e.g., Instead of multiple functions, I am exploring alternative solutions to provide the desired functionality and performance. These could include:
Both approaches have trade-offs, such as memory usage for the original reader versus performance benefits of lazy loading. I will carefully consider these options and try and work on that direction soon. |
Hi @mnako Thanks for considering this PR and your thoughtful comments. I guess there are two major considerations: the API of the package, and the implementation. Excuse the long essay which follows, which is meant to be helpful! APIRegarding the API I think it would be helpful for users of the package to skip elements of processing such as bodies or attachments. Users may also wish to specify non-strict options such as looser or custom date and address parsing. We have discussed a single entry point at A set of options that spring to mind include:
Having a single entry point suggests that a "sparse" A module user using full email parsing could switch from
to the following if they only wanted headers:
or to the following if they wanted to omit attachments and have custom date parsing:
etc. (I'd still argue that I wonder also if there is scope to reduce the number of public package identifiers shown by ImplementationI suggest that the implementation of the mechanics of It may be worth considering forming submodules to help make Regarding processing email without including reading all the attachments into memory, I suggest that (with reference to 14), that For example, AttachedFile which presently is defined as follows: type AttachedFile struct {
ContentType ContentTypeHeader
ContentDisposition ContentDispositionHeader
Data []byte
} Could be altered to: type AttachedFile struct {
ContentType ContentTypeHeader
ContentDisposition ContentDispositionHeader
Reader io.Reader // decoder wrapped io.Reader
Size int // number of bytes
}
Package users could choose whether to process the file, and how. If it is possible to wrap the Thanks again for your excellent work on this module, and for the chance for a discussion. |
thank you for the comments. Apologies, but I only have just enough time to send two comments now:
I completely agree with the need. I just think that we need to figure out how best to do it, before calling letters 1.0-mature.
If I am understanding this correctly, this suggests a number of helper functions to construct functional options that sets a feature on the email parser's config. On the first read, I do find this idea elegant and composable. Please allow me some time to read all of your comments and think about composability (are the |
Hi @mnako Thanks for your further response. I've made a quick sketch of how functional options might be implemented here. If you pull the
Further note, 14 Jan 2025 A further note about the possible benefits of separating initialising the parser from running it is that the initialisation could only be needed once but used to process many emails. An example use case is the code here. |
@rorycl ,
Thank you for sharing that code. I mostly agree with the suggested approach for options. One place that I still find not flexible enough are the emailParser := letters.NewEmailParser(
WithFileFilter(AllFiles)
)
email, err := emailParser.ParseEmail(rawEmail) none: noFilesEmailParser := letters.NewEmailParser(
WithFileFilter(NoFiles)
)
email, err := noFilesEmailParser.ParseEmail(rawEmail) or (most interesingly) some files: customJPGOnlyEmailParser := letters.NewEmailParser(
WithFileFilter(
func(cth ContentTypeHeader) bool {
return strings.HasSuffix(strings.ToLower(cth.Params["name"]), ".jpg")
},
),
)
email, err := customJPGOnlyEmailParser.ParseEmail(rawEmail) This avoids the code smell of I included more information in https://github.com/mnako/letters/pull/119/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R178. Please let me know what you think and if this covers your use case. This PR is still very far from being production ready, but I can commit to seeing it through if we agree on this direction. |
cc @ValleyCrisps for your opinion on #112 (comment) and #119 |
Hi @mnako Your suggestions for functional options/filters sound great. I particularly like the idea of the options allowing the library user to extract (for example) jpegs. Allowing Additionally users could supply a func to decide what to do with files at the io.Reader level, as set out in #118 and my Therefore an opt such as I hope to have time later in the week to do a refactoring along the lines I've mentioned in #118 if you have time to make some comments beforehand, that would be great. I'd like to make that work as useful as possible. -- I suggest we close this PR and move the discussion to #118 if that works for you. |
It is very nice to see the active discussion, and the thought going into the parse exensibility! Let me weigh in with a few comments as well. In general, I think we should prioritise extensible and maintainable solutions over hyper-customisation. I think the functional filters @mnako proposed cover the most common use-cases, and offer additional flexibility on top of it (I really like the option to parse only some file types, I would not have thougth of it myself, but I can see its utility). For the more specialised use-cases, I wonder if it would not be better keep the email parser simple (single-responsibility), and instead address them with post-parsing logic. |
I haven't seen any suggestions for hyper-customisation. Certainly my own suggestions have centred (even if poorly phrased) on allowing part-processing of an email or the supply of user-defined functions. So far these suggestions have centred on:
These suggestions all hinge on whether the parser is defined by a struct and if its main funcs hang off that struct. Having the parser defined by a struct opens up customisation possibilities. As it happens in the 10s of thousands of emails I've used to parse with So what is a user to do when they have to parse an email but letters won't accept the date or address (for example). Get another library? Might it not be better to just let them supply their own func if that is going to let them continue work? The main power of a Allowing users to override default funcs is, in my view, the way to go. The other tweaks will likely make a big improvement to memory usage and processing speed too. |
Thank you for your comments, @ValleyCrisps. Since we agree on all points, there is no need for me to come back to the |
Hi @mnako
I've had a chance to look at how to potentially implement new package level funcs, as set out in this PR. Let me know what you think.
Cheers,
Rory
api: split main package ParseEmail func into several variants
Split ParseEmail into three functions with the function signature
The three functions are
ParseEmailHeadings replaces ParseHeadings that takes an Email parameter
instead of an io.Reader.
The existing ParseEmail function is renamed to parseEmail and the
processing options determined by the new processType type.