-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running preprocessing pipeline on Tiles object #332
Comments
Hi @PaulScemama , thanks for the suggestion! |
@jacob-rosenthal definitely! That makes a lot of sense and I will give it a go. This will be my first opensource contribution so bear with me :). Btw just came upon this package a week ago and really enjoy the documentation, effort put into readability, etc. so thank you! I'll start working on it shortly. |
@jacob-rosenthal just ran
Do you have any idea how to resolve this; or where I may have done something wrong? Edit: additionally, this is what I get when I run
|
@jacob-rosenthal new question (and let me know if there is a better place to pose these questions for you).
Any thoughts? |
Maybe consider just writing init from scratch? Since your case is pretty specific, you probably don't need all the logic used when initializing For the Java VM issue, did the tests pass and you just got a message, or was it causing the testing to fail? If tests pass, probably ok to ignore other messages. Not sure what could be causing it, but first thought would be to try again in a fresh environment. Javabridge can be finicky |
To the first part - that sounds good! To the second - I will have to double check when I can get to my computer. But pretty sure a singular test failed and that was the message. It hasn't halted development on this specific task, but I'll look into it a bit more and try a fresh env. |
@jacob-rosenthal two-parter (again).
Thoughts? |
I guess the idea is that you could initialize SlideData with tiles, and then save it to h5path. Agree that that that could be a good way to support your usecase without even changing the API. Maybe, instead of making a new class, you could just add a new method to SlideData, something like One thing to avoid is rewriting all the logic for distributed and non-distributed processing. We want to keep that in one place, and it should be the same regardless of wherever the tiles are coming from. So might need to pull it out into new methods and call them via I would say don't worry about melding tiles, throwing an informative exception should be good What do you think? Thanks! |
I agree full heartedly. I'll get on it asap and keep you posted. Thanks! |
I had a similar use case of running further processing on existing h5path files and took a stab at a solution here #337 It's not exactly the approach described in this issue of running a pipeline directly on a Tiles object, but instead refactors SlideData to allow reading and updating tiles from an h5path file. More generally, it separates the tile generation from running pipelines, so that run can be called repeatedly to tack on transforms or can be called on tiles in an existing h5path file. This did make breaking changes by moving tile size, stride, padding, etc. from run to the SlideData constructor, but I feel like this abstraction does make sense. |
Ah I should have read through that beforehand. I looked through the code once through and I like the ideas. I am still learning the codebase so I do have a question:
Then you have one method - run() which runs a pipeline on self.tiles() Then in the case that you've run a pipeline and want to run it again on the same tiles, you can either call run() again which modifies the tiles in-place, or you can keep the first set of transformed tiles by (writing to h5path, etc.) and then calling run(). Let me know if there are any flaws in my thinking? |
I think the issue with providing a directory with a bag of tile files is that it doesn't have meaningful structure as a single slide, so |
Edit: If I have a bunch of .pngs that are from a single slide (which I do), I could just save them as an h5 and then use that as input to SlideData(), then run() and I'm all set? That is what you're proposing and I think that's definitely worthwhile. So I should probably create TileData() that doesn't care if the tiles are from the same slide or not? Because Tiles() seems to only be instantiated provided the tiles are from the same slide (through passing in an h5 manager). Am I correct in thinking that? |
I guess we need to decide the best way to handle this. It could go a couple ways for your usecase: Option 1:
This mechanism should also support running preprocessing on any existing tiles (e.g. #337) because at that point they are already in h5 so it doesn't matter where the tiles came from (i.e. whether they were loaded from disk or created by running a pipeline previously) Option 2:
This would be more similar to the mechanism for generating tiles from a backend, because it would be loading the images from disk. I think the main difference is that the tiles wouldn't be read from disk and put into h5 at the time of initializing, since it would use the file paths to read the tiles at the time of running a pipeline. Thoughts on the best/most logical way to implement? Need to think about performance and also try to make the API as consistent as possible. @PaulScemama @tddough98 |
Edit: With my current understanding, I think Option 2 may be more logical. The reasons I think this are
My question: Do you think we should create separate backend classes that has as input a directory instead of a filename? Or should we modify the current backends to accommodate a directory input? My feeling is the latter is a better option, but I could be wrong. Thoughts? @tddough98 @jacob-rosenthal |
I begin at the Tile level. That is I am beginning with regions extracted from a WSI where these regions are in the form of .pngs. I want to run preprocessing on these Tiles in a parallel fashion. From what I've seen there is only this parallel preprocessing done for a
slide
object (slide.run()
).Describe the solution you'd like
I think an alternative to running a preprocessing pipeline on the slide level would be to have a way to run a preprocessing pipeline on the Tiles level. That is, I have a bunch of Tiles in a Tiles instance. I then do something like
Tiles.run()
and it runs a parallel preprocessing.Describe alternatives you've considered
Of course, I could do a for loop across all the tiles and sequentially apply transforms/preprocessing/etc.
Additional context
I think this would be nice functionality to add for user flexibility. I am willing to try and implement it myself but wanted to get the thoughts of other contributors. Is this worthwhile? Is this pointless? Is there a better way?
Thanks in advance!
The text was updated successfully, but these errors were encountered: