Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic/gc fixes post 3.10 #4192

Merged
merged 12 commits into from
Dec 18, 2018
Merged

Topic/gc fixes post 3.10 #4192

merged 12 commits into from
Dec 18, 2018

Conversation

muellmusik
Copy link
Contributor

@muellmusik muellmusik commented Dec 1, 2018

Purpose and Motivation

This PR primarily addresses a class of GC issues previously not widely noted. The pattern is (within one primitive):

  1. get the receiver from the stack
  2. create a new result object
  3. set the result object on the stack
  4. create new objects while continuing to access the receiver

As the receiver is not on the stack anymore, it may be collected when new objects are created in 4., before we are finished using it, if it is not otherwise reachable.

This fixes #3454, fixes #4179.

In addition:

  • I did a code review of and fixed two other primitives that could have been subject to this error
  • I updated the Writing Primitives help file to explain this issue and how to avoid it
  • I added comments in appropriate places in other primitives explaining why this was not a problem in those cases, in the hopes that this will avoid people reusing that code without understanding (this has been a problem in the past)
  • I fixed one other GC related issue.

Types of changes

  • Documentation (non-code change which corrects or adds documentation for existing features)
  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • All previous tests are passing
  • Tests were updated or created to address changes in this PR, and tests are passing
  • Updated documentation, if necessary
  • This PR is ready for review

@mtmccrea
Copy link
Member

mtmccrea commented Dec 1, 2018

This is great, thank you @muellmusik!! It's been an outstanding issue for a while and will really help some underlying issues with the ATK (flopping large matrices).

You mentioned that this might fix #4179, did you have a chance to test the example posted there (case 1)?

@muellmusik
Copy link
Contributor Author

You mentioned that this might fix #4179, did you have a chance to test the example posted there (case 1)?

Yes, I get "Made it! Lucky you."

But you should test as well.

Copy link
Contributor

@mossheim mossheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @muellmusik ! The logic on this makes sense to me for the most part, and it fixes both the #3454 and #4179 examples for me.

lang/LangPrimSource/PyrListPrim.cpp Outdated Show resolved Hide resolved
lang/LangPrimSource/PyrStringPrim.cpp Outdated Show resolved Hide resolved
lang/LangPrimSource/SC_CoreAudioPrim.cpp Outdated Show resolved Hide resolved
lang/LangPrimSource/PyrStringPrim.cpp Outdated Show resolved Hide resolved
lang/LangPrimSource/PyrListPrim.cpp Outdated Show resolved Hide resolved
HelpSource/Guides/WritingPrimitives.schelp Show resolved Hide resolved
HelpSource/Guides/WritingPrimitives.schelp Outdated Show resolved Hide resolved
HelpSource/Guides/WritingPrimitives.schelp Outdated Show resolved Hide resolved
HelpSource/Guides/WritingPrimitives.schelp Outdated Show resolved Hide resolved
lang/LangPrimSource/PyrStringPrim.cpp Show resolved Hide resolved
@mossheim
Copy link
Contributor

What do you think of adding a sclang-side test for this? Both of the examples from the two issues linked seem decent candidates, they don't take too long to run either.

@muellmusik
Copy link
Contributor Author

What do you think of adding a sclang-side test for this? Both of the examples from the two issues linked seem decent candidates, they don't take too long to run either.

Good idea. I'll add that and do the changes above.

Longer term (i.e in #3384) I think a better approach would be to just ensure all created objects are reachable at least until the primitive returns. We've already got NewPyrObjectPtr there for new objects which catches cases where new objects are not consumed, but we could use a similar smart pointer to handle objects in primitives accessed from the stack or objects' slots. I think (well hope) the performance hit from this could be kept very small, and other than that all it would mean would be that in a tiny number of cases objects would be GC'd later than otherwise. In exchange it would remove a whole class of nasty errors and pitfalls, and allow a more familiar and intuitive style (objects remain valid least until they go out of scope) for primitive writing.

@muellmusik
Copy link
Contributor Author

Changes done.

@mossheim mossheim added the comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library" label Dec 15, 2018
++g->sp; SetObject(g->sp, result); // push the result array on the stack, so both it and rec are reachable
... // further allocations which make use of the receiver to populate result
--g->sp; // pop the stack back to the receiver slot since we stored result there above
SetObject(rec, result); // now set the receiver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this supposed to be "model code":

  • make the spacing of * in types consistent -- generally codebase seems to prefer 1 space to the left, 0 to the right, i.e. how result is declared
  • avoid unnecessary abbreviation - receiver instead of rec
  • place SetObject(g->sp, result); on a new line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Happy btw when you or anyone spots cosmetic things like this for you to just fix them in the online editor in cases where that would be faster than the discussion/agreement/commit/push cycle. There're enough big things to spend time on. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple reasons why i don't do that:

  • to be respectful i should make sure it's ok to push to your remote branch first in case you have changes locally. that takes nearly as much time as waiting for you to make the commit yourself.
  • some people (myself included) have an amend/rebase-oriented workflow, and adding commits via the web UI would complicate that.
  • i believe the incentive should be for contributors to review their own code for style. it saves the most time overall if a reviewer doesn't have to make a new commit or comment in the first place. i realize not everyone may be as attentive or interested in doing that, but it's also no fun if things fall into a pattern where some maintainers are on 'cleanup duty'.
  • while it has sometimes resulted in unproductive churn, i think these conversations are actually on the whole quite good at helping to develop and communicate informal community standards and expectations. at least for me, a lot of interesting ideas have come out of that kind of interaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see all that. But I also worry about the balance between strict and consistent style and the ways in which this can be off-putting to casual contributors, especially if they're more relaxed about such things. As we really do want more people to be resolving issues I think we have to bear that in mind. Sometimes just doing things if they're trivial cosmetic fixes is a way of showing good faith on that and keeping people on board.

For myself, I'll never object to someone pushing something like that. I assume PRs are for (amongst other things) shared work on an issue. If it's sort of git manners thing that's stopping us from doing that, I'd say let's not worry about that. If it's big you can always post a warning and discuss, but these are not big.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bottleneck lately actually seems to be from a lack of reviews, not a lack of PR submissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bottleneck lately actually seems to be from a lack of reviews, not a lack of PR submissions.

There remain 517 open issues, despite pruning. The issue this PR addresses was open for almost a year, and it's really serious. The oldest issues are going on 6 years old.

lang/LangPrimSource/PyrStringPrim.cpp Outdated Show resolved Hide resolved
});
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if this should really go in this file.. it is not testing Array so much as gc behavior. but we also don't have a convention yet (under the new unit testing guidelines) for naming files that don't test a class. my first thought would be to place it in a file/class Test_GC, underscore to indicate what follows is not a class name. not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it here for now. It's actually a bit of a weird test, as it's really targeting a particular regression. (Could happen to catch other things, but...)

What I think we should really do is make a functionality test for every primitive that does object creation (and maybe just most or all primitives), and then make some specific very targeted tests that check bits of GC functionality, perhaps with test primitives for that purpose. But I think that's more for something like #3384.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it here for now.

ok, could you leave a FIXME/TODO so the technical debt is noted then?

It's actually a bit of a weird test, as it's really targeting a particular regression

we have lots of these tests, i like them a lot actually, they're kind of like battle scars :)

What I think we should really do is make a functionality test for every primitive that does object creation (and maybe just most or all primitives), and then make some specific very targeted tests that check bits of GC functionality, perhaps with test primitives for that purpose.

sgtm, some of these tests should be at the C++ level if possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also if you are going to leave it here i have some code style comments:

  • tighten it up by declaring a and b inside the do loop (unless that change eliminates the failure prior to this change? that seems unlikely though) and removing extra empty lines (in general no need to have 2 empty lines in a row)
  • 1000.do {|i| to conform to our style guide
  • .collect { |lm| lm[0] } to conform to our style guide
  • since there is only 1 semantic thing tested here (that gc "works" in a non-regressing way) there should only be 1 assert. details of any failures can be added in the details: parameter of assertEquals. passing an onFailure: argument will cause test running to end immediately on a failure, which is probably not what you want based on the message. i would accumulate details on any failure during the loop (in string format) and then have one assert outside the loop at the end of the test, with the information you're posting in onFailure instead passed as details:.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • tighten it up by declaring a and b inside the do loop (unless that change eliminates the failure prior to this change? that seems unlikely though) and removing extra empty lines (in general no need to have 2 empty lines in a row)

Sorry, but no way. It is not totally clear to me that this won't affect the test, and given how quirky GC issues can be and how particular this is, if we're doing this at all I'd like to be sure we're testing what we thought we were. Providing you can prove conclusively that that won't make a difference, and the outer declaration really bugs you, go ahead, but for me that's a step too far into time wasting finickiness, sorry.

  • i would accumulate details on any failure during the loop (in string format) and then have one assert outside the loop at the end of the test, with the information you're posting in onFailure instead passed as details:.

I considered that, but the number of fails isn't relevant, so the end on first failure is actually intentional. The number of fails will vary a bit based on the state of the GC, but tells you nothing of use. Any fail means you've got the problem (or at least a problem).

The rest I'll do shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not trying to waste your time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there cannot be 1000 asserts made by this test. it will flood test runner output. please change it so only 1 assert is made. and please use pipes instead of arg on lines 175 and 176 as i noted in the earlier review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not trying to waste your time.

Sorry. Written in haste. I meant it would be a waste of my time, and frankly yours, to go to the trouble of proving that the test was not affected by this, not that your request was intended to waste my time.

please use pipes instead of arg on lines 175 and 176 as i noted in the earlier review.

Ah okay. This is a perfect example of why it would be easier just to make those small changes. You weren't explicit about what you wanted changed, and as I didn't compare it character by character, I noticed the bracket and changed that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github used to have a "suggest diff" feature that i hope they bring out of beta, because it's exactly what you're looking for. but i will not make commits directly to someone else's branch unless they request or OK that specific diff, personal preference.

@muellmusik
Copy link
Contributor Author

Thanks for the further review @brianlheim !

@mossheim
Copy link
Contributor

Thanks for your responsiveness and attentiveness to this PR @muellmusik . I had put off reviewing the test code until we decided where to put it but beyond that I think this PR is ready after those changes are made!

@muellmusik
Copy link
Contributor Author

I think this is all done now?

});
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there cannot be 1000 asserts made by this test. it will flood test runner output. please change it so only 1 assert is made. and please use pipes instead of arg on lines 175 and 176 as i noted in the earlier review.

Copy link
Contributor

@mossheim mossheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@muellmusik
Copy link
Contributor Author

Since we discussed it on Slack @brianlheim, would it make sense to force push a branch with the commits collapsed so as to avoid muddying blame with a bunch of minor cosmetic changes?

@mossheim
Copy link
Contributor

if you want; that's what i do these days but it's by no means necessary. btw if you use git commit --fixup or --squash you get a lot of that 'for free' with rebase

@muellmusik
Copy link
Contributor Author

Ah, thanks @brianlheim. I use squash, but didn't know about fixup and autosquash. Does github cope well with this sort of thing?

- fixes #3454 and I think #4179
- the problem was that setting the result on the stack allowed the receiver to be collected if it wasn't otherwise reachable. Solution is to push on the stack (allocating with runGC = false could also work)

Signed-off-by: Scott Wilson <i@scottwilson.ca>
…ceiver if it is still needed when storing on the stack

Signed-off-by: Scott Wilson <i@scottwilson.ca>
- avoid the possibility of the receiver being collected while we're still using it

Signed-off-by: Scott Wilson <i@scottwilson.ca>
…on the stack is okay

Signed-off-by: Scott Wilson <i@scottwilson.ca>
…stack is okay

Signed-off-by: Scott Wilson <i@scottwilson.ca>
… on the stack is okay

Signed-off-by: Scott Wilson <i@scottwilson.ca>
…er on the stack is okay

Signed-off-by: Scott Wilson <i@scottwilson.ca>
…er on the stack is okay

Signed-off-by: Scott Wilson <i@scottwilson.ca>
… receiver on the stack is okay

Signed-off-by: Scott Wilson <i@scottwilson.ca>
Signed-off-by: Scott Wilson <i@scottwilson.ca>
Signed-off-by: Scott Wilson <i@scottwilson.ca>
Signed-off-by: Scott Wilson <i@scottwilson.ca>
@muellmusik muellmusik force-pushed the topic/GC-fixes-post-3.10 branch from a29d9be to 8709c0b Compare December 17, 2018 18:07
@mossheim mossheim merged commit c6e6531 into develop Dec 18, 2018
@mossheim mossheim deleted the topic/GC-fixes-post-3.10 branch December 18, 2018 00:29
mtmccrea added a commit to ambisonictoolkit/atk-sc3 that referenced this pull request Jul 30, 2019
The workaround is no longer necessary as of supercollider/supercollider#4192
@mtmccrea
Copy link
Member

mtmccrea commented Jul 30, 2019

@brianlheim -
sorry if I'm missing something obvious, but did this make it into the 3.10 branch?
I see this hasn't made it to 3.10 branch, has this been overlooked or is there a reason it hasn't been cherry picked?

@nhthn
Copy link
Contributor

nhthn commented Jul 30, 2019

@mtmccrea it was deemed too risky to go into a 3.10.x patch release. i vaguely recall at least one bug having been discovered in it after it was merged, but i cant find it anymore

@mtmccrea
Copy link
Member

ok, thanks for the clarification @snappizz !

@nhthn
Copy link
Contributor

nhthn commented Jul 30, 2019

no problem. i apologize if the develop/3.10 decisions may seem a bit arbitrary -- i'll try to be better at communicating the exact reasons for those decisions when it isn't so obvious.

@mtmccrea
Copy link
Member

Thanks, that's helpful. Is it safe to assume it's now slated for 3.11?

i'll try to be better at communicating the exact reasons for those decisions when it isn't so obvious.

The new Release boards you've created might be a good place for this, if not just in the PR conversation itself.

I'll do some digging to see if I can find what the outstanding issue is with this...

@mtmccrea
Copy link
Member

Good memory :)
A bug cropped up here: #4239, but the fix is in: #4241, which has a 3.11 milestone, so I suppose this one should too.
Hope I'm not being overly persistent with this, but the outstanding bug has bitten me numerous times since I first spotted it in Jan '18 ;) But good that it's being fleshed out in the meantime!

@mtmccrea
Copy link
Member

Hi @brianlheim, can you confirm whether this will get a 3.11 milestone? There is some other work referencing this bug that would be good to have a clear idea of the timeline. Thanks!

joslloand added a commit to joslloand/MathLib that referenced this pull request Dec 23, 2019
Refactor to avoid .flop.flop, which if called on Array, will expose it to a bug that could fail on large arrays. Also, this is more efficient.

See: supercollider/supercollider#4192

Authors: @mtmccrea, with suggestions by @telephon
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library"
Projects
None yet
4 participants