Topic/gc fixes post 3.10 #4192

muellmusik · 2018-12-01T16:45:05Z

Purpose and Motivation

This PR primarily addresses a class of GC issues previously not widely noted. The pattern is (within one primitive):

get the receiver from the stack
create a new result object
set the result object on the stack
create new objects while continuing to access the receiver

As the receiver is not on the stack anymore, it may be collected when new objects are created in 4., before we are finished using it, if it is not otherwise reachable.

This fixes #3454, fixes #4179.

In addition:

I did a code review of and fixed two other primitives that could have been subject to this error
I updated the Writing Primitives help file to explain this issue and how to avoid it
I added comments in appropriate places in other primitives explaining why this was not a problem in those cases, in the hopes that this will avoid people reusing that code without understanding (this has been a problem in the past)
I fixed one other GC related issue.

Types of changes

Documentation (non-code change which corrects or adds documentation for existing features)
Bug fix (non-breaking change which fixes an issue)

Checklist

All previous tests are passing
Tests were updated or created to address changes in this PR, and tests are passing
Updated documentation, if necessary
This PR is ready for review

mtmccrea · 2018-12-01T22:19:13Z

This is great, thank you @muellmusik!! It's been an outstanding issue for a while and will really help some underlying issues with the ATK (flopping large matrices).

You mentioned that this might fix #4179, did you have a chance to test the example posted there (case 1)?

muellmusik · 2018-12-02T13:39:23Z

You mentioned that this might fix #4179, did you have a chance to test the example posted there (case 1)?

Yes, I get "Made it! Lucky you."

But you should test as well.

mossheim

Thanks @muellmusik ! The logic on this makes sense to me for the most part, and it fixes both the #3454 and #4179 examples for me.

lang/LangPrimSource/PyrListPrim.cpp

lang/LangPrimSource/PyrStringPrim.cpp

lang/LangPrimSource/SC_CoreAudioPrim.cpp

lang/LangPrimSource/PyrStringPrim.cpp

lang/LangPrimSource/PyrListPrim.cpp

HelpSource/Guides/WritingPrimitives.schelp

lang/LangPrimSource/PyrStringPrim.cpp

mossheim · 2018-12-14T00:26:31Z

What do you think of adding a sclang-side test for this? Both of the examples from the two issues linked seem decent candidates, they don't take too long to run either.

muellmusik · 2018-12-14T14:28:48Z

What do you think of adding a sclang-side test for this? Both of the examples from the two issues linked seem decent candidates, they don't take too long to run either.

Good idea. I'll add that and do the changes above.

Longer term (i.e in #3384) I think a better approach would be to just ensure all created objects are reachable at least until the primitive returns. We've already got NewPyrObjectPtr there for new objects which catches cases where new objects are not consumed, but we could use a similar smart pointer to handle objects in primitives accessed from the stack or objects' slots. I think (well hope) the performance hit from this could be kept very small, and other than that all it would mean would be that in a tiny number of cases objects would be GC'd later than otherwise. In exchange it would remove a whole class of nasty errors and pitfalls, and allow a more familiar and intuitive style (objects remain valid least until they go out of scope) for primitive writing.

muellmusik · 2018-12-14T15:17:43Z

Changes done.

mossheim · 2018-12-15T05:42:53Z

HelpSource/Guides/WritingPrimitives.schelp

++g->sp; SetObject(g->sp, result); // push the result array on the stack, so both it and rec are reachable
+... // further allocations which make use of the receiver to populate result
+--g->sp; // pop the stack back to the receiver slot since we stored result there above
+SetObject(rec, result); // now set the receiver


since this supposed to be "model code":

make the spacing of * in types consistent -- generally codebase seems to prefer 1 space to the left, 0 to the right, i.e. how result is declared

avoid unnecessary abbreviation - receiver instead of rec

place SetObject(g->sp, result); on a new line

Done. Happy btw when you or anyone spots cosmetic things like this for you to just fix them in the online editor in cases where that would be faster than the discussion/agreement/commit/push cycle. There're enough big things to spend time on. :-)

a couple reasons why i don't do that:

to be respectful i should make sure it's ok to push to your remote branch first in case you have changes locally. that takes nearly as much time as waiting for you to make the commit yourself.

some people (myself included) have an amend/rebase-oriented workflow, and adding commits via the web UI would complicate that.

i believe the incentive should be for contributors to review their own code for style. it saves the most time overall if a reviewer doesn't have to make a new commit or comment in the first place. i realize not everyone may be as attentive or interested in doing that, but it's also no fun if things fall into a pattern where some maintainers are on 'cleanup duty'.

while it has sometimes resulted in unproductive churn, i think these conversations are actually on the whole quite good at helping to develop and communicate informal community standards and expectations. at least for me, a lot of interesting ideas have come out of that kind of interaction.

I can see all that. But I also worry about the balance between strict and consistent style and the ways in which this can be off-putting to casual contributors, especially if they're more relaxed about such things. As we really do want more people to be resolving issues I think we have to bear that in mind. Sometimes just doing things if they're trivial cosmetic fixes is a way of showing good faith on that and keeping people on board.

For myself, I'll never object to someone pushing something like that. I assume PRs are for (amongst other things) shared work on an issue. If it's sort of git manners thing that's stopping us from doing that, I'd say let's not worry about that. If it's big you can always post a warning and discuss, but these are not big.

the bottleneck lately actually seems to be from a lack of reviews, not a lack of PR submissions.

the bottleneck lately actually seems to be from a lack of reviews, not a lack of PR submissions.

There remain 517 open issues, despite pruning. The issue this PR addresses was open for almost a year, and it's really serious. The oldest issues are going on 6 years old.

lang/LangPrimSource/PyrStringPrim.cpp

mossheim · 2018-12-15T05:51:47Z

testsuite/classlibrary/TestArray.sc

+			});
+		}
+	}
+


i wonder if this should really go in this file.. it is not testing Array so much as gc behavior. but we also don't have a convention yet (under the new unit testing guidelines) for naming files that don't test a class. my first thought would be to place it in a file/class Test_GC, underscore to indicate what follows is not a class name. not sure.

Let's leave it here for now. It's actually a bit of a weird test, as it's really targeting a particular regression. (Could happen to catch other things, but...)

What I think we should really do is make a functionality test for every primitive that does object creation (and maybe just most or all primitives), and then make some specific very targeted tests that check bits of GC functionality, perhaps with test primitives for that purpose. But I think that's more for something like #3384.

Let's leave it here for now.

ok, could you leave a FIXME/TODO so the technical debt is noted then?

It's actually a bit of a weird test, as it's really targeting a particular regression

we have lots of these tests, i like them a lot actually, they're kind of like battle scars :)

What I think we should really do is make a functionality test for every primitive that does object creation (and maybe just most or all primitives), and then make some specific very targeted tests that check bits of GC functionality, perhaps with test primitives for that purpose.

sgtm, some of these tests should be at the C++ level if possible

also if you are going to leave it here i have some code style comments:

tighten it up by declaring a and b inside the do loop (unless that change eliminates the failure prior to this change? that seems unlikely though) and removing extra empty lines (in general no need to have 2 empty lines in a row)

1000.do {|i| to conform to our style guide

.collect { |lm| lm[0] } to conform to our style guide

since there is only 1 semantic thing tested here (that gc "works" in a non-regressing way) there should only be 1 assert. details of any failures can be added in the details: parameter of assertEquals. passing an onFailure: argument will cause test running to end immediately on a failure, which is probably not what you want based on the message. i would accumulate details on any failure during the loop (in string format) and then have one assert outside the loop at the end of the test, with the information you're posting in onFailure instead passed as details:.

tighten it up by declaring a and b inside the do loop (unless that change eliminates the failure prior to this change? that seems unlikely though) and removing extra empty lines (in general no need to have 2 empty lines in a row)

Sorry, but no way. It is not totally clear to me that this won't affect the test, and given how quirky GC issues can be and how particular this is, if we're doing this at all I'd like to be sure we're testing what we thought we were. Providing you can prove conclusively that that won't make a difference, and the outer declaration really bugs you, go ahead, but for me that's a step too far into time wasting finickiness, sorry.

i would accumulate details on any failure during the loop (in string format) and then have one assert outside the loop at the end of the test, with the information you're posting in onFailure instead passed as details:.

I considered that, but the number of fails isn't relevant, so the end on first failure is actually intentional. The number of fails will vary a bit based on the state of the GC, but tells you nothing of use. Any fail means you've got the problem (or at least a problem).

The rest I'll do shortly.

i'm not trying to waste your time.

there cannot be 1000 asserts made by this test. it will flood test runner output. please change it so only 1 assert is made. and please use pipes instead of arg on lines 175 and 176 as i noted in the earlier review.

i'm not trying to waste your time.

Sorry. Written in haste. I meant it would be a waste of my time, and frankly yours, to go to the trouble of proving that the test was not affected by this, not that your request was intended to waste my time.

please use pipes instead of arg on lines 175 and 176 as i noted in the earlier review.

Ah okay. This is a perfect example of why it would be easier just to make those small changes. You weren't explicit about what you wanted changed, and as I didn't compare it character by character, I noticed the bracket and changed that.

github used to have a "suggest diff" feature that i hope they bring out of beta, because it's exactly what you're looking for. but i will not make commits directly to someone else's branch unless they request or OK that specific diff, personal preference.

muellmusik · 2018-12-15T11:10:58Z

Thanks for the further review @brianlheim !

mossheim · 2018-12-16T05:13:48Z

Thanks for your responsiveness and attentiveness to this PR @muellmusik . I had put off reviewing the test code until we decided where to put it but beyond that I think this PR is ready after those changes are made!

muellmusik · 2018-12-16T14:00:49Z

I think this is all done now?

mossheim · 2018-12-16T15:09:35Z

testsuite/classlibrary/TestArray.sc

+			});
+		}
+	}
+


there cannot be 1000 asserts made by this test. it will flood test runner output. please change it so only 1 assert is made. and please use pipes instead of arg on lines 175 and 176 as i noted in the earlier review.

mossheim

thanks!

muellmusik · 2018-12-16T20:14:04Z

Since we discussed it on Slack @brianlheim, would it make sense to force push a branch with the commits collapsed so as to avoid muddying blame with a bunch of minor cosmetic changes?

mossheim · 2018-12-16T22:40:25Z

if you want; that's what i do these days but it's by no means necessary. btw if you use git commit --fixup or --squash you get a lot of that 'for free' with rebase

muellmusik · 2018-12-17T10:04:05Z

Ah, thanks @brianlheim. I use squash, but didn't know about fixup and autosquash. Does github cope well with this sort of thing?

- fixes #3454 and I think #4179 - the problem was that setting the result on the stack allowed the receiver to be collected if it wasn't otherwise reachable. Solution is to push on the stack (allocating with runGC = false could also work) Signed-off-by: Scott Wilson <i@scottwilson.ca>

…ceiver if it is still needed when storing on the stack Signed-off-by: Scott Wilson <i@scottwilson.ca>

- avoid the possibility of the receiver being collected while we're still using it Signed-off-by: Scott Wilson <i@scottwilson.ca>

…on the stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

…stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

… on the stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

…er on the stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

… receiver on the stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

Signed-off-by: Scott Wilson <i@scottwilson.ca>

The workaround is no longer necessary as of supercollider/supercollider#4192

mtmccrea · 2019-07-30T02:03:37Z

@brianlheim -
~~sorry if I'm missing something obvious, but did this make it into the 3.10 branch?~~
I see this hasn't made it to 3.10 branch, has this been overlooked or is there a reason it hasn't been cherry picked?

nhthn · 2019-07-30T02:37:58Z

@mtmccrea it was deemed too risky to go into a 3.10.x patch release. i vaguely recall at least one bug having been discovered in it after it was merged, but i cant find it anymore

mtmccrea · 2019-07-30T03:53:48Z

ok, thanks for the clarification @snappizz !

nhthn · 2019-07-30T04:38:36Z

no problem. i apologize if the develop/3.10 decisions may seem a bit arbitrary -- i'll try to be better at communicating the exact reasons for those decisions when it isn't so obvious.

mtmccrea · 2019-07-30T04:54:47Z

Thanks, that's helpful. Is it safe to assume it's now slated for 3.11?

i'll try to be better at communicating the exact reasons for those decisions when it isn't so obvious.

The new Release boards you've created might be a good place for this, if not just in the PR conversation itself.

I'll do some digging to see if I can find what the outstanding issue is with this...

mtmccrea · 2019-07-30T05:15:07Z

Good memory :)
A bug cropped up here: #4239, but the fix is in: #4241, which has a 3.11 milestone, so I suppose this one should too.
Hope I'm not being overly persistent with this, but the outstanding bug has bitten me numerous times since I first spotted it in Jan '18 ;) But good that it's being fleshed out in the meantime!

mtmccrea · 2019-12-23T13:30:18Z

Hi @brianlheim, can you confirm whether this will get a 3.11 milestone? There is some other work referencing this bug that would be good to have a clear idea of the timeline. Thanks!

@mtmccrea

Refactor to avoid .flop.flop, which if called on Array, will expose it to a bug that could fail on large arrays. Also, this is more efficient. See: supercollider/supercollider#4192 Authors: @mtmccrea, with suggestions by @telephon

muellmusik mentioned this pull request Dec 1, 2018

Refactor new object allocation #3384

Closed

mossheim suggested changes Dec 14, 2018

View reviewed changes

mossheim added the comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library" label Dec 15, 2018

mossheim reviewed Dec 15, 2018

View reviewed changes

mossheim suggested changes Dec 16, 2018

View reviewed changes

mossheim approved these changes Dec 16, 2018

View reviewed changes

muellmusik added 12 commits December 17, 2018 18:02

Update Writing Primitives help with info about not overwriting the re…

2b70184

…ceiver if it is still needed when storing on the stack Signed-off-by: Scott Wilson <i@scottwilson.ca>

GC fixes to prString_FindRegexpAt and prString_FindRegexpAt

4a72983

- avoid the possibility of the receiver being collected while we're still using it Signed-off-by: Scott Wilson <i@scottwilson.ca>

Add comment to prString_PathMatch about why overwriting the receiver …

5e1e26e

…on the stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

Add comment to listDevices about why overwriting the receiver on the …

caf84f2

…stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

Add comment to prListMIDIEndpoints about why overwriting the receiver…

8a226f3

… on the stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

Add comment to prHID_API_BuildDeviceList about why overwriting receiv…

2184a68

…er on the stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

Add comment to SC_TerminalClient::prArgv about why overwriting receiv…

2fe57b9

…er on the stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

Add comment to prLanguageConfig_getLibraryPaths about why overwriting…

b988855

… receiver on the stack is okay Signed-off-by: Scott Wilson <i@scottwilson.ca>

Fix comment in prString_PathMatch to match elsewhere in code base

5a746db

Signed-off-by: Scott Wilson <i@scottwilson.ca>

Fix comment in Core Audio listDevices to matche elsewhere in code base

049fdd5

Signed-off-by: Scott Wilson <i@scottwilson.ca>

Add Unit test for Array:flop GC issue

8709c0b

Signed-off-by: Scott Wilson <i@scottwilson.ca>

muellmusik force-pushed the topic/GC-fixes-post-3.10 branch from a29d9be to 8709c0b Compare December 17, 2018 18:07

mossheim merged commit c6e6531 into develop Dec 18, 2018

mossheim deleted the topic/GC-fixes-post-3.10 branch December 18, 2018 00:29

mossheim mentioned this pull request Jan 9, 2019

[develop] String:findRegexp handles empty results incorrectly #4239

Closed

joshpar mentioned this pull request Jul 16, 2019

Fix build with boost-1.70 #4491

Closed

mtmccrea added a commit to ambisonictoolkit/atk-sc3 that referenced this pull request Jul 30, 2019

[HoaDegree:-lm] Restore array flop

f47e324

The workaround is no longer necessary as of supercollider/supercollider#4192

mtmccrea mentioned this pull request Dec 23, 2019

Refactor: Matrix: *with supercollider-quarks/MathLib#25

Merged

Topic/gc fixes post 3.10 #4192

Topic/gc fixes post 3.10 #4192

Conversation

muellmusik commented Dec 1, 2018 • edited by mossheim Loading

Purpose and Motivation

Types of changes

Checklist

mtmccrea commented Dec 1, 2018

muellmusik commented Dec 2, 2018

mossheim left a comment

Choose a reason for hiding this comment

mossheim commented Dec 14, 2018

muellmusik commented Dec 14, 2018

muellmusik commented Dec 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muellmusik commented Dec 15, 2018

mossheim commented Dec 16, 2018

muellmusik commented Dec 16, 2018

Choose a reason for hiding this comment

mossheim left a comment

Choose a reason for hiding this comment

muellmusik commented Dec 16, 2018

mossheim commented Dec 16, 2018

muellmusik commented Dec 17, 2018

mtmccrea commented Jul 30, 2019 • edited Loading

nhthn commented Jul 30, 2019

mtmccrea commented Jul 30, 2019

nhthn commented Jul 30, 2019

mtmccrea commented Jul 30, 2019

mtmccrea commented Jul 30, 2019

mtmccrea commented Dec 23, 2019

muellmusik commented Dec 1, 2018 •

edited by mossheim

Loading

mtmccrea commented Jul 30, 2019 •

edited

Loading