Add internal use only virtual machine blocks #2147
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves
Performance switching between blocks in Sequencer.stepThread and execute.
Proposed Changes
Depends on #2145.
Related to #2148.
Reason for Changes
This is where this whole change set is heading so let's start here. Back in June/July we made a change to reporter execution by creating a set of all reporters and their final command block (optional for stack clicks) and executing that set in a for loop instead of the recursive walk through the reporters. This change is of the same idea, applied to command blocks.
With this a command block and each following block build a set of
_ops
we can use for promise thawing and_allOps
inlining all of the_ops
of the block and each next block. When the conditions are met this set of block operations can be performed in onefor
loop. If the conditions fail we safely fallback thedo-while
loop containingfor
the loop. And if we still fail there we can fall back to Sequencer.stepThread and ultimately Sequencer.stepThreads. In most cases we should be able to stay in execute and quickly execute a thread.To make this possible some of the virtual machine behaviour need to be blocks. We can determine the block functions to perform ahead of time and execute them without have to passively test when we do that virtual machine work.
Effectively at this point execute uses this and the block operation sets to compile the current block state for a target on demand (or just-in-time). And with the compiled sets in the blocks cache, they will continue to be disposed of when a block is changed on the target.
To support fall back from the inner execution loops, we make use of Thread.status and a new value Thread.STATUS_INTERRUPT. We already need to test status in case it becomes YIELD, YIELD_TICK, PROMISE_WAIT, or DONE. Using that fact we can use INTERRUPT in the internal use only blocks to signal the inner loop to break and return to outer code paths.
Using INTERRUPT, internal blocks can indicate to execute to reconsider the block operation set to execute if say for example another block a new branch. The branch blocks are not in the current execution set so execute should break out of the inner loop and retrieve the new block operation set to execute.
One more new member and value works with Thread.status to control the execute loops. Thread.continuous indicates if a thread should continue to execute after a command block or operate one block at a time. The default is
false
for one at a time. Sequencer will set it temporarily totrue
to enable the fast execute loop.A bound function has to be two function calls. The produced bound function calls the original function. A non-bound function called with Function.call is one function call. The difference of one function call can improve performance in the inner execution loop since many reporters (operator_*) are fairly cheap.
We can move the branching for popping stacks from Sequencer and execute into virtual machine blocks. We can determine the blocks to be used ahead of time when a stack frame is pushed.
This lets us reduce the passive branch tests related to stack popping. I think popping with virtual machine blocks will follow easily as it mirrors the blocks that push the stack frames.
Profiling itself takes time away from executing more blocks. For internal behaviour I think skipping these blocks will help us make normal blocks perform better.
BROADCAST_INPUT block inputs have a different behaviour when they store their value. This is currently supported by testing every block if its output will be stored as BROADCAST_INPUT. Making this the needed behaviour (casting to a string) a block behaviour we can determine to use that ahead of time after the block that produces its value. As such we'll remove a branching point from our inner loop.
This also lets us remove BROADCAST_INPUT special case logic when freezing and thawing thread execution for promises.
Like BROADCAST_INPUT, we currently test every reporter and command block if it is the lastOperation which may need to report to one of these behaviours. Then for command blocks (who for lastOperation is always true) or some reporters (who for lastOperation may be true if they are returning a stack click or monitor value) test the possible ways they need to report.
We can determine when these behaviours are needed ahead of time and turn the passive branching into planned actions. Hats can be determine when the operation is cached. Stack click reports and monitors can be determined when the thread is created and be set as the end block for the thread.
Though stack click is a bit weird in this. This version tries to report the topBlock when the last block executed in the therad was the topBlock or not. This likely isn't an issue as the case where we report is stack clicking a reporter. That'll still function here. The weird part is stack clicking on command blocks or a series of command blocks. We'll currently try to report the topBlock's value (which could be wrong if there was any non-running status) but this probably won't be an issue either as command blocks only return promises that resolve to undefined or undefined directly. And hat's which do return values are explicitly excluded in this change set.
Most of the thread from-a-promise-reentry logic is moved into a block. Most of this is the existing thread thawing behaviour with an added twist that temporarily modifies the operations set and sets continuous to false.
If the promise was for a command block we'll have the same thread popping behaviour. Just later when the thread would normally be executed. This is thanks to the vm_end_of_ virtual blocks. Those let us represent the popping behaviour in one place and promise reentry doesn't need to know it.
This is a small change. Its the same block we already have but in handlePromise itself.
The inner execute loop at this point is very small. Which lets us execute blocks faster.
The remaining bit is profiling ... I want to move that, but the best case would be to split the execute function and call the right one from a newly exported execute function, or call the right inner function from sequencer.