Update performance.md (#94)

Add documentation on shorter atomic pauses Co-authored-by: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com>
2021-11-01 12:08:01 -07:00 · 2021-11-01 12:08:01 -07:00 · 33cb9d5991
parent e562596bb8
commit 33cb9d5991
1 changed files with 8 additions and 0 deletions
--- a/docs/_pages/performance.md
+++ b/docs/_pages/performance.md
@ -153,3 +153,11 @@ For example, functions like `insert`, `remove` and `move` from the `table` libra
 Luau uses an incremental garbage collector which does a little bit of work every so often, and at no point does it stop the world to traverse the entire heap. The runtime will make sure that the collector runs interspersed with the program execution as the program allocates additional memory, which is known as "garbage collection assists", and can also run in response to explicit garbage collection invocation via `lua_gc`. In interactive environments such as video game engines it's possible, and even desirable, to request garbage collection every frame to make sure assists are minimized, since that allows scheduling the garbage collection to run concurrently with other engine processing that doesn't involve script execution.

 Inspired by excellent work by Austin Clements on Go's garbage collector pacer, we've implemented a pacing algorithm that uses a proportional–integral–derivative controller to estimate internal garbage collector tunables to reach a target heap size, defined as a percentage of the live heap data (which is more intuitive and actionable than Lua 5.x "GC pause" setting). Luau runtime also estimates the allocation rate making it easy (given uniform allocation rates) to adjust the per-frame garbage collection requests to do most of the required GC work outside of script execution.
+
+## Reduced garbage collector pauses
+
+While Luau uses an incremental garbage collector, once per each collector cycle it runs a so-called "atomic" step. While all other GC steps can do very little work by only looking at a few objects at a given time, which means that the collector can have arbitrarily short pauses, the "atomic" step needs to traverse some amount of data that, in some cases, may scale with the application heap. Since atomic step is indivisible, it can result in occasional pauses on the order of tens of milliseconds, which is problematic for interactive applications. We've implemented a series of optimizations to help reduce the atomic step.
+
+Normally objects that have been modified after the GC marked them in an incremental mark phase need to be rescanned during atomic phase, so frequent modifications of existing tables may result in a slow atomic step. To address this, we run a "remark" step where we traverse objects that have been modified after being marked once more (incrementally); additionally, the write barrier that triggers for object modifications changes the transition logic during remark phase to reduce the probability that the object will need to be rescanned.
+
+Another source of scalability challenges is coroutines. Writes to coroutine stacks don't use a write barrier, since that's prohibitively expensive as they are too frequent. This means that coroutine stacks need to be traversed during atomic step, so applications with many coroutines suffer large atomic pauses. To address this, we implement incremental marking of coroutines: marking a coroutine makes it "inactive" and resuming a coroutine (or pushing extra objects on the coroutine stack via C API) makes it "active". Atomic step only needs to traverse active coroutines again, which reduces the cost of atomic step by effectively making coroutine collection incremental as well.