From 7c76a5a70a3ad05cbc5000676e250d3df52690f0 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Thu, 4 Nov 2021 17:27:56 -0700 Subject: [PATCH] Create profile.md Add profiler documentation. --- docs/_pages/profile.md | 61 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 docs/_pages/profile.md diff --git a/docs/_pages/profile.md b/docs/_pages/profile.md new file mode 100644 index 0000000..76f3605 --- /dev/null +++ b/docs/_pages/profile.md @@ -0,0 +1,61 @@ +--- +permalink: /profile +title: Profiling +toc: true +--- + +One of main goals of Luau is to enable high performance code. To help with that goal, we are relentlessly optimizing the compiler and runtime - but ultimately, performance of their +code is in developers' hands, and is a combination of good algorithm design and implementation that adheres to the strengths of the language. To help write efficient code, Luau +provides a built-in profiler that samples the execution of the program and outputs a profiler dump that can be converted to an interactive flamegraph. + +To run the profiler, make sure you have an optimized build of the intepreter (otherwise profiling results are going to be very skewed) and run it with `--profile` argument: + +``` +$ luau --profile tests/chess.lua +OK 8902 rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 +OK 2039 r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 0 +OK 2812 8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - - 0 0 +OK 9467 r3k2r/Pppp1ppp/1b3nbN/nP6/BBP1P3/q4N2/Pp1P2PP/R2Q1RK1 w kq - 0 1 +OK 1486 rnbq1k1r/pp1Pbppp/2p5/8/2B5/8/PPP1NnPP/RNBQK2R w KQ - 1 8 +OK 2079 r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10 +Profiler dump written to profile.out (total runtime 2.034 seconds, 20344 samples, 374 stacks) +GC: 0.378 seconds (18.58%), mark 46.80%, remark 3.33%, atomic 1.93%, sweepstring 6.77%, sweep 41.16% +``` + +The resulting `profile.out` file can be converted to an SVG file by running `perfgraph.py` script that is part of Luau repository: + +``` +$ python tools/perfgraph.py profile.out >profile.svg +``` + +This produces an SVG file that can be opened in a browser (the image below is clickable): + +[![profile.svg](/assets/images/chess-profile.svg)](/assets/images/chess-profile.svg) + +In a flame graph visualization, the individual bars represent function calls, the width represents how much of the total program runtime they execute, and the deepness of the +red color represents how expensive they are based on their speed and how many times they are called: Dark Red bars are often the best targets for optimization, followed by +Bright Red, Dark Orange, Light Orange, Dark Yellow, Light Yellow. This is a fantastic visualization technique that allows you to hone in on the specific bottlenecks affecting +your program performance, optimize those exact bottlenecks, and then re-generate the profile data and visualizer, and look for the next set of true bottlenecks (if any). + +Hovering your mouse cursor over individual sections will display detailed function information in the status bar and in a tooltip. If you want to Search for a specific named +function, use the Search field in the upper right, or press Ctrl+F. + +Notice that some of the bars in the screenshot don't have any text. In some cases, there isn't enough room in the size of the bar to display the name. +You can hover your mouse over those bars to see the name and source location of the function in the tool tip, or double-click to zoom in on that part of the flame graph. + +Some tooltips will have a source location for the function you're hovering over, but no name. Those are anonymous functions, or functions that were not declared in a way that +allows Luau compiler to track the name. To fill in more names, you may want to make these changes to your code: + +`local myFunc = function() --[[ work ]] end` -> `local function myFunc() --[[ work ]] end` + +Even without these changes, you can hover over a given bar with no visible name and see it's source location. + +As any sampling profiler, this profiler relies on gathering enough information for the resulting output to be statistically meaningful. It may miss short functions if they +aren't called often enough. By default the profiler runs at 10 kHz, this can be customized by passing a different parameter to `--profile=`. Note that higher +frequencies result in higher profiling overhead and longer program execution, potentially skewing the results. + +This profiler doesn't track leaf C functions and instead attributes the time spent there to calling Luau functions. As a result, when thinking about why a given function is +slow, consider not just the work it does immediately but also the library functions it calls. + +This profiler tracks time consumed by Luau thread stacks; when a thread calls another thread via `coroutine.resume`, the time spent is not attributed to the parent thread that's +waiting for resume results. This limitation will be removed in the future in the future.