Merge branch 'master' into merge

This commit is contained in:
Arseny Kapoulkine 2022-07-28 20:45:57 -07:00
commit efcaf05010
4 changed files with 157 additions and 5 deletions

View File

@ -3640,6 +3640,9 @@ void TypeChecker::checkFunctionBody(const ScopePtr& scope, TypeId ty, const AstE
reportError(getEndLocation(function), FunctionExitsWithoutReturning{funTy->retTypes});
}
}
if (!currentModule->astTypes.find(&function))
currentModule->astTypes[&function] = ty;
}
else
ice("Checking non functional type");

View File

@ -27,9 +27,9 @@ Of course the interpreter isn't typical C code - it uses many tricks to achieve
Unlike Lua and LuaJIT, Luau uses a multi-pass compiler with a frontend that parses source into an AST and a backend that generates bytecode from it. This carries a small penalty in terms of compilation time, but results in more flexible code and, crucially, makes it easier to optimize the generated bytecode.
> Note: Compilation throughput isn't the main focus in Luau, but our compiler is reasonably fast; with all currently implemented optimizations enabled, it compiles 400K lines of Luau code in 0.5 seconds on a single core of a desktop Core i7 CPU, producing bytecode and debug information.
> Note: Compilation throughput isn't the main focus in Luau, but our compiler is reasonably fast; with all currently implemented optimizations enabled, it compiles 950K lines of Luau code in 1 second on a single core of a desktop Ryzen 5900X CPU, producing bytecode and debug information.
While bytecode optimizations are limited due to the flexibility of Luau code (e.g. `a * 1` may not be equivalent to `a` if `*` is overloaded through metatables), even in absence of type information Luau compiler can perform some optimizations such as "deep" constant folding across functions and local variables, perform upvalue optimizations for upvalues that aren't mutated, do analysis of builtin function usage, and some peephole optimizations on the resulting bytecode. In the future we plan to do bytecode-level inlining and possibly other code transformation.
While bytecode optimizations are limited due to the flexibility of Luau code (e.g. `a * 1` may not be equivalent to `a` if `*` is overloaded through metatables), even in absence of type information Luau compiler can perform some optimizations such as "deep" constant folding across functions and local variables, perform upvalue optimizations for upvalues that aren't mutated, do analysis of builtin function usage, and some peephole optimizations on the resulting bytecode. The compiler can also be instructed to use more aggressive optimizations by enabling optimization level 2 (`-O2` in CLI tools), some of which are documented further on this page.
Luau compiler currently doesn't use type information to do further optimizations, however early experiments suggest that we can extract further wins. Because we control the entire stack (unlike e.g. TypeScript where the type information is discarded completely before reaching the VM), we have more flexibility there and can make some tradeoffs during codegen even if the type system isn't completely sound. For example, it might be reasonable to assume that in presence of known types, we can infer absence of side effects for arithmetic operations and builtins - if the runtime types mismatch due to intentional violation of the type safety through global injection, the code will still be safely sandboxed; this may unlock optimizations such as common subexpression elimination and allocation hoisting without a JIT. This is speculative pending further research.
@ -90,6 +90,8 @@ As a result, builtin calls are very fast in Luau - they are still slightly slowe
> Note: The partial specialization mechanism is cute in that for `assert`, it only specializes on truthy conditions; hopefully performance of `assert(false)` isn't crucial for most code!
In addition to runtime optimizations for builtin calls, many builtin calls can also be constant-folded by the bytecode compiler when using aggressive optimizations (level 2); this currently applies to most builtin calls with constant arguments and a single return value.
## Optimized table iteration
Luau implements a fully generic iteration protocol; however, for iteration through tables in addition to generalized iteration (`for .. in t`) it recognizes three common idioms (`for .. in ipairs(t)`, `for .. in pairs(t)` and `for .. in next, t`) and emits specialized bytecode that is carefully optimized using custom internal iterators.

View File

@ -0,0 +1,129 @@
# Disallow `name T` and `name(T)` in future syntactic extensions for type annotations
## Summary
We propose to disallow the syntax ``<name> `('`` as well as `<name> <type>` in future syntax extensions for type annotations to ensure that all existing programs continue to parse correctly. This still keeps the door open for future syntax extensions of different forms such as ``<name> `<' <type> `>'``.
## Motivation
Lua and by extension Luau's syntax is very free form, which means that when the parser finishes parsing a node, it doesn't try to look for a semi-colon or any termination token e.g. a `{` to start a block, or `;` to end a statement, or a newline, etc. It just immediately invokes the next parser to figure out how to parse the next node based on the remainder's starting token.
That feature is sometimes quite troublesome when we want to add new syntax.
We have had cases where we talked about using syntax like `setmetatable(T, MT)` and `keyof T`. They all look innocent, but when you look beyond that, and try to apply it onto Luau's grammar, things break down really fast.
### `F(T)`?
An example that _will_ cause a change in semantics:
```
local t: F
(u):m()
```
where today, `local t: F` is one statement, and `(u):m()` is another. If we had the syntax for `F(T)` here, it becomes invalid input because it gets parsed as
```
local t: F(u)
:m()
```
This is important because of the `setmetatable(T, MT)` case:
```
type Foo = setmetatable({ x: number }, { ... })
```
For `setmetatable`, the parser isn't sure whether `{}` is actually a type or an expression, because _today_ `setmetatable` is parsed as a type reference, and `({}, {})` is the remainder that we'll attempt to parse as a statement. This means `{ x: number }` is invalid table _literal_. Recovery by backtracking is technically possible here, but this means performance loss on invalid input + may introduce false positives wrt how things are parsed. We'd much rather take a very strict stance about how things get parsed.
### `F T`?
An example that _will_ cause a change in semantics:
```
local function f(t): F T
(t or u):m()
end
```
where today, the return type annotation `F T` is simply parsed as just `F`, followed by a ambiguous parse error from the statement `T(t or u)` because its `(` is on the next line. If at some point in the future we were to allow `T` followed by `(` on the next line, then there's yet another semantic change. `F T` could be parsed as a type annotation and the first statement is `(t or u):m()` instead of `F` followed by `T(t or u):m()`.
For `keyof`, here's a practical example of the above issue:
```
type Vec2 = {x: number, y: number}
local function f(t, u): keyof Vec2
(t or u):m()
end
```
There's three possible outcomes:
1. Return type of `f` is `keyof`, statement throws a parse error because `(` is on the next line after `Vec2`,
2. Return type of `f` is `keyof Vec2` and next statement is `(t or u):m()`, or
3. Return type of `f` is `keyof` and next statement is `Vec2(t or u):m()` (if we allow `(` on the next line to be part of previous line).
This particular case is even worse when we keep going:
```
local function f(t): F
T(t or u):m()
end
```
```
local function f(t): F T
{1, 2, 3}
end
```
where today, `F` is the return type annotation of `f`, and `T(t or u):m()`/`T{1, 2, 3}` is the first statement, respectively.
Adding some syntax for `F T` **will** cause the parser to change the semantics of the above three examples.
### But what about `typeof(...)`?
This syntax is grandfathered in because the parser supported `typeof(...)` before we stabilized our syntax, and especially before type annotations were released to the public, so we didn't need to worry about compatibility here. We are very glad that we used parentheses in this case, because it's natural for expressions to belong within parentheses `()`, and types to belong within angles `<>`.
## The One Exception with a caveat
This is a strict requirement!
`function() -> ()` has been talked about in the past, and this one is different despite falling under the same category as ``<name> `('``. The token `function` is in actual fact a "hard keyword," meaning that it cannot be parsed as a type annotation because it is not an identifier, just a keyword.
Likewise, we also have talked about adding standalone `function` as a type annotation (semantics of it is irrelevant for this RFC)
It's possible that we may end up adding both, but the requirements are as such:
1. `function() -> ()` must be added first before standalone `function`, OR
2. `function` can be added first, but with a future-proofing parse error if `<` or `(` follows after it
If #1 is what ends up happening, there's not much to worry about because the type annotation parser will parse greedily already, so any new valid input will remain valid and have same semantics, except it also allows omitting of `(` and `<`.
If #2 is what ends up happening, there could be a problem if we didn't future-proof against `<` and `(` to follow `function`:
```
return f :: function(T) -> U
```
which would be a parse error because at the point of `(` we expect one of `until`, `end`, or `EOF`, and
```
return f :: function<a>(a) -> a
```
which would also be a parse error by the time we reach `->`, that is the production of the above is semantically equivalent to `(f < a) > (a)` which would compare whether the value of `f` is less than the value of `a`, then whether the result of that value is greater than `a`.
## Alternatives
Only allow these syntax when used inside parentheses e.g. `(F T)` or `(F(T))`. This makes it inconsistent with the existing `typeof(...)` type annotation, and changing that over is also breaking change.
Support backtracking in the parser, so if `: MyType(t or u):m()` is invalid syntax, revert and parse `MyType` as a type, and `(t or u):m()` as an expression statement. Even so, this option is terrible for:
1. parsing performance (backtracking means losing progress on invalid input),
2. user experience (why was this annotation parsed as `X(...)` instead of `X` followed by a statement `(...)`),
3. has false positives (`foo(bar)(baz)` may be parsed as `foo(bar)` as the type annotation and `(baz)` is the remainder to parse)
## Drawbacks
To be able to expose some kind of type-level operations using `F<T>` syntax, means one of the following must be chosen:
1. introduce the concept of "magic type functions" into type inference, or
2. introduce them into the prelude as `export type F<T> = ...` (where `...` is to be read as "we haven't decided")

View File

@ -31,9 +31,9 @@ Because we care about backward compatibility, we need some new syntax in order t
1. A string chunk (`` `...{ ``, `}...{`, and `` }...` ``) where `...` is a range of 0 to many characters.
* `\` escapes `` ` ``, `{`, and itself `\`.
* Restriction: the string interpolation literal must have at least one value to interpolate. We do not need 3 ways to express a single line string literal.
* The pairs must be on the same line (unless a `\` escapes the newline) but expressions needn't be on the same line.
2. An expression between the braces. This is the value that will be interpolated into the string.
* Restriction: we explicitly reject `{{` as it is considered an attempt to escape and get a single `{` character at runtime.
3. Formatting specification may follow after the expression, delimited by an unambiguous character.
* Restriction: the formatting specification must be constant at parse time.
* In the absence of an explicit formatting specification, the `%*` token will be used.
@ -61,7 +61,6 @@ local set2 = Set.new({0, 5, 4})
print(`{set1} {set2} = {Set.union(set1, set2)}`)
--> {0, 1, 3} {0, 5, 4} = {0, 1, 3, 4, 5}
-- For illustrative purposes. These are illegal specifically because they don't interpolate anything.
print(`Some example escaping the braces \{like so}`)
print(`backslash \ that escapes the space is not a part of the string...`)
print(`backslash \\ will escape the second backslash...`)
@ -88,13 +87,25 @@ print(`Welcome to \
-- Luau!
```
This expression will not be allowed to come after a `prefixexp`. I believe this is fully additive, so a future RFC may allow this. So for now, we explicitly reject the following:
This expression can also come after a `prefixexp`:
```
local name = "world"
print`Hello {name}`
```
The restriction on `{{` exists solely for the people coming from languages e.g. C#, Rust, or Python which uses `{{` to escape and get the character `{` at runtime. We're also rejecting this at parse time too, since the proper way to escape it is `\{`, so:
```lua
print(`{{1, 2, 3}} = {myCoolSet}`) -- parse error
```
If we did not apply this as a parse error, then the above would wind up printing as the following, which is obviously a gotcha we can and should avoid.
```
--> table: 0xSOMEADDRESS = {1, 2, 3}
```
Since the string interpolation expression is going to be lowered into a `string.format` call, we'll also need to extend `string.format`. The bare minimum to support the lowering is to add a new token whose definition is to perform a `tostring` call. `%*` is currently an invalid token, so this is a backward compatible extension. This RFC shall define `%*` to have the same behavior as if `tostring` was called.
```lua
@ -121,6 +132,13 @@ print(string.format("%* %* %*", return_two_nils()))
--> error: value #3 is missing, got 2
```
It must be said that we are not allowing this style of string literals in type annotations at this time, regardless of zero or many interpolating expressions, so the following two type annotations below are illegal syntax:
```lua
local foo: `foo`
local bar: `bar{baz}`
```
## Drawbacks
If we want to use backticks for other purposes, it may introduce some potential ambiguity. One option to solve that is to only ever produce string interpolation tokens from the context of an expression. This is messy but doable because the parser and the lexer are already implemented to work in tandem. The other option is to pick a different delimiter syntax to keep backticks available for use in the future.