mlua/src/function.rs

172 lines
5.3 KiB
Rust
Raw Normal View History

use std::ptr;
use std::os::raw::c_int;
use ffi;
A lot of performance changes. Okay, so this is kind of a mega-commit of a lot of performance related changes to rlua, some of which are pretty complicated. There are some small improvements here and there, but most of the benefits of this change are from a few big changes. The simplest big change is that there is now `protect_lua` as well as `protect_lua_call`, which allows skipping a lightuserdata parameter and some stack manipulation in some cases. Second simplest is the change to use Vec instead of VecDeque for MultiValue, and to have MultiValue be used as a sort of "backwards-only" Vec so that ToLuaMulti / FromLuaMulti still work correctly. The most complex change, though, is a change to the way LuaRef works, so that LuaRef can optionally point into the Lua stack instead of only registry values. At state creation a set number of stack slots is reserved for the first N LuaRef types (currently 16), and space for these are also allocated separately allocated at callback time. There is a huge breaking change here, which is that now any LuaRef types MUST only be used with the Lua on which they were created, and CANNOT be used with any other Lua callback instance. This mostly will affect people using LuaRef types from inside a scope callback, but hopefully in those cases `Function::bind` will be a suitable replacement. On the plus side, the rules for LuaRef types are easier to state now. There is probably more easy-ish perf on the table here, but here's the preliminary results, based on my very limited benchmarks: create table time: [314.13 ns 315.71 ns 317.44 ns] change: [-36.154% -35.670% -35.205%] (p = 0.00 < 0.05) create array 10 time: [2.9731 us 2.9816 us 2.9901 us] change: [-16.996% -16.600% -16.196%] (p = 0.00 < 0.05) Performance has improved. create string table 10 time: [5.6904 us 5.7164 us 5.7411 us] change: [-53.536% -53.309% -53.079%] (p = 0.00 < 0.05) Performance has improved. call add function 3 10 time: [5.1134 us 5.1222 us 5.1320 us] change: [-4.1095% -3.6910% -3.1781%] (p = 0.00 < 0.05) Performance has improved. call callback add 2 10 time: [5.4408 us 5.4480 us 5.4560 us] change: [-6.4203% -5.7780% -5.0013%] (p = 0.00 < 0.05) Performance has improved. call callback append 10 time: [9.8243 us 9.8410 us 9.8586 us] change: [-26.937% -26.702% -26.469%] (p = 0.00 < 0.05) Performance has improved. create registry 10 time: [3.7005 us 3.7089 us 3.7174 us] change: [-8.4965% -8.1042% -7.6926%] (p = 0.00 < 0.05) Performance has improved. I think that a lot of these benchmarks are too "easy", and most API usage is going to be more like the 'create string table 10' benchmark, where there are a lot of handles and tables and strings, so I think that 25%-50% improvement is a good guess for most use cases.
2018-03-11 23:20:10 -04:00
use error::{Error, Result};
use util::{check_stack, check_stack_err, error_traceback, pop_error, protect_lua_closure,
stack_guard};
use types::LuaRef;
use value::{FromLuaMulti, MultiValue, ToLuaMulti};
/// Handle to an internal Lua function.
#[derive(Clone, Debug)]
pub struct Function<'lua>(pub(crate) LuaRef<'lua>);
impl<'lua> Function<'lua> {
/// Calls the function, passing `args` as function arguments.
///
/// The function's return values are converted to the generic type `R`.
///
/// # Examples
///
/// Call Lua's built-in `tostring` function:
///
/// ```
/// # extern crate rlua;
/// # use rlua::{Lua, Function, Result};
/// # fn try_main() -> Result<()> {
/// let lua = Lua::new();
/// let globals = lua.globals();
///
/// let tostring: Function = globals.get("tostring")?;
///
/// assert_eq!(tostring.call::<_, String>(123)?, "123");
///
/// # Ok(())
/// # }
/// # fn main() {
/// # try_main().unwrap();
/// # }
/// ```
///
/// Call a function with multiple arguments:
///
/// ```
/// # extern crate rlua;
/// # use rlua::{Lua, Function, Result};
/// # fn try_main() -> Result<()> {
/// let lua = Lua::new();
///
/// let sum: Function = lua.eval(r#"
/// function(a, b)
/// return a + b
/// end
/// "#, None)?;
///
/// assert_eq!(sum.call::<_, u32>((3, 4))?, 3 + 4);
///
/// # Ok(())
/// # }
/// # fn main() {
/// # try_main().unwrap();
/// # }
/// ```
pub fn call<A: ToLuaMulti<'lua>, R: FromLuaMulti<'lua>>(&self, args: A) -> Result<R> {
let lua = self.0.lua;
unsafe {
A lot of performance changes. Okay, so this is kind of a mega-commit of a lot of performance related changes to rlua, some of which are pretty complicated. There are some small improvements here and there, but most of the benefits of this change are from a few big changes. The simplest big change is that there is now `protect_lua` as well as `protect_lua_call`, which allows skipping a lightuserdata parameter and some stack manipulation in some cases. Second simplest is the change to use Vec instead of VecDeque for MultiValue, and to have MultiValue be used as a sort of "backwards-only" Vec so that ToLuaMulti / FromLuaMulti still work correctly. The most complex change, though, is a change to the way LuaRef works, so that LuaRef can optionally point into the Lua stack instead of only registry values. At state creation a set number of stack slots is reserved for the first N LuaRef types (currently 16), and space for these are also allocated separately allocated at callback time. There is a huge breaking change here, which is that now any LuaRef types MUST only be used with the Lua on which they were created, and CANNOT be used with any other Lua callback instance. This mostly will affect people using LuaRef types from inside a scope callback, but hopefully in those cases `Function::bind` will be a suitable replacement. On the plus side, the rules for LuaRef types are easier to state now. There is probably more easy-ish perf on the table here, but here's the preliminary results, based on my very limited benchmarks: create table time: [314.13 ns 315.71 ns 317.44 ns] change: [-36.154% -35.670% -35.205%] (p = 0.00 < 0.05) create array 10 time: [2.9731 us 2.9816 us 2.9901 us] change: [-16.996% -16.600% -16.196%] (p = 0.00 < 0.05) Performance has improved. create string table 10 time: [5.6904 us 5.7164 us 5.7411 us] change: [-53.536% -53.309% -53.079%] (p = 0.00 < 0.05) Performance has improved. call add function 3 10 time: [5.1134 us 5.1222 us 5.1320 us] change: [-4.1095% -3.6910% -3.1781%] (p = 0.00 < 0.05) Performance has improved. call callback add 2 10 time: [5.4408 us 5.4480 us 5.4560 us] change: [-6.4203% -5.7780% -5.0013%] (p = 0.00 < 0.05) Performance has improved. call callback append 10 time: [9.8243 us 9.8410 us 9.8586 us] change: [-26.937% -26.702% -26.469%] (p = 0.00 < 0.05) Performance has improved. create registry 10 time: [3.7005 us 3.7089 us 3.7174 us] change: [-8.4965% -8.1042% -7.6926%] (p = 0.00 < 0.05) Performance has improved. I think that a lot of these benchmarks are too "easy", and most API usage is going to be more like the 'create string table 10' benchmark, where there are a lot of handles and tables and strings, so I think that 25%-50% improvement is a good guess for most use cases.
2018-03-11 23:20:10 -04:00
stack_guard(lua.state, || {
let args = args.to_lua_multi(lua)?;
let nargs = args.len() as c_int;
check_stack_err(lua.state, nargs + 3)?;
ffi::lua_pushcfunction(lua.state, error_traceback);
let stack_start = ffi::lua_gettop(lua.state);
A lot of performance changes. Okay, so this is kind of a mega-commit of a lot of performance related changes to rlua, some of which are pretty complicated. There are some small improvements here and there, but most of the benefits of this change are from a few big changes. The simplest big change is that there is now `protect_lua` as well as `protect_lua_call`, which allows skipping a lightuserdata parameter and some stack manipulation in some cases. Second simplest is the change to use Vec instead of VecDeque for MultiValue, and to have MultiValue be used as a sort of "backwards-only" Vec so that ToLuaMulti / FromLuaMulti still work correctly. The most complex change, though, is a change to the way LuaRef works, so that LuaRef can optionally point into the Lua stack instead of only registry values. At state creation a set number of stack slots is reserved for the first N LuaRef types (currently 16), and space for these are also allocated separately allocated at callback time. There is a huge breaking change here, which is that now any LuaRef types MUST only be used with the Lua on which they were created, and CANNOT be used with any other Lua callback instance. This mostly will affect people using LuaRef types from inside a scope callback, but hopefully in those cases `Function::bind` will be a suitable replacement. On the plus side, the rules for LuaRef types are easier to state now. There is probably more easy-ish perf on the table here, but here's the preliminary results, based on my very limited benchmarks: create table time: [314.13 ns 315.71 ns 317.44 ns] change: [-36.154% -35.670% -35.205%] (p = 0.00 < 0.05) create array 10 time: [2.9731 us 2.9816 us 2.9901 us] change: [-16.996% -16.600% -16.196%] (p = 0.00 < 0.05) Performance has improved. create string table 10 time: [5.6904 us 5.7164 us 5.7411 us] change: [-53.536% -53.309% -53.079%] (p = 0.00 < 0.05) Performance has improved. call add function 3 10 time: [5.1134 us 5.1222 us 5.1320 us] change: [-4.1095% -3.6910% -3.1781%] (p = 0.00 < 0.05) Performance has improved. call callback add 2 10 time: [5.4408 us 5.4480 us 5.4560 us] change: [-6.4203% -5.7780% -5.0013%] (p = 0.00 < 0.05) Performance has improved. call callback append 10 time: [9.8243 us 9.8410 us 9.8586 us] change: [-26.937% -26.702% -26.469%] (p = 0.00 < 0.05) Performance has improved. create registry 10 time: [3.7005 us 3.7089 us 3.7174 us] change: [-8.4965% -8.1042% -7.6926%] (p = 0.00 < 0.05) Performance has improved. I think that a lot of these benchmarks are too "easy", and most API usage is going to be more like the 'create string table 10' benchmark, where there are a lot of handles and tables and strings, so I think that 25%-50% improvement is a good guess for most use cases.
2018-03-11 23:20:10 -04:00
lua.push_ref(&self.0);
for arg in args {
A lot of performance changes. Okay, so this is kind of a mega-commit of a lot of performance related changes to rlua, some of which are pretty complicated. There are some small improvements here and there, but most of the benefits of this change are from a few big changes. The simplest big change is that there is now `protect_lua` as well as `protect_lua_call`, which allows skipping a lightuserdata parameter and some stack manipulation in some cases. Second simplest is the change to use Vec instead of VecDeque for MultiValue, and to have MultiValue be used as a sort of "backwards-only" Vec so that ToLuaMulti / FromLuaMulti still work correctly. The most complex change, though, is a change to the way LuaRef works, so that LuaRef can optionally point into the Lua stack instead of only registry values. At state creation a set number of stack slots is reserved for the first N LuaRef types (currently 16), and space for these are also allocated separately allocated at callback time. There is a huge breaking change here, which is that now any LuaRef types MUST only be used with the Lua on which they were created, and CANNOT be used with any other Lua callback instance. This mostly will affect people using LuaRef types from inside a scope callback, but hopefully in those cases `Function::bind` will be a suitable replacement. On the plus side, the rules for LuaRef types are easier to state now. There is probably more easy-ish perf on the table here, but here's the preliminary results, based on my very limited benchmarks: create table time: [314.13 ns 315.71 ns 317.44 ns] change: [-36.154% -35.670% -35.205%] (p = 0.00 < 0.05) create array 10 time: [2.9731 us 2.9816 us 2.9901 us] change: [-16.996% -16.600% -16.196%] (p = 0.00 < 0.05) Performance has improved. create string table 10 time: [5.6904 us 5.7164 us 5.7411 us] change: [-53.536% -53.309% -53.079%] (p = 0.00 < 0.05) Performance has improved. call add function 3 10 time: [5.1134 us 5.1222 us 5.1320 us] change: [-4.1095% -3.6910% -3.1781%] (p = 0.00 < 0.05) Performance has improved. call callback add 2 10 time: [5.4408 us 5.4480 us 5.4560 us] change: [-6.4203% -5.7780% -5.0013%] (p = 0.00 < 0.05) Performance has improved. call callback append 10 time: [9.8243 us 9.8410 us 9.8586 us] change: [-26.937% -26.702% -26.469%] (p = 0.00 < 0.05) Performance has improved. create registry 10 time: [3.7005 us 3.7089 us 3.7174 us] change: [-8.4965% -8.1042% -7.6926%] (p = 0.00 < 0.05) Performance has improved. I think that a lot of these benchmarks are too "easy", and most API usage is going to be more like the 'create string table 10' benchmark, where there are a lot of handles and tables and strings, so I think that 25%-50% improvement is a good guess for most use cases.
2018-03-11 23:20:10 -04:00
lua.push_value(arg);
}
let ret = ffi::lua_pcall(lua.state, nargs, ffi::LUA_MULTRET, stack_start);
if ret != ffi::LUA_OK {
return Err(pop_error(lua.state, ret));
}
let nresults = ffi::lua_gettop(lua.state) - stack_start;
let mut results = MultiValue::new();
check_stack(lua.state, 2);
for _ in 0..nresults {
A lot of performance changes. Okay, so this is kind of a mega-commit of a lot of performance related changes to rlua, some of which are pretty complicated. There are some small improvements here and there, but most of the benefits of this change are from a few big changes. The simplest big change is that there is now `protect_lua` as well as `protect_lua_call`, which allows skipping a lightuserdata parameter and some stack manipulation in some cases. Second simplest is the change to use Vec instead of VecDeque for MultiValue, and to have MultiValue be used as a sort of "backwards-only" Vec so that ToLuaMulti / FromLuaMulti still work correctly. The most complex change, though, is a change to the way LuaRef works, so that LuaRef can optionally point into the Lua stack instead of only registry values. At state creation a set number of stack slots is reserved for the first N LuaRef types (currently 16), and space for these are also allocated separately allocated at callback time. There is a huge breaking change here, which is that now any LuaRef types MUST only be used with the Lua on which they were created, and CANNOT be used with any other Lua callback instance. This mostly will affect people using LuaRef types from inside a scope callback, but hopefully in those cases `Function::bind` will be a suitable replacement. On the plus side, the rules for LuaRef types are easier to state now. There is probably more easy-ish perf on the table here, but here's the preliminary results, based on my very limited benchmarks: create table time: [314.13 ns 315.71 ns 317.44 ns] change: [-36.154% -35.670% -35.205%] (p = 0.00 < 0.05) create array 10 time: [2.9731 us 2.9816 us 2.9901 us] change: [-16.996% -16.600% -16.196%] (p = 0.00 < 0.05) Performance has improved. create string table 10 time: [5.6904 us 5.7164 us 5.7411 us] change: [-53.536% -53.309% -53.079%] (p = 0.00 < 0.05) Performance has improved. call add function 3 10 time: [5.1134 us 5.1222 us 5.1320 us] change: [-4.1095% -3.6910% -3.1781%] (p = 0.00 < 0.05) Performance has improved. call callback add 2 10 time: [5.4408 us 5.4480 us 5.4560 us] change: [-6.4203% -5.7780% -5.0013%] (p = 0.00 < 0.05) Performance has improved. call callback append 10 time: [9.8243 us 9.8410 us 9.8586 us] change: [-26.937% -26.702% -26.469%] (p = 0.00 < 0.05) Performance has improved. create registry 10 time: [3.7005 us 3.7089 us 3.7174 us] change: [-8.4965% -8.1042% -7.6926%] (p = 0.00 < 0.05) Performance has improved. I think that a lot of these benchmarks are too "easy", and most API usage is going to be more like the 'create string table 10' benchmark, where there are a lot of handles and tables and strings, so I think that 25%-50% improvement is a good guess for most use cases.
2018-03-11 23:20:10 -04:00
results.push_front(lua.pop_value());
}
ffi::lua_pop(lua.state, 1);
R::from_lua_multi(results, lua)
})
}
}
/// Returns a function that, when called, calls `self`, passing `args` as the first set of
/// arguments.
///
/// If any arguments are passed to the returned function, they will be passed after `args`.
///
/// # Examples
///
/// ```
/// # extern crate rlua;
/// # use rlua::{Lua, Function, Result};
/// # fn try_main() -> Result<()> {
/// let lua = Lua::new();
///
/// let sum: Function = lua.eval(r#"
/// function(a, b)
/// return a + b
/// end
/// "#, None)?;
///
/// let bound_a = sum.bind(1)?;
/// assert_eq!(bound_a.call::<_, u32>(2)?, 1 + 2);
///
/// let bound_a_and_b = sum.bind(13)?.bind(57)?;
/// assert_eq!(bound_a_and_b.call::<_, u32>(())?, 13 + 57);
///
/// # Ok(())
/// # }
/// # fn main() {
/// # try_main().unwrap();
/// # }
/// ```
pub fn bind<A: ToLuaMulti<'lua>>(&self, args: A) -> Result<Function<'lua>> {
unsafe extern "C" fn bind_call_impl(state: *mut ffi::lua_State) -> c_int {
let nargs = ffi::lua_gettop(state);
let nbinds = ffi::lua_tointeger(state, ffi::lua_upvalueindex(2)) as c_int;
ffi::luaL_checkstack(state, nbinds + 2, ptr::null());
ffi::lua_settop(state, nargs + nbinds + 1);
ffi::lua_rotate(state, -(nargs + nbinds + 1), nbinds + 1);
ffi::lua_pushvalue(state, ffi::lua_upvalueindex(1));
ffi::lua_replace(state, 1);
for i in 0..nbinds {
ffi::lua_pushvalue(state, ffi::lua_upvalueindex(i + 3));
ffi::lua_replace(state, i + 2);
}
ffi::lua_call(state, nargs + nbinds, ffi::LUA_MULTRET);
ffi::lua_gettop(state)
}
let lua = self.0.lua;
unsafe {
A lot of performance changes. Okay, so this is kind of a mega-commit of a lot of performance related changes to rlua, some of which are pretty complicated. There are some small improvements here and there, but most of the benefits of this change are from a few big changes. The simplest big change is that there is now `protect_lua` as well as `protect_lua_call`, which allows skipping a lightuserdata parameter and some stack manipulation in some cases. Second simplest is the change to use Vec instead of VecDeque for MultiValue, and to have MultiValue be used as a sort of "backwards-only" Vec so that ToLuaMulti / FromLuaMulti still work correctly. The most complex change, though, is a change to the way LuaRef works, so that LuaRef can optionally point into the Lua stack instead of only registry values. At state creation a set number of stack slots is reserved for the first N LuaRef types (currently 16), and space for these are also allocated separately allocated at callback time. There is a huge breaking change here, which is that now any LuaRef types MUST only be used with the Lua on which they were created, and CANNOT be used with any other Lua callback instance. This mostly will affect people using LuaRef types from inside a scope callback, but hopefully in those cases `Function::bind` will be a suitable replacement. On the plus side, the rules for LuaRef types are easier to state now. There is probably more easy-ish perf on the table here, but here's the preliminary results, based on my very limited benchmarks: create table time: [314.13 ns 315.71 ns 317.44 ns] change: [-36.154% -35.670% -35.205%] (p = 0.00 < 0.05) create array 10 time: [2.9731 us 2.9816 us 2.9901 us] change: [-16.996% -16.600% -16.196%] (p = 0.00 < 0.05) Performance has improved. create string table 10 time: [5.6904 us 5.7164 us 5.7411 us] change: [-53.536% -53.309% -53.079%] (p = 0.00 < 0.05) Performance has improved. call add function 3 10 time: [5.1134 us 5.1222 us 5.1320 us] change: [-4.1095% -3.6910% -3.1781%] (p = 0.00 < 0.05) Performance has improved. call callback add 2 10 time: [5.4408 us 5.4480 us 5.4560 us] change: [-6.4203% -5.7780% -5.0013%] (p = 0.00 < 0.05) Performance has improved. call callback append 10 time: [9.8243 us 9.8410 us 9.8586 us] change: [-26.937% -26.702% -26.469%] (p = 0.00 < 0.05) Performance has improved. create registry 10 time: [3.7005 us 3.7089 us 3.7174 us] change: [-8.4965% -8.1042% -7.6926%] (p = 0.00 < 0.05) Performance has improved. I think that a lot of these benchmarks are too "easy", and most API usage is going to be more like the 'create string table 10' benchmark, where there are a lot of handles and tables and strings, so I think that 25%-50% improvement is a good guess for most use cases.
2018-03-11 23:20:10 -04:00
stack_guard(lua.state, || {
let args = args.to_lua_multi(lua)?;
let nargs = args.len() as c_int;
if nargs + 2 > ffi::LUA_MAX_UPVALUES {
return Err(Error::BindError);
}
A lot of performance changes. Okay, so this is kind of a mega-commit of a lot of performance related changes to rlua, some of which are pretty complicated. There are some small improvements here and there, but most of the benefits of this change are from a few big changes. The simplest big change is that there is now `protect_lua` as well as `protect_lua_call`, which allows skipping a lightuserdata parameter and some stack manipulation in some cases. Second simplest is the change to use Vec instead of VecDeque for MultiValue, and to have MultiValue be used as a sort of "backwards-only" Vec so that ToLuaMulti / FromLuaMulti still work correctly. The most complex change, though, is a change to the way LuaRef works, so that LuaRef can optionally point into the Lua stack instead of only registry values. At state creation a set number of stack slots is reserved for the first N LuaRef types (currently 16), and space for these are also allocated separately allocated at callback time. There is a huge breaking change here, which is that now any LuaRef types MUST only be used with the Lua on which they were created, and CANNOT be used with any other Lua callback instance. This mostly will affect people using LuaRef types from inside a scope callback, but hopefully in those cases `Function::bind` will be a suitable replacement. On the plus side, the rules for LuaRef types are easier to state now. There is probably more easy-ish perf on the table here, but here's the preliminary results, based on my very limited benchmarks: create table time: [314.13 ns 315.71 ns 317.44 ns] change: [-36.154% -35.670% -35.205%] (p = 0.00 < 0.05) create array 10 time: [2.9731 us 2.9816 us 2.9901 us] change: [-16.996% -16.600% -16.196%] (p = 0.00 < 0.05) Performance has improved. create string table 10 time: [5.6904 us 5.7164 us 5.7411 us] change: [-53.536% -53.309% -53.079%] (p = 0.00 < 0.05) Performance has improved. call add function 3 10 time: [5.1134 us 5.1222 us 5.1320 us] change: [-4.1095% -3.6910% -3.1781%] (p = 0.00 < 0.05) Performance has improved. call callback add 2 10 time: [5.4408 us 5.4480 us 5.4560 us] change: [-6.4203% -5.7780% -5.0013%] (p = 0.00 < 0.05) Performance has improved. call callback append 10 time: [9.8243 us 9.8410 us 9.8586 us] change: [-26.937% -26.702% -26.469%] (p = 0.00 < 0.05) Performance has improved. create registry 10 time: [3.7005 us 3.7089 us 3.7174 us] change: [-8.4965% -8.1042% -7.6926%] (p = 0.00 < 0.05) Performance has improved. I think that a lot of these benchmarks are too "easy", and most API usage is going to be more like the 'create string table 10' benchmark, where there are a lot of handles and tables and strings, so I think that 25%-50% improvement is a good guess for most use cases.
2018-03-11 23:20:10 -04:00
check_stack_err(lua.state, nargs + 5)?;
lua.push_ref(&self.0);
ffi::lua_pushinteger(lua.state, nargs as ffi::lua_Integer);
for arg in args {
A lot of performance changes. Okay, so this is kind of a mega-commit of a lot of performance related changes to rlua, some of which are pretty complicated. There are some small improvements here and there, but most of the benefits of this change are from a few big changes. The simplest big change is that there is now `protect_lua` as well as `protect_lua_call`, which allows skipping a lightuserdata parameter and some stack manipulation in some cases. Second simplest is the change to use Vec instead of VecDeque for MultiValue, and to have MultiValue be used as a sort of "backwards-only" Vec so that ToLuaMulti / FromLuaMulti still work correctly. The most complex change, though, is a change to the way LuaRef works, so that LuaRef can optionally point into the Lua stack instead of only registry values. At state creation a set number of stack slots is reserved for the first N LuaRef types (currently 16), and space for these are also allocated separately allocated at callback time. There is a huge breaking change here, which is that now any LuaRef types MUST only be used with the Lua on which they were created, and CANNOT be used with any other Lua callback instance. This mostly will affect people using LuaRef types from inside a scope callback, but hopefully in those cases `Function::bind` will be a suitable replacement. On the plus side, the rules for LuaRef types are easier to state now. There is probably more easy-ish perf on the table here, but here's the preliminary results, based on my very limited benchmarks: create table time: [314.13 ns 315.71 ns 317.44 ns] change: [-36.154% -35.670% -35.205%] (p = 0.00 < 0.05) create array 10 time: [2.9731 us 2.9816 us 2.9901 us] change: [-16.996% -16.600% -16.196%] (p = 0.00 < 0.05) Performance has improved. create string table 10 time: [5.6904 us 5.7164 us 5.7411 us] change: [-53.536% -53.309% -53.079%] (p = 0.00 < 0.05) Performance has improved. call add function 3 10 time: [5.1134 us 5.1222 us 5.1320 us] change: [-4.1095% -3.6910% -3.1781%] (p = 0.00 < 0.05) Performance has improved. call callback add 2 10 time: [5.4408 us 5.4480 us 5.4560 us] change: [-6.4203% -5.7780% -5.0013%] (p = 0.00 < 0.05) Performance has improved. call callback append 10 time: [9.8243 us 9.8410 us 9.8586 us] change: [-26.937% -26.702% -26.469%] (p = 0.00 < 0.05) Performance has improved. create registry 10 time: [3.7005 us 3.7089 us 3.7174 us] change: [-8.4965% -8.1042% -7.6926%] (p = 0.00 < 0.05) Performance has improved. I think that a lot of these benchmarks are too "easy", and most API usage is going to be more like the 'create string table 10' benchmark, where there are a lot of handles and tables and strings, so I think that 25%-50% improvement is a good guess for most use cases.
2018-03-11 23:20:10 -04:00
lua.push_value(arg);
}
A lot of performance changes. Okay, so this is kind of a mega-commit of a lot of performance related changes to rlua, some of which are pretty complicated. There are some small improvements here and there, but most of the benefits of this change are from a few big changes. The simplest big change is that there is now `protect_lua` as well as `protect_lua_call`, which allows skipping a lightuserdata parameter and some stack manipulation in some cases. Second simplest is the change to use Vec instead of VecDeque for MultiValue, and to have MultiValue be used as a sort of "backwards-only" Vec so that ToLuaMulti / FromLuaMulti still work correctly. The most complex change, though, is a change to the way LuaRef works, so that LuaRef can optionally point into the Lua stack instead of only registry values. At state creation a set number of stack slots is reserved for the first N LuaRef types (currently 16), and space for these are also allocated separately allocated at callback time. There is a huge breaking change here, which is that now any LuaRef types MUST only be used with the Lua on which they were created, and CANNOT be used with any other Lua callback instance. This mostly will affect people using LuaRef types from inside a scope callback, but hopefully in those cases `Function::bind` will be a suitable replacement. On the plus side, the rules for LuaRef types are easier to state now. There is probably more easy-ish perf on the table here, but here's the preliminary results, based on my very limited benchmarks: create table time: [314.13 ns 315.71 ns 317.44 ns] change: [-36.154% -35.670% -35.205%] (p = 0.00 < 0.05) create array 10 time: [2.9731 us 2.9816 us 2.9901 us] change: [-16.996% -16.600% -16.196%] (p = 0.00 < 0.05) Performance has improved. create string table 10 time: [5.6904 us 5.7164 us 5.7411 us] change: [-53.536% -53.309% -53.079%] (p = 0.00 < 0.05) Performance has improved. call add function 3 10 time: [5.1134 us 5.1222 us 5.1320 us] change: [-4.1095% -3.6910% -3.1781%] (p = 0.00 < 0.05) Performance has improved. call callback add 2 10 time: [5.4408 us 5.4480 us 5.4560 us] change: [-6.4203% -5.7780% -5.0013%] (p = 0.00 < 0.05) Performance has improved. call callback append 10 time: [9.8243 us 9.8410 us 9.8586 us] change: [-26.937% -26.702% -26.469%] (p = 0.00 < 0.05) Performance has improved. create registry 10 time: [3.7005 us 3.7089 us 3.7174 us] change: [-8.4965% -8.1042% -7.6926%] (p = 0.00 < 0.05) Performance has improved. I think that a lot of these benchmarks are too "easy", and most API usage is going to be more like the 'create string table 10' benchmark, where there are a lot of handles and tables and strings, so I think that 25%-50% improvement is a good guess for most use cases.
2018-03-11 23:20:10 -04:00
protect_lua_closure(lua.state, nargs + 2, 1, |state| {
ffi::lua_pushcclosure(state, bind_call_impl, nargs + 2);
})?;
A lot of performance changes. Okay, so this is kind of a mega-commit of a lot of performance related changes to rlua, some of which are pretty complicated. There are some small improvements here and there, but most of the benefits of this change are from a few big changes. The simplest big change is that there is now `protect_lua` as well as `protect_lua_call`, which allows skipping a lightuserdata parameter and some stack manipulation in some cases. Second simplest is the change to use Vec instead of VecDeque for MultiValue, and to have MultiValue be used as a sort of "backwards-only" Vec so that ToLuaMulti / FromLuaMulti still work correctly. The most complex change, though, is a change to the way LuaRef works, so that LuaRef can optionally point into the Lua stack instead of only registry values. At state creation a set number of stack slots is reserved for the first N LuaRef types (currently 16), and space for these are also allocated separately allocated at callback time. There is a huge breaking change here, which is that now any LuaRef types MUST only be used with the Lua on which they were created, and CANNOT be used with any other Lua callback instance. This mostly will affect people using LuaRef types from inside a scope callback, but hopefully in those cases `Function::bind` will be a suitable replacement. On the plus side, the rules for LuaRef types are easier to state now. There is probably more easy-ish perf on the table here, but here's the preliminary results, based on my very limited benchmarks: create table time: [314.13 ns 315.71 ns 317.44 ns] change: [-36.154% -35.670% -35.205%] (p = 0.00 < 0.05) create array 10 time: [2.9731 us 2.9816 us 2.9901 us] change: [-16.996% -16.600% -16.196%] (p = 0.00 < 0.05) Performance has improved. create string table 10 time: [5.6904 us 5.7164 us 5.7411 us] change: [-53.536% -53.309% -53.079%] (p = 0.00 < 0.05) Performance has improved. call add function 3 10 time: [5.1134 us 5.1222 us 5.1320 us] change: [-4.1095% -3.6910% -3.1781%] (p = 0.00 < 0.05) Performance has improved. call callback add 2 10 time: [5.4408 us 5.4480 us 5.4560 us] change: [-6.4203% -5.7780% -5.0013%] (p = 0.00 < 0.05) Performance has improved. call callback append 10 time: [9.8243 us 9.8410 us 9.8586 us] change: [-26.937% -26.702% -26.469%] (p = 0.00 < 0.05) Performance has improved. create registry 10 time: [3.7005 us 3.7089 us 3.7174 us] change: [-8.4965% -8.1042% -7.6926%] (p = 0.00 < 0.05) Performance has improved. I think that a lot of these benchmarks are too "easy", and most API usage is going to be more like the 'create string table 10' benchmark, where there are a lot of handles and tables and strings, so I think that 25%-50% improvement is a good guess for most use cases.
2018-03-11 23:20:10 -04:00
Ok(Function(lua.pop_ref()))
})
}
}
}