This isn't faster, just easier to understand.
Division with remainder on u128 is badly optimized by LLVM. Copying it into our crate allows for inlining and proper optimization.