asm.js: a highly optimizable compilation target

Published by marco on

The article Surprise! Mozilla can produce near-native performance on the Web by Peter Bright (Ars Technica) takes a (very) early look at asm.js, a compilation target that the Mozilla foundation is pushing as a way to bring high-performance C++/C applications (read: games) to browsers.

The tool chain is really, really cool. The Clang compiler has really come a long way and established itself as the new, more flexible compiler back-end to use (Apple’s XCode has been using it since version 3.2 and it’s been the default since XCode 4.2). Basically, Mozilla hooked up a JavaScript code generator to the Clang tool-chain. This way, they get compilation, error-handling and a lot of optimizations for free. From the article,

“[The input] language is typically C or C++, and the compiler used to produce asm.js programs is another Mozilla project: Emscripten. Emscripten is a compiler based on the LLVM compiler infrastructure and the Clang C/C++ front-end. The Clang compiler reads C and C++ source code and produces an intermediate platform-independent assembler-like output called LLVM Intermediate Representation. LLVM optimizes the LLVM IR. LLVM IR is then fed into a backend code generator—the part that actually produces executable code. Traditionally, this code generator would emit x86 code. With Emscripten, it’s used to produce JavaScript.”

Mozilla has had a certain amount of success with it, but if you read all the way through the article, the project is very much a work in progress. The benchmarks executed by Ars Technica, however, bear out Mozilla’s claims of being within shooting distance of native performance (for some usages; e.g. native MT applications still blow it away because JavaScript lacks support for multi-threading and shared memory structures).

Just compiling C++/C code to JavaScript is only part of the solution: that wouldn’t necessarily generate code that’s any faster than hand-tuned JavaScript. The trick is to optimize the compilation target—that is, if the code is going to be generated by a compiler, that compiler can avoid using JavaScript language features and patterns that are hard or impossible to optimize (read the latest spec to find out more). Not only that, but if the JavaScript engine is asm.js-aware, it will also be able to apply even more optimizations because the input code will be guaranteed not to make use of any dynamic features that require much more stringent checking and handling. From the article,

“An engine that knows about asm.js also knows that asm.js programs are forbidden from using many JavaScript features. As a result, it can produce much more efficient code. Regular JavaScript JITs must have guards to detect this kind of dynamic behavior. asm.js JITs do not; asm.js forbids this kind of dynamic behavior, so the JITs do not need to handle it. This simpler model—no dynamic behavior, no memory allocation or deallocation, just a narrow set of well-defined integer and floating point operations—enables much greater optimization.”

While the results so far are quite positive, there are still a few issues to address:

asm.js scripts are currently quite large; Chrome would barely run them at all and even Firefox needed to be restarted every once in a while. Guess which browser handled the scripts with aplomb? That’s right: IE10
asm.js also preallocates a large amount of memory, managing its own heap and memory layout (using custom-built VMTs to emulate objects rather than using the slower dynamic typing native to JavaScript). This preallocation means that a script’s base footprint is much larger than that for a normal JavaScript application.
Browsers that haven’t optimized the asm.js codepath run it more slowly than regular JavaScript that does the same thing
Source-level debugging is not available and debugging the generated JavaScript is a fool’s errand