Thursday, October 5, 2017

Optimizing ES2015 proxies in V8

Introduction

Proxies have been an integral part of JavaScript since ES2015. They allow intercepting fundamental operations on objects and customizing their behavior. Proxies form a core part of projects like jsdom and the Comlink RPC library. Recently, we put a lot of effort into improving the performance of proxies in V8. This article sheds some light on general performance improvement patterns in V8 and for proxies in particular.

Proxies are “objects used to define custom behavior for fundamental operations (e.g. property lookup, assignment, enumeration, function invocation, etc.)” (definition by MDN). More info can be found in the full specification. For example, the following code snippet adds logging to every property access on the object:

const target = {};
const callTracer = new Proxy(target, {
  get: (target, name, receiver) => {
    console.log(`get was called for: ${name}`);
    return target[name];
  }
});

callTracer.property = 'value';
console.log(callTracer.property);
// get was called for: property
// value

Constructing proxies

The first feature we'll focus on is the construction of proxies. Our original C++ implementation here followed the EcmaScript specification step-by-step, resulting in at least 4 jumps between the C++ and JS runtimes as shown in the following figure. We wanted to port this implementation into the platform-agnostic CodeStubAssembler (CSA), which is executed in the JS runtime as opposed to the C++ runtime.This porting minimizes that number of jumps between the language runtimes. CEntryStub and JSEntryStub represent the runtimes in the figure below. The dotted lines represent the borders between the JS and C++ runtimes. Luckily, lots of helper predicates were already implemented in the assembler, which made the initial version concise and readable.

The figure below shows the execution flow for calling a Proxy with any proxy trap (in this example apply, which is being called when the proxy is used as a function) generated by the following sample code:

function foo(...) {...}
g = new Proxy({...}, {
  apply: foo
});
g(1, 2);

After porting the trap execution to CSA all of the execution happens in the JS runtime, reducing the number of jumps between languages from 4 to 0.

This change resulted in the following performance improvements::

Our JS performance score shows an improvement between 49% and 74%. This score roughly measures how many times the given microbenchmark can be executed in 1000ms. For some tests the code is run multiple times in order to get an accurate enough measurement given the timer resolution. The code for all of the following benchmarks can be found in our js-perf-test directory.

Call and construct traps

The next section shows the results from optimizing call and construct traps (a.k.a. "apply" and "construct").

The performance improvements when calling proxies are significant — up to 500% faster! Still, the improvement for proxy construction is quite modest, especially in cases where no actual trap is defined — only about 25% gain. We investigated this by running the following command with the d8 shell:

$ out/x64.release/d8 --runtime-call-stats test.js
> run: 120.104000

                      Runtime Function/C++ Builtin        Time             Count
========================================================================================
                                         NewObject     59.16ms  48.47%    100000  24.94%
                                      JS_Execution     23.83ms  19.53%         1   0.00%
                              RecompileSynchronous     11.68ms   9.57%        20   0.00%
                        AccessorNameGetterCallback     10.86ms   8.90%    100000  24.94%
      AccessorNameGetterCallback_FunctionPrototype      5.79ms   4.74%    100000  24.94%
                                  Map_SetPrototype      4.46ms   3.65%    100203  25.00%

… SNIPPET …

Where test.js's source is:

function MyClass() {}
MyClass.prototype = {};
const P = new Proxy(MyClass, {});
function run() {
  return new P();
}
const N = 1e5;
console.time('run');
for (let i = 0; i < N; ++i) {
  run();
}
console.timeEnd('run');

It turned out most of the time is spent in NewObject and the functions called by it, so we started planning how to speed this up in future releases.

Get trap

The next section describes how we optimized the other most common operations — getting and setting properties through proxies. It turned out the get trap is more involved than the previous cases, due to the specific behavior of V8's inline cache. For a detailed explanation of inline caches, you can watch this talk.

Eventually we managed to get a working port to CSA with the following results:

After landing the change, we noticed the size of the Android .apk for Chrome had grown by ~160KB, which is more than expected for a helper function of roughly 20 lines, but fortunately we track such statistics. It turned out this function is called twice from another function, which is called 3 times, from another called 4 times. The cause of the problem turned out to be the aggressive inlining. Eventually we solved the issue by turning the inline function into a separate code stub, thus saving precious KBs - the end version had only ~19KB increase in .apk size.

Has trap

The next section shows the results from optimizing the has trap. Although at first we thought it would be easier (and reuse most of the code of the get trap), it turned out to have its own peculiarities. A particularly hard-to-track-down problem was the prototype chain walking when calling the in operator. The improvement results achieved vary between 71% and 428%. Again the gain is more prominent in cases where the trap is present.

Set trap

The next section talks about porting the set trap. This time we had to differentiate between named and indexed properties (elements). These two main types are not part of the JS language, but are essential for V8's efficient property storage. The initial implementation still bailed out to the runtime for elements, which causes crossing the language boundaries again. Nevertheless we achieved improvements between 27% and 438% for cases when the trap is set, at the cost of a decrease of up to 23% when it's not. This performance regression is due to the overhead of additional check for differentiating between indexed and named properties. For indexed properties, there is no improvement yet. Here are the complete results:

Real-world usage

Results from jsdom-proxy-benchmark

The jsdom-proxy-benchmark project compiles the ECMAScript specification using the Ecmarkup tool. As of v11.2.0, the jsdom project (which underlies Ecmarkup) uses proxies to implement the common data structures NodeList and HTMLCollection. We used this benchmark to get an overview of some more realistic usage than the synthetic micro-benchmarks, and achieved the following results, average of 100 runs:

  • Node v8.4.0 (without Proxy optimizations): 14277 ± 159 ms
  • Node v9.0.0-v8-canary-20170924 (with only half of the traps ported): 11789 ± 308 ms
  • Gain in speed around 2.4 seconds which is ~17% better

(thanks for the results provided by TimothyGu)

Results from Chai.js

Chai.js is a popular assertion library which makes heavy use of proxies. We've created a kind of real-world benchmark by running its tests with different versions of V8 an improvement of roughly 1s out of more than 4s, average of 100 runs:

Optimization approach

We often tackle performance issues using a generic optimization scheme. The main approach that we followed for this particular work included the following steps:

  • Implement performance tests for the particular sub-feature
  • Add more specification conformance tests (or write them from scratch)
  • Investigate the original C++ implementation
  • Port the sub-feature to the platform-agnostic CodeStubAssembler
  • Optimize the code even further by hand-crafting a TurboFan implementation
  • Measure the performance improvement.

This approach can be applied to any general optimization task that you may have.

Written by Maya Lekova (@MayaLekova), Optimizer of Proxies

Wednesday, October 4, 2017

An Internship on Laziness: Lazy Unlinking of Deoptimized Functions

Roughly three months ago, I joined the V8 team (Google Munich) as an intern and since then I’ve been working on the VM’s Deoptimizer — something completely new to me which proved to be an interesting and challenging project. The first part of my internship focused on improving the VM security-wise. The second part focused on performance improvements. Namely, on the removal of a data-structure used for the unlinking of previously deoptimized functions, which was a performance bottleneck during garbage collection. This blog post describes this second part of my internship. I’ll explain how V8 used to unlink deoptimized functions, how we changed this, and what performance improvements were obtained.

Let’s (very) briefly recap the V8 pipeline for a JavaScript function: V8’s interpreter, Ignition, collects profiling information about that function while interpreting it. Once the function becomes hot, this information is passed to V8’s compiler, TurboFan, which generates optimized machine code. When the profiling information is no longer valid — for example because one of the profiled objects gets a different type during runtime — the optimized machine code might become invalid. In that case, V8 needs to deoptimize it.

Source: JavaScript Start-up Performance

Upon optimization, TurboFan generates a code object, i.e. the optimized machine code, for the function under optimization. When this function is invoked the next time, V8 follows the link to optimized code for that function and executes it. Upon deoptimization of this function, we need to unlink the code object in order to make sure that it won’t be executed again. How does that happen?

For example, in the following code, the function f1 will be invoked many times (always passing an integer as argument). TurboFan then generates machine code for that specific case.

function g() {
  return (i) => i;
}

// Create a closure.
const f1 = g();
// Optimize f1.
for (var i = 0; i < 1000; i++) f1(0);

Each function also has a trampoline to the interpreter — more details in these slides — and will keep a pointer to this trampoline in its SharedFunctionInfo (SFI). This trampoline will be used whenever V8 needs to go back to unoptimized code. Thus, upon deoptimization, triggered by passing an argument of a different type, for example, the Deoptimizer can simply set the code field of the JavaScript function to this trampoline.

Although this seems simple, it forces V8 to keep weak lists of optimized JavaScript functions. This is because it is possible to have different functions pointing to the same optimized code object. We can extend our example as follows, and the functions f1 and f2 both point to the same optimized code.

const f2 = g();
f2(0);

If the function f1 is deoptimized (for example by invoking it with an object of different type {x: 0}) we need to make sure that the invalidated code will not be executed again by invoking f2.

Thus, upon deoptimization, V8 used to iterate over all the optimized JavaScript functions, and would unlink those that pointed to the code object being deoptimized. This iteration in applications with many optimized JavaScript functions became a performance bottleneck. Moreover, other than slowing down deoptimization, V8 used to iterate over these lists upon stop-the-world cycles of garbage collection, making it even worse.

In order to have an idea of the impact of such data-structure in the performance of V8, we wrote a micro-benchmark that stresses its usage, by triggering many scavenge cycles after creating many JavaScript functions.

function g() {
  return (i) => i + 1;
}

// Create an initial closure and optimize.
var f = g();

f(0);
f(0);
%OptimizeFunctionOnNextCall(f);
f(0);

// Create 2M closures, those will get the previously optimized code.
var a = [];
for (var i = 0; i < 2000000; i++) {
  var h = g();
  h();
  a.push(h);
}

// Now cause scavenges, all of them are slow.
for (var i = 0; i < 1000; i++) {
  new Array(50000);
}

When running this benchmark, we could observe that V8 spent around 98% of its execution time on garbage collection. We then removed this data structure, and instead used an approach for lazy unlinking, and this was what we observed on x64:

Although this is just a micro-benchmark that creates many JavaScript functions and triggers many garbage collection cycles, it gives us an idea of the overhead introduced by this data structure. Other more realistic applications where we saw some overhead, and which motivated this work, were the router benchmark implemented in Node.js and ARES-6 benchmark suite.

Lazy unlinking

Rather than unlinking optimized code from JavaScript functions upon deoptimization, V8 postpones it for the next invocation of such functions. When such functions are invoked, V8 checks whether they have been deoptimized, unlinks them and then continues with their lazy compilation. If these functions are never invoked again, then they will never be unlinked and the deoptimized code objects will not be collected. However, given that during deoptimization, we invalidate all the embedded fields of the code object, we only keep that code object alive.

The commit that removed this list of optimized JavaScript functions required changes in several parts of the VM, but the basic idea is as follows. When assembling the optimized code object, we check if this is the code of a JavaScript function. If so, in its prologue, we assemble machine code to bail out if the code object has been deoptimized. Upon deoptimization we don’t modify the deoptimized code — code patching is gone. Thus, its bit marked_for_deoptimization is still set when invoking the function again. TurboFan generates code to check it, and if it is set, then V8 jumps to a new builtin, CompileLazyDeoptimizedCode, that unlinks the deoptimized code from the JavaScript function and then continues with lazy compilation.

In more detail, the first step is to generate instructions that load the address of the code being currently assembled. We can do that in x64, with the following code:

Label current;
// Load effective address of current instruction into rcx.
__ leaq(rcx, Operand(&current));
__ bind(&current);

After that we need to obtain where in the code object the marked_for_deoptimization bit lives.

int pc = __ pc_offset();
int offset = Code::kKindSpecificFlags1Offset - (Code::kHeaderSize + pc);

We can then test the bit and if it is set, we jump to the CompileLazyDeoptimizedCode built in.

// Test if the bit is set, that is, if the code is marked for deoptimization.
__ testl(Operand(rcx, offset),
         Immediate(1 << Code::kMarkedForDeoptimizationBit));
// Jump to builtin if it is.
__ j(not_zero, /* handle to builtin code here */, RelocInfo::CODE_TARGET);

On the side of this CompileLazyDeoptimizedCode builtin, all that’s left to do is to unlink the code field from the JavaScript function and set it to the trampoline to the Interpreter entry. So, considering that the address of the JavaScript function is in the register rdi, we can obtain the pointer to the SharedFunctionInfo with:

// Field read to obtain the SharedFunctionInfo.
__ movq(rcx, FieldOperand(rdi, JSFunction::kSharedFunctionInfoOffset));

…and similarly the trampoline with:

// Field read to obtain the code object.
__ movq(rcx, FieldOperand(rcx, SharedFunctionInfo::kCodeOffset));

Then we can use it to update the function slot for the code pointer:

// Update the code field of the function with the trampoline.
__ movq(FieldOperand(rdi, JSFunction::kCodeOffset), rcx);
// Write barrier to protect the field.
__ RecordWriteField(rdi, JSFunction::kCodeOffset, rcx, r15,
                    kDontSaveFPRegs, OMIT_REMEMBERED_SET, OMIT_SMI_CHECK);

This produces the same result as before. However, rather than taking care of the unlinking in the Deoptimizer, we need to worry about it during code generation. Hence the handwritten assembly.

The above is how it works in the x64 architecture. We have implemented it for ia32, arm, arm64, mips, and mips64 as well.

This new technique is already integrated in V8 and, as we’ll discuss later, allows for performance improvements. However, it comes with a minor disadvantage: Before, V8 would consider unlinking only upon deoptimization. Now, it has to do so in the activation of all optimized functions. Moreover, the approach to check the marked_for_deoptimization bit is not as efficient as it could be, given that we need to do some work to obtain the address of the code object. Note that this happens when entering every optimized function. A possible solution for this issue is to keep in a code object a pointer to itself. Rather than doing work to find the address of the code object whenever the function is invoked, V8 would do it only once, after its construction.

Results

We now look at the performance gains and regressions obtained with this project.

General Improvements on x64

The following plot shows us some improvements and regressions, relative to the previous commit. Note that the higher, the better.

The promises benchmarks are the ones where we see greater improvements, observing almost 33% gain for the bluebird-parallel benchmark, and 22.40% for wikipedia. We also observed a few regressions in some benchmarks. This is related to the issue explained above, on checking whether the code is marked for deoptimization.

We also see improvements in the ARES-6 benchmark suite. Note that in this chart too, the higher the better. These programs used to spend considerable amount of time in GC-related activities. With lazy unlinking we improve performance by 1.9% overall. The most notable case is the Air steadyState where we get an improvement of around 5.36%.


AreWeFastYet results

The performance results for the Octane and ARES-6 benchmark suites also showed up on the AreWeFastYet tracker. We looked at these performance results on September 5th, 2017, using the provided default machine (macOS 10.10 64-bit, Mac Pro, shell).


Impact on Node.js

We can also see performance improvements in the router-benchmark. The following two plots show the number of operations per second of each tested router. Thus the higher the better. We have performed two kinds of experiments with this benchmark suite. Firstly, we ran each test in isolation, so that we could see the performance improvement, independently from the remaining tests. Secondly, we ran all tests at once, without switching of the VM, thus simulating an environment where each test is integrated with other functionalities.

For the first experiment, we saw that the router and express tests perform about twice as many operations than before, in the same amount of time. For the second experiment, we saw even greater improvement. In some of the cases, such as routr, server-router and router, the benchmark performs approximately 3.80×, 3× and 2× more operations, respectively. This happens because V8 accumulates more optimized JavaScript functions, test after test. Thus, whenever executing a given test, if a garbage collection cycle is triggered, V8 has to visit the optimized functions from the current test and from the previous ones.


Further Optimization

Now that V8 does not keep the linked-list of JavaScript functions in the context, we can remove the field next from the JSFunction class. Although this is a simple modification, it allows us to save the size of a pointer per function, which represent significant savings in several web pages:

Benchmark Kind Memory savings (absolute) Memory savings (relative)
facebook.com Average effective size 170KB 3.7%
twitter.com Average size of allocated objects 284KB 1.2%
cnn.com Average size of allocated objects 788KB 1.53%
youtube.com Average size of allocated objects 129KB 0.79%

Acknowledgments

Throughout my internship, I had lots of help from several people, who were always available to answer my many questions. Thus I would like to thank the following people: Benedikt Meurer, Jaroslav Sevcik, and Michael Starzinger for discussions on how the Compiler and the Deoptimizer work, Ulan Degenbaev for helping with the Garbage Collector whenever I broke it, and Mathias Bynens, Peter Marshall, Camillo Bruni, and Maya Lekova for proofreading this article.

Finally, this article is my last contribution as a Google intern and I would like to take the opportunity to thank everyone in the V8 team, and especially my host, Benedikt Meurer, for hosting me and for giving me the opportunity to work on such an interesting project — I definitely learned a lot and enjoyed my time at Google!

Juliana Franco, @jupvfranco, Laziness Expert

Friday, September 22, 2017

Temporarily disabling escape analysis

In JavaScript, an allocated object escapes if it is accessible from outside the current function. Normally V8 allocates new objects on the JavaScript heap, but using escape analysis, an optimizing compiler can figure out when an object can be treated specially because its lifetime is provably bound to the function’s activation. When the reference to a newly allocated object does not escape the function that creates it, JavaScript engines don’t need to explicitly allocate that object on the heap. They can instead effectively treat the values of the object as local variables to the function. That in turn enables all kinds of optimizations like storing these values on the stack or in registers, or in some cases, optimizing the values away completely. Objects that escape (more accurately, objects that can’t be proven to not escape) must be heap-allocated.

For example, escape analysis enables V8 to effectively rewrite the following code:

function foo(a, b) {
  const object = { a, b };
  return object.a + object.b;
  // Note: `object` does not escape.
}

…into this code, which enables several under-the-hood optimizations:

function foo(a, b) {
  const object_a = a;
  const object_b = b;
  return object_a + object_b;
}

V8 v6.1 and older used an escape analysis implementation that was complex and generated many bugs since its introduction. This implementation has since been removed and a brand new escape analysis codebase is available in V8 v6.2.

However, a Chrome security vulnerability involving the old escape analysis implementation in V8 v6.1 has been discovered and responsibly disclosed to us. To protect our users, we’ve turned off escape analysis in Chrome 61. Node.js should not be affected as the exploit depends on execution of untrusted JavaScript.

Turning off escape analysis negatively impacts performance because it disables the abovementioned optimizations. Specifically, the following ES2015 features might suffer temporary slowdowns:

  • destructuring
  • for-of iteration
  • array spread
  • rest parameters

Note that disabling escape analysis is only a temporary measure. With Chrome 62, we’ll ship the brand new — and most importantly, enabled — implementation of escape analysis as seen in V8 v6.2.

Tuesday, September 12, 2017

“Elements kinds” in V8

JavaScript objects can have arbitrary properties associated with them. The names of object properties can contain any character. One of the interesting cases that a JavaScript engine can choose to optimize for are properties whose names are purely numeric, most specifically array indices.

In V8, properties with integer names — the most common form of which are objects generated by the Array constructor — are handled specially. Although in many circumstances these numerically-indexed properties behave just like other properties, V8 chooses to store them separately from non-numeric properties for optimization purposes. Internally, V8 even gives these properties a special name: elements. Objects have properties that map to values, whereas arrays have indices that map to elements.

Although these internals are never directly exposed to JavaScript developers, they explain why certain code patterns are faster than others.

Common elements kinds

While running JavaScript code, V8 keeps track of what kind of elements each array contains. This information allows V8 to optimize any operations on the array specifically for this type of element. For example, when you call reduce, map, or forEach on an array, V8 can optimize those operations based on what kind of elements the array contains.

Take this array, for example:

const array = [1, 2, 3];

What kinds of elements does it contain? If you’d ask the typeof operator, it would tell you the array contains numbers. At the language-level, that’s all you get: JavaScript doesn’t distinguish between integers, floats, and doubles — they’re all just numbers. However, at the engine level, we can make more precise distinctions. The elements kind for this array is PACKED_SMI_ELEMENTS. In V8, the term Smi refers to the particular format used to store small integers. (We’ll get to the PACKED part in a minute.)

Later adding a floating-point number to the same array transitions it to a more generic elements kind:

const array = [1, 2, 3];
// elements kind: PACKED_SMI_ELEMENTS
array.push(4.56);
// elements kind: PACKED_DOUBLE_ELEMENTS

Adding a string literal to the array changes its elements kind once again.

const array = [1, 2, 3];
// elements kind: PACKED_SMI_ELEMENTS
array.push(4.56);
// elements kind: PACKED_DOUBLE_ELEMENTS
array.push('x');
// elements kind: PACKED_ELEMENTS

We’ve seen three distinct elements kinds so far, with the following basic types:

  • Small integers, also known as Smi.
  • Doubles, for floating-point numbers and integers that cannot be represented as a Smi.
  • Regular elements, for values that cannot be represented as Smi or doubles.

Note that doubles form a more general variant of Smi, and regular elements are another generalization on top of doubles. The set of numbers that can be represented as a Smi is a subset of the numbers that can be represented as a double.

What’s important here is that elements kind transitions only go in one direction: from specific (e.g. PACKED_SMI_ELEMENTS) to more general (e.g. PACKED_ELEMENTS). Once an array is marked as PACKED_ELEMENTS, it cannot go back to PACKED_DOUBLE_ELEMENTS, for example.

So far, we’ve learned the following:

  • V8 assigns an elements kind to each array.
  • The elements kind of an array is not set in stone — it can change at runtime. In the earlier example, we transitioned from PACKED_SMI_ELEMENTS to PACKED_ELEMENTS.
  • Elements kind transitions can only go from specific kinds to more general kinds.

PACKED vs. HOLEY kinds

So far, we’ve only been dealing with dense or packed arrays. Creating holes in the array (i.e. making the array sparse) downgrades the elements kind to its “holey” variant:

const array = [1, 2, 3, 4.56, 'x'];
// elements kind: PACKED_ELEMENTS
array.length; // 5
array[9] = 1; // array[5] until array[8] are now holes
// elements kind: HOLEY_ELEMENTS

V8 makes this distinction because operations on packed arrays can be optimized more aggressively than operations on holey arrays. For packed arrays, most operations can be performed efficiently. In comparison, operations on holey arrays require additional checks and expensive lookups on the prototype chain.

Each of the basic elements kinds we’ve seen so far (i.e. Smis, doubles, and regular elements) comes in two flavors: the packed and the holey version. Not only can we transition from, say, PACKED_SMI_ELEMENTS to PACKED_DOUBLE_ELEMENTS, we can also transition from any PACKED kind to its HOLEY counterpart.

To recap:

  • The most common elements kinds come in PACKED and HOLEY flavors.
  • Operations on packed arrays are more efficient than operations on holey arrays.
  • Elements kinds can transition from PACKED to HOLEY flavors.

The elements kind lattice

V8 implements this tag transitioning system as a lattice. Here’s a simplified visualization of that featuring only the most common elements kinds:

It’s only possible to transition downwards through the lattice. Once a single floating-point number is added to an array of Smis, it is marked as DOUBLE, even if you later overwrite the float with a Smi. Similarly, once a hole is created in an array, it’s marked as holey forever, even when you fill it later.

V8 currently distinguishes 21 different elements kinds, each of which comes with its own set of possible optimizations.

In general, more specific elements kinds enable more fine-grained optimizations. The further down the elements kind is in the lattice, the slower manipulations of that object might be. For optimal performance, avoid needlessly transitioning to less specific types — stick to the most specific one that’s applicable to your situation.

Performance tips

In most cases, elements kind tracking works invisibly under the hood and you don’t need to worry about it. But here are a few things you can do to get the greatest possible benefit from the system.

Avoid creating holes

Let’s say we’re trying to create an array, for example:

const array = new Array(3);
// The array is sparse at this point, so it gets marked as
// `HOLEY_SMI_ELEMENTS`, i.e. the most specific possibility given
// the current information.
array[0] = 'a';
// Hold up, that’s a string instead of a small integer… So the kind
// transitions to `HOLEY_ELEMENTS`.
array[1] = 'b';
array[2] = 'c';
// At this point, all three positions in the array are filled, so
// the array is packed (i.e. no longer sparse). However, we cannot
// transition to a more specific kind such as `PACKED_ELEMENTS`. The
// elements kind remains `HOLEY_ELEMENTS`.

Once the array is marked as holey, it’s holey forever — even if it’s packed later! Any operation on the array from then on is potentially slower than it could be. If you plan on performing lots of operations on the array, and you’d like to optimize those operations, avoid creating holes in the array. V8 can deal with packed arrays more efficiently.

A better way of creating an array is to use a literal instead:

const array = ['a', 'b', 'c'];
// elements kind: PACKED_ELEMENTS

If you don’t know all the values ahead of time, create an array, and later push the values to it.

const array = [];
// …
array.push(someValue);
// …
array.push(someOtherValue);

This approach ensures that the array never transitions to a holey elements kind. As a result, V8 can optimize any future operations on the array more efficiently.

Avoid reading beyond the length of the array

A similar situation to hitting a hole occurs when reading beyond the length of the array, e.g. reading array[42] when array.length === 5. In this case, the array index 42 is out of bounds, the property is not present on the array itself, and so the JavaScript engine has to perform the same expensive prototype chain lookups.

Don’t write your loops like this:

// Don’t do this!
for (let i = 0, item; (item = items[i]) != null; i++) {
  doSomething(item);
}

This code reads all the elements in the array, and then one more. It only ends once it finds an undefined or null element. (jQuery uses this pattern in a few places.)

Instead, write your loops the old-fashioned way, and just keep iterating until you hit the last element.

for (let index = 0; index < items.length; index++) {
  const item = items[index];
  doSomething(item);
}

When the collection you’re looping over is iterable (as is the case for arrays and NodeLists), that’s even better: just use for-of.

for (const item of items) {
  doSomething(item);
}

For arrays specifically, you could use the forEach built-in:

items.forEach((item) => {
  doSomething(item);
});

Nowadays, the performance of both for-of and forEach is on par with the old-fashioned for loop.

Avoid reading beyond the array’s length! Doing so is just as bad as hitting a hole in an array. In this case, V8’s bounds check fails, the check to see if the property is present fails, and then we need to look up the prototype chain.

Avoid elements kind transitions

In general, if you need to perform lots of operations on an array, try sticking to an elements kind that’s as specific as possible, so that V8 can optimize those operations as much as possible.

This is harder than it seems. For example, just adding -0 to an array of small integers is enough to transition it to PACKED_DOUBLE_ELEMENTS.

const array = [3, 2, 1, +0];
// PACKED_SMI_ELEMENTS
array.push(-0);
// PACKED_DOUBLE_ELEMENTS

As a result, any future operations on this array are optimized in a completely different way than they would be for Smis.

Avoid -0, unless you explicitly need to differentiate -0 and +0 in your code. (You probably don’t.)

The same thing goes for NaN and Infinity. They are represented as doubles, so adding a single NaN or Infinity to an array of SMI_ELEMENTS transitions it to DOUBLE_ELEMENTS.

const array = [3, 2, 1];
// PACKED_SMI_ELEMENTS
array.push(NaN, Infinity);
// PACKED_DOUBLE_ELEMENTS

If you’re planning on performing lots of operations on an array of integers, consider normalizing -0 and blocking NaN and Infinity when initializing the values. That way, the array sticks to the PACKED_SMI_ELEMENTS kind. This one-time normalization cost can be worth the later optimizations.

In fact, if you’re doing mathematical operations on an array of numbers, consider using a TypedArray. We have specialized elements kinds for those, too.

Prefer arrays over array-like objects

Some objects in JavaScript — especially in the DOM — look like arrays although they aren’t proper arrays. It’s possible to create array-like objects yourself:

const arrayLike = {};
arrayLike[0] = 'a';
arrayLike[1] = 'b';
arrayLike[2] = 'c';
arrayLike.length = 3;

This object has a length and supports indexed element access (just like an array!) but it lacks array methods such as forEach on its prototype. It’s still possible to call array generics on it, though:

Array.prototype.forEach.call(arrayLike, (value, index) => {
  console.log(`${ index }: ${ value }`);
});
// This logs '0: a', then '1: b', and finally '2: c'.

This code calls the Array.prototype.forEach built-in on the array-like object, and it works as expected. However, this is slower than calling forEach on a proper array, which is highly optimized in V8. If you plan on using array built-ins on this object more than once, consider turning it into an actual array beforehand:

const actualArray = Array.prototype.slice.call(arrayLike, 0);
actualArray.forEach((value, index) => {
  console.log(`${ index }: ${ value }`);
});
// This logs '0: a', then '1: b', and finally '2: c'.

The one-time conversion cost can be worth the later optimizations, especially if you plan on performing lots of operations on the array.

The arguments object, for example, is an array-like object. It’s possible to call array builtins on it, but such operations won’t be fully optimized the way they could be for a proper array.

const logArgs = function() {
  Array.prototype.forEach.call(arguments, (value, index) => {
    console.log(`${ index }: ${ value }`);
  });
};
logArgs('a', 'b', 'c');
// This logs '0: a', then '1: b', and finally '2: c'.

ES2015 rest parameters can help here. They produce proper arrays that can be used instead of the array-like arguments objects in an elegant way.

const logArgs = (...args) => {
  args.forEach((value, index) => {
    console.log(`${ index }: ${ value }`);
  });
};
logArgs('a', 'b', 'c');
// This logs '0: a', then '1: b', and finally '2: c'.

Nowadays, there’s no good reason to use the arguments object directly.

In general, avoid array-like objects whenever possible and use proper arrays instead.

Avoid polymorphism

If you have code that handles arrays of many different elements kinds, it can lead to polymorphic operations that are slower than a version of the code that only operates on a single elements kind.

Consider the following example, where a library function is called with various elements kinds. (Note that this is not the native Array.prototype.forEach, which has its own set of optimizations on top of the elements kinds-specific optimizations discussed in this article.)

const each = (array, callback) => {
  for (let index = 0; index < array.length; ++index) {
    const item = array[index];
    callback(item);
  }
};
const doSomething = (item) => console.log(item);

each([], () => {});

each(['a', 'b', 'c'], doSomething);
// `each` is called with `PACKED_ELEMENTS`. V8 uses an inline cache
// (or “IC”) to remember that `each` is called with this particular
// elements kind. V8 is optimistic and assumes that the
// `array.length` and `array[index]` accesses inside the `each`
// function are monomorphic (i.e. only ever receive a single kind
// of elements) until proven otherwise. For every future call to
// `each`, V8 checks if the elements kind is `PACKED_ELEMENTS`. If
// so, V8 can re-use the previously-generated code. If not, more work
// is needed.

each([1.1, 2.2, 3.3], doSomething);
// `each` is called with `PACKED_DOUBLE_ELEMENTS`. Because V8 has
// now seen different elements kinds passed to `each` in its IC, the
// `array.length` and `array[index]` accesses inside the `each`
// function get marked as polymorphic. V8 now needs an additional
// check every time `each` gets called: one for `PACKED_ELEMENTS`
// (like before), a new one for `PACKED_DOUBLE_ELEMENTS`, and one for
// any other elements kinds (like before). This incurs a performance
// hit.

each([1, 2, 3], doSomething);
// `each` is called with `PACKED_SMI_ELEMENTS`. This triggers another
// degree of polymorphism. There are now three different elements
// kinds in the IC for `each`. For every `each` call from now on, yet
// another elements kind check is needed to re-use the generated code
// for `PACKED_SMI_ELEMENTS`. This comes at a performance cost.

Built-in methods (such as Array.prototype.forEach) can deal with this kind of polymorphism much more efficiently, so consider using them instead of userland library functions in performance-sensitive situations.

Another example of monomorphism vs. polymorphism in V8 involves object shapes, also known as the hidden class of an object. To learn about that case, check out Vyacheslav’s article.

Debugging elements kinds

To figure out a given object’s “elements kind”, get a debug build of d8 (see “Building from source”), and run:

$ out.gn/x64.debug/d8 --allow-natives-syntax

This opens a d8 REPL in which special functions such as %DebugPrint(object) are available. The “elements” field in its output reveals the “elements kind” of any object you pass to it.

d8> const array = [1, 2, 3]; %DebugPrint(array);
DebugPrint: 0x1fbbad30fd71: [JSArray]
 - map = 0x10a6f8a038b1 [FastProperties]
 - prototype = 0x1212bb687ec1
 - elements = 0x1fbbad30fd19 <FixedArray[3]> [PACKED_SMI_ELEMENTS (COW)]
 - length = 3
 - properties = 0x219eb0702241 <FixedArray[0]> {
    #length: 0x219eb0764ac9 <AccessorInfo> (const accessor descriptor)
 }
 - elements= 0x1fbbad30fd19 <FixedArray[3]> {
           0: 1
           1: 2
           2: 3
 }
[…]

Note that “COW” stands for copy-on-write, which is yet another internal optimization. Don’t worry about that for now — that’s a topic for another blog post!

Another useful flag that’s available in debug builds is --trace-elements-transitions. Enable it to let V8 inform you whenever any elements kind transition takes place.

$ cat my-script.js
const array = [1, 2, 3];
array[3] = 4.56;

$ out.gn/x64.debug/d8 --trace-elements-transitions my-script.js
elements transition [PACKED_SMI_ELEMENTS -> PACKED_DOUBLE_ELEMENTS] in ~+34 at x.js:2 for 0x1df87228c911 <JSArray[3]> from 0x1df87228c889 <FixedArray[3]> to 0x1df87228c941 <FixedDoubleArray[22]>

Monday, September 11, 2017

V8 Release 6.2

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 6.2, which is in beta until its release in coordination with Chrome 62 Stable in several weeks. V8 v6.2 is filled with all sorts of developer-facing goodies. This post provides a preview of some of the highlights in anticipation of the release.

Performance improvements


The performance of Object#toString() was previously already identified as a potential bottleneck, since it’s often used by popular libraries like lodash and underscore.js, and frameworks like AngularJS. Various helper functions like _.isPlainObject, _.isDate, angular.isArrayBuffer or angular.isRegExp are often used throughout application and library code to perform runtime type checks.

With the advent of ES2015, Object#toString() became monkey-patchable via the new Symbol.toStringTag symbol, which also made Object#toString() more heavy-weight and more challenging to speed up. In this release we ported an optimization initially implemented in the SpiderMonkey JavaScript engine to V8, speeding up throughput of Object#toString() by a factor of 6.5×.



It also impacts the Speedometer browser benchmark, specifically the AngularJS subtest, where we measured a solid 3% improvement. Read the detailed blog post for additional information.



We’ve also significantly improved the performance of ES2015 proxies, speeding up calling a proxy object via someProxy(params) or new SomeOtherProxy(params) by up to :



And similarly, the performance of accessing a property on a proxy object via someProxy.property improved by almost 6.5×:



This is part of an ongoing internship. Stay tuned for a more detailed blog post and final results.

We’re also excited to announce that thanks to contributions from Peter Wong, the performance of the String#includes() built-in improved by more than since the previous release.

Hashcode lookups for internal hash tables got much faster, resulting in improved performance for Map, Set, WeakMap, and WeakSet. An upcoming blog post will explain this optimization in detail.



The garbage collector now uses a Parallel Scavenger for collecting the so-called young generation of the heap.

Enhanced low-memory mode


Over the last few releases V8’s low-memory mode was enhanced (e.g. by setting initial semi-space size to 512K). Low-memory devices now hit fewer out-of-memory situations. This low-memory behavior might have a negative impact on runtime performance though.

More regular expressions features


Support for the dotAll mode for regular expressions, enabled through the s flag, is now enabled by default. In dotAll mode, the . atom in regular expressions matches any character, including line terminators.

/foo.bar/su.test('foo\nbar'); // true

Lookbehind assertions, another new regular expression feature, are now available by default. The name already describes its meaning pretty well. Lookbehind assertions offer a way to restrict a pattern to only match if preceded by the pattern in the lookbehind group. It comes in both matching and non-matching flavors:

/(?<=\$)\d+/.exec('$1 is worth about ¥123'); // ['1']
/(?<!\$)\d+/.exec('$1 is worth about ¥123'); // ['123']


More details about these features are available in our blog post titled Upcoming regular expression features.

Template literal revision


The restriction on escape sequences in template literals has been loosened per the relevant proposal. This enables new use cases for template tags, such as writing a LaTeX processor.

const latex = (strings) => {
  // …
};

const document = latex`
\newcommand{\fun}{\textbf{Fun!}}
\newcommand{\unicode}{\textbf{Unicode!}}
\newcommand{\xerxes}{\textbf{King!}}
Breve over the h goes \u{h}ere // Illegal token!
`;

Increased max string length


The maximum string length on 64-bit platforms increased from 2**28 - 16 to 2**30 - 25 characters.

FullCodeGen is gone


In 6.2 the final major pieces of the old pipeline are gone. More than 30k lines of code were deleted in this release — a clear win for reducing code complexity.

V8 API


Please check out our summary of API changes. This document is regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use git checkout -b 6.2 -t branch-heads/6.2 to experiment with the new features in V8 6.2. Alternatively you can subscribe to Chrome’s Beta channel and try the new features out yourself soon.

Posted by the V8 team

Wednesday, August 30, 2017

Fast Properties in V8

In this blog post we would like to explain how V8 handles JavaScript properties internally. From a JavaScript point of view there are only a few distinctions necessary for properties. JavaScript objects mostly behave like dictionaries, with string keys and arbitrary objects as values. The specification does however treat integer-indexed properties and other properties differently during iteration. Other than that, the different properties behave mostly the same, independent of whether they are integer indexed or not.

However, under the hood V8 does rely on several different representations of properties for performance and memory reasons. In this blog post we are going to explain how V8 can provide fast property access while handling dynamically-added properties. Understanding how properties work is essential for explaining how optimizations such as inline caches work in V8.

This post explains the difference in handling integer-indexed and named properties. After that we show how V8 maintains HiddenClasses when adding named properties in order to provide a fast way to identify the shape of an object. We'll then continue giving insights into how named properties are optimized for fast accesses or fast modification depending on the usage. In the final section we provide details on how V8 handles integer-indexed properties or array indices.

Named Properties vs. Elements

Let's start by analysing a very simple object such as {a: "foo", b: "bar"}. This object has two named properties, "a" and "b". It does not have any integer indices for property names. array-indexed properties, more commonly known as elements, are most prominent on arrays. For instance the array ["foo", "bar"] has two array-indexed properties: 0, with the value "foo", and 1, with the value "bar". This is the first major distinction on how V8 handles properties in general.

The following diagram shows what a basic JavaScript object looks like in memory.



Elements and properties are stored in two separate data structures which makes adding and accessing properties or elements more efficient for different usage patterns.

Elements are mainly used for the various Array.prototype methods such as pop or slice. Given that these functions access properties in consecutive ranges, V8 also represents them as simple arrays internally — most of the time. Later in this post we will explain how we sometimes switch to a sparse dictionary-based representation to save memory.

Named properties are stored in a similar way in a separate array. However, unlike elements, we cannot simply use the key to deduce their position within the properties array; we need some additional metadata. In V8 every JavaScript object has a HiddenClass associated. The HiddenClass stores information about the shape of an object, and among other things, a mapping from property names to indices into the properties. To complicate things we sometimes use a dictionary for the properties instead of a simple array. We will explain this in more detail in a dedicated section.

Takeaway from this section:
  • Array-indexed properties are stored in a separate elements store.
  • Named properties are stored in the properties store.
  • Elements and properties can either be arrays or dictionaries.
  • Each JavaScript object has a HiddenClass associated that keeps information about the object shape.

HiddenClasses and DescriptorArrays

After explaining the general distinction of elements and named properties we need to have a look at how HiddenClasses work in V8. This HiddenClass stores meta information about an object, including the number of properties on the object and a reference to the object’s prototype. HiddenClasses are conceptually similar to classes in typical object-oriented programming languages. However, in a prototype-based language such as JavaScript it is generally not possible to know classes upfront. Hence, in this case V8, HiddenClasses are created on the fly and updated dynamically as objects change. HiddenClasses serve as an identifier for the shape of an object and as such a very important ingredient for V8's optimizing compiler and inline caches. The optimizing compiler for instance can directly inline property accesses if it can ensure a compatible objects structure through the HiddenClass.

Let's have a look at the important parts of a HiddenClass.


In V8 the first field of a JavaScript object points to a HiddenClass. (In fact, this is the case for any object that is on the V8 heap and managed by the garbage collector.) In terms of properties, the most important information is the third bit field, which stores the number of properties, and a pointer to the descriptor array. The descriptor array contains information about named properties like the name itself and the position where the value is stored. Note that we do not keep track of integer indexed properties here, hence there is no entry in the descriptor array.

The basic assumption about HiddenClasses is that objects with the same structure — e.g. the same named properties in the same order — share the same HiddenClass. To achieve that we use a different HiddenClass when a property gets added to an object. In the following example we start from an empty object and add three named properties.


Every time a new property is added, the object's HiddenClass is changed. In the background V8 creates a transition tree that links the HiddenClasses together. V8 knows which HiddenClass to take when you add, for instance, the property "a" to an empty object. This transition tree makes sure you end up with the same final HiddenClass if you add the same properties in the same order. The following example shows that we would follow the same transition tree even if we add simple indexed properties in between.


However, if we create a new object that gets a different property added, in this case property "d," V8 creates a separate branch for the new HiddenClasses.


Takeway from this section:
  • Objects with the same structure (same properties in the same order) have the same HiddenClass
  • By default every new named property added causes a new HiddenClass to be created.
  • Adding array-indexed properties does not create new HiddenClasses.

The Three Different Kinds of Named Properties

After giving an overview on how V8 uses HiddenClasses to track the shape of objects let’s dive into how these properties are actually stored. As explained in the introduction above, there are two fundamental kind of properties: named and indexed. The following section covers named properties.

A simple object such as {a: 1, b: 2} can have various internal representations in V8. While JavaScript objects behave more or less like simple dictionaries from the outside, V8 tries to avoid dictionaries because they hamper certain optimizations such as inline caches which we will explain in a separate post. 

In-object vs. Normal Properties: V8 supports so-called in-object properties which are stored directly on the object themselves. These are the fastest properties available in V8 as they are accessible without any indirection. The number of in-object properties is predetermined by the initial size of the object. If more properties get added than there is space in the object, they are stored in the properties store. The properties store adds one level of indirection but can be grown independently.



Fast vs. Slow Properties: The next important distinction is between fast and slow properties. Typically we define the properties stored in the linear properties store as "fast".  Fast properties are simply accessed by index in the properties store. To get from the name of the property to the actual position in the properties store, we have to consult the descriptor array on the HiddenClass, as we've outlined before.


However, if many properties get added and deleted from an object, it can generate a lot of time and memory overhead to maintain the descriptor array and HiddenClasses. Hence, V8 also supports so-called slow properties. An object with slow properties has a self-contained dictionary as a properties store. All the properties meta information is no longer stored in the descriptor array on the HiddenClass but directly in the properties dictionary. Hence, properties can be added and removed without updating the HiddenClass. Since inline caches don’t work with dictionary properties, the latter, are typically slower than fast properties.

Takeaway from this section:
  • There are three different named property types: in-object, fast and slow/dictionary.
    1. In-object properties are stored directly on the object itself and provide the fastest access.
    2. Fast properties live in the properties store, all the meta information is stored in the descriptor array on the HiddenClass.
    3. Slow properties live in a self-contained properties dictionary, meta information is no longer shared through the HiddenClass.
  • Slow properties allow for efficient property removal and addition but are slower to access than the other two types.

Elements or array-indexed Properties

So far we have looked at named properties and ignored integer indexed properties commonly used with arrays. Handling of integer indexed properties is no less complex than named properties. Even though all indexed properties are always kept separately in the elements store, there are 20 different types of elements!

Packed or Holey Elements: The first major distinction V8 makes is whether the elements backing store is packed or has holes in it. You get holes in a backing store if you delete an indexed element, or for instance, you don't define it. A simple example is [1,,3] where the second entry is a hole. The following example illustrates this issue:
const o = ["a", "b", "c"];
console.log(o[1]);          // Prints "b".

delete o[1];                // Introduces a hole in the elements store.
console.log(o[1]);          // Prints "undefined"; property 1 does not exist.
o.__proto__ = {1: "B"};     // Define property 1 on the prototype.

console.log(o[0]);          // Prints "a".
console.log(o[1]);          // Prints "B".
console.log(o[2]);          // Prints "c".
console.log(o[3]);          // Prints undefined

In short, if a property is not present on the receiver we have to keep on looking on the prototype chain. Given that elements are self-contained, e.g. we don't store information about present indexed properties on the HiddenClass, we need a special value, called the_hole, to mark properties that are not present. This is crucial for the performance of Array functions. If we know that there are no holes, i.e. the elements store is packed, we can perform local operations without expensive lookups on the prototype chain.

Fast or Dictionary Elements: The second major distinction made on elements is whether they are fast or dictionary-mode. Fast elements are simple VM-internal arrays where the property index maps to the index in the elements store. However, this simple representation is rather wasteful for very large sparse/holey arrays where only few entries are occupied. In this case we used a dictionary-based representation to save memory at the cost of slightly slower access:
const sparseArray = [];
sparseArray[9999] = "foo"; // Creates an array with dictionary elements.
In this example, allocating a full array with 10k entries would be rather wasteful. What happens instead is that V8 creates a dictionary where we store a key-value-descriptor triplets. The key in this case would be 9999 and the value "foo" and the default descriptor is used. Given that we don't have a way to store descriptor details on the HiddenClass, V8 resorts to slow elements whenever you define an indexed properties with a custom descriptor:
const array = [];
Object.defineProperty(array, 0, {value: "fixed", configurable: false});
console.log(array[0]);      // Prints "fixed".
array[0] = "other value";   // Cannot override index 0.
console.log(array[0]);      // Still prints "fixed".
In this example we added a non-configurable property on the array. This information is stored in the descriptor part of a slow elements dictionary triplet. It is important to note that Array functions perform considerably slower on objects with slow elements.

Smi and Double Elements: For fast elements there is another important distinction made in V8. For instance if you only store integers in an Array, a common use-case, the GC does not have to look at the array, as integers are directly encoded as so called small integers (Smis) in place. Another special case are Arrays that only contain doubles. Unlike Smis, floating point numbers are usually represented as full objects occupying several words. However, V8 stores raw doubles for pure double arrays to avoid memory and performance overhead. The following example lists 4 examples of Smi and double elements:
const a1 = [1,   2, 3];  // Smi Packed
const a2 = [1,    , 3];  // Smi Holey, a2[1] reads from the prototype
const b1 = [1.1, 2, 3];  // Double Packed
const b2 = [1.1,  , 3];  // Double Holey, b2[1] reads from the prototype

Special Elements: With the information so far we covered 7 out of the 20 different element kinds. For simplicity we excluded 9 element kinds for TypedArrays, two more for String wrappers and last but not least, two more special element kinds for arguments objects.

The ElementsAccessor: As you can imagine we are not exactly keen on writing Array functions 20 times in C++, once for every elements kind. That's where some C++ magic comes into play. Instead of implementing Array functions over and over again, we built the ElementsAccessor where we mostly have to implement only simple functions that access elements from the backing store. The ElementsAccessor relies on CRTP to create specialized versions of each Array function. So if you call something like slice on a array, V8 internally calls a builtin written in C++ and dispatches through the ElementsAccessor to the specialized version of the function:


Takeaway from this section:
  • There are fast and dictionary-mode indexed properties and elements.
  • Fast properties can be packed or they can can contain holes which indicate that an indexed property has been deleted.
  • Elements are specialized on their content to speed up Array functions and reduce GC overhead.


Understanding how properties work is key to many optimizations in V8. For JavaScript developers many of these internal decisions are not visible directly, but they explain why certain code patterns are faster than others. Changing the property or element type typically causes V8 to create a different HiddenClass which can lead to type pollution which prevents V8 from generating optimal code. Stay tuned for further posts on how the VM-internals of V8 work.

Posted by Camillo Bruni (@camillobruni), also author of “Fast for-in