skip to Main Content

In v8 / Node.js in particular, when you push a primitive type (string, number, boolean) into an array, does it clone the string, or store a reference?

I know you can’t do this and change the string:

let array = []
let x = 'foo'
array.push(x)
x = 'bar'
console.log(array) //=> ['foo']

But if I do this, is it copying the string multiple times (thereby increasing the memory footprint)?

let array = []
let x = 'foo'
array.push(x)
array.push(x)
array.push(x)
...

Same question for object keys, if I do this will it clone the string?

let object = {}
let x = 'foo'
object.a = x
object.b = x
object.c = x

I searched around a bit but didn’t find a direct answer to this question.

Do objects pushed into an array in javascript deep or shallow copy?

This blog post says:

Objects and arrays are pushed as a pointer to the original object. Built-in primitive types like numbers or booleans are pushed as a copy.

But I’m not sure if that is correct (it is not backed up). I would have to run a bunch of thorough tests to really check and see if memory grows when I push into an array or not. I’m not quite sure the easiest way to accomplish that, so perhaps a v8 engineer or someone else well-versed in the compiler theory knows how this is implemented.

I want to use Buffer.byteLength(text, 'utf8') to calculate the size of each string I am adding to a trie, and then keep track of the rough size of the trie (summing the string sizes used in it, and rough guestimating the bytes used to store n number of object properties and x-length arrays). So first step is understanding, will my string get copied when I push it into multiple places, or will the same reference be carried around in each place?

I would hope that blog is incorrect and that it pushes a reference, it’s just that you can’t modify a variable once it’s been sent to another function. But the string is still a reference, until you try and change the variable, something like that.

3

Answers


  1. I’ve built a test framework

    When you run this, you will see the a and c are passed by reference (shallow copy) whereas b is passed by value

    var a = {'a' : 0}
    var b = 0;
    var c = [0];
    
    function foo(x) {
      return ++x.a;
    }
    
    function foo2(x) {
      return ++x;
    }
    
    function foo3(x) {
      x.push(1);
      return x;
    }
    
    
    console.log(a);
    foo(a);
    console.log(a);
    
    console.log(b);
    foo2(b);
    console.log(b);
    
    console.log(c);
    foo3(c);
    console.log(c);
    Login or Signup to reply.
  2. JavaScript VMs will never copy strings if they can avoid it. In this case, it’s trivial to not copy the string.

    If you truly wish to copy strings, you need to go through shenanigans, such as converting them to other encodings and back or splitting them and concatenating them back. If my memory serves correctly, last time I checked, once strings were copied, VMs didn’t try to deduplicate them.

    Source: Used to work on SpiderMonkey.

    Login or Signup to reply.
  3. (V8 developer here.)

    When storing a string in an array (or anywhere, really), the string is not copied. Neither are booleans.

    For numbers, it depends: they are usually also stored as references, except for certain optimized cases where there are more efficient alternatives.

    The reason is twofold:
    (1) There is no need to clone strings.
    (2) It is simpler and faster not to clone strings.

    The snippet you quoted is plain wrong as far as implementation details are concerned. One could argue that it is not entirely incorrect as far as observable semantics are concerned: your program’s behavior cannot tell whether a copy of the string was stored, or just another reference to it. (But of course that just makes the whole statement meaningless: if objects are stored as references, and for primitives we cannot tell the difference, why not simply assume that everything is stored as reference?)

    As a rule of thumb: VMs for dynamic languages like JavaScript treat everything as a reference, except for whichever special cases they choose to optimize (typically some definition of number; search for the terms "smi-tagging" and "nan-boxing" if you want to dig deeper).
    Whether a value is a "primitive" or not only affects whether it has object identity:

    {foo: 42} === {foo: 42}  // false, objects have identity
    42 === 42                // true, numbers have no identity
    "foo" === "foo"          // true, strings have no identity
    

    Being a primitive does not affect how a value is stored in arrays/objects/variables/whatever, nor where it is allocated (a related myth I sometimes see is "primitives are allocated on the stack" — nope, they are not).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search