OpenSSL and Breaking UTF-8 Change (fixed in Node v0.8.27 and v0.10.29)
Today we are releasing new versions of Node:
First and foremost these releases address the current OpenSSL vulnerability CVE-2014-0224, for both 0.8 and 0.10 we've upgraded the version of the bundled OpenSSL to their fixed versions v1.0.0m and v1.0.1h respectively.
Additionally these releases address the fact that V8 UTF-8 encoding would allow
unmatched surrogate pairs. That is to say, previously you could construct a
Buffer as UTF-8, send and consume that string in another process and it would
fail to interpret because the UTF-8 string was invalid.
Note, the results encoded by V8 in this case are exactly what was passed into the encoding routine. There is no overflow, underflow, or the inclusion of other arbitrary memory, merely an unmatched UTF-8 surrogate resulting in invalid UTF-8.
As of these releases, if you try and pass a string with an unmatched surrogate
pair, Node will replace that character with the unknown unicode character
(U+FFFD). To preserve the old behavior set the environment variable
NODE_INVALID_UTF8 to anything (even nothing). If the environment variable is
present at all it will revert to the old behavior.
This breaks backward compatibility for the specific reason that unsanitized
strings sent as a text payload for an RFC compliant WebSocket implementation
should result in the disconnection of the client. If the client attempts to
reconnect and receives another invalid payload it must disconnect again. If
there is no logic to handle the reconnection attempts, this may lead to a
denial of service attack. For instance
socket.io attempts to reconnect by
// Prior to these releases: new Buffer('ab\ud800cd', 'utf8'); // <Buffer 61 62 ed a0 80 63 64> // After this release: new Buffer('ab\ud800cd', 'utf8'); // <Buffer 61 62 ef bf bd 63 64> // This is an explicit conversion to a Buffer, but the implicit // .write('ab\ud800cd') also results in the same pattern websocket.write(new Buffer('ab\ud800cd', 'utf8')); // This would result in the client disconnecting.
Node's default encoding for strings is
UTF-8, so even if you're not
Buffers out of strings, Node may be doing so under the
hood. If what you're passing is not actually
UTF-8 then when you call
.write(str) you could be specific and say
.write(str, 'binary') which
signals Node to pass the string through without interpreting it.
Thanks to Node.js alum Felix Geisendörfer for finding, getting the fixes upstreamed, and helping with the testing and mitigation. Also for helping to inform and improve the process for Node.js security issues.
To float these fixes in your own builds you can apply the following patch with