Base64 encoding and decoding in JavaScript

Base64 encoding and decoding in JavaScript

better implementation of base64 encode & decode

JavaScript Notes

2019.11.30

0 👣 #base64

Base64 encoding schemes are commonly used when there is a need to encode binary data that needs to be stored and transferred over media that are designed to deal with textual data. This is to ensure that the data remain intact without modification during transport. Base64 is commonly used in a number of applications including email via MIME, and storing complex data in XML.

In JavaScript there are two functions respectively for decoding and encoding base64 strings:

  • atob() - decodes a string of data which has been encoded using base-64 encoding
  • btoa() - creates a base-64 encoded ASCII string from a “string” of binary data

RFC 4648 specifies the base64 algorithm in section 4, and also defines an alternate base64url algorithm for URLs in section 5 (which is not the one used by atob/btoa).

Encoded size increase

Each Base64 digit represents exactly 6 bits of data. So, three 8-bits bytes of the input string/binary file (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits).

This means that the Base64 version of a string or file will be at most 133% the size of its source (a ~33% increase). The increase may be larger if the encoded data is small. For example, the string "a" with length === 1 gets encoded to "YQ==" with length === 4 — a 300% increase.

The “Unicode Problem”

Since DOMStrings are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00~0xFF). Here follow five possible methods to solve this problem.

JavaScript’s UTF-16 => base64

A very fast and widely useable way to solve the unicode problem is by encoding JavaScript native UTF-16 strings directly into base64. Please visit the URL:

data:text/plain;charset=utf-16;base64,OCY5JjomOyY8Jj4mPyY=

for a demonstration (copy the data uri, open a new tab, paste the data URI into the address bar, then press enter to go to the page).

This method is particularly efficient because it does not require any type of conversion, except mapping a string into an array. The following code is also useful to get an ArrayBuffer from a Base64 string and/or viceversa.

"use strict";

/* Array of bytes to base64 string decoding */

function b64ToUint6 (nChr) {

  return nChr > 64 && nChr < 91 ?
      nChr - 65
    : nChr > 96 && nChr < 123 ?
      nChr - 71
    : nChr > 47 && nChr < 58 ?
      nChr + 4
    : nChr === 43 ?
      62
    : nChr === 47 ?
      63
    :
      0;

}

function base64DecToArr (sBase64, nBlockSize) {

  var sB64Enc = sBase64.replace(/[^A-Za-z0-9\+\/]/g, ""), nInLen = sB64Enc.length,
      nOutLen = nBlockSize ? Math.ceil((nInLen * 3 + 1 >>> 2) / nBlockSize) * nBlockSize : nInLen * 3 + 1 >>> 2,
      aBytes = new Uint8Array(nOutLen);

  for (var nMod3, nMod4, nUint24 = 0, nOutIdx = 0, nInIdx = 0; nInIdx < nInLen; nInIdx++) {
    nMod4 = nInIdx & 3;
    nUint24 |= b64ToUint6(sB64Enc.charCodeAt(nInIdx)) << 18 - 6 * nMod4;
    if (nMod4 === 3 || nInLen - nInIdx === 1) {
      for (nMod3 = 0; nMod3 < 3 && nOutIdx < nOutLen; nMod3++, nOutIdx++) {
        aBytes[nOutIdx] = nUint24 >>> (16 >>> nMod3 & 24) & 255;
      }
      nUint24 = 0;
    }
  }

  return aBytes;
}

/* Base64 string to array encoding */

function uint6ToB64 (nUint6) {

  return nUint6 < 26 ?
      nUint6 + 65
    : nUint6 < 52 ?
      nUint6 + 71
    : nUint6 < 62 ?
      nUint6 - 4
    : nUint6 === 62 ?
      43
    : nUint6 === 63 ?
      47
    :
      65;

}

function base64EncArr (aBytes) {

  var eqLen = (3 - (aBytes.length % 3)) % 3, sB64Enc = "";

  for (var nMod3, nLen = aBytes.length, nUint24 = 0, nIdx = 0; nIdx < nLen; nIdx++) {
    nMod3 = nIdx % 3;

    /* Uncomment the following line in order to split the output in lines 76-character long: */
    //if (nIdx > 0 && (nIdx * 4 / 3) % 76 === 0) { sB64Enc += "\r\n"; }

    nUint24 |= aBytes[nIdx] << (16 >>> nMod3 & 24);
    if (nMod3 === 2 || aBytes.length - nIdx === 1) {
      sB64Enc += String.fromCharCode(uint6ToB64(nUint24 >>> 18 & 63), uint6ToB64(nUint24 >>> 12 & 63), uint6ToB64(nUint24 >>> 6 & 63), uint6ToB64(nUint24 & 63));
      nUint24 = 0;
    }
  }

  return  eqLen === 0 ? sB64Enc : sB64Enc.substring(0, sB64Enc.length - eqLen) + (eqLen === 1 ? "=" : "==");

}

Tests:

var myString = "☸☹☺☻☼☾☿";

/* Part 1: Encode `myString` to base64 using native UTF-16 */

var aUTF16CodeUnits = new Uint16Array(myString.length);
Array.prototype.forEach.call(aUTF16CodeUnits, function (el, idx, arr) { arr[idx] = myString.charCodeAt(idx); });
var sUTF16Base64 = base64EncArr(new Uint8Array(aUTF16CodeUnits.buffer));

/* Show output */

alert(sUTF16Base64); // "OCY5JjomOyY8Jj4mPyY="

/* Part 2: Decode `sUTF16Base64` to UTF-16 */

var sDecodedString = String.fromCharCode.apply(null, new Uint16Array(base64DecToArr(sUTF16Base64, 2).buffer));

/* Show output */

alert(sDecodedString); // "☸☹☺☻☼☾☿"

The produced string is fully portable, although represented as UTF-16.

Decode a Base64 string to Uint8Array or ArrayBuffer

The functions above let us also create uint8Arrays or arrayBuffers from base64-encoded strings:

var myArray = base64DecToArr("QmFzZSA2NCDigJQgTW96aWxsYSBEZXZlbG9wZXIgTmV0d29yaw=="); // "Base 64 \u2014 Mozilla Developer Network" (as UTF-8)

var myBuffer = base64DecToArr("QmFzZSA2NCDigJQgTW96aWxsYSBEZXZlbG9wZXIgTmV0d29yaw==").buffer; // "Base 64 \u2014 Mozilla Developer Network" (as UTF-8)

alert(myBuffer.byteLength);

Note: The function base64DecToArr(sBase64[, *nBlockSize*]) returns an uint8Array of bytes. If your aim is to build a buffer of 16-bit / 32-bit / 64-bit raw data, use the nBlockSize argument, which is the number of bytes which the uint8Array.buffer.bytesLength property must result to be a multiple of (1 or omitted for ASCII, binary content, binary strings, UTF-8-encoded strings; 2 for UTF-16 strings; 4 for UTF-32 strings).

For a more complete library, see StringView – a C-like representation of strings based on typed arrays (source code available on GitHub).

JavaScript’s UTF-16 => UTF-8 => base64

This solution consists in converting a JavaScript’s native UTF-16 string into a UTF-8 string and then encoding the latter into base64. This also grants that converting a pure ASCII string to base64 always produces the same output as the native btoa().

"use strict";

/* Array of bytes to base64 string decoding */

function b64ToUint6 (nChr) {

  return nChr > 64 && nChr < 91 ?
      nChr - 65
    : nChr > 96 && nChr < 123 ?
      nChr - 71
    : nChr > 47 && nChr < 58 ?
      nChr + 4
    : nChr === 43 ?
      62
    : nChr === 47 ?
      63
    :
      0;

}

function base64DecToArr (sBase64, nBlockSize) {

  var
    sB64Enc = sBase64.replace(/[^A-Za-z0-9\+\/]/g, ""), nInLen = sB64Enc.length,
    nOutLen = nBlockSize ? Math.ceil((nInLen * 3 + 1 >>> 2) / nBlockSize) * nBlockSize : nInLen * 3 + 1 >>> 2, aBytes = new Uint8Array(nOutLen);

  for (var nMod3, nMod4, nUint24 = 0, nOutIdx = 0, nInIdx = 0; nInIdx < nInLen; nInIdx++) {
    nMod4 = nInIdx & 3;
    nUint24 |= b64ToUint6(sB64Enc.charCodeAt(nInIdx)) << 18 - 6 * nMod4;
    if (nMod4 === 3 || nInLen - nInIdx === 1) {
      for (nMod3 = 0; nMod3 < 3 && nOutIdx < nOutLen; nMod3++, nOutIdx++) {
        aBytes[nOutIdx] = nUint24 >>> (16 >>> nMod3 & 24) & 255;
      }
      nUint24 = 0;
    }
  }

  return aBytes;
}

/* Base64 string to array encoding */

function uint6ToB64 (nUint6) {

  return nUint6 < 26 ?
      nUint6 + 65
    : nUint6 < 52 ?
      nUint6 + 71
    : nUint6 < 62 ?
      nUint6 - 4
    : nUint6 === 62 ?
      43
    : nUint6 === 63 ?
      47
    :
      65;

}

function base64EncArr (aBytes) {

  var eqLen = (3 - (aBytes.length % 3)) % 3, sB64Enc = "";

  for (var nMod3, nLen = aBytes.length, nUint24 = 0, nIdx = 0; nIdx < nLen; nIdx++) {
    nMod3 = nIdx % 3;
    /* Uncomment the following line in order to split the output in lines 76-character long: */
    /*
    if (nIdx > 0 && (nIdx * 4 / 3) % 76 === 0) { sB64Enc += "\r\n"; }
    */
    nUint24 |= aBytes[nIdx] << (16 >>> nMod3 & 24);
    if (nMod3 === 2 || aBytes.length - nIdx === 1) {
      sB64Enc += String.fromCharCode(uint6ToB64(nUint24 >>> 18 & 63), uint6ToB64(nUint24 >>> 12 & 63), uint6ToB64(nUint24 >>> 6 & 63), uint6ToB64(nUint24 & 63));
      nUint24 = 0;
    }
  }

  return  eqLen === 0 ?
      sB64Enc
    :
      sB64Enc.substring(0, sB64Enc.length - eqLen) + (eqLen === 1 ? "=" : "==");

}

/* UTF-8 array to DOMString and vice versa */

function UTF8ArrToStr (aBytes) {

  var sView = "";

  for (var nPart, nLen = aBytes.length, nIdx = 0; nIdx < nLen; nIdx++) {
    nPart = aBytes[nIdx];
    sView += String.fromCharCode(
      nPart > 251 && nPart < 254 && nIdx + 5 < nLen ? /* six bytes */
        /* (nPart - 252 << 30) may be not so safe in ECMAScript! So...: */
        (nPart - 252) * 1073741824 + (aBytes[++nIdx] - 128 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
      : nPart > 247 && nPart < 252 && nIdx + 4 < nLen ? /* five bytes */
        (nPart - 248 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
      : nPart > 239 && nPart < 248 && nIdx + 3 < nLen ? /* four bytes */
        (nPart - 240 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
      : nPart > 223 && nPart < 240 && nIdx + 2 < nLen ? /* three bytes */
        (nPart - 224 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
      : nPart > 191 && nPart < 224 && nIdx + 1 < nLen ? /* two bytes */
        (nPart - 192 << 6) + aBytes[++nIdx] - 128
      : /* nPart < 127 ? */ /* one byte */
        nPart
    );
  }

  return sView;

}

function strToUTF8Arr (sDOMStr) {

  var aBytes, nChr, nStrLen = sDOMStr.length, nArrLen = 0;

  /* mapping... */

  for (var nMapIdx = 0; nMapIdx < nStrLen; nMapIdx++) {
    nChr = sDOMStr.charCodeAt(nMapIdx);
    nArrLen += nChr < 0x80 ? 1 : nChr < 0x800 ? 2 : nChr < 0x10000 ? 3 : nChr < 0x200000 ? 4 : nChr < 0x4000000 ? 5 : 6;
  }

  aBytes = new Uint8Array(nArrLen);

  /* transcription... */

  for (var nIdx = 0, nChrIdx = 0; nIdx < nArrLen; nChrIdx++) {
    nChr = sDOMStr.charCodeAt(nChrIdx);
    if (nChr < 128) {
      /* one byte */
      aBytes[nIdx++] = nChr;
    } else if (nChr < 0x800) {
      /* two bytes */
      aBytes[nIdx++] = 192 + (nChr >>> 6);
      aBytes[nIdx++] = 128 + (nChr & 63);
    } else if (nChr < 0x10000) {
      /* three bytes */
      aBytes[nIdx++] = 224 + (nChr >>> 12);
      aBytes[nIdx++] = 128 + (nChr >>> 6 & 63);
      aBytes[nIdx++] = 128 + (nChr & 63);
    } else if (nChr < 0x200000) {
      /* four bytes */
      aBytes[nIdx++] = 240 + (nChr >>> 18);
      aBytes[nIdx++] = 128 + (nChr >>> 12 & 63);
      aBytes[nIdx++] = 128 + (nChr >>> 6 & 63);
      aBytes[nIdx++] = 128 + (nChr & 63);
    } else if (nChr < 0x4000000) {
      /* five bytes */
      aBytes[nIdx++] = 248 + (nChr >>> 24);
      aBytes[nIdx++] = 128 + (nChr >>> 18 & 63);
      aBytes[nIdx++] = 128 + (nChr >>> 12 & 63);
      aBytes[nIdx++] = 128 + (nChr >>> 6 & 63);
      aBytes[nIdx++] = 128 + (nChr & 63);
    } else /* if (nChr <= 0x7fffffff) */ {
      /* six bytes */
      aBytes[nIdx++] = 252 + (nChr >>> 30);
      aBytes[nIdx++] = 128 + (nChr >>> 24 & 63);
      aBytes[nIdx++] = 128 + (nChr >>> 18 & 63);
      aBytes[nIdx++] = 128 + (nChr >>> 12 & 63);
      aBytes[nIdx++] = 128 + (nChr >>> 6 & 63);
      aBytes[nIdx++] = 128 + (nChr & 63);
    }
  }

  return aBytes;

}

Tests:

/* Tests */

var sMyInput = "Base 64 \u2014 Mozilla Developer Network";

var aMyUTF8Input = strToUTF8Arr(sMyInput);

var sMyBase64 = base64EncArr(aMyUTF8Input);

alert(sMyBase64); // "QmFzZSA2NCDigJQgTW96aWxsYSBEZXZlbG9wZXIgTmV0d29yaw=="

var aMyUTF8Output = base64DecToArr(sMyBase64);

var sMyOutput = UTF8ArrToStr(aMyUTF8Output);

alert(sMyOutput); // "Base 64 — Mozilla Developer Network"

JavaScript’s UTF-16 => binary string => base64

The following is the fastest and most compact possible approach. The output is exactly the same produced by the first solution (JavaScript’s UTF-16 => base64), but instead of rewriting atob() and btoa() it uses the native ones. This is made possible by the fact that instead of using typed arrays as encoding/decoding inputs this solution uses binary strings as an intermediate format. It is a “dirty” workaround in comparison to JavaScript’s UTF-16 => base64 (binary strings are a grey area), however it works pretty well and requires only a few lines of code.

"use strict";

function btoaUTF16 (sString) {

  var aUTF16CodeUnits = new Uint16Array(sString.length);
  Array.prototype.forEach.call(aUTF16CodeUnits, function (el, idx, arr) { arr[idx] = sString.charCodeAt(idx); });
  return btoa(String.fromCharCode.apply(null, new Uint8Array(aUTF16CodeUnits.buffer)));

}

function atobUTF16 (sBase64) {

  var sBinaryString = atob(sBase64), aBinaryView = new Uint8Array(sBinaryString.length);
  Array.prototype.forEach.call(aBinaryView, function (el, idx, arr) { arr[idx] = sBinaryString.charCodeAt(idx); });
  return String.fromCharCode.apply(null, new Uint16Array(aBinaryView.buffer));

}

Tests:

var myString = "☸☹☺☻☼☾☿";

/* Part 1: Encode `myString` to base64 using native UTF-16 */

var sUTF16Base64 = btoaUTF16(myString);

/* Show output */

alert(sUTF16Base64); // "OCY5JjomOyY8Jj4mPyY="

/* Part 2: Decode `sUTF16Base64` to UTF-16 */

var sDecodedString = atobUTF16(sUTF16Base64);

/* Show output */

alert(sDecodedString); // "☸☹☺☻☼☾☿"

Escaping the string before encoding it

function b64EncodeUnicode(str) {
  // first we use encodeURIComponent to get percent-encoded UTF-8,
  // then we convert the percent encodings into raw bytes which
  // can be fed into btoa.
  return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
      function toSolidBytes(match, p1) {
          return String.fromCharCode('0x' + p1);
  }));
}

b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="

To decode the Base64-encoded value back into a String:

function b64DecodeUnicode(str) {
  // Going backwards: from bytestream, to percent-encoding, to original string.
  return decodeURIComponent(atob(str).split('').map(function(c) {
      return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
  }).join(''));
}

b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"

Unibabel implements common conversions using this strategy.

Rewrite the DOMs atob() and btoa() using JavaScript’s TypedArrays and UTF-8

Use a TextEncoder polyfill such as TextEncoding (also includes legacy Windows, Mac, and ISO encodings), TextEncoderLite, combined with a Buffer and a Base64 implementation such as base64-js or TypeScript version of base64-js for both modern browsers and Node.js.

When a native TextEncoder implementation is not available, the most light-weight solution would be to use JavaScript’s UTF-16 => binary string => base64 because in addition to being much faster, this solution also works in IE9 “out of the box.” Alternatively, use TextEncoderLite with base64-js. Use the browser implementation when you can.

The following function implements such a strategy. It assumes base64-js imported as <script type="text/javascript" src="base64js.min.js"/>. Note that TextEncoderLite only works with UTF-8.

function Base64Encode(str, encoding = 'utf-8') {
  var bytes = new (typeof TextEncoder === "undefined" ? TextEncoderLite : TextEncoder)(encoding).encode(str);
  return base64js.fromByteArray(bytes);
}

function Base64Decode(str, encoding = 'utf-8') {
  var bytes = base64js.toByteArray(str);
  return new (typeof TextDecoder === "undefined" ? TextDecoderLite : TextDecoder)(encoding).decode(bytes);
}

Note: TextEncoderLite parse 4 bytes UTF-8 character incorrectly, such as \uD842\uDFB7 or \u{20BB7} in short, see issue.

Alternatively, use text-encoding instead.

In some cases, the above conversion to UTF-8 and then to Base64 will not be very space efficient. UTF-8 produces longer output than UTF-16 when the text contains a large percentage of characters in the range U+0800-U+FFFF, which are encoded with three bytes in UTF-8 but two in UTF-16. In the case where the JavaScript string contains evenly-distributed UTF-16 code points, one might consider encoding to UTF-16 instead of UTF-8 before the conversion to Base64, for a 40% reduction in size.

THE END
Ads by Google

林宏

Frank Lin, PhD

Hey, there! This is Frank Lin (@flinhong), one of the 1.41 billion . This 'inDev. Journal' site holds the exploration of my quirky thoughts and random adventures through life. Hope you enjoy reading and perusing my posts.

YOU MAY ALSO LIKE

Using Liquid in Jekyll - Live with Demos

Web Notes

2016.08.20

Using Liquid in Jekyll - Live with Demos

Liquid is a simple template language that Jekyll uses to process pages for your site. With Liquid you can output complex contents without additional plugins.

Setup an IKEv2 server with strongSwan

Tutorials

2020.01.09

Setup an IKEv2 server with strongSwan

IKEv2, or Internet Key Exchange v2, is a protocol that allows for direct IPSec tunnelling between two points. In IKEv2 implementations, IPSec provides encryption for the network traffic. IKEv2 is natively supported on some platforms (OS X 10.11+, iOS 9.1+, and Windows 10) with no additional applications necessary, and it handles client hiccups quite smoothly.

Practising closures in JavaScript

JavaScript Notes

2018.12.17

Practising closures in JavaScript

JavaScript is a very function-oriented language. As we know, functions are first class objects and can be easily assigned to variables, passed as arguments, returned from another function invocation, or stored into data structures. A function can access variable outside of it. But what happens when an outer variable changes? Does a function get the most recent value or the one that existed when the function was created? Also, what happens when a function invoked in another place - does it get access to the outer variables of the new place?