skip to Main Content

I found some code online that I am trying to work through which encodes to base64. I know Python has base64.urlsafe_b64decode() but I would like to learn a bit more about what is going on.

The JS atob looks like:

function atob (input) {
  var chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';
  var str = String(input).replace(/=+$/, '');
  if (str.length % 4 == 1) {
    throw new InvalidCharacterError("'atob' failed: The string to be decoded is not correctly encoded.");
  }
  for (
    // initialize result and counters
    var bc = 0, bs, buffer, idx = 0, output = '';
    // get next character
    buffer = str.charAt(idx++);
    // character found in table? initialize bit storage and add its ascii value;
    ~buffer && (bs = bc % 4 ? bs * 64 + buffer : buffer,
      // and if not first of each 4 characters,
      // convert the first 8 bits to one ascii character
      bc++ % 4) ? output += String.fromCharCode(255 & bs >> (-2 * bc & 6)) : 0
  ) {
    // try to find character in table (0-63, not found => -1)
    buffer = chars.indexOf(buffer);
  }
  return output;
}

My goal is to port this Python, but I am trying to understand what the for loop is doing in Javascript.

It checks if the value is located in the chars table and then initializes some variables using a ternary like: bs = bc % 4 ? bs*64+buffer: buffer, bc++ %4

I am not quite sure I understand what the buffer, bc++ % 4 part of the ternary is doing. The comma confuses me a bit. Plus the String.fromCharCode(255 & (bs >> (-2 * bc & 6))) is a bit esoteric to me.

I’ve been trying something like this in Python, which produces some results, albeit different than what the javascript implementation is doing

# Test subject
b64_str: str = "fwHzODWqgMH+NjBq02yeyQ=="
    
# Lookup table for characters
chars: str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="

# Replace right padding with empty string
replaced = re.sub("=+$", '', b64_str)
if len(replaced) % 4 == 1:
    raise ValueError("atob failed. The string to be decoded is not valid base64")

# Bit storage and counters
bc = 0
out: str = ''
for i in replaced:

    # Get ascii value of character
    buffer = ord(i)

    # If counter is evenly divisible by 4, return buffer as is, else add the ascii value
    bs = bc * 64 + buffer if bc % 4 else buffer
    bc += 1 % 4 # Not sure I understand this part
    
    # Check if character is in the chars table
    if i in chars:

        # Check if the bit storage and bit counter are non-zero
        if bs and bc:
            # If so, convert the first 8 bits to an ascii character
            out += chr(255 & bs >> (-2 * bc & 6))
        else:
            out = 0
            
    # Set buffer to the index of where the first instance of the character is in the b64 string
    print(f"before: {chr(buffer)}")
    buffer = chars.index(chr(buffer))
    print(f"after: {buffer}")
    
print(out)

JS gives ó85ªÁþ60jÓlÉ

Python gives 2:u1(²ë:ð1G>%Y

2

Answers


  1. I’ve been trying something like this in Python, which produces some
    results, albeit different than what the javascript implementation is
    doing

    First step would be determining if either implementation works right, RFC4648 contains Tests Vectors for that purpose

    BASE64("") = ""
    BASE64("f") = "Zg=="
    BASE64("fo") = "Zm8="
    BASE64("foo") = "Zm9v"
    BASE64("foob") = "Zm9vYg=="
    BASE64("fooba") = "Zm9vYmE="
    BASE64("foobar") = "Zm9vYmFy"
    

    If one implementation works correctly you should determine what is causing difference, otherwise you might attempt to implement base64decode based on description contained in mentioned RFC4648.

    Login or Signup to reply.
    • The loop processes each character in chunks of four, converting each Base64 character back into its binary form.
    • bc helps keep track of where we are in these 24-bit groups.
    • bs accumulates the bits from the Base64 characters, and output builds the decoded string by converting 8-bit chunks of bs to characters.
    • The ternary operation and bitwise shifts are used to manipulate and extract the correct bits from the Base64 data.

    Here is a tested version
    https://www.online-python.com/PiseKNFuaO

    import base64
    
    class InvalidCharacterError(Exception):
        pass
    
    def atob(input_str):
        chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='
        input_str = str(input_str).rstrip('=')
        
        if len(input_str) % 4 == 1:
            raise InvalidCharacterError("'atob' failed: The string to be decoded is not correctly encoded.")
        
        output = []
        bc = 0
        bs = 0
        buffer = 0
        
        for char in input_str:
            buffer = chars.find(char)
            
            if buffer == -1:
                raise InvalidCharacterError("'atob' failed: The string to be decoded contains an invalid character.")
            
            bs = (bs << 6) + buffer
            bc += 6
            
            if bc >= 8:
                bc -= 8
                output.append(chr((bs >> bc) & 255))
        
        return ''.join(output)
    
    # Compare with Python's built-in Base64 decoding
    def test_atob():
        test_strings = [
            "SGVsbG8gd29ybGQ=",  # "Hello world"
            "U29mdHdhcmUgRW5naW5lZXJpbmc=", # "Software Engineering"
            "VGVzdGluZyAxMjM=", # "Testing 123"
            "SGVsbG8gd29ybGQ==",  # "Hello world" with extra padding
            "SGVsbG8gd29ybGQ= ",  # "Hello world" with trailing space (invalid)
            "SGVsbG8gd29ybGQrn",  # "Hello world" with newline characters (invalid)
            "Invalid!!==",  # Invalid characters
            "VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZyE", # "This is an encoded string!" without padding
            "U29tZVNwZWNpYWwgQ2hhcnM6ICsgLyA=", # "SomeSpecial Chars: + / " with padding
        ]
        
        for encoded in test_strings:
            try:
                expected = base64.b64decode(encoded).decode('utf-8')
                result = atob(encoded)
                print(result == expected, "Custom:", result, "Expected:", expected)
            except Exception as e:
                print(f"Error for string: {encoded} - {e}")
    
    test_atob()
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search