5 min read

Base64 basics and “bse64” CTF challenge writeup

Base64 basics and “bse64” CTF challenge writeup

Last year I took part in a RedHat internal CTF challenge run by Keith Swagler and the Red Hat Information Security Team. Many awesome and interesting challenges, but I’d like to write about the “bse64” challenge. Along with that I’ll take a closer look at the base64 encoding itself and explain some basics.

Challenge description

Developer Paul said removing one specific letter from a Base64 encoded string was good enough to stop attackers for a few years. Can you prove him wrong? ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=

First try: decoding the line

The first step for me was to try to decode the line and see if it works somehow. So, I ran this command in my Linux terminal and took a look at the decoded output:

$ base64 -d <<< ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=

As expected, not much help here, no patterns, no clues, nothing. It means that to solve this challenge we need to get closer to base64 encoding itself and learn some foundations.

What is Base64

BTW, did you know that you can encode image in base64, so the server will send the image without the need to read it from another source like the file system?

Base64 is a binary-to-text encoding schema used to convert binary files and send them over the Internet.

Base64 string can only contain A-Z, a-z, 0-9, +, / characters (64 characters). These characters can be represented by 6 bits. 1 byte contains 8 bits to fit this, so the left side is filled by 0s. That means there are 8 bits, but only 6 right bits have a value for us because the left 2 bits are always 0. The encoding rule for base64 encodes three 8-bit bytes to four 6-bit bytes and then adds 0s to each 6-bit byte to fit the 8-bit byte.

Probably it sounds a bit complicated, but it really isn’t. Let’s take a look at the examples and everything will become more clear.

Base64 encoding example

Let’s take the string OIL as an example and try to encode it.

The first step is to convert each letter to its ASCII code representation:

O I L -> 79 73 76

Then convert all numbers to binary format (8 digits == 8 bits):

79 73 76 -> 01001111 01001001 01001100

Now we need to create four 6-bit groups. For this let’s merge all together first:

01001111 01001001 01001100 -> 010011110100100101001100

At this moment we can break down our line with four 6-bit groups (6 digits == 6 bits):

010011110100100101001100 -> 010011 110100 100101 001100

Since each group should be 8-bits long, we have to add 0s to the left side of each group to fit 8-bit format:

010011 110100 100101 001100 -> 00010011 00110100 00100101 00001100

Now the final part to convert all the binary digits to letters. For this let’s convert binary digits back to decimal first:

00010011 00110100 00100101 00001100 -> 19 52 37 12

The last step is to convert decimals to letters using Base64 encoding table (see below):

19 52 37 12 -> T 0 l M

Encoding is completed. We’ve just encoded the string OIL to the T0lM base64 string.

BTW, did you know that the e-mail protocol makes massive use of base64 to attach files to messages ?

Base64 encoding table

Index Character Index Character Index Character
0 A 26 a 52 0
1 B 27 b 53 1
2 C 28 c 54 2
3 D 29 d 55 3
4 E 30 e 56 4
5 F 31 f 57 5
6 G 32 g 58 6
7 H 33 h 59 7
8 I 34 i 60 8
9 J 35 j 61 9
10 K 36 k 62 +
11 L 37 l 63 /
12 M 38 m
13 N 39 n
14 O 40 o
15 P 41 p
16 Q 42 q
17 R 43 r
18 S 44 s
19 T 45 t
20 U 46 u
21 V 47 v
22 W 48 w
23 X 49 x
24 Y 50 y
25 Z 51 z

“Missing” bytes

To fit the three 8-bit groups we need to encode 3 letters like in the example above.

But what if we want to encode 2 letters or even 1 letter? In this case, we can’t get four 6-bits groups and we have to paste the “=” character in the place of missing groups.

It means, whenever the length of our string we can have only one or two “=” characters at the end of our string (or zero if our string matches the base64 encoding pattern).

Different implementations, however, may use other values for the latest two characters and the one used for padding (“=“).

Back to the challenge

Based on the information above we need to find the missing letter (or letters) and recover the string from the description. We don’t need to recover the whole string, instead, we can break down the string by 4 characters and try to recover small groups.

Finding the missing letter

The description says that there is a missing letter in the string. And finding the missing letter is the simplest part of this challenge. We know the format of the answer which is flag{}. Let’s take the flag{ part and try to encode it with base64 in the Linux terminal:

$ base64 <<< flag{

And then compare with the string from the description:

ZmxhZ3sK vs ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=

Here we can see that our line contains additional character m in the first 4 letters of the string. It means, we’ve found the missing letter and now we can try to recover the whole string.

Recovering the string: the algorithm

I used small python automation to brute force 4-letters groups in the line and tried to figure out if a specific group of 4 letters needs the additional m letter or not.

Note: The description tells us about a missing letter. It means it can be lowercase or uppercase (m or M)

To do that I took 3 random letters and encoded them to a base64 string. Then I compared the result base64 string with the group of 4 letters from the original base64 string.

Possible results here are:

  • The full match with the 4 letters in the original string
  • Partial match with the first 3 letters from the original base64 string with m or M somewhere between these letters. In this case, we need to move the last letter from the 4-letter group of the original base64 string to the next round.

Recovering the string: the process

I ran the script and got these results:

Current 4-letter group from the original base64 string Matching result ASCII result Result Current checking part of the original base64 string
ZxhZ Zmxh fla Z goes to the next group ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
Z3tO Z3tO g{N Full match ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
HRf2 MHRf 0t_ 2 goes to the next group ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
24wd M24w 3n0 d goes to the next group ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
dWdo dWdo ugh Full match ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
XzBi XzBi _0b Full match ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
ZnUk ZnUk fu$ Full match ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
PEB0 PEB0 <@t Full match ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
TBuf MTBu 10n f goes to the next group ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
fQo= fQo= }\n Full match ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=

Challenge results

In the end I got this full base64 string with all missing letters in place:


And also the flag: