Base64 basics and “bse64” CTF challenge writeup
Last year I took part in a RedHat internal CTF challenge run by Keith Swagler and the Red Hat Information Security Team. Many awesome and interesting challenges, but I’d like to write about the “bse64” challenge. Along with that I’ll take a closer look at the base64 encoding itself and explain some basics.
Challenge description
Developer Paul said removing one specific letter from a Base64 encoded string was good enough to stop attackers for a few years. Can you prove him wrong? ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
First try: decoding the line
The first step for me was to try to decode the line and see if it works somehow. So, I ran this command in my Linux terminal and took a look at the decoded output:
$ base64 -d <<< ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
gYE՝|Ԑ0n}
As expected, not much help here, no patterns, no clues, nothing. It means that to solve this challenge we need to get closer to base64 encoding itself and learn some foundations.
What is Base64
Base64 is a binary-to-text encoding schema used to convert binary files and send them over the Internet.
Base64 string can only contain A-Z
, a-z
, 0-9
, +
, /
characters (64 characters). These characters can be represented by 6 bits. 1 byte contains 8 bits to fit this, so the left side is filled by 0s. That means there are 8 bits, but only 6 right bits have a value for us because the left 2 bits are always 0. The encoding rule for base64 encodes three 8-bit bytes to four 6-bit bytes and then adds 0s to each 6-bit byte to fit the 8-bit byte.
Probably it sounds a bit complicated, but it really isn’t. Let’s take a look at the examples and everything will become more clear.
Base64 encoding example
Let’s take the string OIL as an example and try to encode it.
The first step is to convert each letter to its ASCII code representation:
O I L -> 79 73 76
Then convert all numbers to binary format (8 digits == 8 bits):
79 73 76 -> 01001111 01001001 01001100
Now we need to create four 6-bit groups. For this let’s merge all together first:
01001111 01001001 01001100 -> 010011110100100101001100
At this moment we can break down our line with four 6-bit groups (6 digits == 6 bits):
010011110100100101001100 -> 010011 110100 100101 001100
Since each group should be 8-bits long, we have to add 0s to the left side of each group to fit 8-bit format:
010011 110100 100101 001100 -> 00010011 00110100 00100101 00001100
Now the final part to convert all the binary digits to letters. For this let’s convert binary digits back to decimal first:
00010011 00110100 00100101 00001100 -> 19 52 37 12
The last step is to convert decimals to letters using Base64 encoding table (see below):
19 52 37 12 -> T 0 l M
Encoding is completed. We’ve just encoded the string OIL to the T0lM base64 string.
Base64 encoding table
Index | Character | Index | Character | Index | Character |
---|---|---|---|---|---|
0 | A | 26 | a | 52 | 0 |
1 | B | 27 | b | 53 | 1 |
2 | C | 28 | c | 54 | 2 |
3 | D | 29 | d | 55 | 3 |
4 | E | 30 | e | 56 | 4 |
5 | F | 31 | f | 57 | 5 |
6 | G | 32 | g | 58 | 6 |
7 | H | 33 | h | 59 | 7 |
8 | I | 34 | i | 60 | 8 |
9 | J | 35 | j | 61 | 9 |
10 | K | 36 | k | 62 | + |
11 | L | 37 | l | 63 | / |
12 | M | 38 | m | ||
13 | N | 39 | n | ||
14 | O | 40 | o | ||
15 | P | 41 | p | ||
16 | Q | 42 | q | ||
17 | R | 43 | r | ||
18 | S | 44 | s | ||
19 | T | 45 | t | ||
20 | U | 46 | u | ||
21 | V | 47 | v | ||
22 | W | 48 | w | ||
23 | X | 49 | x | ||
24 | Y | 50 | y | ||
25 | Z | 51 | z |
“Missing” bytes
To fit the three 8-bit groups we need to encode 3 letters like in the example above.
But what if we want to encode 2 letters or even 1 letter? In this case, we can’t get four 6-bits groups and we have to paste the “=” character in the place of missing groups.
It means, whenever the length of our string we can have only one or two “=” characters at the end of our string (or zero if our string matches the base64 encoding pattern).
Different implementations, however, may use other values for the latest two characters and the one used for padding (“=“).
Back to the challenge
Based on the information above we need to find the missing letter (or letters) and recover the string from the description. We don’t need to recover the whole string, instead, we can break down the string by 4 characters and try to recover small groups.
Finding the missing letter
The description says that there is a missing letter in the string. And finding the missing letter is the simplest part of this challenge. We know the format of the answer which is flag{}
. Let’s take the flag{
part and try to encode it with base64 in the Linux terminal:
$ base64 <<< flag{
ZmxhZ3sK
And then compare with the string from the description:
ZmxhZ3sK vs ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo=
Here we can see that our line contains additional character m
in the first 4 letters of the string. It means, we’ve found the missing letter and now we can try to recover the whole string.
Recovering the string: the algorithm
I used small python automation to brute force 4-letters groups in the line and tried to figure out if a specific group of 4 letters needs the additional m letter or not.
Note: The description tells us about a missing letter. It means it can be lowercase or uppercase (m
orM
)
To do that I took 3 random letters and encoded them to a base64 string. Then I compared the result base64 string with the group of 4 letters from the original base64 string.
Possible results here are:
- The full match with the 4 letters in the original string
- Partial match with the first 3 letters from the original base64 string with
m
orM
somewhere between these letters. In this case, we need to move the last letter from the 4-letter group of the original base64 string to the next round.
Recovering the string: the process
I ran the script and got these results:
Current 4-letter group from the original base64 string | Matching result | ASCII result | Result | Current checking part of the original base64 string |
---|---|---|---|---|
ZxhZ | Zmxh | fla | Z goes to the next group | ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo= |
Z3tO | Z3tO | g{N | Full match | ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo= |
HRf2 | MHRf | 0t_ | 2 goes to the next group | ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo= |
24wd | M24w | 3n0 | d goes to the next group | ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo= |
dWdo | dWdo | ugh | Full match | ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo= |
XzBi | XzBi | _0b | Full match | ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo= |
ZnUk | ZnUk | fu$ | Full match | ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo= |
PEB0 | PEB0 | <@t | Full match | ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo= |
TBuf | MTBu | 10n | f goes to the next group | ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo= |
fQo= | fQo= | }\n | Full match | ZxhZ3tOHRf24wdWdoXzBiZnUkPEB0TBufQo= |
Challenge results
In the end I got this full base64 string with all missing letters in place:
ZmxhZ3tOMHRfM24wdWdoXzBiZnUkPEB0MTBufQo=
And also the flag:
flag{N0t_3n0ugh_0bfu$<@t10n}
Member discussion