I have participated in various algorithmic contests (such as TopCoder or ACM ICPC) and I know what fun they are. But I have never tried taking part in a CTF competition before, so when two weeks ago my colleagues shared a link to an entry-level competition of this kind, I decided to give it a try. I had a really good time solving these computer-oriented puzzles and I recommend this to anyone who enjoys hacking. To start competing in CTFs you need general knowledge about practical computer science, have to know tools available in your system (I use Ubuntu) and have some programmers intuition. Also it was worth to assume that problems are really solvable and to use all hints the organizers give. And last but not least, you need some luck also.
There were 10 tasks and for each of them we were given one file. Files are available on the contest page. I will present the problems in the order I have solved them. Although organizers proposed some order of tasks starting (in their opinion) from the easiest to the hardest, but difficulty is a very subjective matter and usually such an ordering is based on how much technical knowledge you should know beforehand to solve a task, and not how tricky it is. So I began by opening all the tasks first, and then started solving the ones I find most promising. In later stages I ended up on thinking about 2–3 task simultaneously.
Task 5. 01
We have one file with a long string of 0s and 1s. If we open this file in a text editor which wraps rows, and resize the window, changing the number of characters in a row, we can get the impression that in fact this string encodes a two-dimensional picture. But what are its dimensions? The string has length 97343, so we must find two integers x, y such that x·y = 97343. Since this number is semiprime, we have only two possibilities (its not a novel idea, the Arecibo message broadcasted 40 years ago into space also had semiprime length).
Setting image size to 313 characters in a row, we get a picture that should be recognized by anyone who saw a QR code before:
The next thing to do is convert this string of 0s and 1s into an image file. This step is fairly easy if we are familiar with portable bitmap format (PBM). We simply add two rows (P1 and 313 311) before data in the file and then use some image-processing program (like GIMP) to open it and save as an image in some popular format. The last step is to find some online QR code converter and use it to get encoded text: PGS_PRIMENUMBERS. Not only this is in format of flag, but also organizers share the key idea used in solving the task, so we can be pretty sure that our first task is solved.
Task 4. deepspace
In this task we get an SVG file which depicts two humans and some symbols (the bottom ones resembling our solar system). If you are not familiar with fact that it's actually a Pioneer plaque, you can deduce from the name of the task and bottom symbols to google for graphics with "astronomy drawing man woman".
After opening our SVG and Wikipedia version, it's not hard to spot the difference: the sequence on the ray which touches Jupiter is different. On our SVG it is
|---- --||| |--|| ||||| |---- -|--| -|||| -|||- --|-| --|-| |--|-
That (treated as binary numbers where pipe is 1 and dash is 0) encodes the number sequence 80, 71, 83, 95, 80, 73, 79, 78, 69, 69, 82. This interpreted as ASCII codes gives us the flag: PGS_PIONEER.
Task 1. whereami
We get the following file:
51184646 17031207 (1) 51663598 16084613 (1) 53717124 19335107 (1) _ 50984155 23171832 (10) 50264888 19023722 (4) 50972438 18218128 (8) 51218633 18568923 (4) 50728694 18444701 (6) 52598551 15892477 (2) 54068038 14968824 (3) 51842555 18086852 (3) 53066035 19909063 (5) 52767355 17755542 (6) 51768751 15813327 (5) 54675984 18420687 (3)
Since we know that all flags are in format PGS_[a-zA-Z0-9_]+ and are human-readable, we conclude that every row encodes one letter (and the first three encode characters P, G and S, respectively). It probably is not a coincidence that all the numbers in the first and second columns are around 52·106 and 18·106. The name of the task suggest some location and I remember from school that geographical coordinates of Warsaw are around 52°N 21°E. So we can add dots after second digits and try to find "51.184646 17.031207", "51.663598 16.084613" and "53.717124 19.335107" in Google Maps. It look like we have luck, since each time near the cursor we find a street which name starts with needed letter (Parkowa, Generała Władysława Sikorskiego and Słowiańska). This also suggests that the numbers in the third column specifies which letter from the name we are interested in.
But next coordinates are not so helpful, for some even we don't have street names on the map. The problem is that we are missing the most important information on the map, which is the name of the city or village our cursor is in. There are two ways to spot it: either remove the cursor from the map (but stay on the coordinates) or look for coordinates in Google and not in Google Maps. The names for rows are now Psary, Głogów, Susz, Krasnystaw, Katowice, Kluczbork, Wieluń, Dobrodzień, Międzychód, Pustkowo, Russów, Żuromin, Gąsawa, Siedlisko and Żelistrzewo. Combining these letters into flag get us: PGS_WORLDISSMALL.
Task 9. tweet(y)
The name of file data.raw suggests some file format without headers or any additional metadata; just plain data. Viewing it in a hex editor we see that it starts with 16-bit codes close to 0000 and FFFF, and later it becomes more random. That actually suggests raw audio data (with 16-bit signed precision codes close to 0000 and FFFF encode positive and negative numbers close to 0, which result in silence). Audacity program has an option to Import raw data, we only have to be sure to select Signed 16 bit PCM encoding. The data imports correctly, and we can hear Merrily We Roll Along song:
But where is our flag? So this time it is worth knowing one trick with audio (or with digital signals in general). The above image shows signal value in time, but also signal can be described by frequencies which it composes of, in a form of spectrogram. So let's see how frequencies of our signal look, by selecting Spectrogram log(f) option in Audacity. On the following image we show the interesting part of the sound file (starting approximately at offset of 2 seconds):
We clearly see that together with the main signal in frequencies up to 5 kHz there is also additional signal of frequency around 16 kHz (almost inaudible to human ear) which looks like it encoding some bit sequence. The sequence has length of 1.6 seconds and the smallest interval of continuous sound (or rest) has length of 0.01 seconds. We can treat it as a 160-bit sequence:
Now it's not so hard to see that after partitioning it into 20 8-bit chunks and discarding two sentinels (the first and the last chunk), we got a message encoded in ASCII: PGS_soundofsilence.
Task 6. matrioshka
In this task we are given a DLL file and a strong hint, since the task name (matrioshka) suggest that in this file is contained another file which itself can have more layers of files inside. One of the most important command which can help us working with files of different types is file command which tries to determine file type. In our case it acknowledges that the file is in fact PE32 executable (DLL) (GUI) Intel 80386, for MS Windows.
So let's examine the file closer and see what information we get from strings command which extracts the strings of printable characters in files. Most of extracted strings looks like garbage, but we also get JFIF and Created with GIMP which suggest that inside DLL there is a JPEG image (possibly as a resource). But what is more interesting is the 8 kB string which ends on ...ANgAAAOQYAAAAAA==. That should ring a bell to anyone familiar with Base64 encoding. So let's try to decode it using base64 -d command.
The file command says that decoded data is a Zip archive, and after unzipping it we get a data.bin file. This file is Microsoft Disk Image, Virtual Server or Virtual PC. We don't want to install software to actually run this image, and we already cut some corners with skipping the JPEG part, so let's try luck once more. Strings in this file suggest that it contains image of MSDOS 5.0 system using FAT16 file system. If we open the file in a hex editor we'll see that these strings appear in block at address 0x10a00, which looks like the boot sector of the file system:
00010A00 EB 3C 90 4D │ 53 44 4F 53 │ 35 2E 30 00 │ 02 01 02 00 < MSDOS5.0..... 00010A10 02 00 02 00 │ E8 F8 E7 00 │ 3F 00 FF 00 │ 80 00 00 00 .... .?. . ... 00010A20 00 00 00 00 │ 80 00 29 93 │ 92 EF 62 4E │ 4F 20 4E 41 .... .) bNO NA 00010A30 4D 45 20 20 │ 20 20 46 41 │ 54 31 36 20 │ 20 20 33 C9 ME FAT16 3Ɏ
From the data in the boot sector we get some characteristics of the file system: 512 bytes per sector, 1 sector per cluster, 2 reserved clusters, 2 copies of FAT, 512 directory entries and 231 sectors per FAT. Since the order is: reserved sectors, FATs, root directory and then actual data, thus the address of the first FAT is 0x10a00 + 512·2 = 0x0e00 and the root directory starts at address 0x0e00 + 521·2·231 = 0x4aa00. Directory entry has size 32 bytes and first two clusters are not present, thus the address of the i-th data cluster is 0x4aa00 + 512·(32-2+i). Let's take a look at the root directory:
0004AA00 43 54 46 20 │ 20 20 20 20 │ 20 20 20 08 │ 00 00 00 00 CTF ..... 0004AA10 00 00 00 00 │ 00 00 A8 A0 │ 62 45 00 00 │ 00 00 00 00 ...... bE...... 0004AA20 24 52 45 43 │ 59 43 4C 45 │ 42 49 4E 16 │ 00 AF DD A0 $RECYCLEBIN.. ?? 0004AA30 62 45 62 45 │ 00 00 E0 A0 │ 62 45 02 00 │ 00 00 00 00 bEbE.. bE...... 0004AA40 46 4C 41 47 │ 20 20 20 20 │ 42 49 4E 20 │ 18 6A 65 A1 FLAG BIN .je 0004AA50 62 45 62 45 │ 00 00 61 A1 │ 62 45 04 00 │ 8A 00 04 00 bEbE..a bE.. ...
The most promising is file FLAG.BIN which has size of 0x4008a bytes and starts in cluster 4, so at address 0x4aa00 + 512·(32-2+4) = 0x4ee00:
0004EE00 42 4D 8A 00 │ 04 00 00 00 │ 00 00 8A 00 │ 00 00 7C 00 BM ....... ...|. 0004EE10 00 00 00 02 │ 00 00 80 00 │ 00 00 01 00 │ 20 00 03 00 ...... ..... ... 0004EE20 00 00 00 00 │ 04 00 13 0B │ 00 00 13 0B │ 00 00 00 00 ................ 0004EE30 00 00 00 00 │ 00 00 00 00 │ 00 FF 00 00 │ FF 00 00 FF ......... .. .. 0004EE40 00 00 FF 00 │ 00 00 42 47 │ 52 73 00 00 │ 00 00 00 00 .. ...BGRs......
00010E00 F8 FF FF FF │ FF FF FF FF │ 05 00 06 00 │ 07 00 08 00 ........ 00010E10 09 00 0A 00 │ 0B 00 0C 00 │ 0D 00 0E 00 │ 0F 00 10 00 ................ 00010E20 11 00 12 00 │ 13 00 14 00 │ 15 00 16 00 │ 17 00 18 00 ................ 00010E30 19 00 1A 00 │ 1B 00 1C 00 │ 1D 00 1E 00 │ 1F 00 20 00 .............. . 00010E40 21 00 22 00 │ 23 00 24 00 │ 25 00 26 00 │ 27 00 28 00 !.".#.$.%.&.'.(. 00010E50 29 00 2A 00 │ 2B 00 2C 00 │ 2D 00 2E 00 │ 2F 00 30 00 ).*.+.,.-.../.0. 00010E60 31 00 32 00 │ 33 00 34 00 │ 35 00 36 00 │ 37 00 38 00 22.214.171.124.126.96.36.199. 00010E70 39 00 3A 00 │ 3B 00 3C 00 │ 3D 00 3E 00 │ 3F 00 40 00 9.:.;.<.=.>.?.@. 00010E80 41 00 42 00 │ 43 00 44 00 │ 45 00 46 00 │ 47 00 48 00 A.B.C.D.E.F.G.H. 00010E90 49 00 4A 00 │ 4B 00 4C 00 │ 4D 00 4E 00 │ 4F 00 50 00 I.J.K.L.M.N.O.P. 00010EA0 51 00 52 00 │ 53 00 54 00 │ 55 00 56 00 │ 57 00 58 00 Q.R.S.T.U.V.W.X. 00010EB0 59 00 5A 00 │ 5B 00 5C 00 │ 5D 00 5E 00 │ 5F 00 60 00 Y.Z.[.\.].^._.`.
It looks like we again have luck: our file is stored in consecutive clusters 4, 5, 6 etc., so to extract it we simply copy 0x4008a bytes starting from address 0x4ee00. After opening this file with GIMP, we see that most of the file is black background with small white Almost done! message in the corner. But GIMP's Histogram tool shows that the image uses three shades of gray: pure white (255), pure black (0) and almost pure black (1). Replacing this almost pure black with some contrasting color reveals that the more important message is written using this color and it says PGS_formatmaster.
Task 10. noneshallpass
During the contest it looked like I had better luck with harder tasks (up to this point I had tried to solve tasks 2 and 3 with several approaches, but without success). So why not try with the hardest task now? We get an encrypted Zip file (the command unzip asks for a password), but since in Zip only the data is encrypted and not metadata (information about zipped files), we can see what was encrypted using unzip -v command:
Archive: task10.zip Length Method Size Cmpr Date Time CRC-32 Name -------- ------ ------- ---- ---------- ----- -------- ---- 653218 Defl:X 371475 43% 2014-11-12 02:23 fdb41d5f flag 2703 Defl:X 2218 18% 2014-11-12 02:06 c0b89d1b pgs-logo.png -------- ------- --- ------- 655921 373693 43% 2 files
So we have two files, one presumably with flag, and the second one with pgs-logo.png image of size 2703 bytes. Googling for "pgs-logo.png" reveals that the logotype in the header of PGS Software website has exactly the same name and size. What if we assume that it's exactly this file? Well, it will allow us to perform a known-plaintext attack on the Zip file. There is already software that can help us in doing it, during the contest I used PkCrack (yes, I remember to send a postcard to its author). The program can do all dirty work for us, but there is one tricky bit. Since the program requires that the known plaintext should be compressed with the same compression method used for the encrypted file, we have to tweak compression level in order to get file of size exactly 2218 after compression.
After cracking we are informed by file command that that the type of unpacked file is Targa image data - RGB - RLE 500 x 375, and opening it with GIMP reveals that the flag is PGS_KNOWNTEXT.
Task 2. math...
In this task we are presented with the following arithmetic expression:
I had several attempts to solve this task, mostly trying to find out some mathematical properties of the expression. Since the goal is to extract a sequence of characters, I also tried to partition the expression into parts which correspond to subsequent characters. I had several candidates (like slashes or number 12 which occurs suspiciously often).
I was a little bit frustrated after several unsuccessful attempts, but then I thought to myself that since this task is supposed to be second easiest, there must be something simple I miss. I remembered that organizers published a sample task, so I read that for some inspiration. In this task one of key points was using Morse code. The characters PGS_ in this code are .--. --. ... ..--.- and in fact that is exactly what we get if we erase from our mathematical expression every character which is not dot or dash (and treat slashes as delimiters). That gives us
Which turns out to be our flag PGS_MORSEFOREVER. By the way, I think that three dots in the name of this task is also a slight hint for Morse code.
Task 3. matrix
Here we are given GIF file with PGS Software logo. During the contest I thought that maybe there is some hidden data stored in the image using steganography. Since GIF format uses indexed color mode I also made sure to check the palette of the picture using Color palette option in GIMP. The most suspecting thing is that many randomly placed entries of the palette has black color. All these black entries are used in the image, so I spent good deal of time trying to see what parts of image each entry corresponds to, but it led me nowhere.
Finally I focused on the palette entirely but it was not until I arranged it in a 16×16 square (why GIMP doesn't do it by itself is beyond me), when things started to click:
It is definitely not coincidence that two edges of this square are black, and another two have black and non-black entries alternating. I am not familiar with all two-dimensional codes, but ultimately I found the suspect: it looked like a Data Matrix barcode. Now it is only a question of finding online decoder to get the flag: PGS_palFun.
Task 7. haxor
This time we are given an Android application package. I haven't got access to Android phone, so I decided to solve it like a real hacker. Since I have never developed an Android application, I had to get all the necessary information from the Internet. I disassembled the program into Dalvik bytecode using dexdump -d classes.dex command from Android SDK. I won't go into details here, the code itself is not long and with a help of list of opcodes it can be understood even by people who (like me) saw Dalvik bytecode for the first time. It turned out that this program performs xor operation on three arrays of length 17 (one stored directly in classes.dex file at offset 0x6c8, one stored in assets/b1.bin file and one encoded as string in resources.arsc file at offset 0xfb). The answer is PGS_LOVES_ANDROID.
Task 8. bigmem
In the final task we got an executable file (file command says ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped). Unfortunately the file was compiled for Haiku operating system, so running it on my Ubuntu results in segmentation fault. (I will admit that on some point during the contest I gave up and actually installed Haiku just to be able to debug the executable using gdb. It hadn't lead me anywhere, but at least I played with a new system.)
So let's try to disassemble the contents of the program using objdump command (with parameters objdump -d -M intel flag). The most interesting part is the main function of the program:
00000878 <main>: 878: 55 push ebp 879: 89 e5 mov ebp,esp 87b: 83 ec 14 sub esp,0x14 87e: 53 push ebx 87f: e8 00 00 00 00 call 884 <main+0xc> 884: 5b pop ebx 885: 81 c3 40 12 00 00 add ebx,0x1240 88b: 83 c4 f4 add esp,0xfffffff4 88e: 68 69 7a 00 00 push 0x7a69 893: e8 80 ff ff ff call 818 <ctfdelay> 898: 83 c4 10 add esp,0x10 89b: 83 c4 f8 add esp,0xfffffff8 89e: 68 c9 00 00 00 push 0xc9 8a3: 68 c9 00 00 00 push 0xc9 8a8: 68 c3 00 00 00 push 0xc3 8ad: 68 e2 00 00 00 push 0xe2 8b2: 68 c1 00 00 00 push 0xc1 8b7: 6a 6d push 0x6d 8b9: 68 e2 00 00 00 push 0xe2 8be: 68 c7 00 00 00 push 0xc7 8c3: 68 d7 00 00 00 push 0xd7 8c8: 8d 93 d1 ee ff ff lea edx,[ebx-0x112f] 8ce: 89 d0 mov eax,edx 8d0: 50 push eax 8d1: e8 6e fc ff ff call 544 <printf@plt> 8d6: 83 c4 30 add esp,0x30 8d9: 31 c0 xor eax,eax 8db: eb 03 jmp 8e0 <main+0x68> 8dd: 8d 76 00 lea esi,[esi+0x0] 8e0: 8b 5d e8 mov ebx,DWORD PTR [ebp-0x18] 8e3: 89 ec mov esp,ebp 8e5: 5d pop ebp 8e6: c3 ret 8e7: 90 nop
After glancing at the contents of the function ctfdelay (not shown here), it looks like it does nothing except recursively invoke malloc to allocate big number of chunks of memory (what is actually suggested by the task name), so its sole purpose is not to allow to execute next instructions when program is simply run from the command line.
The more interesting thing is printf function. Before its invocation ten values are pushed on the stack, so we assume that the function has ten arguments: one string and nine integer values. Using strings command we can see that one string is particularly interesting: %c%c%c%c%c%c%c%c%c. It looks like a string format, so we can assume that the invocation is as follows:
But this prints some garbage on the screen, since it does not contain any ASCII characters (except 0x6d which is letter m). But let's assume that in fact this invocation corresponds to printing flag on the screen, using some unknown character encoding. Then we have the following mapping:
0xd7 → P 0xc7 → G 0xe2 → S 0x6d → _ 0xc1 → x 0xe2 → S 0xc3 → y 0xc9 → z 0xc9 → z
Is there any English word that has five letters, the second letter is S and two last letters are the same? You will not find it in every English dictionary, but it exists and actually we have used it in this write-up. So a pretty strong guess for flag is PGS_ASCII. But how can we be sure? If the task is sensible, the unknown encoding shouldn't be a totally arbitrary one. An in fact it isn't, since it is IBM's encoding called EBCDIC.
I solved almost all tasks in about 10 straight hours of hacking, and later (when I got some sleep) I finished remaining ones. The whole event was a nice mixture of frustration and enjoyment. For additional information about CTF contests, I recommend to read what more competent people have to say in this subject. Following these two links may be a good beginning: How to get started in CTF, CTF write-ups repository, CTF write-ups from DragonSector team.