Content

Content

Projects

XOR-Encoding in Cryptography



Introduction


The XOR code is a quite strange one. If used correctly (One-Time Pad) it is unbreakable. If used wrong it is very weak on the other hand. For a seminar I did some work on that topic.

If someone who did never do cryptography before is told to do a cryptographic system it will usually end in one of the following two solutions:
  • XOR code or if he is more sophisticated a
  • PRNG coding.
The basic XOR code works quit easy. You pick a letter and XOR it with the whole text. Better implementations use more then one letter and always repeat that password. For example if the pass is "car" the first, fourth, seventh and so on letter would be xored with 'c', the second, 5th, 8th,... with 'a' and the 3rd, 6th, 9th,... letter is encoded with r.

Here are some XOR implementations of mine: The PRNG solution works quite similar. Here however the password does determine the start of a PRNG (pseudo random number generator). Then the next number is created 'random'. In most cases these random key is the again XORed onto the source text.

Both are quite simple to implement solutions and they look like they would be quite strong. Often also the very same people tell you that their systems are unable to break, but they won't tell you any details about them.

I think it is needless to say this proves wrong. Both of these cyphers are extremely easy to break. Why else would stuff like DES, IDEA, RSA or others exist? The main problem is just that those who do a DES know that it can be broken. Those who do XOR or PRNG often think they are safe. In the following I will explain how to break the XOR code. I did not do more work into the PRNG one, but trust me it is almost as simple broken.

When I took this project I knew that XOR code is breakable, but I did think that it was still quite strong. I also knew that it is quite easily broken if you know the key length, but I though that this is almost impossible to find out... well - it wasn't. Not at all.


Mathematical Background


Kappa (T',T'') := Sum from u=1 to M of d(t'u,t''u)/M
where
d (x,y) := 1 if x=y and 0 if x!=y

Great I what do we do with that mathematic stuff? That Kappa function is a quite interesting thing. It obviously counts the percentages of matching of two strings.

For example these two give 1/22 or 0.045:
"theprecedingchapterhas"
"wouldseemthatonewaytoo"
       *
Now the interesting stuff is that if you take long texts from different language you will get different values for Kappa. Amazingly you can even determine which language you have in front of you without even reading a word. German for example will give a 0.0762 - 0.082 while English will give a 0.0661 to 0.0675. Russia will give 0.047 to 0.0529.
Cool what that little function can do, isn't it?
You can download a DOS implementation of Kappa (24k) here.
But it is even more amazing what it will do with XOR encrypted stuff.


Getting the Keylength of XOR


Take a XOR encoded text of sufficient length. Now place it over itself. No, not exactly. Obviously comparing a text to itself will always give 1. Rather add one letter to the start of it like this:
akjhifhsiuzdfiasdfnasdflasdfjlasf
 akjhifhsiuzdfiasdfnasdflasdfjlasf
When using an unencoded text the Kappa value will once again be the one typical for that language. Not very surprisingly in this case however we get a value close to 0.038, the value for comparing two completely random texts. However the text isn't completely random as you can see with an easy experiment: Encode two different text with the same key. Now place them over each other.
Once again you get the correct Kappa for your language. That should give the first hint that something might be wrong with that encryption. And we are only one step away from getting the key length:
Once you know the trick it is quite easy. First take the one encrypted text you have and compare it with itself shifted by one. You will get a value close to 0.038. Then shift the text by 2 and again compare it to itself and so forth. The numbers will stay close over 0.038 until you hit the key length. At that moment you will get a value of 0.05-0.07. This indicates that we have the same language suddenly. Or in other words we have to have caught the period of the cyper. There is no random disturbance anymore at this point because the disturbance is the same one in both cases. So 1 done 1 to go...


Breaking XOR


The rest is really kids work if you got a sufficiently long and normal text (shorter/more complicated stuff is breakable as well, but needs lexica to find words). You might already have broken a text where each letter was replaced with a different one as a kid. How did you do that? Right. You looked for the most common letter and said it was an e. Then you looked for the second most common letter and said it was an i and so on. Then when you had a couple of letters you guessed the rest. Well that is pretty much what I will do here as well, just a bit modified:

Let's say we have a key length of 3. Now imagine we had 3 text as well. The first text does have the 1st, 4th, 7th,... letter of the encrypted text, the second text got the 2nd, 5th, 8th,... letter of the encrypted text and finally the third text got the 3rd, 6th, 9th,... encoded letters.
We already are back to that toy problem, just that it is even easier as we only need to guess one letter per imaginary text and all others can be calculated by that one. So all one has to do is for the most often occurring letter in virtual text one. That has to be the space. Now XOR that letter and ' ' - you get the first letter of the key. Do the same for the second and third virtual text and you know the complete key that was used to encrypt.
Here are some XOR breaking implementations of mine

Conclusion


Well that's it. And even worse we did not only break the text, but we found the key as well. Most users tend to use the same key over and over again. They just told us their key and we can maybe break into more secure systems... Scaring, isn't it?

To cite a famous quote: An XOR might keep your kid sister from reading your files, but it won't stop a cryptanalyst for more than a few minutes...

Minutes? I wonder when this was said. We are talking about seconds at most. At that is a quick and dirty implementation from me.

So what is the difference to the One-Time Pads you might want to ask. I claimed it was the same function and it was unbreakable. Indeed I did and it is true. So what is the difference?
First think how we broke this code:
  • Find the key length
  • Use statistics
So one has to prevent at least one of this. We cannot do anything about the key length, so we have to prevent statistics. Well the only way to do this is to not hand out enough data to analyze. Translated into the XOR cipher that means the key has to be waaaaaaaaaaaaay longer. A key with is one tenth of the text usually still leaves enough room for statistic attacks. So the bad truth is that the key has to be as long as the text for real security. And of course you may only use the key once (thus the name "One-Time"). If you do this you will get the only 100% secure system however.