Statistical techniques for cryptanalysis


Cryptography is the art of writing information in code or cipher, to disguise, and in so doing secure this content of a specific stream of words. When encrypted, an ordinary text can be uncovered only by using the main element used to encode the cipher. Cryptography does not mask the living of the meaning, but does indeed disguise its content [1]. In contrary, cryptanalysis is the fine art of recovering the plaintext of a note without access to the main element. Successful cryptanalysis may restore the plaintext or the key for a particular ciphertext [2].

There are five standard types of cryptanalytic attacks:-

1. Ciphertext-only invasion: In this type of assault, the cryptanalyst has some cipher text messages encrypted using the same encryption algorithm. Then, the cryptanalyst deduces the plain text of every of the cipher text messages or identifies the main element used to encrypt the cipher text

2. Known-plaintext strike: In this kind of attack, the cryptanalyst has a series of ciphertext and their matching plaintext principles encrypted using a specific key. The cryptanalyst then attempts to deduce the main element by creating a relationship between your ciphertext and plaintext entries.

3. Chosen-plaintext assault: In this kind of episode, the cryptanalyst not only has access to the ciphertext and associated plaintext for many communications, but he also decides the plaintext that gets encrypted. His job is to deduce the key used to encrypt the communications or an algorithm to decrypt any new announcements encrypted with the same key.

4. Frequency analysis: It's the study of thefrequency of lettersor sets of characters in aciphertext. The technique is utilized as an help to breakingclassical ciphers. Consistency analysis is based on the fact that, in any given stretch out of written language, certain letters and mixtures of letters happen with varying frequencies.

5. Rubber-hose cryptanalysis: The cryptanalyst threatens, tortures or blackmails the individual who gets the key until they give it up.

Among the countless cryptanalytic techniques, consistency analysis or occurrence counting is the most basic technique applied to break substitution cipher centered algorithms, among the varied list of episode techniques. The essential use of rate of recurrence analysis is to first count up the rate of recurrence of ciphertext letters and then relate guessed plaintext letters with them. More technical use of information can be conceived, such as considering matters of pairs of words digrams, trigrams, etc. That is done to provide more information to the cryptanalyst.

It exploits the weakness in the substitution cipher algorithm to encrypt similar plaintext words to similar ciphertext words. Frequency analysis established cryptanalysis techniques were used to break ciphers predicated on the traditional cryptographic algorithms, but they do not work well with the modern block cipher founded cryptographic algorithms.

Statistical properties of British:

Frequency analysis structured cryptanalysis uses the fact that natural language is not random in dynamics and solo alphabetic based mostly substitution will not hide the statistical properties of the natural terminology. In the case of encryption using monoalphabetic substitution, to begin deciphering the encryption it pays to to get a frequency count of all letters. The most typical letter may represent the most typical letter in English, E accompanied by T, A, O and I whereas the least regular are Q, Z and X [7]. Statistical habits in a language can be discovered by tracing the redundancy of the written text in the words. It has been noticed that various widespread regularities characterize word from different domains and dialects. The best-known is Zipf's legislation on the circulation of expression frequencies [5], corresponding to which the frequency of terms in a series decreases inversely to the rank of the conditions. Zipf's law has been found to apply to collections of written documents in virtually all dialects [5].

English language characters employ a high redundancy rate when used for cryptographic substitutions. If we have a note encrypted using the substitution cipher that should be cracked, we can use frequency examination. "Quite simply, if the sender has used an encryption structure, that replaces one letter in the British to be another letter in British, we can still recognize the original ordinary wording as, the occurrence characteristics of the initial plain text will be offered the new cipher text characters "[4]. To apply frequency evaluation, we should know the consistency of every letter in the British alphabet, or the occurrence characteristics of the vocabulary utilized by the sender to encrypt the written text.

Below is a set of average frequencies for words in the British language. So, for example, the letter E accounts for 12. 7% of all letters in English, whereas Z makes up about 0. 1 %. All of the frequencies are tabulated and plotted below:-

For example, let us consider the next word: "We research Cryptography within our course". Utilizing a simple substitution cipher, let us consider the next

a->c, b-> d, c->e. . . . . . . . . . . . . . w->y, x->z, y->a, z->b

So, the cipher text becomes: "yg uvwfa etarvqitcrja cu rctv qh qwt eqwtug". A straightforward frequency analysis of the cipher text message can be executed and the email address details are as listed below

The above data can be utilized by a cryptanalyst to identify the main element or the plaintext by using simple substitution to the cipher wording till the right plaintext value is not diagnosed.

Apart from the utilization of mono alphabetic regularity examination, cryptanalysts also identify rate of recurrence of paired letters better known as digram frequency and that of three letter words, called as Trigram frequencies. These help the cryptanalyst to exploit the redundant top features of English language to break the cipher.

The most typical Digrams (in order)

th, he, in, en, nt, re, er, an, ti, sera, on, at, se, nd, or, ar, al, te, co, de, to, ra, et, ed, it, sa, em, ro.

The most popular Trigrams (to be able)

the, and, tha, ent, ing, ion, tio, for, nde, has, nce, edt, tis, oft, sth, men

Desk 1: Digram and Trigram Frequencies [6]

These help in identifying the mostly used conditions in British to break a cipher. The digram frequencies are used to break two letter words such as an, to, of etc and the trigram frequencies are being used to break three letter words like the, are, for etc. After breaking a significant two letter and three notice words, it is pretty much east to recognize the key from the cracked principles of plaintext by complementing the corresponding ideals in the ciphertext. This huge weakness in British language can be used to break cipher text messages encrypted using simple algorithms that employ English alphabets. Used the utilization of frequency research consists of first counting the frequency of ciphertext words and then assigning "guessed" plaintext characters to them. "Many words will occur with approximately the same rate of recurrence, so a cipher with X's may indeed map X onto R, but may possibly also map X onto G or M. But some letters atlanta divorce attorneys language using words will occur more often; if there are usually more X's in the ciphertext than anything else, it's a good think for English plaintext that X is a substitution for E. But T and A are also quite typical in English text message, so X might be either of them also" [4]. Thus the cryptanalyst may need to try several combos of mappings between ciphertext and plaintext letters. Once the common single letter frequencies have been settled, then paired habits and other patterns are resolved. Finally, when sufficient character types have been damaged, then the remaining words can be cracked using simple substitution. Rate of recurrence analysis is incredibly effective against the easier substitution ciphers and can break astonishingly brief cipher texts with ease.

Attacks on Traditional algorithms

Encrypting using traditional algorithms have been defenseless against cryptanalytic disorders as they use bit by bit encryption, which can be easily damaged using frequency examination based episodes.

1. Caesar Cipher:

Considering the situation of 1 of the oldest ciphers, the Caesar Cipher, this cipher replaces one letter of the plaintext with another to produce the ciphertext, and any particular letter in the plaintext will usually, turn into the same letter in the cipher for any example of the plaintext identity. For example, all B's will become F's. "Frequency evaluation is dependant on the fact that one letters, and mixtures of letters, look with characteristic occurrence in essentially all texts in a particular words" [9]. For instance, in the British vocabulary, E is quite typical, while X is not. In the same way, ST, NG, TH, and QU are normal mixtures, while XT, NZ, and QJ are extremely unusual, or even "impossible" that occurs in English. This evidently shows the way the Caesar cipher can be destroyed with ease by just identifying the occurrence of each letter in the cipher text. A message encrypted using Caesar cipher is extremely insecure as an exhaustive cryptanalysis on the secrets easily breaks the code.

2. Substitution Ciphers:

The Caesar cipher forms a subset of the complete group of substitution ciphers. Here, the main element of the encryption process is the permutation of all the twenty six people of the British alphabets. Rather than choosing a particular key for those encryption process, we use a different key for successive encryption procedures. This technique increases the range of possible key to 26!, which is about 4 X 1026, which gets rid of the exhaustive cryptanalysis invasion on the keyspace [7]. To decrypt the cipher the, statistical consistency distribution of one letter event in English terminology is examined. Then, the digram and trigram frequencies of standard English words are compared with the frequencies of the trigrams in the cipher to finally reconstruct the main element and in turn decipher the text. This is an efficient method to break the substitution cipher as, each plaintext letter is symbolized by the same ciphertext letter in the note. So, all properties of plaintext are continued to the cipher text.

3. Vigenere Cipher:

In a Vigenere cipher, there exists better security as, a given plaintext notice is not always represented by the same ciphertext notice. This is achieved by by using a series of n different substitution ciphers to encrypt a note. This technique increases the possible range of secrets from 26! to (26!)n. Although this is considered to be unbreakable, the Kasiski's approach to attacking a Vigenere cipher yielded successful results of decrypting the subject matter. According to the method, the first step is to get the key span (n).

Find similar segments of simple words that get encrypted to the same ciphertext, when they are b positions aside, where b=0 mod n. Relating to Kasiski, the next thing is to find all the identical segments of size higher than 3, and track record the length between them [7].

This may then be utilized to predict the distance of the main element (n). Once this is available the key is available by an exhaustive search of the keyspace for any possible combinations to recognize the key. That is done by substituting all possible worth for n to create substrings. After the substring is developed, the plaintext note can be automatically discovered by using the back again substitution of the key into the cipher [7]. This is done for any possible values for n until finally arriving at the actual key, which uncovers the plaintext that was encrypted. This method can take a long time to break the key to identify the plaintext incase the key length is very long, as the keyspace value would be large for much larger keys.

Defeating frequency founded attacks:

Frequency founded problems have been used for years to break traditional encryption algorithms. It uses the actual fact that, traditional encryption algorithms do not get rid of the statistical properties of the dialect upon encryption.

The first way to defeat frequency based attacks is to encrypt blocks of characters at the same time rather than solo letters [7]. This might ensure that, the same text in the plaintext is not encrypted to the same content material in the ciphertext upon encryption. For e. g. , if we use the Caesar cipher encryption scheme, the term "ADDITIONAL" will be encrypted to "CFFKVKQPCN", we can easily see that the alphabets A, D and I are repeated more often than once and at each illustration, the encryption system used always encrypts A to C, D to F and I to K. This can clearly be used during frequency evaluation to analyze the redundancy of the characters and subsequently map them back again to get the initial plaintext character. Utilizing a block encryption scheme, one can be satisfied that, this trend does not take place as, in a block encryption scheme, the complete plaintext is shattered into chunks or blocks of data, that is fed in as type to the encryption algorithm. The algorithm then, reads the input block along with the key and encrypts the entire stop of plaintext, somewhat than individual people, so there is a smaller chance that two blocks will produce the same chunk of ciphertext.

The second way of defeating occurrence evaluation is to employ synonyms of words [7], somewhat than repeating the same expression again and again in a sentence. There are a lot of words in English, which have several synonym, thus providing with a couple of words to be utilized as convenient in this context. To greatly help in the selection of a synonym, sentence structure checking would have to be utilized to ensure that, the meaning expressed in the word is not modified by changing what. Attacks from this approach could include building a set of the best synonyms, but this would not help the attacker as different word could be utilized at each illustration the same meaning needs to be indicated, defeating the benefit for this technique. This system of using different words to symbolize common words to defeat cryptanalysis attacks is called "Homophones" [7] in cryptography.

A third strategy that can effectively defeat cryptanalysis is Polyalphabetic substitution, that is, the utilization of "several alphabets to encrypt the subject matter" [3], alternatively than using the same substitution approach over and over. The Vigenere Cipher is a form of Polyalphabetic cipher. This means that, no two individuals are encrypted to the same ciphertext alphabet in the same message. This ensures that, direct frequency research of the cipher is not possible to successfully get the original note. However, other techniques need to be used to identify the key duration, if this can be done, then frequency analysis attack could be utilized to identify the initial plaintext message successfully.

Finally, a possible approach that may be used to beat frequency evaluation is to "encrypt a single character of plaintext with two ciphertext individuals" [3]. Upon encountering the same persona twice, then different people should be utilized to encrypt the note. This is achieved by utilizing a key size double that of the plaintext note and then encrypting the same plaintext with two ideals in the key and save them alongside one another for the same plaintext persona. This might ensure that no two plaintext personas will have the same ciphertext identity, defeating the frequency analysis approach to breaking the cipher.

Modern encryption algorithms and cryptanalysis:

Modern cryptographic algorithms take a much better procedure in defeating consistency analysis based problems. The cryptographic algorithms nowadays use block encryption, somewhat than encrypting heroes bit by bit, thus eradicating the redundancy of ciphertext alphabets for similar plaintext alphabets. "Block ciphers are the central tool in the design of protocols for shared-key cryptography. A block cipher is a function E: 0, 1k - 0, 1n ' 0, 1n. This notation means that E calls for two inputs, one being a k-bit string and the other an n-bit string, and results an n-bit string" [2]. The first input is the main element, which is employed to encrypt the trick message. The next string is called the plaintext, and the outcome is called a ciphertext. The key-length k and the block-length n are guidelines associated to a specific block cipher. They change from block cipher to block cipher, and depend on the look of the algorithm itself. Some of the most trustworthy symmetric ciphers include AES, Triple-DES, Blowfish, CAST and IDEA. In public-key cryptography, the mostly used cryptosystems are RSA and the Diffie-Hellman systems, which have not been found to acquire any vulnerabilities till day.

Preferably, the block cipher E is a public given algorithm. "In typical use, a arbitrary key K is chosen and retained secret between a pair of users. The function EK can be used by the sender to encrypt the subject matter, for a given key, before sending it to the supposed device, who decrypts the note using the same key" [2]. Security depends on the secrecy of the key. So, initially, one might think of the cryptanalyst's goal as recovering the key K given some ciphertext, intercepted during transmission. The block cipher should be designed to make this task computationally difficult. To be able to accomplish that, the algorithms that are being used to encrypt the note must be designed with a high degree of mathematical complexness, which cannot be reversed to get the plaintext from a known ciphertext.

The distance of the key used during encryption of a note takes on an important role in deciding the effectiveness of an algorithm. Key duration is conventionally measured in parts, and almost all of the popular strong ciphers have key measures between 128 and 256 parts. A cipher is known as strong if, after years of attempts to find a weakness in the algorithm, there is no known effective cryptanalytic episode against it. This indicates that, the most effective way of breaking an encrypted meaning without knowing the main element used to encrypt it is to "brute push" it, i. e. striving all possible keys. "Your time and effort required to break an encrypted message is determined by the number of possible secrets, known as thekeyspace. Knowing the velocity of the computer to break the key, it is straightforward to calculate how long it would take to search the keyspace to break a specific cipher" [2].

For example, considering a cipher that uses 128-little keys, each tad can either be 0 or 1, so, there are 2128 or 3-1038 tips approximately. Assume we suppose about ten billion pcs are assigned the duty of breaking the code, each capable of examining ten billion keys per second, then, the duty of running through the complete keyspace would take around 3-1018seconds, which is approximately 100 billion years. "But, in reality, it would be necessary to run through only half the keyspace to hit upon the correct key, which would take around 50 billion years. That is longer than the approximated age group of the world according to modern cosmology, which is about 15 billion years" [2]. This demonstrates, it is nearly infeasible to split modern cryptographic algorithms using Brute Pressure attacks. So, you can imagine the effectiveness of the present day cryptographic algorithms and their level of resistance towards cryptanalytic attacks.


Cryptography has advanced lately and modern cryptographic algorithms have became successful in defending against most forms of cryptanalytic attacks. Frequency analysis based problems have turned out to exploit the weaknesses in traditional encryption algorithms into disclosing the plaintext communication that was encrypted using them. The natural vocabulary used to encrypt information is not considered to be random in characteristics, which is exploited by frequency counting based disorders. Based after the frequency of letters that appear in the ciphertext, you can speculate the plaintext personas because of their redundancy rate and the specific combination of words in a word. This weakness can be repelled by using stream ciphers, which do not take the redundancy in the plaintext to the ciphertext. Modern stop cipher, encrypt a chunk of plaintext into ciphertext and vice versa, getting rid of the redundancy of dialect found in encryption.

Although the algorithm plays an important part, it's the key length found in stop ciphers that helps in repelling cryptanalysis. Modern ciphers use a key length starting from 128 bits, getting rid of the possibility of a brute force attack to decrypt the meaning. The higher the key length, the more time it requires to break these ciphers. These advantages have made modern cryptographic algorithms more popular among the list of security community. No known weaknesses have been found in these algorithms yet, that could allow someone to identify the plaintext concept.


[1] Stallings, W. , Cryptography and Network Security, Chapter 1, Third Model, Prentice Hall, 2003

[2] Schneier, B. , Applied Cryptography, Section 1, Second Release, John Wiley & Sons, New York City, NY, USA, 1996

[3] Hart, G. W. , To Decode Brief Cryptograms, Marketing communications of the ACM 37(9), 1994, pp. 102-108

[4] Lee, K. W. , Teh, C. E. , Tan, Y. L. , Decrypting English Text message Using Enhanced Rate of recurrence Analysis, National Workshop on Technology, Technology and Community Sciences (STSS 2006), Kuantan, Pahang, Malaysia

[5] Zipf, GK. , Individuals Behaviour and the Concept of Least Effort, 1949, Cambridge: Addison Wesley Magazines.

[6] Lewand, R. E. , Cryptological Mathematics, The Mathematical Relationship of America, 2000, Webpages 345-346

[7] Stamp, M and Low, R. M. , Applied Cryptanalysis, 2007, Chapter 1 and 2, John Wiley & Sons, New York City, NY, USA

[8] http://www. simonsingh. net, Online internet regularity analysis tools

[9] http://www. textalyser. net, online words analysis and regularity analysis information

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)