Art by Shoaib Pasha.

Summary of Symmetric Cryptography (Complete)


These are my complete notes for Symmetric Cryptography, covering such topics as Keystreams, Modular Arithmetic, Integer Rings, Shift Ciphers, Affine Ciphers, Stream Ciphers, Block Ciphers, the Data Encryption Standard (DES), the Advanced Encryption Standard (AES), and more.

I color-coded my notes according to their meaning - for a complete reference for each type of note, see here (also available in the sidebar). All of the knowledge present in these notes has been filtered through my personal explanations for them, the result of my attempts to understand and study them from my classes and online courses. In the unlikely event there are any egregious errors, contact me at jdlacabe@gmail.com.

Table Of Contents






I. Introductions

I.I Classification & Terminology.


Fig. 1. - Cybersecurity Classification Hierarchy:
A Classification Hierarchy of the major components of Cryptology. For a further classification of Cryptanalysis, see Rule 5, and III.I for Symmetric Cryptography. The work of the mastermind Jonathan Lacabe himself.

# Cryptology/Cybersecurity: IT Security; The protection of digital information against misuse, incorporating technical, organizational, and implementation-specific aspects. This is the modern, digital theater of the much greater field of 'Security'.

All IT Security follows the CIA Triad: Confidentiality, Integrity and Availability of information. Security is largely focused on attackers, while system safety & reliability is focused on protecting against random technical failures.


# Cryptography: A sub-field of Cryptology;

The science of securing communication through encryption, especially against the cryptanalysis efforts of an adversary. Cryptographic algorithms are the bedrock of all cybersecurity systems - if cybersecurity is a car, then cryptography is the engine.


# Cryptanalysis: A sub-field of Cryptology;

The reductive counterpart to Cryptography, the science of 'breaking' encryption and bypassing the cryptographic security. Though it is the medium of hackers and cybercriminals, it is also a serious scientific field for researchers to test the security of cryptosystems, e.g. systems established using Cryptography. This is the only to absolutely ensure the security of the cryptosystem - remember Schneier's Law.


# Symmetric Algorithms: The classic form of Cryptography - two parties hold a secret key, enabling one party to encrypt a message and the other party to decrypt it. Thus, the same key is used for both encryption and decryption.

Until 1976, this was the only form of cryptography in existence, and describes the basic nature of all historical ciphers (such as the shift/caesar cipher and affine cipher) and Stream Ciphers, Block Ciphers, and DES & AES. Hash functions are somewhat similar to symmetric algorithms, though they can be considered an independent, third type of Cryptographic algorithm.


# Asymmetric/Public-Key Algorithms: The new cryptography, which all modern cryptosystems and what much of the infrastructure of the Internet, is based on. Invented by the Diffie-Hellman team in 1976, Public-Key Cryptography functions by having each party have TWO keys, a private key and a public key.

For more detail on how these functions work, see the "Public-Key Cryptography" section ([[[[[), but know they are used in a variety of fashions: digital signatures, key establishment/management, and more!


# Protocols: Collective cryptographic algorithms that serve a complex security function, like a library of algorithms that work together for a common goal. For example - the Transport Layer Security scheme (TLS) and the Hypertext Transfer Protocol (HTTP) are used in every web browser.

In the words of Edward Snowden, "[The Internet of Things and countless] protocols have given us the means to digitize and put online damn near everything in the world that we don’t eat, drink, wear, or dwell in."


# C. Rule 1. Cryptographic 'Primitives', 'Ciphers', and 'Algorithms' all refer to the same thing in slightly different ways: schemes used to represent data. Primitives are the basic cryptographic functions (building blocks of an algorithm), ciphers are the implementation of an encryption scheme, and algorithms are the combined procedures of primitives and ciphers to create the greater cryptosystem.


# Channel: A transmission medium for data to pass through. Examples include the Internet, airways, and Wi-Fi. Securing transmission from interception is the goal of cryptography.


# Hybrid Schemes: The combined usage of both asymmetric and symmetric algorithms in a cryptosystem, since both type have their own strengths and weaknesses.



I.II Symmetric Crypto. Basics.

# C. Rule 2. Symmetric Cryptography

The ancient problem presents itself: How can two parties communicate over an insecure channel (an 'open channel') without having their communications be intercepted by a third-party?

Say Alice and Bob connect through the internet (which is an open channel due to the potential of package rerouting/interception) and transfer data. An opponent, Oscar, can read their communications by intercepting the data before it reaches Bob.
A diagram showcasing how communications between Alice and Bob, when passing through an insecure channel unencrypted, can be intercepted by the attacker Oscar.

To stop Oscar from reading the message, an encryption algorithm can be established, converting the plaintext message (x) into a ciphertext message (y), and then sending it through the insecure channel. Oscar would only see a stream of random characters/bits as a result of the encryption. Bob, using a decryption function, would convert the ciphertext back into plaintext, thus completing the data transfer without interception.

In order for Bob to not simply decrypt the ciphertext immediately using the decryption function (which should be public for reasons outlined in Rule 3), a Key only known to Alice and Bob (sent through a secure, impenetrable channel) must be used as a parameter in both the encryption and decryption functions.

The key must be inputted into the encryption function to influence the manner in which the plaintext is encrypted into cyphertext. Thus, only with the key will Bob be able to decrypt the Ciphertext. Oscar, who doesn't know the key, will be unable to decrypt the message, even if the decryption algorithm is public.
A diagram showcasing a symmetric cryptosystem, complete with a key sent through a secure channel and encryption/decryption functions.

x = Plaintext message - the unadulterated, original content of the message.

y = Ciphertext, which looks like scrambled characters to an interceptor like Oscar.

e = Encryption function, a mathematical formula that converts x into y.

d = Decryption function, a mathematical formula that converts y into x.

K = The Key fed into the encryption and decryption functions.

|K|, 𝒦 = Keyspace, the total number of possible keys in a cryptosystem. This matters significantly with regard to brute-force Cryptanalysis (see Rule 5). The keyspace corresponds to the key size/key length of the algorithm: If it produces a 168-bit key (see Rule [[[[), then there are 2^168 possible keys.


# C. Rule 3. Keeping the encryption/decryption algorithms 'e' and 'd' secret, preventing their cryptanalysis by the enemy (known as Security by Obscurity), was standard procedure for the ~4000 year history of Cryptography prior to the discovery of public-key Cryptography. However, the only way of ensuring the security of the algorithms is to make them public so it can be analyzed by cryptanalysts.

This is the central principle of a foundational law in Cryptography, postulated in 1883 by Auguste Kerckhoffs:


Kerckhoffs' Principle:

"A cryptosystem should be secure even if the attacker (Oscar) knows all the details of the system, with the exception of the secret key."


In Practice: Never use an untested Crypto algorithm! Furthermore, never roll your own crypto!


# End-Eo-End Encryption (E2EE): A system of encryption widely used in modern communication services (like internet messaging sites), in which all data sent from Alice to Bob (party #1 to party #2) is never decrypted at any point along its route until it reaches Bob.

Thus, all parties intercepting/eavesdropping will be unable to read or manipulate the message, even if they control any of the base stations the message goes through.

An example of how communications between Alice and Bob would occur under a end-to-end encryption scheme.



I.III Cryptanalysis Basics.

# C. Rule 4. There are multitudes of cryptanalysis techniques, and the crucial law is that if just one attack works, then the entire cryptosystem crumbles and fails. While a substitution cipher (Rule 7) may be impenetrable to a brute-force attack (with its 2^88 keyspace), it collapses against letter frequency analysis.

In order for a cryptosystem to be considered secure, it must be resistant against every single type of attack. An attacker always looks for the weakest link in the cryptosystem. Thus, in addition to strong algorithms, safeguards against social engineering and implementation attacks must be instituted.


Fig. 2. - Cybersecurity Classification Hierarchy:
A Classification hierarchy of Cryptanalysis. There are two main branches of Classical Cryptanalysis, Brute-force and Analytical Cryptanalysis. See section I.I for the full classification and III.I for Symmetric Cryptography.


# Classical Cryptanalysis: The combined usage of both asymmetric and symmetric algorithms in a cryptosystem, since both type have their own strengths and weaknesses.


# Social Engineering: Bypassing the protections of a cryptosystem by going directly after the humans that have access to the secret key. Obtaining the key can range from kidnapping and forcing them to tell the key/password, to a simple phishing scheme over the phone or through email.


# Implementation Attacks: Extraction of the key through 'side-channel analysis'. By observing the behavior of the implementation of the cryptosystem in an IC or software, it is possible to deduce important relation relating to the key.

For example: By looking at the electrical power consumption or electromagnetic radiation of a CPU running a cryptographic algorithm, a signal processing technique can be used to recover the key. The run time behavior can also indicate information regarding the key, which is why many cryptographers ensure constant run times in embedded cryptosystems.

These attacks are most relevant when the attacker has physical access to a piece of hardware running the cryptosystem, such as a credit card.


# Attack Vectors: The many possible ways to attack a cryptosystem.


# C. Rule 5. Classical Attack #1: Brute-Force Attack/Exhaustive Key search:

Mathematical Definition:

Let (x, y) denote the pair of plaintext and ciphertext, and let K = {k1 ,..., kκ } be the key space of all possible keys ki. A brute-force attack now checks for every ki ∈ K whether dki == x.

If the equality holds, a possible correct key is found; if not, proceed with the next key.



Explanation:

The testing of all possible keys in a given keyspace with the decryption function, to see if they produce the plaintext. This is akin to thinking of the cryptosystem as a black box, in which the only significant factor to decoding an encrypted message is the number of possible keys the message could have been created with.

As of 2010 (very recent), the largest keyspace that can be searched in a relatively reasonable amount of time is 2^60. However, not all key bits are created equally: a 128-bit symmetric key provides roughly the same security as a 3072-bit RSA (asymmetric algorithm) key.

A chart of the time it would take to crack keyspaces of various lengths using a brute-force attack.


# Moore’s Law: The computing power of the strongest computers (e.g., the number of transistors in an integrated circuit) will double every 18-24 months while the price will remain constant. This means that computing power, growing exponentially over time, will continually pose more and more of a threat to modern cryptographic systems and to antiquated ones still in use, as it becomes cheaper and faster to break them.


# C. Rule 6. Classical Attack #2: Analytical Attacks:

The intelligent counterpart to 'brute'-force, Analytical Attacks examine the internal characteristics of the encryption function and pick apart weaknesses in the function that could enable reconstruction of the plaintext message, such as Letter Frequency Analysis (for the Substitution Cipher) and Differential Cryptanalysis (for Block and Stream Ciphers).

There is no 'equation' for Analytical Attacks, because they are specific and unique to every cryptosystem (though some common characteristics can emerge between similar cryptosystems).



I.IV Generic Analytical Attacks.

At a minimum, an attacker will always know the ciphertext. There are four generic types of Analytic Attacks that specify what information the attacker possesses in addition (if anything):


# Ciphertext-only Attack: An attack in which the adversary only has knows the ciphertext.


# Known-plaintext Attack: In addition to the ciphertext, the adversary also knows some pieces of the plaintext (e.g., header information of an encrypted file or email).


# Chosen-plaintext Attack: An attack in which the adversary can choose the plaintext that is being encrypted and also has access to the corresponding ciphertext, found through possession of the decryption function (whether aware of it's internal workings or not).


# Chosen-ciphertext Attack: An attack in which the adversary can choose ciphertexts and obtain the corresponding plaintexts, the goal being (typically) to recover the secret key.




II. Historical Ciphers & Modular Arithmetic.

II.I Substitution Cipher.

# Historical Cipher: A cipher that was used before the development of modern cryptography, operating on letters instead of bits/bytes. Every one of these ciphers are easily crackable, from the Enigma Code to the Caesar Cipher.


# C. Rule 7. Substitution Cipher

A historical cipher, operating purely on letters, in which every plaintext letter is replaced by a fixed ciphertext letter. It is just scrambling the letters, nothing more.

The Keyspace is 26!, or 2^88, as each plaintext letter that maps to a ciphertext letter decreases the remaining number of possible letters. Thus, a brute-force attack would be out of the question for modern computers.

Since the different letters of the English alphabet have different frequencies, and these frequencies are preserved even after scrambling (just different letters will have the frequencies), Letter Frequency Analysis can be conducted as an attack on the substitution cipher, making it rather simple.

This is an example of a cipher/algorithm in which an Analytical Attack is most effective.



II.II Modular Arithmetic Basics.

# C. Rule 8. Modular Arithmetic:

Mathematical Definition:

(a, r, m) ∈
m > 0

If m|(a − r), then a ≡ r mod m



= The set of all integers.
a = The chosen number.
r = The remainder.
m = The modulus, the highest possible number in the set.
'mod m' = The modulus/remainder operator, which represents the usage of modular arithmetic, with all numbers represented by their remainder with respect to m.


Explanation:

Since the bulk of cryptographic computations are performed in finite sets, special mathematical tools are needed. In particular, Modular Arithmetic is the basis for all modern cryptosystems, and many historical ones as well.

Modular arithmetic is a system of simplifying a number in terms of a given modulus, the highest number in a set. Any number greater than the modulus is rewritten using the modulus operator (or remainder operator, see below), in effect 'wrapping' the number around the modulus, with respect to the difference between the number and any multiple of the modulus, known as the remainder.

The preferred multiple for finding the remainder (indeed, the exact one that would be found by division (see two paragraphs below)) is the greatest one lower than the number, but any will work - see Rule 10, Equivalence Classes.

For example, with a modulus of 12, 61 would be represented as 1 mod 12.

The process of finding the modular equivalent of a number is simple: divide the chosen number by the modulus, and find the remainder. The remainder will then be the coefficient in front of the modulus operator. All numbers smaller than the modulus are rendered ordinarily, without the mod operator.


# C. Rule 9. Modular Arithmetic: Computation of the Remainder.

To find the remainder when m and a are large, burdensome numbers, there is an exact, algorithmic definition:


Mathematical Definition:

(a, m) ∈
0 ≤ r < m

a = q × m + r
Thus, a - r = q × m

m|(a − r)



= The set of all integers.
a = The chosen number.
q = The quotient, which produces a multiple of the modulus.
r = The remainder.
m = The modulus, the highest possible number in the set.

The final equation is precisely the same as what was discovered before, and q is not involved whatsoever - all that matters is the remainder.


# C. Rule 10. Equivalence Classes:

While 12 ≡ 3 mod 9, it is equally true that 12 ≡ 21 mod 9, and that 12 ≡ -6 mod 9. This is because, while the remainders may not exactly be the difference between 'a' (12) and the highest multiple of m lesser than a (9), they all still return true in the formula of Rule 8:

12 ≡ 3 mod 9, 3 is a valid remainder since 9|(12−3)
12 ≡ 21 mod 9, 21 is a valid remainder since 9|(12−21)
12 ≡ −6 mod 9, −6 is a valid remainder since 9|(12−(−6))

The set of all values that behave 'equivalently' for a particular modulus, in that they can be simplified down to the smallest possible positive number for r, form what is known as an Equivalence Class, an infinite set/series of values.

For 3 mod 9, the equivalence class of r values is an infinite set, a partial series of which is as follows:
{..., -24, -15, -6, 3, 12, 21, 30, ...}

If any of these values were to be substituted in for r (3) in 3 mod 9, it would be equivalent, the same value, because of the mod operator.

Note: There are 'm' equivalence classes for each possible remainder (0 through m-1), which collectively contain all possible integers. The difference between any two neighboring values in an equivalence class will be the modulus, as shown above.


# C. Rule 11. Modular Reduction:

Because any value in an equivalence class acts the same with regard to the mod operator of a given modulus, then substitutions in mathematical operations that use the modulus operator can be performed, simplifying them.

Take (13 × 16) - 8, with modulus 5. While the full number can be computed (200) and an answer derived from there (0 mod 5), that is lame and time-consuming. A much cooler, faster, 21st century means of solving the problem is to substitute in for each term the simplest member of their respective equivalence classes:

13 mod 5 has a remainder of 3, and so 3 is the simplest term.
16 mod 5 has a remainder of 1, and so 1 is the simplest term.
8 mod 5 has a remainder of 3, and so 3 is the simplest term.

Therefore, the same calculation can be performed with simply (3 × 1) - 3, which returns 0 mod 5, the same answer as before, but found much easier-ly.


Note that this substitution trick does not equally extend to all mathematical operations. Exponentials, for example, cannot be simplified with the modulus:

For the problem 3⁸ mod 7, the exponent '8' cannot be simplfied down to 1 mod 7, and the entire problem to 3 mod 7. Exponents simply cannot have the substitution performed.

For all exponentials, simplification into smaller components (using the 2nd Holy Property of Exponents, Math Rule 66 ([[[[[)) must be performed to simplify. For ex.: 3⁸ = (3²)⁴ = (9)⁴ = (2)⁴ = 16 mod 7 = 2 mod 7, which is the answer.


Operations where equivalence class substitutions are allowed:

Multiplication
Addition
Substraction

Operations where equivalence class substitutions are banned:

Exponentials
Division


# C. Rule 12. When getting a result in modular arithmetic, always simplify down to the smallest positive member of the equivalence class for your final answer.



II.III Integer Rings.

# Integer Ring: A set of elements which can have basic arithmetic operations performed upon them, such as addition and multiplication, so that the natural properties expected of addition and multiplication all hold. There are ring definitions/interpretations of polynomials, matrixes, and modular arithmetic (Rule 13). Rings are a fundamental part of Ring Theory, a field of Abstract algebra. ([[[[[[)


# C. Rule 13. Algebraic Modular Arithmetic:

Modular Arithmetic can be defined in the form of an integer ring:


The integer ring m consists of the following characteristics:

1. The set m = {0, 1, ..., m-1}

2. Two arithmetic operators, "+" and "×", hold for all (a, b, c, d) ∈ m that:
  i)  a + b ≡ c mod m
  ii) a × b ≡ d mod m



There are many overshadowing rules that every ring has to fulfill, seen in the section "RULES OF THE RING" below.


# RULES OF THE RING:

These following rules hold true for all integer rings in general, not just modular arithmetic.

  1. The ring should be closed: if any two numbers from the set are multiplied or added together, the result must be in the ring. This is ensured by the modulus operator.

  2. Addition and multiplication are associative properties: For all (a, b, c) ∈ m, a + (b + c) = (a + b) + c, and a × (b × c) = (a × b) × c.


  3. Addition:


  4. Addition is commutative for all (a, b) ∈ m: a + b = b + a.

  5. There is an additive neutral element, 0. For all elements a ∈ m, it holds that a + 0 ≡ a mod m.

  6. The additive inverse always exists: for any element a in the ring, there is always a negative element '-a' such that a + (-a) ≡ 0 mod m, the neutral element.


  7. Multiplication:


  8. There is a multiplicative neutral element, 1: For all elements a ∈ m, it holds that a × 1 ≡ a mod m.

  9. The Multiplicative Inverse exists only for some elements: For a ∈ , the inverse a⁻¹ is defined such that a × a⁻¹ ≡ 1 mod m. The nature of most inverses and the reasons for their occasional nonexistence are outlined in Rules 14 & 15.

    If there exists an inverse for a, then, a rare usage of division in the modular ring becomes possible: the element a⁻¹ can be divided by, as a divisor, since b/a ≡ b × a⁻¹ mod m.

  10. The distributive property holds for all ring operations. For all (a, b, c) ∈ m, a × (b + c) = (a × b) + (a × c).


# C. Rule 14. To determine if an inverse exists for a specific 'a' of a specific modulus, simply find the gcd (Greatest Common Divisor) of the two values:

If gcd(a, m) = 1, then there exists a⁻¹, and a & m are said to be "relatively prime", or "coprime".

If gcd(a, m) ≠ 1, then there does NOT exist a⁻¹.

If m is prime, then gcd(a, m) will always equal to 1, for all nonzero 'a' values, and thus every 'a' value will have an inverse in those rings. The inverse of an integer completely depends on the ring, and thus the modulus.


# C. Rule 15. Finding the Multiplicative Inverse of a has the following formula (as described in Rule of the Ring #7):

a × a⁻¹ ≡ 1 mod m

Though the multiplicative inverse of a is literally shown as a⁻¹, it is almost never the actual reciprocal of a. The reason for this is simple: All reciprocals other than -1 and 1 are not integers, and are thus not included in the set.

Take a = 2, for example. The supposed inverse '1/2' is incorrect, because it is not included in the set. A different inverse must be found specific to the modulus; if the modulus were 3, then an inverse of '2' would work:

2 × 2 ≡ 1 mod 3



II.IV Shift/Caesar Cipher.

# C. Rule 16. Shift Cipher:

Mathematical Definition:

(x, y, k) ∈ 26

Encryption: ek(x) ≡ (x + k) mod 26

Decryption: dk(y) ≡ (y - k) mod 26



x = Plaintext message.
y = Ciphertext.
k = The Key fed into the encryption and decryption functions.
26 = A ring of the first 26 integers (including 0), matching the 26 characters of the alphabet.
ek(x) = Encryption function, a mathematical formula that converts x into y.
dk(y) = Decryption function, a mathematical formula that converts y into x.
mod 26 = The modulus/remainder operator, which represents the usage of modular arithmetic, with all numbers represented by their remainder with respect to 26.


Explanation:

A simplified form or special case of the Substitution cipher, the Shift Cipher (or Caesar Cipher) merely shifts every plaintext letter however many so positions rightward in the alphabet, wrapping around when it reaches the end (modular arithmetic!). This is occasionally known as the "ROT" Cipher, and the most well-known example is ROT13, which shifts rightward precisely halfway through the alphabet.

As a generic example, with a key of 1, some shifts would be as follows:

a -> b
b -> c
...
z -> a

As shown, once the alphabet reaches letter (26 - k) + 1, or the position in the ring 26 (26 - k) (since the set begins at 0), the shift returns to the front of the alphabet.

It should be obvious to all this is an extremely unsecure cryptosystem, with a keyspace of only 26. It is vulnerable to both letter frequency analysis (explained in Rule 7) and brute-force attacks (Rule 5).


# C. Rule 17. Vigenère Cipher

Mathematical Definition:

(x, y, k, l) ∈ 26

Encryption: ek(x) ≡ (xi + ki mod l) mod 26

Decryption: dk(y) ≡ (yi - ki mod l) mod 26



x = Plaintext message.
y = Ciphertext.
k = The Key/codeword fed into the encryption and decryption functions, which, like the plaintext, is a series of letters (represented by their numerical value, 0-25) that are used by their i position.
i = The position in the string of both the key and the plaintext.
l = The length of the codeword.
26 = A ring of the first 26 integers (including 0), matching the 26 characters of the alphabet.
ek(x) = Encryption function, a mathematical formula that converts x into y.
dk(y) = Decryption function, a mathematical formula that converts y into x.
mod 26 = The modulus/remainder operator, which represents the usage of modular arithmetic, with all numbers represented by their remainder with respect to 26.


Explanation:

A mere variation of the Shift cipher, in which the shift for each letter is determined by a secret code word 'c', which has 'l' letters. Each character in the code word c represents a number of shifts rightward according to its cipher.

The letters of the code word are denoted as (c0, c1, ..., cl - 1), and as the keys, they are also represented as (k0, k1, ..., kl - 1), respectively.

The encryption function works as follows: Each character in the plaintext string 'x' is applied a rightward shift according to it's respective letter (identical position) in the key text (a cyclic shift, if the addition of the key letter produces a number greater than 26).

This dictates that the length of the code word must be the same or larger than that of the actual code itself, which is somewhat inefficient for longer plaintexts. E.g., |x| ≤ l.

The decryption function, of course, is just the opposite of the encryption function, moving each character of the ciphertext backward by the value in the positional counterpart of the code word 'c'.



II.V Affine Cipher.

# C. Rule 18. Affine Cipher:

Mathematical Definition:

(x, y, a, b) ∈ 26
K = (a, b)

Encryption: ek(x) ≡ ((a × x) + b) mod 26

Decryption: dk(y) ≡ (a⁻¹ × (y - b)) mod 26



x = Plaintext message.
y = Ciphertext.
K = (symbol not depicted) The key, which has been split into two parts: a and b.
a = The multiplied value of the key. The number of possible 'a' values is limited by the necessity of the inverse (for the decryption function), as explained below.
a⁻¹ = The inverse of a, necessary for the decryption function.
b = The "shift parameter" value of the key, only in addition and subtraction operations. There are 26 possible values for b, explained below.
26 = A ring of the first 26 integers (including 0), matching the 26 characters of the alphabet.
ek(x) = Encryption function, a mathematical formula that converts x into y.
dk(y) = Decryption function, a mathematical formula that converts y into x.
mod 26 = The modulus/remainder operator, which represents the usage of modular arithmetic, with all numbers represented by their remainder with respect to 26.


Explanation:

The Affine Cipher is an attempted improvement of the shift cipher, done by complicating the encryption function through the addition of multiple parameters, but is still extremely easy to break. The affine cipher is performed by splitting the key into two parts:

K = (a, b)

The number of possible b values in the system is 26, as it is merely the shift parameter (as shown in the definition equations).

The number of possible a values is limited by the condition explicated in Rule 14, with m = 26: only when gcd(a, 26) = 1 is 'a' an inverse, and thus contributive to the number of possible values. Counting all possible values of a (1, 3, 5, 7, 9, ...), all odd numbers 1-26 other than 13, produces an end result of 12 possible 'a' values.

The keyspace, being (#a) × (#b), is therefore 312. This remains very easy to break using a brute-force attack (Rule 5).

Furthermore, letter frequency also remains preserved (since any of a particular character (plaintext) will always be converted to another specific character (ciphertext), despite the minute differences in the encryption function), so a letter frequency analysis attack is also applicable.

Though multiple encryption will not increase the security of an affine cipher, it will for other ciphers, such as the Data Encryption Standard (see Rule 43).


# C. Rule 19. Always, as a rule of life, write denominators as (x)⁻¹ instead of as below a line. With the importance of the inverse operation, it always critical to recognize all divisors as being inverses of their variable. 1/a = a⁻¹.


# C. Rule 20. Note: You can ALWAYS derive the decryption function from the encryption function in mathematical definitions of basic ciphers/algorithms. Just know that the encryption function is equal to y and the decryption function is equal to x, and you can simply isolate the appropriate variable to find whichever function you wish.

Also recall that the mod operator will simply remain in place - it can be practically ignored until the end of the isolation, with it not playing a part in moving variables from side to side and whatnot.




III. Stream Ciphers.

III.I Introduction to Stream Ciphers.


Fig. 3. - Symmetric Cryptography Hierarchy:
A Classification hierarchy of Symmetric Cryptography. There are two major branches of Symmetric Cryptography, Block Ciphers and Stream Ciphers. See section I.I for the full tree, and Rule 5 for Cryptanalysis.


# C. Rule 21. Stream Ciphers

Mathematical Definition:

(xi, yi, si) ∈ 2

Encryption: esi(xi) ≡ (xi + si) mod 2

Decryption: dsi(yi) ≡ (yi + si) mod 2



xi = The Plaintext bit at index i.
yi = The Ciphertext bit at index i.
si = The "Key Stream" at index i, equivalent to 'k' (as in past ciphers). Here, it is the Key (at index i) fed into the encryption and decryption functions.
2 = A ring of the first 2 integers: {0, 1}.
esi(xi) = Encryption function, a mathematical formula that converts each x bit into a y bit.
dsi(yi) = Decryption function, a mathematical formula that converts each y bit into an x bit.
mod 2 = The modulus/remainder operator with respect to 2. Numbers can only be 0 or 1 in this reality.


Explanation:

A cipher that encrypts bits individually, processing one bit after another in a sequence. This is opposed to a Block Cipher (see Section IV), which takes in large amounts of bits all at once.

The modular 2 operation (see Rule 22) has its own symbol in cryptographic diagrams: a circle with a cross. In electrical engineering, the mod 2 function is known as the XOR gate, a type of binary gate ([[[[[).

Using this symbol, the encryption and decryption functions of the basic stream cipher can be depicted as shown:

A simple diagram displaying how a simple Stream Cipher (abiding by the encryption and decryption functions defined) would work. Note the usage of the XOR gate, or modulus 2 function.

If the key bit is zero, then the plaintext bit will pass through unchanged, but if the key bit is 1, then the plaintext bit will flip (like binary). Thus, if the keystream (by divine intervention) is straight 0's, then the plaintext will go unencrypted.

The central complexity of the Stream Cipher is the generation of the "Key Stream", e.g. the stream of bits with which the plaintext is encrypted. This generation must be performed through Random Numbers - see Rule 23.


# Key Stream: A stream of characters that is combined with the plaintext to produce the ciphertext in a Stream Cipher, or, in the case of asynchronous stream ciphers, combined with a previous ciphertext to produce a new ciphertext.

The key stream is derived from the key and the seed. Generally, the key stream is denoted as "si", while the seed itself is s0, the initial position (which may have been created using an actual TRNG) from which all recursive stream ciphers evolve from in their encryption functions, at least all ciphers that use the PRNG algorithm, like the LCG (see Rule 25).


# Synchronous Stream Cipher: A Stream Cipher in which the key stream depends solely on the key, generated independently of the plaintext or ciphertext. Examples include Salsa20 & ChaCha.


# Asynchronous Stream Cipher: A Stream Cipher in which the key stream depends on the ciphertext in addition to the key - the key generation is reliant on previous ciphertext, making it a risky operation. An example is the Cipher Feedback (CFB) mode.


# C. Rule 22. Mod 2 addition and subtraction are the same operation (an XOR gate). Multiplication complicates things, but in all instances where the mod 2 function is limited to addition or subtraction, the results will be identical no matter which operation is applied.

Note that this does NOT extend to the modulus of any other number - only for two. See the Vigenère Cipher (Rule 17), which works akin to a stream cipher, for proof.



III.II Random Number Generation.

# C. Rule 23. Random Number Generators (RNG):

The means with which a system can generate random numbers, for the purposes of constructing/filling a keystream with random bits, are split into three main mechanisms:

  1. True Random Number Generators (TRNGs): "True" random numbers stem from random physical processes that cannot be reproduced.

    Such natural sources of entropy include flipping a coin, rolling a die, thermal noise, mouse movement, and keystroke timing.

    Hardware and cryptographic systems have tried a lot of different techniques over the years to create a TRNG by analyzing/recording internal characteristics of the computer. Most modern CPU's have a TRNG within a TPM (trusted platform module) on the motherboard.


  2. Pseudo Random Number Generators (PRNGs): An RNG system that is random, but reproducable. Because PRNG's are computed, they are inherently deterministic processes. Since they are deterministic, Alice and Bob can generate the same key stream independently of one another, only needing to know the initial "seed", which is like the key for the key.

    Often, PRNG's are computed with the following function:

    si+1 = f(si)

    s0 = The initial seed value, which may be an actual TRNG.
    f(si) = The 'randomizer' function, which performs whatever upon the seed in order to computationally randomize it.

    This recursive generation is the form taken by the Linear Congruential Generator (see Rule 25).

    The rand() function in C is a PRNG and uses a seed of 12345, with the following function as the randomizer:
    Si+1 = ((1103515245 × si) + 12345) mod 2³¹.


  3. Cryptographically Secure Pseudorandom Number Generators (CSPRNGs): PRNG's that have an additional property: the numbers must be unpredictable.

    Given n output bits (si, si+1, ..., si+(n-1)), it is computationally infeasible to construct si+n.

    While in most applications, PRNG systems are perfectly fine and suited to create 'random' numbers for whatever purpose, they form very unsecure cryptosystems, and thus in cryptography, the additional property of unpredictability is required.


# Unconditional Security: A cipher is "unconditionally secure" (or "information-theoretically secure") if it cannot be broken even with infinite computing resources.

Even if a key is 2^1000 bits, it would not be considered "unconditionally secure", because 2^1000 computers (a patently not infinite amount of computers) would be able to break the key in one second. This is despite the fact that such levels of computing power will never be realized in human history.

Thus, though there are very many ciphers that are "practically" secure (forming the backbone of our modern cybersecurity systems), they are not unconditionally secure.


# Computational Security: A cryptosystem is computationally secure if the best known algorithm for breaking it requires at least t operations, where 't' is some ridiculous number far beyond the reach of modern brute-force computational abilities.


# C. Rule 24. One Time Pad (OTP)

The One Time Pad (also referred to as the "Vernam cipher"), as it were, is the unconditionally secure, perfect encryption algorithm, unbreakable in any meaningful way. The OTP is a stream cipher with the following properties:

1) The Keystream bits si must originate from a TRNG.

2) Each Key stream bit must be used only once.

Knowledge of the Key Stream must be exclusively limited to the involved parties, of course. At the end of the data transfer, the party that received the data must destroy the key so that it cannot be used by another party to decrypt the message after the fact. This is the same reason why reusing keys is extremely risky and nullifies the security benefits of the OTP.

The security of cryptosystem draws from the fact that, without knowledge of the keystream, even a brute-force attack would be futile, because any two text strings are equally as probable to be a particular word in the plaintext: if a four letter word were to be encrypted, it is equally probable for it to be "when", "stop", or "abcd".

There is one fundamental drawback of the OTP, however, that prevents its universal application: since keystream bits cannot be reused, the key has to be as long as the plaintext. A 400MB file, 8 bits to the byte, would entail a 3.2GB key.


Fig. 4. - Practical Stream Ciphers:
A diagram of how all practical stream ciphers should function, with the modular 2 addition highlighted and the key stream generator (which varies from cipher to cipher) displayed.


# Bit-length: The number of bits required to represent an integer as a binary number. The capital letter 'A', for example, has an ASCII value of 65, which is 1000001, meaning a bit-length of 7. It is often calculated using log2, since binary positions are dictated by powers of two.


# C. Rule 25. Linear Congruential Generator (LCG):

Mathematical Definition:

(A, B, si) ∈ m
Key K = (A, B)

Si + 1 ≡ ((A × Si) + B) mod m



Si + 1 = The following value in the keystream, which is recursively using the value of the previous bit in the keystream (along with the key) to calculate the new value. It is exactly ⌈log2 m⌉ (bit-length of m) bits long.
S0 = The initial seed value, which may have been created using a TRNG. It is exactly ⌈log2 m⌉ (bit-length of m) bits long.
Si = The Key Stream at index i. It is exactly ⌈log2 m⌉ (bit-length of m) bits long.
m = A ring of m integers.
A = The multiplied value of the key. It is exactly ⌈log2 m⌉ (bit-length of m) bits long.
B = The "shift parameter" value of the key, only used for the addition operation. It is exactly ⌈log2 m⌉ (bit-length of m) bits long.
mod m = The modulus/remainder operator, which represents the usage of modular arithmetic, with all numbers represented by their remainder with respect to m.


Explanation:

A means of generating a unique key stream using the PRNG system, based on the attributes of the key and the seed. This makes it more practical a system than the One Time Pad, but much easier to break (see Rule 26).

Of course, separate from the nature of the function/equation itself, note that there are no "Encryption" or "Decryption" functions, because the given equation is merely for composing the keystream (in the Rule 23 PRNG-style), while the equation for the Stream Cipher itself remains the same (Rule 21).


# C. Rule 26. HOW TO CRACK A LINEAR CONGRUENTIAL GENERATOR (LCG):

In order to crack a rudimentary LCG system and discover the full keystream to decrypt the system, the attacker only needs to know three characters of the plaintext that were encrypted (along with all of the ciphertext). Generally, in encrypting a document (such as Microsoft Word or Excel), there will be file or protocol header information at the start (like the version number), which will always be the same initial symbols being encrypted. Therefore, knowledge of the attacker of these plaintext characters, in addition to the ciphertext, can be assumed.

With knowledge of the LCG key-stream equation, the attacker would know S0, S1, and S2 only. There would thus be three unknowns, A, B, and m. However, the set of commonly used m values is relatively small (see list) could possibly be, and so iterating through them using brute-force (once A and B have been discovered) would be very simple: calculate the associated key stream for each probable m value, then check each key stream using the stream cipher decryption function (see Rule 21) and the known ciphertext, until the correct key stream (and thus m value) is found. Thus, m is considered one of the known variables.

The attacker only has to create a system of equations-style problem (using the three known plaintext characters and the ciphertext) in order to isolate A and B, respectively.

S1 ≡ ((A × S0) + B) mod m
S2 ≡ ((A × S1) + B) mod m

A = ((S1 - S2) × (S0 - S1)⁻¹) mod m
B = (S1 - (S0 × (S1 - S2) × (S0 - S1)⁻¹)) mod m

After finding A and B, all that would be left to be done is to brute-force m as previously described, and the system would be cracked.



III.III Linear Feedback Shift Registers.

# Clocked Storage Element (CSE): A storage container that captures information at a specific moment in time, storing it until needed. In electrical engineering, CSE's are part of clocking subsystems and respond to Clock signals (E.E. Rule [[[[[).


# C. Rule 27. A "Flip-flop" is a clocked storage element that stores a single bit. There is an input, an output, and a Clock Input that determines whether any inputted bit is to be stored or not.

The general symbology used in diagrams for a flip-flop is as follows:

A diagram of the basic functions of the flip-flop electrical component. D is the input, Q is the output. Ignore the Q with the line for now. Courtesy of buildcircuitelectronics.

And here is a diagram showing how the output relies on clock signals to accept an input change, if any:

A diagram detailing how the clock signal will influence what data from the input is stored and produced in the output. As shown, the flip-flop will only store the first event captured in the input when the clock signal becomes active, regardless of whether the input changes while the clock signal is still active. Courtesy of Wikipedia.


# Shift Register: A "shift register" is akin to a system, a concatentation of several individual flip-flops, each sending a signal to the next until the final flip-flop, which will produce each bit of the output. It is one half of an LFSR.

An example of a shift register, with three flip-flops, with every new value sent receiving a clock signal:

A shift register, complete with three flip-flops, and a starting value of 1-0-0. Data moves over once, and then each change is moved over until there are no more changes to be moved.


# C. Rule 28. Linear Feedback Shift Registers:

Mathematical Definition:

The equation depends on the specific characteristics of the LFSR. Specifically, the number of flip-flops, the number of mod 2 functions, and which flip-flops have mod 2 functions. The following formula applies to all LFSR's and is fully explained/derived in Rule 29:

(Si, Pj) ∈ 2
(i, j) ∈ m


$$S_{m+1} \equiv \left( \sum_{j=0}^{m-1} S_{i+j} \times P_{j} \right) \mod 2$$
m = The number of flip-flops in the LFSR. This number determines both the length of the state register and the period (see Rule 31) of the output sequence.

i = The offset in the shift register, e.g., the number bit of the key stream that is being found. This is why it is ∈ m - it can go no greater than value m.

j = The index variable for each individual flip-flop, which have the potential to contribute to the feedback path, as determined by the feedback coefficient.

Sm + 1 = The next output bit into the keystream produced by the LFSR after shifting.

Si + j = The bit held by a particular flip-flop j, at a specific offset in the shift register i. This is the value being inputted into the multiplier function from the flip-flop.

Pj = The feedback coefficient of flip-flop j. Either 0 or 1.

mod 2 = The modulus/remainder operator with respect to 2. Numbers can only be 0 or 1 in this reality.


Explanation:

There is an inherent relationship between electrical engineering and Cryptography, specifically in how Cryptography developed: in the early days of the development of the Stream Cipher, the goal was to create a 'small' stream cipher, e.g. a stream cipher that could be implemented in hardware without using too much power.

Today, all stream ciphers fit one of two categories: ciphers optimized for software implementation, and ciphers optimized for hardware implementation. A prime example of the latter are "Shift Register-Based Stream Ciphers", in particular shift registers with feedback, better known as Linear Feedback Shift Registers.

Occasionally referred to as cryptographic "linear recurrences" by various demented individuals, Linear Feedback Shift Registers are algorithms of key stream generation that utilize yet another PRNG, and are thus crackable with some effort (see Rule 35).

Many ciphers, such as the A51 cipher, consist of multiple LFSRs (in the case of A51, 3), which greatly enhance the security of the cipher and validate the usage of LFSR's for legitimate cryptographic purposes ([[[[[[[[[ how?).

LFSR's incorporate a standard Shift Register with a feedback mechanism/path, generating fresh input for the first flip-flop using some output data. This is done by a modular 2 addition function (an XOR gate), which can be performed between the value from the final flip-flop and any other located in the shift register. Multiple mod2 functions can be added in fact, complicating the structure of the LFSR.

The mod 2 calculation is first performed in the initial state of the shift register, before the first clock cycle, pre-computing the value that will be used as input for the first flip-flop on the first clock cycle.

Remember: The feedback mechanism that the outputs take uses a mod2 function, but the actual input into the first flip-flop itself is without any special system - it is directly taking in the given number.

Very important to notice in the progress of an LFSR is that there will be a distinct pattern that holds in each calculated value (placed below the flip-flops on a diagram), known as the Diagonal Rule: each bit value will be translated diagonally to the right, for as many flip-flops as there are to the right, continually. Note this pattern in the diagram below:

A Linear Feedback Shift Register, complete with three flip-flops and a full period of values, which compose one complete key stream.


The specific mathematical formula of the key produced by the LFSR depicted is:
Si+3 ≡ (si+1 + si) mod 2.


# Feedback Path/Feedback Mechanism: The path in which all the XOR gates (or other operations) are located, streaming back to the first flip-flop for the new input. This mechanism, when combined with a shift register, helps form an LFSR.


# State (in regard to LFSR's): Also referred as the "initialization vector" of the LFSR in this context. The bits of the flip-flops at a particular shift register offset. Given knowledge of an LFSR state and the XOR-positions/feedback path, the following state, produced through another shift register offset, can be determined.

Even without knowledge of the XOR positions, by pure knowledge of the first state, every flip-flop bit in the following state (except for the first F.F., the recipient of the input of the feedback state) can be determined, as a result of the diagonal rule described in Rule 28.


# C. Rule 29. LFSR Generalization:

To generalize the LFSR so that a general equation can be applied, several new mathematical operations must be introduced.

First, the number of flip-flops in the register can be generalized to an arbitrary 'm' number of times, which is the simplest generalization to be made.

It is possible for any of the flip-flops to have an XOR gate, and for there to be as many XOR gates as flip-flops, minus one. Since all of the flip-flops have a chance at being part of the feedback path, a multiplier function must be added after each flip-flop, between the path after the flip-flop and any possible XOR gate above on the feedback path. Thus, the multiplier controls the way to the XOR gate, and essentially acts as an on-or-off switch.

A generalized LFSR, with every possible link between a flip-flop and the feedback path having a multiplier function, serving as the 'switch' determining whether that flip-flop is included as a part of the XOR operation.


The multiplier function works by multiplying the value of the flip-flop (the Input value I) by the internal P-value, which is determined by whether the flip-flop actually has an XOR gate or not (see below). The p-value can only be a binary value: pi ∈ ℤ2.

The multipliers (labeled P0 to Pm-1) have their own values, known as "feedback coefficients". These values are dictated by the "switch", e.g. if there is or isn't a connection to an XOR gate from that flip-flop. Furthermore, there is an input I and output O that travels from the multiplier onto the XOR gate path.

If the switch is closed (e.g. there IS an XOR gate connected to that flip-flop, and thus a connection to the feedback path), then the p-value will equal 1, since the input is equal to the output: O = pi × I = I.

If the switch is open (no XOR gate), then the p-value will equal 0. The output O is equal to 0, and since O = pi × I, and I is not necessarily 0, then pi must equal 0.

For the following formula, you must assume that sm is the sum of all of the XOR gates, e.g. the preloaded value that is ready to be inputted into the first flip-flop after the initial state has finished being processed. Thus, the second complete XOR-ing of the values (the second value inputted into the first flip-flop) will be Sm+1, and so on. The value of the input into the first flip-flop is the result of the summation of all of the possible XOR operations that could be present in the LFSR.


(Si, Pj) ∈ 2
(i, j) ∈ m

$$S_{m+1} \equiv \left( \sum_{j=0}^{m-1} S_{i+j} \times P_{j} \right) \mod 2$$
m = The number of flip-flops in the LFSR. This number determines both the length of the state register and the period (see Rule 31) of the output sequence.

i = The offset in the shift register, e.g., the number bit of the key stream that is being found. This is why it is ∈ m - it can go no greater than value m.

j = The index variable for each individual flip-flop, which have the potential to contribute to the feedback path, as determined by the feedback coefficient.

Sm + 1 = The next output bit into the keystream produced by the LFSR after shifting.

Si + j = The bit held by a particular flip-flop j, at a specific offset in the shift register i. This is the value being inputted into the multiplier function from the flip-flop.

Pj = The feedback coefficient of flip-flop j. Either 0 or 1.

mod 2 = The modulus/remainder operator with respect to 2. Numbers can only be 0 or 1 in this reality.


# Primitive Polynomials: A form of irreducible polynomials, akin to a "prime" polynomial, with the only "factors" being 1 and the polynomial itself. [[[[[[[EXPLAIN IN A MORE COHESIVE AND FULLY ELABORATED MANNER[[[[[[[[.


# C. Rule 30. Types of LFSRs:

In practice, there are three types of LFSRs, the properties of which are the result of the feedback coefficients and degree.
  1. Primitive Polynomial-based: These LFSRs are composed of primitive polynomials (see definition), and produce a maximum-length sequence, the only LFSRs to do so.

  2. Non-Primitive Irreducible Polynomial-based: Using irreducible polynomials that are not "primitive", these LFSRs produce a period that is totally independent of the initial value/"state" of the register. These algorithms do not produce maximum-length sequences.

  3. Reducible Polynomial-based: These LFSRs have a sequence/period dependent on the initial state of the register. Of course, these algorithms can never produce a maximum-length sequence.


# C. Rule 31. The Period is the number of different strings of numbers that can occur in a specific LFSR before repeating. In the LFSR given in Rule 28 (with 3 flip-flops), this number is 7, e.g. all possible states.

A simple principle can thus be derived: The maximum possible period (or sequence length) generated by an LFSR of degree m is 2m - 1.

As soon as a previous bit pattern/sequence (a 'state') is reached, then the cycle will repeat itself. Only certain feedback configurations (Pm-1, ..., P0) yield maximum length sequences, regardless of the initial bit pattern/state (!). For any LFSR with a set number of flip-flops, m, and a given positioning/number of XOR gates, no matter what the initial state is, the period will always be the same.

This is why the given value is the maximum possible period of an LFSR - only some reach it.

There are patterns within the LFSR, specifically that of the positioning of the XOR gates with respect to the degree m, that can be used to determine if an LFSR will produce a maximum-length period or not - see Rule 34 for more information. These patterns require knowledge of the Polynomial Representation of LFSR's, described in Rule 32.


# C. Rule 32. The only string of bits that can never appear in an LFSR is straight 0's. Such a thing is only possible if the initial setting is all straight 0's, which is a ridiculous thought not to be entertained.


# C. Rule 33. Polynomial Representation of LFSR's:

All that is needed to represent the nature of an LFSR (devoid of the initial state) is the degree m (the number of flip-flops), and the feedback coefficients. These characteristics can be represented in simple polynomial form:

P(x) = xm + Pm-1 × xm-1 + ... + P1 × x + P0

m = The number of flip-flops.
x = The simple polynomial variable, nothing to be plugged in.
Pm-1 = The feedback coefficient of the very first multiplier function (and thus, the flip-flop it represents) in the sequence. Pm would be to the left of the first flip-flop, along the very end of the feedback path, and thus is always 1 and is unnotated. The sequence Pm to P0 represents all multiplier functions, left to right.

For example: The LFSR depicted in Rule 28 would be represented as x³ + x + 1.

Even if there were no XOR gates in the entire LFSR, then the equation would still be xm + 1, representing the ending and beginning points of feedback path.

What you are really looking at when you see a polynomial expression of an LSFR is the degree of the first term, and all of the terms in between the first term and the last term, because there is always going to be a +1 at the end. Otherwise, the output would not be incorporated into the feedback path, which is utterly preposterous and is never seen in the real world.

There are patterns in LFSR polynomials that can be used to determine whether the LFSR has a maximum-length period or not, described in Rule 34.


# C. Rule 34. Deriving Maximum-Length Periods from LFSR Polynomials:

Only LFSR's with primitive polynomials (see definition) yield maximum-length sequences.

Below is a table showcasing an example primitive polynomial for each m value 2-128. Note that there are multitudes of primitive polynomials for any given m value, too many to produce a complete list. There are 6.92 × 10⁷ primitive polynomials for an LFSR of m 31.

In the table below, take the values as representing the degrees in reverse order: x^0 + x^1 + x^2, for example. This ensures the +1, a necessary component of any LFSR polynomial as explained in Rule 33.

A table showcasing an example primitive polynomial for each m value 2-128.


# C. Rule 35. HOW TO CRACK A LINEAR FEEDBACK SHIFT REGISTER (LFSR):

While Linear Feedback Shift Register's have the properties of relative randomness and are exceedingly efficient (with 100 flip-flops, a miniscule amount of space in hardware, producing a period of 2¹⁰⁰ - 1), it fails on the last hurdle: unpredictability.

By its very nature, an LFSR is a deterministic means of producing a key stream, thus opening it up to vulnerabilities that can be exploited. The main principle that must be acknowledged in attempting to 'break' or 'crack' any stream cipher that utilizes an LFSR is as follows:

If an attacker knows (at least) 2m output values of an LFSR, he can recover the entire LFSR configuration.

Oscar, as always knows the complete ciphertext string. Furthermore, he knows the degree m of the LFSR, and, most damningly, he knows the first few plaintext characters: x0, ..., x2m - 1. With the header information, it is still realistic for that amount of plaintext to be known to the attacker.

Given this knowledge, Oscar simply has to follow a 3-step process in order to recover the entire key:


Step 1:
Utilizing the definition of the encryption function of a stream cipher, the key stream at position i can be isolated:

i ∈ 2m

Si ≡ (xi + yi) mod 2.

Thus, Oscar is able to recover the first 2m output bits (the output/keystream bits) of the LFSR, which is not enough: the maximum possible period is 2m - 1, so there remain a considerable number of output bits that need to be recovered.


Step 2:
In order to discover the remaining bits, the feedback coefficients of the multiplier functions must be discovered. This is done using the generalized summation equation of the LFSR. Expanding out the summation will produce the following equation:

Sm ≡ (Sm-1 × Pm-1 + ... + S0 × P0) mod 2

The only unknowns in this equation are every single feedback coefficient. Thus, there are a total of m unknowns that need to be solved for. In order to solve for the total of m unknowns, a system-of-equations type problem, with m equations, must be created:

Sm ≡ (Sm-1 × Pm-1 + ... + S0 × P0) mod 2

Sm+1 ≡ (Sm × Pm-1 + ... + S1 × P0) mod 2

.
.
.

S2m-1 ≡ (S2m-2 × Pm-1 + ... + Sm-1 × P0) mod 2

As was done for Linear Congruential Generators, each unknown can easily be solve for by Gaussian Elimination ([[[[[) or Matrix Inversion ([[[[[[[[).


Step 3:
Now that Oscar knows all of the Feedback Coefficients, he can build the LFSR himself.

He can determine the initial state using the output bits that are already known (using the Diagonal Rule, see Rule 28), and from there he will be able to compute the full keystream, S2m+1 and beyond.

With the keystream, Oscar will now, of course, be able to recover the full plaintext:

xi ≡ (yi + si) mod 2



III.IV Alternate Secure Stream Ciphers.

# The Development of Modern Stream Ciphers:

During the 1990s, stream ciphers were a center of attention for many cryptanalysts, who ruthlessly revealed vulnerabilites and weaknesses that opened particular popular ciphers of the time - this would fuel the predominance of block ciphers in modern cryptographic communications.

As a result, the eSTREAM project was established in 2004 by the "European Network of Excellence in Cryptology" (an E.U. front), which called for the development (and expert selection) of stream ciphers.

By 2008, several new, highly secure stream ciphers were developed, divided into two main categories/profiles:

1. Stream ciphers designed for implementation in software with high-volume data transfer requirements. Examples: Salsa20 (Rule 36) & ChaCha (Rule [[[[).

2. Stream ciphers designed for hardware implementation, with low power consumption, storage capabilities, and gate counts. Example: Trivium (Rule [[[[).


# Add-Rotate-XOR (ARX) Ciphers: An algorithm in a cryptosystem that only uses additions, rotations, and XOR operations to generate its keystream.


# "Nonce": A number used only once.


# C. Rule 36. Salsa20:

Mathematical Definition:

[[[[[To be completed at a later date. I have determined this topic: "Salsa20", to be non-essential to the learning and understanding of the major fundamentals of Cryptography, and so it has been placed in the low-priority learning register.[[[[[

Explanation:

"Salsa20" is a family of profile-1 stream ciphers, developed in 2005. The PRNG utilizes a 32-bit ARX framework, with additional layers of security added through increased 'rounds' of encryption: the base cipher (Salsa20/20) uses 20 ciphers, while reduced-round, faster varients use 12 rounds (Salsa20/12) and 8 rounds (Salsa20/8), respectively.

Salsa20/20 (hereafter referred to as "Salsa20") supports key lengths of either 128 or 256 bits, though 256 is recommended. [[[[[To be completed at a later date. I have determined this topic: "Salsa20", to be non-essential to the learning and understanding of the major fundamentals of Cryptography, and so it has been placed in the low-priority learning register.[[[[[




# C. Rule 37. [[[[[Reserved for Future Usage. If additional rules are needed, then the WoO system will be used.[[[[[ [[[[[To be completed at a later date. I have determined this topic: "Salsa20, ChaCha, and Trivium", to be non-essential to the learning and understanding of the major fundamentals of Cryptography, and so it has been placed in the low-priority learning register.[[[[[




# C. Rule 38. [[[[[Reserved for Future Usage. If additional rules are needed, then the WoO system will be used.[[[[[ [[[[[To be completed at a later date. I have determined this topic: "Salsa20, ChaCha, and Trivium", to be non-essential to the learning and understanding of the major fundamentals of Cryptography, and so it has been placed in the low-priority learning register.[[[[[




# C. Rule 39. [[[[[Reserved for Future Usage. If additional rules are needed, then the WoO system will be used.[[[[[ [[[[[To be completed at a later date. I have determined this topic: "Salsa20, ChaCha, and Trivium", to be non-essential to the learning and understanding of the major fundamentals of Cryptography, and so it has been placed in the low-priority learning register.[[[[[






IV. Block Ciphers.

IV.I Historical Background.


# C. Rule 40. Block Ciphers:

A cipher that encrypts a block of 'b' plaintext bits all at the same time, where b is greater than one (since one bit would be a stream cipher). Thus, the encryption of any plaintext bit in the block depends on the nature of every other plaintext bit in the block, like the positions of celestial bodies in a galaxy.

Most block ciphers have a block length of 128 bits (16 bytes), such as the AES (see Rule [[[[[), or 64 bits (8 bytes), such as the DES and 3DES (see Rules 43 & [[[[, respectively). For each input, a block cipher will produce an equivalent amount of output data: in the case of DES, this will be another 64 bits.

Block Ciphers are more commonly used than Stream Ciphers in modern computer encryption schemes, due to the stream cipher mass extinction event caused by relentless attacks by cryptanalysts in the '90s, and the mass adoption of block ciphers that followed.


# Look-up Table: A Substitution table - an algorithm in which every input is paired with a particular value/address, and has that value outputted. It is not a requirement that the number of output bits match the number of inputted bits - in the DES Feistel cipher, for example, table S1 produces 4 output bits from 6 input bits.


# Product Cipher: A generic term used to describe any cipher that utilizes/concatenates the methods of permutation and repeated encryption (in "rounds") to create the element of "diffusion" (see Rule 41) necessary for any block cipher.


# C. Rule 41. Confusion & Diffusion:

During the postwar Cryptographic research boom, stemming from the rush of funding for military science, it was postulated in a classified paper that there were two atomic operations that all good block ciphers should perform:

  1. Confusion:
    The relationship between the plaintext and ciphertext must be obscured.
    Any cipher that scrambles its characters thoroughly enough that the plaintext and ciphertext bare no resemblance to one another, passes this requirement.

    However, the sole utilization of this requirement and the abandonment of the second, as was done with the Substitution and Shift Ciphers, will produce a very unsecure cipher.


  2. Diffusion:
    The influence of each plaintext bit is spread over many ciphertext bits.
    In systems making good use of diffusion, changing one bit of plaintext results in an average change of half of the total output bits, known as the Avalanche effect and akin to chaos theory. Example: Permutations of an encryption method to make a cryptosystem more secure.

    This element is impossible under stream ciphers, since each bit is processed and encrypted individually.
In order to make a truly "secure" cryptosystem using block ciphers, you need to combine both confusion & diffusion elements into the algorithm many times over, in the form of a product cipher.



IV.II Feistel Networks.

# C. Rule 42. Feistel Ciphers:

Mathematical Definition:

i ∈ {1, ..., 16}

Li = Ri-1
Ri = (Li-1 + f(Ri-1, ki)) mod 2



i = The particular round, minus 1, that the left and right tranches of the data are at. L0, for example, would be in the first round of the cipher. Since i has a minimum value of 1, the first round that can have the equations utilized on is round 2, since there is no past round data to feed into round 1.

Li = The left tranche as it is in round i-1. If not in the first round, it will inherit the unencrypted form of the previous right tranche as its starting value in the round.

Ri = The right tranche as it is in round i-1. If not in the first round, it will inherit the XOR-ing of the left tranche of the previous round with the encrypted right tranche of the previous round as its starting position.

Ri-1 = The right tranche of the round prior to the current one, used by the f function (see below).

f = The mysterious encryption function, the central part of the entire Feistel scheme. Uses the right tranche and the subkey of a particular round to make an encryption. Explained in Rule 46.

ki = The specific subkey used in the round. Each subkey is derived from the main key, explained (for DES) in Rule [[[[[.

mod 2 = The modulus/remainder operator with respect to 2. Numbers can only be 0 or 1 in this reality.


Explanation:

A Feistel Cipher/Feistel Network/Feistel Structure is a round-based algorithm used to encrypt data. Widely used among block ciphers (though not in the AES), several ciphers developed more optimized versions of the network, such as the DES. The original, standard, unadorned version of the Feistel Network is depicted below, while the DES-enhanced version can be seen in Rule 43.

A diagram of an individual round of the Feistel Network, almost totally unadorned of any additional characteristics specific to the DES system - see Rule 43 for that. The given values and lengths, such as the 48-bit subkey, are all specific to the DES varient.

The bulk of the modern ciphers currently in use were built using feistel ciphers and their varients. A particular advantage of the Feistel Network is that decryption and encryption are almost the same operation, simply using a reversed key schedule (see Rule [[[[[[[[). In terms of importance/usage in the cipher structure, Feistel Networks are the block cipher equivalent of the LFSR, although it actually encrypts the data and requires a key, instead of merely producing the key (like the LFSR).

Devoid of the more specialized characteristics (particular those relating to DES, described in section IV.III), the system by which the "standard", unadorned Feistel Cipher encrypts data is simple:
  1. The data path will be split into two parts: a Left and a Right tranche, L0 and R0 respectively. For a 64 bit cipher (like DES), each part will have 32 bits.

  2. The Right tranche will have its bits fed into the f function, which is the actual encryption function of the entire operation.

    Additionally, a subkey will also be fed into the f function: In each round of encryption, a new subkey derived from the main key is used, thus keeping the process deterministic and replicable, but complicating the process so much that discovery of the plaintext from knowledge of the ciphertext is made (most likely) impossible.

    In the DES cipher, which differs from the more generalized standard feistel network described here and shown in the graphic, the subkey is created using a key schedule. See Rule [[[[[ for an explanation of how this is carried out.

    The length of the subkey does not have to be exactly as long as tranche - in the DES, the subkey is 48 bits long, the purpose of which is explained in Rule [[[[[[[.

  3. The output of the encryption of the right tranche will then be XOR'd with the left tranche (which had gone untouched until now).

  4. For each round, the output of the previous round is fed in as input to the next round. The trick at this juncture is that the tranches will switch their positions in the new round: the right tranche will become the new left tranche (L1), and the left tranche will become the new right tranche (R1).

    IMPORTANT NOTE: The Right tranche is sent to the left tranche of the next round in its unencrypted form. The Right tranche passes through unaltered to the left tranche; since the Left tranche of the first round got XOR'd with the encrypted version of the Right tranche, IT is the one being "encrypted" or simply changed in any meaningful way, not the right tranche.


IV.III Data Encryption Standard.

# Bijection: The mapping of every possible input uniquely to exactly one output, and vice versa. The DES (Rule 43), for example, bijectively maps a 64-bit input block to a 64-bit output block using the principle of diffusion (Rule 41): a difference in a single input bit will change the entire output block, thus making the blocks themselves bijectively mapped, but not the bits.


# C. Rule 43. Data Encryption Standard (DES):

Mathematical Definition:

Explanation:

The Data Encryption Standard is a block cipher that encrypts 64 bits at a time, outputting 64 bits of ciphertext in return, through bijective mapping (see definition). As a product cipher, DES uses 16 encryption rounds of an optimized form of the Feistel Network (depicted below) to thoroughly obfuscate and diffuse the impact of each bit, under the known elements of best-practice block ciphers (Rule 41).

A diagram of an individual round of the DES-specific optimized version of the Feistel Network, complete with a representation of how the subkey is formed from a transform of the main key.

The keysize of DES is 56 bits (and thus has a keyspace of only 2^56), highly crackable, and so it is not in popular use today - it has been largely replaced by other ciphers with longer keys, such as 3DES ([[[[[) and AES ([[[[[). Still, it remains useful and neccesary to study the design of DES, which is modern cryptography in its embryonic state. The subkeys used in DES for the Feistel Network are only 48 bits - see Rule [[[[ for why that is.


# C. Rule 44. History of the Data Encryption Standard (DES):

Proposed in 1974 by IBM (in coordination with the NSA), DES was the cryptographic standard for the U.S. government between 1977 and 1998, making it the most studied cipher in the world.

The DES was the first ever "secure" cryptosystem made public by a national government under the postulate of Kerckhoffs's principle (Rule 3).

Today, there are still some varients of DES that see application: notably, 3DES, which was used in Firefox until 2021. While the original DES, as developed in 1974, is unsecure from Brute-Force attacks due to the short length of its key (56 bits), 3DES has a very long key, and as such is immune to such attacks.


# C. Rule 45. DES Internals #1 - Bit Permutation:

The first characteristic one may note about the DES Feistel Cipher (as depicted in the graphic of Rule 43) is that the operation undergone by data travelling through the cipher is IP(x), the Initial Permutation function. Undepicted is the very last operation in the encryption process, at the bottom of the cipher (which is 16 rounds deep), where the IP(x)⁻¹, the Final Permutation, would be located.

IP(x) and IP(x)⁻¹ are simple bit permutations, or bitwise permutations: the bits are scrambled and assigned to different placements in the data than they were before. This just the algorithmic term for a crosswiring. IP(x)⁻¹, being the inverse of IP(x), perfectly undoes the scrambling of IP(x), cancelling eachother out.

The purpose of the DES bit permutation functions is simple: while serving no cryptographic purpose (since the attacker can simply take the permutation into account, and because it is undone anyway), when DES was designed in 1974, the application of the cipher into hardcare necessitated cross-wiring (physical bit permutation) just to allow the data to enter the chip. The specific bit permutations used for the initial and final permutations (both depicted below) are done in order to, according to leading experts on the issue, "arrange the plaintext and ciphertext bits in a bytewise manner to make data fetches easier for 8-bit data buses", widely used in the '70s.

The Initial and Final Permutations in the DES cipher.

Thus, the bit permutations were merely placed in the DES for practical, electrical engineering purposes, devoid of cryptographic meaning. DES was initially designed purely for hardware, and the electrical engineering-devised permutation (no gates or special operations) made it very easy to be fabricated in hardware form, but somewhat tedious and more time-consuming in software.


# C. Rule 46. DES Internals #2 - The 'f' Function:

The f function, the heart of the Feistel Network (and thus the DES), acts similarly to the a PRNG, using the two input parameters Ri-1 (32 bits) and ki (48 bits) to "encrypt" the right tranche. Below are the complete contents of the f function:

The complete F function of the DES-specific Feistel Network.

  1. The first operation that will be performed will solely involve the first parameter: Ri-1. In order to account for the differences in length between the right tranche and the subkey, the right tranche will be expanded from 32 bits to 48 bits, using the special "Expansion function" or "Expansion box", also referred to as simply the "E-box". Notably, this function will provide diffusion to the cipher, as its effects (as described below) will increase the significance & influence of each individual bit of the plaintext with every additional round of encryption.

    While the E-box also performs a basic permutation on the inputted bits, it differs from a standard bit permutation in that some of the inputted bits are connected to two bit positions, creating values for an expanded table. With the initial 32-bit string of the right tranche, 16 bits of data will be connected only once, while the other 16 bits will be connected twice. An example of how this can be done is seen below.

    The complete F function of the DES-specific Feistel Network.

  2. With the newly 48-bit right tranche, an XOR operation can be performed between the tranche and the subkey (which has its own special means of derivation, explained in Rule [[[[[[). This produces a 48-bit string that incorporates data from both parameters, and now all that is left to be done is to shrink the string back down to 32 bits so that it can be XOR'd later on with the data from the left tranche (outside of the f function).

  3. After the XOR, the most important part, the heart of the f function, is performed: the S boxes. Complementary to the diffusion of the E-box, the S-box will provide the confusion.

    The pattern the S-box follows is simple, and is shown in the complete diagram above: The newly-created 48-bit string will be split into 8 tranches of 6 bits. Each tranche will be sent to a Substitution Table, S1 through S8.

    These tables will be akin to compression functions/compression boxes, as they will produce less output than input: Each table will output 4 bits from an input of 6, thus resulting in a net total of 32 bits, exactly what is needed.

    The Compression Algorithm works as follows: Since each table accepts 6 bits, which are read in their combined, binary form, the table is simply taking the given, 6-bit value (0-63) and replacing it with a 4-bit value (0-15). The chosen 4-bit replacement value for each possible 6-bit value is assigned according to a set, predetermined substitution table, the purposeful and exact design of which was done to ensure protection against Differential Cryptanalysis ([[[[[[[[), which researchers at IBM discovered decades before the academics of the late '80s. For information as to what exact criteria the S-boxes were designed with in mind, see the relevant blue section below.

    Note that EACH Substitution Table, S1 through S8, has its own, individual, substitution table. [[[[[[ The S1 table is shown below; the rest of the tables can be found on page 3 of this here document.

    The specific, exact table used in the first S-Box to replace the 6 bit string with a (max) 4 bit one. Chosen by IBM & the NSA to be differential cryptanalysis-proof.

    In order to read the table, you must take the 6-bit value and examine it as follows:

    For the Column Select, take the middle 4 values (the internal, non-edge values) and find their individual value in binary, separate from the rest of the string. This will give you a value 0-15.

    For the Row Select, simply combine the left and right edge values of the 6-bit value in a single binary number, 0-3. This will serve as the row value.

    All that needs to be done to find the 4-bit output (represented in decimal in the given table image, and all others) is to locate it on the table using the known column and row values. The value returned from properly finding the value from the specific column and row values.

  4. All that remains within the f function is a simple bit permutation, in the same fashion (though different permutation patterns) as the one at the top and bottom of the DES cipher (described in Rule [[[[[).


# Most Significant Bit (MSB): The leftmost bit in a binary number - the largest value representation in the binary place value system. This varies from system to system.


# Least Significant Bit (LSB): The rightmost bit in a binary number (1, from 2^0) - the smallest value representation in the binary place value system.


# Design of DES S-Boxes:

The design of the S-boxes in the DES cipher, both the mechanism for determining the output and the tables of outputs themselves, was done by the NSA and IBM under the following specific criteria, ranging from the obvious to the esoteric:

  1. Each S-box must have six input bits and four output bits, as to have an end total of 32 bits (to allow for XOR-ing with the left tranche).

  2. No single output bit should be too close to a linear combination (Linear Algebra - [[[[[[) of the input bits.

  3. If the lowest and the highest bits of the input are "fixed" (e.g., constant while the four middle values cycle through all possible incarnations) and the four middle bits are varied, each of the possible 4-bit output values must occur exactly once.

  4. "For any nonzero 6-bit difference between inputs, no more than 8 of the 32 pairs of inputs exhibiting that difference may result in the same output difference."

    If you know what the above words mean, since they seem (atleast to me) to be complete and utter nonsense, please contact me at jdlacabe@gmail.com. You will be credited (if you wish) on this page.

  5. A collision (identical output values, known as a "Zero Difference" (since XOR-ing the values = 0)) of the outputs of the eight S-boxes must only be possible for a maximum of three adjacent S-boxes. Two adjacent colliding S-boxes would thus be fair game.

    If it were possible for all 8 boxes to have a zero difference, then the S-box design would have a highly predictable structure.


  6. Diffusion Requirements:

  7. DIFFUSION: If two inputs to an S-box differ by exactly one bit, their outputs must differ by atleast two bits.

  8. DIFFUSION: If two inputs to an S-box differ in two middle bits, their outputs must differ by atleast two bits.

  9. DIFFUSION: If two inputs to an S-box differ in their first two bits, but are identical in their last two bits, the two outputs must be different.