Symmetric Cryptography Summary (Full)

Summary of Symmetric Cryptography (Complete)

These are my complete notes for Symmetric Cryptography, covering such topics as Keystreams, Modular Arithmetic, Integer Rings, Shift Ciphers, Affine Ciphers, Stream Ciphers, Block Ciphers, the Data Encryption Standard (DES), the Advanced Encryption Standard (AES), and more.

I color-coded my notes according to their meaning - for a complete reference for each type of note, see here (also available in the sidebar). All of the knowledge present in these notes has been filtered through my personal explanations for them, the result of my attempts to understand and study them from my classes and online courses. In the unlikely event there are any egregious errors, contact me at jdlacabe@gmail.com.

Table Of Contents

I. Introductions

I.I Classification & Terminology.

I.IV Cryptanalysis Basics.

I.II Symmetric Crypto. Basics.

I.V Generic Analytical Attacks.

I.III Key Terminology.

II. Historical Ciphers & Modular Arithmetic.

II.I Substitution Cipher.

II.IV Shift/Caesar Cipher.

II.II Modular Arithmetic Basics.

II.V Affine Cipher.

II.III Integer Rings.

III. Stream Ciphers.

III.I Intro to Stream Ciphers.

III.III L.F.S.R.'s.

III.II Random Number Generation.

III.IV Alt. Secure Stream Ciphers.

IV. Block Ciphers.

IV.I Historical Background.

IV.III Data Encryption Standard.

IV.II Feistel Networks.

I. Introductions

I.I Classification & Terminology.

# Fig. 1. - Cybersecurity Classification Hierarchy:

A Classification Hierarchy of the major components of Cryptology. For a further classification of Cryptanalysis, see Rule 6, and III.I for Symmetric Cryptography. The work of the mastermind Jonathan Lacabe himself.

# Cryptology/Cybersecurity: IT Security - the protection of digital information against misuse, incorporating technical, organizational, and implementation-specific aspects. This is the modern, digital theater of the much greater field of 'Security'.

All IT Security follows the CIA Triad: Confidentiality, Integrity and Availability of information. Security is largely focused on protecting against attackers, while system safety & reliability is focused on protecting against random technical failures.

# Cryptography: A sub-field of Cryptology;

The science of securing communication through encryption, especially against the cryptanalysis efforts of an adversary. Cryptographic algorithms are the bedrock of all cybersecurity systems - if cybersecurity is a car, then Cryptography is the engine.

# Cryptanalysis: A sub-field of Cryptology;

The reductive counterpart to Cryptography, the science of 'breaking' encryption and bypassing the cryptographic security. Though it is the medium of hackers and cybercriminals, it is also a serious scientific field for researchers to test the security of cryptosystems, i.e. systems established using Cryptography (see definition). This is the only to absolutely ensure the security of the cryptosystem - remember Schneier's Law.

# Cryptosystem: An application/implementation of a set of cryptographic algorithms (incorporating encryption, decryption, and other mechanisms).

# Symmetric Algorithms: A sub-field of Cryptography;

The classic form of Cryptography - two parties hold a secret key, enabling one party to encrypt a message and the other party to decrypt it. Thus, the same key is used for both encryption and decryption.

Until 1976, this was the only form of Cryptography in existence, and describes the basic nature of all historical ciphers (such as the shift/caesar cipher and affine cipher) and Stream Ciphers, Block Ciphers, and DES & AES. Although it is indeed the classic form of Cryptography, it is still highly relevant today, as AES remains an industry standard even into the quantum age.

Hash functions are somewhat similar to symmetric algorithms, though they can be considered an independent, third type of Cryptographic algorithm.

# Asymmetric/Public-Key Algorithms: A sub-field of Cryptography;

The great cryptographic breakthrough of the 20th century, Asymmetric Cryptography (a.k.a. "Public Key Cryptography") is the backbone of most modern cryptosystems and is what much of the infrastructure of the Internet (protocols, in particular) is based on.

Invented by the Diffie-Hellman team in 1976, Public-Key Cryptography functions by having each party have TWO keys, a private key and a public key.

For more detail on how these algorithms work, see the "Public-Key Cryptography" Subcategory ([[[[[), but know they are used in a variety of fashions: digital signatures, key establishment/management, and more!

# Protocols: A sub-field of Cryptography;

Collective cryptographic algorithms that serve a complex security function, like a library of algorithms that work together for a common goal. For example - the Transport Layer Security scheme (TLS) and the Hypertext Transfer Protocol (HTTP) are used in every web browser.

In the words of Edward Snowden, "[The Internet of Things and countless] protocols have given us the means to digitize and put online damn near everything in the world that we don’t eat, drink, wear, or dwell in."

# C. Rule . In Cryptography, 'Primitives', 'Ciphers', and 'Algorithms' all refer to different parts of the same process: the components of a cryptosystem at different scales, algorithms being the largest scale component and primitives being the smallest.

Primitives are the basic cryptographic functions (building blocks of an algorithm), Ciphers are the implementation of an encryption scheme, and Algorithms are the combined procedures of primitives and ciphers to create the greater cryptosystem.

For example, a well-formed algorithm may incorporate the AES cipher (see Rule [[[), which uses the key schedule primitive (see Rule [[[) to assist in both encrypting and decrypting the data.

# Channel: A transmission medium for data to pass through. Examples include the Internet, airways, and Wi-Fi. Securing transmitted data from being intercepted, and from being intercepted in any meaningful form, is the goal of Cryptography.

# Hybrid Schemes: The combined usage of both asymmetric and symmetric algorithms in a cryptosystem, since both types have their own strengths and weaknesses (See [[[).

I.II Symmetric Crypto. Basics.

# C. Rule . Symmetric Cryptosystem:

The ancient problem presents itself: How can two parties communicate over an insecure channel (an 'open channel') without having their communications intercepted by a third-party?

Say Alice and Bob connect through the internet (which is an open channel due to the potential of package rerouting/interception) and transfer data. An opponent, Oscar, can read their communications by intercepting the data before it reaches Bob.

A diagram showcasing how communications between Alice and Bob, when passing through an insecure channel unencrypted, can be intercepted by the attacker Oscar.

To stop Oscar from reading the message, an encryption algorithm can be established, converting the plaintext message (x) into a ciphertext message (y), and then sending it through the insecure channel. Oscar would only see a stream of random characters/bits as a result of the encryption. Bob, using a decryption function, would convert the ciphertext back into plaintext, thus completing the data transfer without interception.

In order for Bob to not simply decrypt the ciphertext immediately using the decryption function (which, counterintuitively, should actually be made publicly available - see Rule 3), a Key only known to Alice and Bob (sent through a secure, impenetrable channel) must be used as a parameter in both the encryption and decryption functions.

The key must be inputted into the encryption function to influence the manner in which the plaintext is encrypted into cyphertext. Thus, only with the key will Bob be able to decrypt the Ciphertext. Oscar, who doesn't know the key, will be unable to decrypt the message, even if the decryption algorithm is public.

A diagram showcasing a symmetric cryptosystem, complete with a key sent through a secure channel and encryption/decryption functions.

x = Plaintext message - the unadulterated, original content of the message.

y = Ciphertext, which looks like scrambled characters to an interceptor like Oscar.

e = Encryption function, a mathematical formula that converts x into y.

d = Decryption function, a mathematical formula that converts y into x.

K = The Key fed into the encryption and decryption functions.

|K|, 𝒦 = Keyspace, the total number of possible keys in a cryptosystem. This matters significantly with regard to brute-force Cryptanalysis (see Rule 6), and entails other characteristics detailed in Rule 4.

# C. Rule . Keeping the encryption/decryption algorithms 'e' and 'd' secret, preventing their cryptanalysis by the enemy (known as Security by Obscurity), was standard procedure for the ~4000 year history of Cryptography prior to the discovery of public-key Cryptography. However, the only way of ensuring the security of the algorithms is to make them public so that they can be analyzed by cryptanalysts.

This is the central principle of a foundational law in Cryptography, postulated in 1883 by Auguste Kerckhoffs:

Kerckhoffs' Principle:

"A cryptosystem should be secure even if the attacker (Oscar) knows all the details of the system, with the exception of the secret key."

In Practice: Never use an untested Crypto algorithm! Furthermore, never roll your own crypto!

# End-To-End Encryption (E2EE): A system of encryption widely used in modern communication services (like internet messaging sites), in which all data sent from Alice to Bob (party #1 to party #2) is never decrypted at any point along its route until it reaches Bob.

Thus, all parties intercepting/eavesdropping will be unable to read or manipulate the message, even if they control any of the base stations the message goes through.

An example of how communications between Alice and Bob would occur under a end-to-end encryption scheme.

I.III Key Terminology.

# Binary:

Binary is a crucial element of how most (and practically all modern) cryptosystems work, and several phrases commonly used in cryptographic discussions directly reference this inherent nature. As such, let it be known: all references to 'bits' of any kind are made in regard to 1s and 0s.

# Bit-length/Key-length: Bit-length is a remarkably simple concept that hinges on comprehension of binary. Literally, the bit-length is how many bits are required to represent a given number.

The bit-length of the key of a cryptosystem, which is the context under which the term 'bit-length' is most oftentimes used, is the number of bits necessary to represent the number of values within the keyspace. This specific use-case of bit-length is known as the key-length of a cryptosystem. Note how key-length is merely a more specific/narrowed application of bit-length, just like how keybits (see definition) are a more specific type of bit. However, they can NOT be used interchangeably, for reasons outlined below.

For example, a keyspace of 2^3 (8 values) would have a bit-length of 4, since 8 is 1000 in binary, and would have a key-length of 3, since exactly 3 bits would be required to account for every value within the keyspace: 000 through 111 represent a total of 8 values, reaching a highest value of 𝒦-1. This is essentially just the bit-length of the number, minus 1, as a result of the counting of the number of values instead of the number itself.

This does not mean that each individual value requires three bits to represent it; the value of 1 only "requires" 1 bit to represent it, for example. The key-length only really means the total number of bits necessary to represent 𝒦-1, which thus accounts for all of the values before it (through the nature of binary representation).

The key-length is only relevant, however, if the cryptosystem uses binary representation, as detailed in Rule 4. The number of bits necessary to represent each value of the keyspace is meaningless if there are no 'bits' involved in the cryptosystem, as is the case for the Shift Cipher (Rule 18), which has a keyspace of 26 but no meaningful key-length. Somewhat ironically, bit-length does not require a bit-based system to have a meaningful result, as it merely denotes the process of converting a value to its binary form.

Mathematically, this means that to encode each possible key in a keyspace with bits, you need a key-length of ATLEAST ⌈log₂ 𝒦⌉. See Rule 4 for more information.

# Keybits: The binary bits in the key used in a cryptosystem. A key-length, for example, is composed of keybits. In Cryptography, a "256-keybit key-length" is regarded as secure against brute-force attacks (see Rule 6).

# Bits of Entropy/Key Entropy: A term frequently used in cryptography, essentially just meaning a number that denotes the degree of security a cryptosystem has with respect to its keyspace. It has the equation ⌈log₂ 𝒦⌉, where 𝒦 is the keyspace of a cryptosystem (the number of possible keys). For example, if a cryptosystem has 2^100 keys, then that cryptosystem has 100 bits of entropy.

It really is just a term used to flex security against brute-force attacks (Rule 6), not in reference to anything tangible other than the size of the keyspace itself - it is a self-referential term, actively disguising useful information behind additional artificial buzzwords.

It happens to share the same equation as the key-length (as detailed in Rule 4 below), but note that "bits of entropy" is not a term chained to any particular requirement and can thus be applied to any cryptosystem, while the concept of a key-length has several prerequisites that limit its usage. This is in spite of the fact that "bits of entropy" uses the word "bit" - it, like bit-length (but unlike key-length, as distinguished in the definition), can be reasonably applied in non-bit-based cryptosystems.

# C. Rule . Keyspace Mathematics:

In Symmetric Cryptography, there is an inherent relation between the keyspace of a cryptosystem (|K| or 𝒦), and the key-length of the key (L) that the cryptosystem uses. It is fairly simple and intuitive, but has some key caveats.

If the algorithm uses a 168-bit key, then there are 2^168 possible keys. This is because the keyspace is the number of possible values that could be represented by the bits of the key-length - a key-length of 5 would have 32 possible values, i.e. every value between 00000 and 11111. Mathematically, this is represented using either of the following equations:

𝒦 = 2ᴸ
L = ⌈log₂ 𝒦⌉

These two equations are equivalent, with the second one derived by taking the log of both sides of the first. The ceiling function (which rounds to the closest greater number) is applied to the logarithm to ensure that the key-length is an integer. Of course, the relation only holds insofar that all keys of length L are valid and equally likely to occur, which rules out a small number of symmetric cryptosystems with selective application of their keys/keybits, like DES - see Rule [[[.

There are some limitations that narrow the applicability of this relation - in effect, it only really applies to modern symmetric cryptosystems (unless specified otherwise, like the aforementioned DES). First, all ciphers that do not explicitly have a key-length, such as those that operate on letters instead of bits (e.g., every historical cipher in Section II), do not qualify. A cipher must structure its keys in the form of binary strings with fixed lengths in order for the relation to be applicable.

As stated in its definition, the very concept of a "keybit" is dependent on the fact that the cipher is using a key made from binary bits - any ciphers that do not meet that requirement are instantly null in regards to the given relation.

For example, the Shift Cipher (see Rule 18) has a keyspace of 26, but that in no way means that the "key-length" of the cipher is log₂ 26 (though it will indeed have log₂ 26 "bits of entropy"). For non-bit cryptosystems, there is no real key-length, but these ciphers are outdated nonsense cryptography that are not to be taken seriously anyways.

Furthermore, this relation does not hold for asymmetric cryptography: There is a significant difference between symmetric keys and asymmetric keys: a 128-bit symmetric key provides roughly the same security as a 3072-bit RSA (asymmetric algorithm) key. For information as to why this is, see Rule [[[.

I.IV Cryptanalysis Basics.

# Fig. 2. - Cybersecurity Classification Hierarchy:

A Classification hierarchy of Cryptanalysis. There are two main branches of Classical Cryptanalysis, Brute-force and Analytical Cryptanalysis. See Subsection I.I for the full classification and III.I for Symmetric Cryptography.

# C. Rule . There are multitudes of cryptanalysis techniques, and the crucial law is that if just one attack works, then the entire cryptosystem crumbles and fails, regardless of how secure it is against other attack types.

For example, while a substitution cipher (see Rule 8) may be impenetrable to a brute-force attack (Rule 6) with its 2^88 keyspace, it collapses against letter frequency analysis (also described in Rule 8).

In order for a cryptosystem to be considered secure, it must be resistant against every single type of attack. An attacker always looks for the weakest link in the cryptosystem. Thus, in addition to strong algorithms, safeguards against social engineering and implementation attacks (see definitions below) must be instituted.

# Classical Cryptanalysis: A sub-field of Cryptanalysis (see Fig. 2);

The basic, algorithm-focused approach to cryptanalysis in which you analyze the inputs and outputs to probe a viable attack vector. This is the "real" cryptanalysis in the eyes of the cryptography snobs, in which you try to poke holes in the design of the cryptosystem itself.

# Social Engineering: A sub-field of Cryptanalysis (see Fig. 2);

The process of bypassing the protections of a cryptosystem by going directly after the humans that have access to the secret key. Obtaining the key can range from kidnapping and forcing them to tell the key/password, to a simple phishing scheme over the phone or through email.

# Implementation Attacks: A sub-field of Cryptanalysis (see Fig. 2);

Extraction of the key through 'side-channel analysis'. By observing the behavior of the implementation of the cryptosystem in an IC or software, it is possible to deduce important information relating to the key.

For example: By looking at the electrical power consumption or electromagnetic radiation of a CPU running a cryptographic algorithm, a signal processing technique can be used to recover the key (see E.E. [[[[[). The runtime behavior can also indicate information regarding the key, which is why many cryptographers ensure constant runtimes in embedded cryptosystems.

These attacks are most relevant when the attacker has physical access to a piece of hardware running the cryptosystem, such as a credit card.

# Attack Vectors: The many possible ways to attack a cryptosystem; the types of cryptanalysis that can be used, including (but not limited to) those shown in Fig. 2.

# Moore’s Law: The computing power of the strongest computers (e.g., the number of transistors in an integrated circuit) will double every 18-24 months while the price will remain constant. This means that computing power, growing exponentially over time, will continually pose more and more of a threat to modern cryptographic systems and to antiquated ones still in use, as it becomes cheaper and faster to break them.

This is especially relevant for cryptanalysis requiring intensive computing power, such as brute-force attacks (see Rule 6 below).

# C. Rule . Classical Attack #1: Brute-Force Attack/Exhaustive Key search:

Mathematical Definition:

Let (x, y) denote the pair of plaintext and ciphertext, and let K = {k₁ ,..., k_κ } be the keyspace of all possible keys k_i. A brute-force attack now checks for every k_i ∈ K whether d_{k_i} == x.

If the equality holds, a possible correct key is found; if not, proceed with the next key.

Explanation:

The testing of all possible keys in a given keyspace with the decryption function, to find the key that will produce the plaintext. This is akin to thinking of the cryptosystem as a black box, in which the only significant factor to decoding an encrypted message is the number of possible keys the message could have been created with.

As shown in the illustration below, different keyspaces take different amounts of time to crack using brute-force. As such, only algorithms with a sufficiently large keyspace can be considered secure against brute-force, though not necessarily against any other form of attack.

A chart of the time it would take to crack keyspaces of various lengths using a brute-force attack.

As of 2010 (very recent), the largest keyspace that can be searched in a relatively reasonable amount of time is 2^60. However, not all keybits are created equally: a 128-bit symmetric key provides roughly the same security as a 3072-bit RSA (asymmetric algorithm) key, as noted in Rule 4. For why this is, see Rule [[[.

# C. Rule . Classical Attack #2: Analytical Attacks:

The intelligent counterpart to 'brute'-force, Analytical Attacks examine the internal characteristics of the encryption function and pick apart weaknesses that could enable reconstruction of the plaintext message, such as Letter Frequency Analysis (for the Substitution Cipher, see Rule 8) and Differential Cryptanalysis (for Block and Stream Ciphers, see Rule [[[[).

There is no 'equation' for Analytical Attacks, because they are specific and unique to every cryptosystem (though some common characteristics can emerge between similar cryptosystems).

I.V Generic Analytical Attacks.

At a minimum, an attacker will always know the ciphertext. There are four generic types of Analytic Attacks that specify what information the attacker possesses in addition (if anything):

# Ciphertext-only Attack: An attack in which the adversary only has knows the ciphertext.

# Known-plaintext Attack: In addition to the ciphertext, the adversary also knows some pieces of the plaintext (e.g., header information of an encrypted file or email).

# Chosen-plaintext Attack: An attack in which the adversary can choose the plaintext that is being encrypted and also has access to the corresponding ciphertext, found through possession of the decryption function (whether aware of its internal workings or not).

# Chosen-ciphertext Attack: An attack in which the adversary can choose ciphertexts and obtain the corresponding plaintexts, the goal being (typically) to recover the secret key.

II. Historical Ciphers & Modular Arithmetic.

II.I Substitution Cipher.

# Historical Cipher: A cipher that was used before the development of modern Cryptography, operating on letters instead of bits/bytes. Every one of these ciphers is easily crackable, from the Enigma Code to the Caesar Cipher.

Because these ciphers do not operate on bits (and thus do not have keys structured in the form of binary strings with fixed lengths), the relation between keyspace and keybit-length as detailed in Rule 4 does not apply to these ciphers does not apply to these ciphers.

# C. Rule . Substitution Cipher

A historical cipher, operating purely on letters, in which every plaintext letter is replaced by a fixed ciphertext letter. It is just scrambling the letters, nothing more.

The Keyspace is 26!, or 2^88, as each plaintext letter that maps to a ciphertext letter decreases the remaining number of possible letters. Thus, a brute-force attack would be out of the question for modern computers.

Since the different letters of the English alphabet have different frequencies of appearance (see the image below), and these frequencies are preserved even after scrambling (just different letters will have the frequencies), Letter Frequency Analysis can be conducted as an attack on the substitution cipher, making cracking the cipher rather simple.

This is an example of a cipher/algorithm in which an Analytical Attack (see Rule 7) is most effective.

A generalized distribution of letters in the english language. In substitution ciphers, the frequencies remain the same, with only the letters that represent them changing. Courtesy of Wikimedia.

II.II Modular Arithmetic Basics.

# Preferred: The value that allows for the maximal amount of simplification in a given representation.

# C. Rule . Modular Arithmetic:

Mathematical Definition:

(a, r, m) ∈ ℤ
m > 0

If m|(a − r), then a ≡ r mod m

ℤ = The set of all integers.
a = The chosen number.
r = The remainder.
m = The modulus, the highest possible number in the set.
'mod m' = The modulus operator / remainder operator, which represents the usage of modular arithmetic, with all numbers represented by their remainder with respect to m.

Explanation:

Since the bulk of cryptographic computations are performed in finite sets, special mathematical tools are needed. In particular, Modular Arithmetic is the basis for all modern cryptosystems, and many historical ones as well.

Modular arithmetic is a system of simplifying a number in terms of a given modulus, the highest number in a set. As shown in the mathematical definition, any chosen number greater than the modulus is rewritten using the modulus operator (a.k.a. the remainder operator) with respect to the difference between the number and any multiple (see Rule 10) of the modulus, known as the remainder.

The best example for understanding modular arithmetic is something most are already familiar with: time on a clock. On clocks, the highest number is 12, and going any higher just loops back to the beginning. In effect, a clock is a modular arithmetic system using a set of {1, 2, ... 12} with a modulus of 12. 13 hours, for example, would be represented as 1 mod 12, using the modulus operator as specified in the mathematical definition.

The process of finding the modular equivalent of a number (in relation to an already known modulus/set) is simple: divide the chosen number by the modulus, and find the remainder (see Rule 10). The remainder will then be placed as the coefficient in front of the modulus operator. All numbers smaller than the modulus are rendered ordinarily, without the mod operator. For example, the modular representation of 4 in the clock example (mod 12) is just 4.

Using modular representation is a bit jarring at first, as it effectively 'wraps' the chosen number around the modulus, but frequent use (as it is used in practically every cryptosystem) will make it second nature to the beginning cryptographer.

# Cyclic Shift/Cyclical Shift: A fancy term essentially just denoting the usage of modular arithmetic in a particular context, with sets wrapping around once they reach an overflow value and whatnot.

# C. Rule . Modular Arithmetic: Computation of the Remainder.

Mathematical Definition:

(a, m) ∈ ℤ

a = (q × m) + r
Thus, a - (q × m) = r

ℤ = The set of all integers.
a = The chosen number.
q = The quotient, which produces a multiple of the modulus.
r = The remainder.
m = The modulus, the highest possible number in the set.

Explanation:

To find the remainder in a modular arithmetic system (see Rule 9) when only the chosen number and the modulus (a and m) are known, there is an exact, algorithmic definition. Note that there are two unknown variables in this equation - the quotient and the remainder (q & r). As such, the quotient will have to be determined first in order to find the remainder.

Any integer will work for the quotient (as a result of equivalence classes, which will be discussed later in Rule 11), but the preferred quotient will be the one that produces the preferred multiple (once multiplied with m), which will then produce the smallest positive remainder (known as the preferred remainder).

This is just a complicated way of saying that there is a particular quotient value that, when multiplied by the modulus, will produce a remainder with the chosen value that will allow for the modular representation (the Rule 9 equation with the modulus operator) to be in its most simplified form. The specificities of how this "preferred muliple" (formed by the quotient and the modulus) functions are elaborated upon below.

The preferred multiple of the modulus for finding the remainder is the greatest multiple lower than the chosen number, allowing for the remainder to be as small as possible: 0 ≤ r < m. Thus, the preferred multiple is mathematically the greatest (q × m) lower than a, which means finding the correct q given that m is a constant.

The difference between the preferred multiple and the chosen number is the smallest positive remainder, which is thus the preferred remainder.

For example, with a modulus of 12, 61 would be represented as 1 mod 12, since 61 has a remainder of 1 in relation to 60, the highest multiple of 12 below 61 (meaning that 60 is the preferred multiple). With respect to a clock, this means that 61 hours after 12 A.M. would be either 1 A.M. or 1 P.M., exactly which can be determined by mod 2 of the remainder (0 for A.M.; 1 for P.M.).

As stated before, any quotient/multiple of the modulus will work for the equation (if with a larger remainder) - see Rule 11, Equivalence Classes. The preferred multiple just allows for the most simplified version of a modular representation (see Rule 13), which will be the one used in practical applications of modular arithmetic. This preferred multiple is also the one that would be most easily determined by division, as described in Rule 9.

# C. Rule . Equivalence Classes:

While 12 ≡ 3 mod 9, it is equally true that 12 ≡ 21 mod 9, and that 12 ≡ -6 mod 9. Although these remainders may not be the preferred remainder of 3 (see Rule 10), they all still return true in the Modular Arithmetic formula (see Rule 9):

12 ≡ 3 mod 9, 3 is a valid remainder since 9|(12−3)
12 ≡ 21 mod 9, 21 is a valid remainder since 9|(12−21)
12 ≡ −6 mod 9, −6 is a valid remainder since 9|(12−(−6))

The given remainder values can all be simplified down to their smallest possible positive number using the preferred multiple of the modulus, as detailed in Rule 10. Thus, all of these modular representations are inherently equal as they all have the same 'root' preferred remainder. The set of all remainder values that behave 'equivalently' for a particular modulus form what is known as an Equivalence Class, an infinite set/series of values.

For 3 mod 9, the equivalence class of r values is an infinite set, a partial series of which is as follows:
{..., -24, -15, -6, 3, 12, 21, 30, ...}

If any of these values were to be substituted in for r (3) in 3 mod 9, it would be equivalent, the same value, because of the mod operator.

Note: There are 'm' equivalence classes for each possible remainder (0 through m-1), which collectively contain all possible integers. The difference between any two neighboring values in an equivalence class will be the modulus, as shown above.

# C. Rule . Modular Reduction:

Because any value in an equivalence class acts the same with regard to the mod operator of a given modulus, then substitutions can be made in mathematical operations that use the modulus operator, simplifying them.

Take (13 × 16) - 8, with modulus 5. While the full number can be computed (200) and an answer derived from there (0 mod 5), that is lame and time-consuming. A much cooler, faster, 21st century means of solving the problem is to substitute in for each term the simplest member of their respective equivalence classes:

13 mod 5 has a remainder of 3, and so 3 is the simplest term.
16 mod 5 has a remainder of 1, and so 1 is the simplest term.
8 mod 5 has a remainder of 3, and so 3 is the simplest term.

Therefore, the same calculation can be performed with simply (3 × 1) - 3, which returns 0 mod 5, the same answer as before, but found much easier-ly.

Note that this substitution trick does not extend equally to all mathematical operations. Exponentials, for example, cannot be simplified with the modulus:

For the problem 3⁸ mod 7, the exponent '8' cannot be simplified down to 1 mod 7. Exponents simply cannot have the substitution performed, while the base can (as shown in the example below).

For all exponentials, simplification into smaller components (using the 2nd Holy Property of Exponents - see Math Rule 66) must be performed to simplify. For ex.: 3⁸ = (3²)⁴ = (9)⁴ = (2)⁴ = 16 mod 7 = 2 mod 7, which is the answer.

Operations where equivalence class substitutions are allowed:

Multiplication
Addition
Substraction

Operations where equivalence class substitutions are banned:

Exponentials (for the exponent itself, not including the base)
Division

# C. Rule . When getting a result in modular arithmetic, always simplify down to the smallest positive member of the equivalence class (see Rule 11) for your final answer. Whether for a homework/test answer or for 'in-the-field' cryptography work, it is convention to use the most simplified version of the class.

II.III Integer Rings.

# Integer Ring: A set of elements which can have basic arithmetic operations performed upon them, such as addition and multiplication, so that the natural properties expected of addition and multiplication all hold. There are ring definitions/interpretations of polynomials, matrixes, and modular arithmetic (see Rule 14). Rings are a fundamental part of Ring Theory, a field of Abstract Algebra (see Math section [[[[[).

# C. Rule . Algebraic Modular Arithmetic:

Modular Arithmetic can be defined in the form of an integer ring:

The integer ring ℤ_m consists of the following characteristics:

1. The set ℤ_m = {0, 1, ..., m-1}

2. Two arithmetic operators, "+" and "×", hold for all (a, b, c, d) ∈ ℤ_m such that:
i) a + b ≡ c mod m
ii) a × b ≡ d mod m

There are many overshadowing rules that every ring has to fulfill, seen in the section "RULES OF THE RING" below.

# RULES OF THE RING:

These following rules hold true for all integer rings in general, not just those pertaining to modular arithmetic.

The ring should be closed: if any two numbers from the set are multiplied or added together, the result must be in the ring. This is ensured by the modulus operator, which always pulls the number back into the set to a value below m.

Addition and multiplication are associative properties: For all (a, b, c) ∈ ℤ_m, a + (b + c) = (a + b) + c, and a × (b × c) = (a × b) × c.

Addition:

Addition is commutative for all (a, b) ∈ ℤ_m: a + b = b + a.

There is an additive neutral element, 0. For all elements a ∈ ℤ_m, it holds that a + 0 ≡ a mod m. It is used to define the additive inverse.

The additive inverse always exists: for any element a in the ring, there is always a negative element '-a' such that a + (-a) ≡ 0 mod m, the neutral element.

Multiplication

There is a multiplicative neutral element, 1: For all elements a ∈ ℤ_m, it holds that a × 1 ≡ a mod m. It is used to define the multiplicative inverse.

The Multiplicative Inverse exists only for some elements: For a ∈ ℤ, the inverse a⁻¹ is defined such that a × a⁻¹ ≡ 1 mod m. The nature of most inverses and the reasons for their occasional nonexistence are outlined in the Inverse Rules - see Rule 15 & Rule 16.

If there exists an inverse for a, then, a rare usage of division in the modular ring becomes possible: the element a⁻¹ can be divided by, as a divisor, since b/a ≡ b × a⁻¹ mod m.

The distributive property holds for all ring operations. For all (a, b, c) ∈ ℤ_m, a × (b + c) = (a × b) + (a × c).

# C. Rule . Multiplicative Inverse Eligibility Test:

To determine if an inverse exists for a specific 'a' of a particular modulus, simply find the gcd (Greatest Common Divisor) of the two values:

If gcd(a, m) = 1, then there exists a⁻¹, and a & m are said to be "relatively prime", or "coprime".

If gcd(a, m) ≠ 1, then there does NOT exist a⁻¹.

If m is prime, then gcd(a, m) will always equal to 1, for all nonzero 'a' values, and thus every 'a' value will have an inverse in that ring. The inverse of an integer completely depends on the ring, and thus the modulus.

# C. Rule . Application of Multiplicative Inverses:

Finding the Multiplicative Inverse of 'a' has the following formula (as described in RULE OF THE RING #7):

a × a⁻¹ ≡ 1 mod m

Though the multiplicative inverse of a is literally shown as a⁻¹, it is almost never the actual reciprocal of a. The reason for this is simple: All reciprocals other than -1 and 1 are not integers, and are thus not included in the set (under part 1 of the ring definition given in Rule 14).

Take a = 5, for example. The supposed inverse '1/5' is incorrect in any context, because it is not included in the set. A different inverse must be found specific to the modulus. For example, if the modulus were 7, then (as the given values pass the multiplicative eligibility test of Rule 15) the true multiplicative inverse can be found:

5 × 5⁻¹ ≡ 1 mod 7

5 × 3 ≡ 1 mod 7

5⁻¹ = 3

As shown above, when used in regards to algebraic rings, the concept of an 'inverse' bears no relation to its common usage in representing reciprocals.

# C. Rule . Always, as a rule of life, write denominators as (x)⁻¹ (instead of as below a line). With the importance of the inverse operation, it is always critical to recognize all denominators as just being inverses of their values. 1/a = a⁻¹.

II.IV Shift/Caesar Cipher.

# C. Rule . Shift Cipher:

Mathematical Definition:

(x, y, k) ∈ ℤ₂₆

Encryption: e_k(x) ≡ (x + k) mod 26

Decryption: d_k(y) ≡ (y - k) mod 26

x = Plaintext message.
y = Ciphertext.
k = The Key fed into the encryption and decryption functions. Here, it represents the number of times that the characters of the plaintext are shifted rightward, using the alphabet as the set. Variations may use an expanded set, including numbers and special characters.
ℤ₂₆ = A ring of the first 26 integers (including 0), matching the 26 characters of the alphabet. See Subsection II.III for an explanation of rings.
e_k(x) = Encryption function, a mathematical formula that converts x into y.
d_k(y) = Decryption function, a mathematical formula that converts y into x.
mod 26 = The modulus/remainder operator, which represents the usage of modular arithmetic, with all numbers represented by their remainder with respect to 26. Variations may use an expanded set, including numbers or special characters, in which case the '26' would change accordingly.

Explanation:

A simplified form or special case of the Substitution Cipher (Rule 8), the Shift Cipher (or Caesar Cipher) merely shifts every plaintext letter however many so positions rightward in the alphabet, wrapping around when it reaches the end (modular arithmetic!). This is occasionally known as the "ROT" Cipher (short for "rotate"), and the most well-known example is ROT13, which shifts rightward precisely halfway through the alphabet.

As a generic example, with a key of 1, some shifts would be as follows:

a -> b
b -> c
...
z -> a

As shown, once the alphabet reaches letter (26 - k) + 1, or the position in the ring ℤ₂₆ (26 - k) (since the set begins at 0), the shift returns to the front of the alphabet.

It should be obvious to all this is an extremely unsecure cryptosystem, with a keyspace of only 26. It is vulnerable to both letter frequency analysis (previously explained in relation to Substitution Ciphers - see Rule 8) and brute-force attacks (see Rule 6).

# C. Rule . Vigenère Cipher

Mathematical Definition:

(x, y, k, l) ∈ ℤ₂₆

Encryption: e_k(x) ≡ (x_i + k_{i mod l}) mod 26

Decryption: d_k(y) ≡ (y_i - k_{i mod l}) mod 26

x = Plaintext message.
y = Ciphertext.

k = The Key/codeword fed into the encryption and decryption functions, which, like the plaintext, is a series of letters (represented by their numerical value, 0-25) that are referred to by their i position. The key may have less, more, or the same number of characters as the plaintext (see 'l').

i = The position in the string of the plaintext, and regardless of the length of l (as a result of the mod operator), the key.

mod l = 'l' represents the length of the codeword. The mod operator functions such that if there are less characters in the codeword than in the plaintext, then the codeword characters will wrap around so that they can continue being used to produce ciphertext indefinitely.

ℤ₂₆ = A ring of the first 26 integers (including 0), matching the 26 characters of the alphabet. See Subsection II.III for an explanation of rings.

e_k(x) = Encryption function, a mathematical formula that converts x into y.
d_k(y) = Decryption function, a mathematical formula that converts y into x.
mod 26 = The modulus/remainder operator, which represents the usage of modular arithmetic, with all numbers represented by their remainder with respect to 26.

Explanation:

A mere variation of the Shift cipher, in which the shift for each letter is determined by a secret codeword 'c', which has 'l' characters. Each character in the code word c represents a number of shifts rightward according to the numerical value. For example, 'a' would represent a rightward shift of zero places, while 'z' would represent a shift of 25 places.

The encryption function works as follows: Each character in the plaintext string 'x' is applied a rightward shift according to its respective letter (identical position) in the key text. This is a cyclic shift (see definition), for if the addition of the key-character value produces a number greater than 26, then the alphabet will wrap around to the beginning. For example, the plaintext character 'z' plus the codeword character 'b' will produce the ciphertext character 'a'. An interesting side-effect of this is that if the keytext is just a string of 'a' ("aaaaaaaa"), then the ciphertext will be identical to the plaintext.

The decryption function, of course, is just the opposite of the encryption function, moving each character of the ciphertext backward by the value in the positional counterpart of the code word 'c'.

Because of the way the equation is written, the key does not necessarily need to have the same number of characters as the plaintext. The mod operator on the 'l', the number of characters in the codeword, causes any short codeword to wrap around to account for any longer plaintext strings. For example, for a plaintext of "lingonberry" and a codeword of "sup", the mod operator would cause the codeword to effectively become "supsupsupsu", accounting for each character of the plaintext. Likewise, any codeward longer than the plaintext has any characters beyond the length of the plaintext ignored.

For each value of i within l, the characters of the codeword are mathematically denoted as {c₀, c₁, ..., c_{l - 1}}, and the keys are similarly represented as {k₀, k₁, ..., k_{l - 1}}.

II.V Affine Cipher.

# C. Rule . Affine Cipher:

Mathematical Definition:

(x, y, a, b) ∈ ℤ₂₆
K = (a, b)

Encryption: e_k(x) ≡ ((a × x) + b) mod 26

Decryption: d_k(y) ≡ (a⁻¹ × (y - b)) mod 26

x = Plaintext message.
y = Ciphertext.
K = The key, which has been split into two parts: a and b.
a = The multiplied value of the key. The number of possible 'a' values is limited by the necessity of the inverse (for the decryption function), as explained below.
a⁻¹ = The inverse of a, necessary for the decryption function.
b = The "shift parameter" value of the key, used only in addition and subtraction operations. There are 26 possible values for b, explained below.
ℤ₂₆ = A ring of the first 26 integers (including 0), matching the 26 characters of the alphabet. See Subsection II.III for an explanation of rings.
e_k(x) = Encryption function, a mathematical formula that converts x into y.
d_k(y) = Decryption function, a mathematical formula that converts y into x.
mod 26 = The modulus/remainder operator, which represents the usage of modular arithmetic, with all numbers represented by their remainder with respect to 26.

Explanation:

The Affine Cipher is an attempted improvement of the shift cipher, done by complicating the encryption function through the addition of multiple parameters. However, it is still extremely easy to break. The affine cipher is performed by splitting the key into two parts:

K = (a, b)

The number of possible b values in the system is 26, as it is merely the shift parameter (as shown in the definition equations).

The number of possible a values, as a result of the inverse in the decryption function, is limited by the condition explicated in the multiplicative inverse eligibility test of Rule 15, with m being 26: only when gcd(a, 26) = 1 is 'a' an inverse, and thus contributive to the number of possible values. Counting all possible values of a (1, 3, 5, 7, 9, ...), all odd numbers 1-26 other than 13, produces an end result of 12 possible 'a' values.

The keyspace, being (#a) × (#b), is therefore 312. This remains very easy to break using a brute-force attack (see Rule 6).

Furthermore, letter frequency also remains preserved (since every instance of a particular plaintext character will be converted into the same corresponding ciphertext character, despite the minute differences in the encryption function), so a letter frequency analysis attack is also applicable.

Though multiple encryption (i.e., running the ciphertext produced by encryption function back into the encryption function to encrypt it again) will not increase the security of an affine cipher, it will for other ciphers, such as the Data Encryption Standard (see Rule 44).

# C. Rule . Note: You can ALWAYS derive the decryption function from the encryption function of basic ciphers/algorithms. Just know that the encryption function is equal to y and the decryption function is equal to x, and you can simply isolate the appropriate variable to find whichever function you wish.

In doing so, always recall that dividing by a variable (as can occur when dividing both sides as a part of the isolation process) makes that variable into its inverse, as foretold in the lesson of Rule 17 and as demonstrated in the mathematical definition of Rule 20.

Know that when doing any algebra with an encryption/decryption equation, the mod operator will simply remain in place on whichever side it was on originally. The mod can be effectively ignored until the end of the isolation process, with it not playing a part in the moving of variables from side to side and whatnot. It will continue to act upon whichever variables are on its side of the equation.

III. Stream Ciphers.

III.I Introduction to Stream Ciphers.

# Fig. 3. - Symmetric Cryptography Hierarchy:

A Classification hierarchy of Symmetric Cryptography. There are two major branches of Symmetric Cryptography, Block Ciphers and Stream Ciphers. See Subsection I.I for the full tree, and see Rule 6 for Cryptanalysis.

# C. Rule . Modular 2 Operations:

In cryptography, when considering all of the mathematical operations commonly applied in primitives and ciphers, the modular 2 operation carries with it a set of properties and a reputation that makes it unique even within modular arithmetic.

Mod 2 addition and subtraction are the same operation. Multiplication complicates things, but in all instances where the mod 2 function is limited to addition or subtraction, the results will be identical no matter which operation is applied.

With the smallest possible modulus that allows room for variation in the remainder, mod 2 operations are by and large the most popular application of modular arithmetic in cryptography. This is because their restriction of all numbers to be either 0s or 1s (by the very nature of a modular operation, see Rule 9) successfully emulates binary representation, making them extremely useful for bit-based computation (and in particular ciphers that intake streams of data, like stream ciphers (see Rule 23)).

The modular 2 operation is used so often, it has its own symbol in cryptographic diagrams: a circle with a cross. See Rule 23 for an example. In electrical engineering, the mod 2 function is known as the XOR gate, a type of binary gate (see E.E. [[[[[), further showcasing the broader importance of modular 2 operations in science as a whole.

Note that the listed properties do NOT extend to the modulus of any other number - only for two. See the Vigenère Cipher (see Rule 19), which works akin to a stream cipher, for proof using mod 26 (hint: the addition and subtraction operations produce different values).

# C. Rule . Stream Ciphers

Mathematical Definition:

(x_i, y_i, s_i) ∈ ℤ₂

Encryption: e_{s_i}(x_i) ≡ (x_i + s_i) mod 2

Decryption: d_{s_i}(y_i) ≡ (y_i + s_i) mod 2

x_i = The Plaintext bit at index i.
y_i = The Ciphertext bit at index i.
s_i = The "Keystream" at index i, equivalent to 'k' (as in past ciphers). Here, it is the Key (at index i) fed into the encryption and decryption functions.
ℤ₂ = A ring of the first 2 integers: {0, 1}. See Subsection II.III for an explanation of rings.
e_{s_i}(x_i) = Encryption function, a mathematical formula that converts each x bit into a y bit.
d_{s_i}(y_i) = Decryption function, a mathematical formula that converts each y bit into an x bit.
mod 2 = The modulus/remainder operator with respect to 2. Numbers can only be 0 or 1 in this reality - see Rule 22 for a full treatise.

Explanation:

A cipher that encrypts bits individually, processing one bit after another in a sequence. This is opposed to a Block Cipher (see Section IV), which takes in large numbers of bits all at once.

Using the modular 2 operation (see Rule 22), the encryption and decryption functions of the basic stream cipher can be depicted as shown:

A simple diagram displaying how a simple Stream Cipher (abiding by the encryption and decryption functions defined) would work. Note the usage of the XOR gate, or modulus 2 function.

The central complexity of the Stream Cipher is the generation of the "Keystream", e.g. the stream of bits with which the plaintext is encrypted. This generation must be performed through Random Numbers - see Rule 24.

If the keybit (s_i) is zero, then the plaintext bit will pass through unchanged, but if the keybit is 1, then the plaintext bit will flip (like binary). Thus, if the keystream (by divine intervention) is straight 0's, then the plaintext will go unencrypted.

# Keystream: A stream of characters that is combined with the plaintext to produce the ciphertext in a Stream Cipher, or, in the case of asynchronous stream ciphers, combined with a previous ciphertext to produce a new ciphertext.

The keystream is derived from the key and the seed. Generally, the keystream is denoted as "s_i", while the seed itself is s₀, the initial position/keybit of the stream from which all recursive stream ciphers evolve from in their encryption functions.

The creation of a keystream (both key and seed) involves random number generators, which are detailed in Subsection III.II. The seed, for example, may be created using an actual True Random Number Generator (see Rule 24), while the rest of the key will be deterministically generated from that seed using a PRNG (also Rule 24).

# Synchronous Stream Cipher: A Stream Cipher in which the keystream depends solely on the key, generated independently of the plaintext or ciphertext. Examples include Salsa20 & ChaCha.

# Asynchronous Stream Cipher: A Stream Cipher in which the keystream depends on the ciphertext in addition to the key - the key generation is reliant on previous ciphertext, making it a risky operation. An example is the Cipher Feedback (CFB) mode, described in Rule [[[.

III.II Random Number Generation.

# C. Rule . Random Number Generators (RNG):

The means with which a system can generate random numbers, for the purposes of constructing/filling a keystream with random bits, are split into three main mechanisms:

True Random Number Generators (TRNGs): "True" random numbers stem from random physical processes that cannot be reproduced.

Such natural sources of entropy include flipping a coin, rolling a die, thermal noise, mouse movement, and keystroke timing.

Hardware and cryptographic systems have tried a lot of different techniques over the years to create a TRNG by analyzing/recording internal characteristics of the computer. Most modern CPU's have a TRNG within a TPM (trusted platform module) on the motherboard.

Pseudo Random Number Generators (PRNGs): An RNG system that is random, but reproducable. Because PRNG's are computed, they are inherently deterministic processes. Since they are deterministic, Alice and Bob can generate the same keystream independently of one another, only needing to know the initial "seed", which is like the key for the key (see the "Keystream" definition).

Often, PRNG's are computed with the following function:

s_i+1 = f(s_i)

s₀ = The initial seed value, which may be an actual TRNG.
f(s_i) = The 'randomizer' function, which performs whatever upon the seed in order to computationally randomize it.

This recursive generation is the form taken by the Linear Congruential Generator (see Rule 26).

The rand() function in C is a PRNG and uses a seed of 12345, with the following function as the randomizer:
S_i+1 = ((1103515245 × s_i) + 12345) mod 2³¹.

Cryptographically Secure Pseudorandom Number Generators (CSPRNGs/CPRNGs): PRNG's that have an additional property: the numbers must be unpredictable.

Given n output keybits (s_i, s_i+1, ..., s_i+(n-1)), it is computationally infeasible to construct s_i+n.

While in most applications (read: the world outside cryptography), PRNG systems are perfectly fine and suited to create 'random' numbers for whatever purpose, they form very unsecure cryptosystems due to their inherently deterministic nature, and thus in Cryptography, the additional property of unpredictability is required.

# Unconditional Security: A cipher is "unconditionally secure" (or "information-theoretically secure") if it cannot be broken even with infinite computing resources.

Even if a key is 2^1000 bits, it would not be considered "unconditionally secure", because 2^1000 computers (a patently not infinite amount of computers) would be able to break the key in one second. This is despite the fact that such levels of computing power will never be realized in human history.

Thus, though there are very many ciphers that are "practically" secure (forming the backbone of our modern cybersecurity systems), they are not unconditionally secure.

# Computational Security: A cryptosystem is computationally secure if the best known algorithm for breaking it requires at least t operations, where 't' is some ridiculous number far beyond the reach of modern brute-force computational abilities.

# C. Rule . One Time Pad (OTP)

The One Time Pad (also referred to as the "Vernam cipher") is the unconditionally secure, perfect encryption algorithm, unbreakable in any meaningful way. The OTP is a stream cipher with the following properties:

1) The Keystream bits s_i must originate from a TRNG.

2) Each Keystream bit must be used only once.

Knowledge of the Keystream must be exclusively limited to the involved parties, of course. At the end of the data transfer, the party that received the data must destroy the key so that it cannot be used by another party to decrypt the message after the fact. This is the same reason why reusing keys is extremely risky and nullifies the security benefits of the OTP.

The security of cryptosystem draws from the fact that, without knowledge of the keystream, even a brute-force attack would be futile, because any two text strings are equally as probable to be a particular word in the plaintext: if a four letter word were to be encrypted, it is equally probable for it to be "when", "stop", or "abcd".

There is one fundamental drawback of the OTP, however, that prevents its universal application: since keystream bits cannot be reused, the key has to be as long as the plaintext. A 400MB file, 8 bits to the byte, would entail a 3.2GB key.

# Fig. 4. - Practical Stream Ciphers:

A diagram of how all practical stream ciphers should function, with the modular 2 addition highlighted and the keystream generator (which varies from cipher to cipher) displayed.

# C. Rule . Linear Congruential Generator (LCG):

Mathematical Definition:

(A, B, i, s_i) ∈ ℤ_m
i > 0
Key K = (A, B)

S_{i + 1} ≡ ((A × S_i) + B) mod m

S₀ = The initial seed value, which may have been created using a TRNG. Of course, it is not depicted in the equation as the equation only refers to i > 0. The seed value is exactly ⌈log₂ m⌉ bits long.
S_i = The Keystream at index i, where i > 0. It is exactly ⌈log₂ m⌉ bits long.
S_{i + 1} = The following value in the keystream, which is recursively calculated using the value of the previous bit in the keystream (along with the key). It is exactly ⌈log₂ m⌉ (bit-length of m) bits long.

ℤ_m = A ring of m integers. See Subsection II.III for an explanation of rings.

A = The multiplied value of the key. It is exactly ⌈log₂ m⌉ (bit-length of m) bits long.
B = The "shift parameter" value of the key, only used for the addition operation. It is exactly ⌈log₂ m⌉ (bit-length of m) bits long.

mod m = The modulus/remainder operator, which represents the usage of modular arithmetic, with all numbers represented by their remainder with respect to m.

Explanation:

# C. Rule . HOW TO CRACK A LINEAR CONGRUENTIAL GENERATOR (LCG):

In order to crack a rudimentary LCG system and discover the full keystream to decrypt the system, the attacker only needs to know three characters of the plaintext that were encrypted (along with all of the ciphertext). Generally, in encrypting a document (such as Microsoft Word or Excel), there will be file or protocol header information at the start (like the version number), which will always be the same initial symbols being encrypted. Therefore, knowledge of the attacker of these plaintext characters, in addition to the ciphertext, can be assumed.

With knowledge of the LCG key-stream equation, the attacker would know S₀, S₁, and S₂ only. There would thus be three unknowns: A, B, and m. However, the set of commonly used m values is relatively small (see list), and so iterating through them using brute-force (once A and B have been discovered) would be very simple: simply calculate the associated keystream for each possible m value, then check each keystream using the stream cipher decryption function (see Rule 23) and the known ciphertext, until the correct keystream (and thus m value) is found. Thus, m is considered one of the known variables.

The attacker only has to create a system of equations-style problem (using the three known plaintext characters and the ciphertext) in order to isolate A and B, respectively.

S₁ ≡ ((A × S₀) + B) mod m
S₂ ≡ ((A × S₁) + B) mod m

A = ((S₁ - S₂) × (S₀ - S₁)⁻¹) mod m
B = (S₁ - (S₀ × (S₁ - S₂) × (S₀ - S₁)⁻¹)) mod m

After finding A and B, all that would be left to be done is to brute-force m as previously described, and the system would be cracked.

III.III Linear Feedback Shift Registers.

# Clocked Storage Element (CSE): A storage container that captures information at a specific moment in time, storing it until needed. In electrical engineering, CSE's are part of clocking subsystems and respond to Clock signals (see E.E. Rule [[[[[).

# C. Rule . A "Flip-flop" is a clocked storage element that stores a single bit. There is an input, an output, and a Clock Input that determines whether any inputted bit is to be stored or not.

The general symbology used in diagrams for a flip-flop is as follows:

A diagram of the basic functions of the flip-flop electrical component. D is the input, Q is the output. Ignore the Q with the line for now. Courtesy of buildcircuitelectronics.

And here is a diagram showing how the output relies on clock signals to accept an input change, if any:

A diagram detailing how the clock signal will influence what data from the input is stored and produced in the output. As shown, the flip-flop will only store the first event captured in the input when the clock signal becomes active, regardless of whether the input changes while the clock signal is still active. Courtesy of Wikipedia.

# Shift Register: A "shift register" is akin to a system, a concatentation of several individual flip-flops, each sending a signal to the next until the final flip-flop, which will produce each bit of the output. It is one half of an LFSR.

An example of a shift register, with three flip-flops, with every new value sent receiving a clock signal:

A shift register, complete with three flip-flops, and a starting value of 1-0-0. Data moves over once, and then each change is moved over until there are no more changes to be moved.

# C. Rule . Linear Feedback Shift Registers:

Mathematical Definition:

The equation depends on the specific characteristics of the LFSR. Specifically, the number of flip-flops, the number of mod 2 functions, and which flip-flops have mod 2 functions. The following formula applies to all LFSR's and is fully explained/derived in the LFSR Generalization (see Rule 30):

(S_i, P_j) ∈ ℤ₂
(i, j) ∈ ℤ_m

$$S_{m+1} \equiv \left( \sum_{j=0}^{m-1} S_{i+j} \times P_{j} \right) \mod 2$$
m = The number of flip-flops in the LFSR. This number determines both the length of the state register and the period (see Rule 32) of the output sequence.

i = The offset in the shift register, e.g., the number bit of the keystream that is being found. This is why it is ∈ ℤ_m - it can go no greater than value m.

j = The index variable for each individual flip-flop, which have the potential to contribute to the feedback path, as determined by the feedback coefficient.

S_{m + 1} = The next output bit into the keystream produced by the LFSR after shifting.

S_{i + j} = The bit held by a particular flip-flop j, at a specific offset in the shift register i. This is the value being inputted into the multiplier function from the flip-flop.

P_j = The feedback coefficient of flip-flop j. Either 0 or 1.

mod 2 = The modulus/remainder operator with respect to 2. Numbers can only be 0 or 1 in this reality.

Explanation:

There is an inherent relationship between electrical engineering and Cryptography, specifically in how Cryptography developed: in the early days of the development of the Stream Cipher, the goal was to create a 'small' stream cipher, e.g. a stream cipher that could be implemented in hardware without using too much power.

Today, all stream ciphers fit one of two categories: ciphers optimized for software implementation, and ciphers optimized for hardware implementation. A prime example of the latter are "Shift Register-Based Stream Ciphers", in particular shift registers with feedback, better known as Linear Feedback Shift Registers.

Occasionally referred to as cryptographic "linear recurrences" by various demented individuals, Linear Feedback Shift Registers are algorithms of keystream generation that utilize yet another PRNG, and are thus crackable with some effort (see Rule 36).

Many ciphers, such as the A51 cipher, consist of multiple LFSRs (in the case of A51, 3), which greatly enhance the security of the cipher and validate the usage of LFSR's for legitimate cryptographic purposes ([[[[[[[[[ how?).

LFSR's incorporate a standard Shift Register with a feedback mechanism/path, generating fresh input for the first flip-flop using some output data. This is done by a modular 2 addition function (an XOR gate), which can be performed between the value from the final flip-flop and any other located in the shift register. Multiple mod2 functions can be added in fact, complicating the structure of the LFSR.

The mod 2 calculation is first performed in the initial state of the shift register, before the first clock cycle, pre-computing the value that will be used as input for the first flip-flop on the first clock cycle.

Remember: The feedback mechanism that the outputs take uses a mod2 function, but the actual input into the first flip-flop itself is without any special system - it is directly taking in the given number.

Very important to notice in the progress of an LFSR is that there will be a distinct pattern that holds in each calculated value (placed below the flip-flops on a diagram), known as the Diagonal Rule: each bit value will be translated diagonally to the right, for as many flip-flops as there are to the right, continually. Note this pattern in the diagram below:

A Linear Feedback Shift Register, complete with three flip-flops and a full period of values, which compose one complete keystream.

The specific mathematical formula of the key produced by the LFSR depicted is:
S_i+3 ≡ (s_i+1 + s_i) mod 2.

# Feedback Path/Feedback Mechanism: The path in which all the XOR gates (or other operations) are located, streaming back to the first flip-flop for the new input. This mechanism, when combined with a shift register, helps form an LFSR.

# State (in regard to LFSR's): Also referred as the "initialization vector" of the LFSR in this context. The bits of the flip-flops at a particular shift register offset. Given knowledge of an LFSR state and the XOR-positions/feedback path, the following state, produced through another shift register offset, can be determined.

Even without knowledge of the XOR positions, by pure knowledge of the first state, every flip-flop bit in the following state (except for the first F.F., the recipient of the input of the feedback state) can be determined, as a result of the diagonal rule of LFSRs (see Rule 29).

# C. Rule . LFSR Generalization:

To generalize the LFSR so that a general equation can be applied, several new mathematical operations must be introduced.

First, the number of flip-flops in the register can be generalized to an arbitrary 'm' number of times, which is the simplest generalization to be made.

It is possible for any of the flip-flops to have an XOR gate, and for there to be as many XOR gates as flip-flops, minus one. Since all of the flip-flops have a chance at being part of the feedback path, a multiplier function must be added after each flip-flop, between the path after the flip-flop and any possible XOR gate above on the feedback path. Thus, the multiplier controls the way to the XOR gate, and essentially acts as an on-or-off switch.

A generalized LFSR, with every possible link between a flip-flop and the feedback path having a multiplier function, serving as the 'switch' determining whether that flip-flop is included as a part of the XOR operation.

The multiplier function works by multiplying the value of the flip-flop (the Input value I) by the internal P-value, which is determined by whether the flip-flop actually has an XOR gate or not (see below). The p-value can only be a binary value: p_i ∈ ℤ₂.

The multipliers (labeled P₀ to P_m-1) have their own values, known as "feedback coefficients". These values are dictated by the "switch", e.g. if there is or isn't a connection to an XOR gate from that flip-flop. Furthermore, there is an input I and output O that travels from the multiplier onto the XOR gate path.

If the switch is closed (e.g. there IS an XOR gate connected to that flip-flop, and thus a connection to the feedback path), then the p-value will equal 1, since the input is equal to the output: O = p_i × I = I.

If the switch is open (no XOR gate), then the p-value will equal 0. The output O is equal to 0, and since O = p_i × I, and I is not necessarily 0, then p_i must equal 0.

For the following formula, you must assume that s_m is the sum of all of the XOR gates, e.g. the preloaded value that is ready to be inputted into the first flip-flop after the initial state has finished being processed. Thus, the second complete XOR-ing of the values (the second value inputted into the first flip-flop) will be S_m+1, and so on. The value of the input into the first flip-flop is the result of the summation of all of the possible XOR operations that could be present in the LFSR.

(S_i, P_j) ∈ ℤ₂
(i, j) ∈ ℤ_m
$$S_{m+1} \equiv \left( \sum_{j=0}^{m-1} S_{i+j} \times P_{j} \right) \mod 2$$
m = The number of flip-flops in the LFSR. This number determines both the length of the state register and the period (see Rule 32) of the output sequence.

i = The offset in the shift register, e.g., the number bit of the keystream that is being found. This is why it is ∈ ℤ_m - it can go no greater than value m.

j = The index variable for each individual flip-flop, which have the potential to contribute to the feedback path, as determined by the feedback coefficient.

S_{m + 1} = The next output bit into the keystream produced by the LFSR after shifting.

S_{i + j} = The bit held by a particular flip-flop j, at a specific offset in the shift register i. This is the value being inputted into the multiplier function from the flip-flop.

P_j = The feedback coefficient of flip-flop j. Either 0 or 1.

mod 2 = The modulus/remainder operator with respect to 2. Numbers can only be 0 or 1 in this reality.

# Primitive Polynomials: A form of irreducible polynomials, akin to a "prime" polynomial, with the only "factors" being 1 and the polynomial itself. [[[[[[[EXPLAIN IN A MORE COHESIVE AND FULLY ELABORATED MANNER[[[[[[[[.

# C. Rule . Types of LFSRs:

In practice, there are three types of LFSRs, the properties of which are the result of the feedback coefficients and degree.

Primitive Polynomial-based: These LFSRs are composed of primitive polynomials (see definition), and produce a maximum-length sequence, the only LFSRs to do so.

Non-Primitive Irreducible Polynomial-based: Using irreducible polynomials that are not "primitive", these LFSRs produce a period that is totally independent of the initial value/"state" of the register. These algorithms do not produce maximum-length sequences.

Reducible Polynomial-based: These LFSRs have a sequence/period dependent on the initial state of the register. Of course, these algorithms can never produce a maximum-length sequence.

# C. Rule . The Period is the number of different strings of numbers that can occur in a specific LFSR before repeating. In a generic LFSR with 3 flip-flops, (such as that given in the LFSR Exposition - see Rule 29), this number is 7, e.g. all possible states.

A simple principle can thus be derived: The maximum possible period (or sequence length) generated by an LFSR of degree m is 2^m - 1.

As soon as a previous bit pattern/sequence (a 'state') is reached, then the cycle will repeat itself. Only certain feedback configurations (P_m-1, ..., P₀) yield maximum length sequences, regardless of the initial bit pattern/state (!). For any LFSR with a set number of flip-flops, m, and a given positioning/number of XOR gates, no matter what the initial state is, the period will always be the same.

This is why the given value is the maximum possible period of an LFSR - only some reach it.

There are patterns within the LFSR, specifically that of the positioning of the XOR gates with respect to the degree m, that can be used to determine if an LFSR will produce a maximum-length period or not - see Rule 35 for more information. These patterns require knowledge of the Polynomial Representation of LFSR's - see Rule 33.

# C. Rule . The only string of bits that can never appear in an LFSR is straight 0's. Such a thing is only possible if the initial setting is all straight 0's, which is a ridiculous thought not to be entertained.

# C. Rule . Polynomial Representation of LFSR's:

All that is needed to represent the nature of an LFSR (devoid of the initial state) is the degree m (the number of flip-flops), and the feedback coefficients. These characteristics can be represented in simple polynomial form:

P(x) = x^m + P_m-1 × x^m-1 + ... + P₁ × x + P₀

m = The number of flip-flops.
x = The simple polynomial variable, nothing to be plugged in.
P_m-1 = The feedback coefficient of the very first multiplier function (and thus, the flip-flop it represents) in the sequence. P_m would be to the left of the first flip-flop, along the very end of the feedback path, and thus is always 1 and is unnotated. The sequence P_m to P₀ represents all multiplier functions, left to right.

For example: The LFSR depicted in Rule 29 would be represented as x³ + x + 1.

Even if there were no XOR gates in the entire LFSR, then the equation would still be x^m + 1, representing the ending and beginning points of feedback path.

What you are really looking at when you see a polynomial expression of an LSFR is the degree of the first term, and all of the terms in between the first term and the last term, because there is always going to be a +1 at the end. Otherwise, the output would not be incorporated into the feedback path, which is utterly preposterous and is never seen in the real world.

There are patterns in LFSR polynomials that can be used to determine whether the LFSR has a maximum-length period or not: see Rule 35.

# C. Rule . Deriving Maximum-Length Periods from LFSR Polynomials:

Only LFSR's with primitive polynomials (see definition) yield maximum-length sequences.

Below is a table showcasing an example primitive polynomial for each m value 2-128. Note that there are multitudes of primitive polynomials for any given m value, too many to produce a complete list. There are 6.92 × 10⁷ primitive polynomials for an LFSR of m 31.

In the table below, take the values as representing the degrees in reverse order: x^0 + x^1 + x^2, for example. This ensures the +1, a necessary component of any LFSR polynomial as explained in Rule 34.

A table showcasing an example primitive polynomial for each m value 2-128.

# C. Rule . HOW TO CRACK A LINEAR FEEDBACK SHIFT REGISTER (LFSR):

While Linear Feedback Shift Register's have the properties of relative randomness and are exceedingly efficient (with 100 flip-flops, a miniscule amount of space in hardware, producing a period of 2¹⁰⁰ - 1), it fails on the last hurdle: unpredictability.

By its very nature, an LFSR is a deterministic means of producing a keystream, thus opening it up to vulnerabilities that can be exploited. The main principle that must be acknowledged in attempting to 'break' or 'crack' any stream cipher that utilizes an LFSR is as follows:

If an attacker knows (at least) 2m output values of an LFSR, he can recover the entire LFSR configuration.

Oscar, as always knows the complete ciphertext string. Furthermore, he knows the degree m of the LFSR, and, most damningly, he knows the first few plaintext characters: x₀, ..., x_{2m - 1}. With the header information, it is still realistic for that amount of plaintext to be known to the attacker.

Given this knowledge, Oscar simply has to follow a 3-step process in order to recover the entire key:

Step 1:
Utilizing the definition of the encryption function of a stream cipher, the keystream at position i can be isolated:

i ∈ ℤ_2m

S_i ≡ (x_i + y_i) mod 2.

Thus, Oscar is able to recover the first 2m output bits (the output/keystream bits) of the LFSR, which is not enough: the maximum possible period is 2^m - 1, so there remain a considerable number of output bits that need to be recovered.

Step 2:
In order to discover the remaining bits, the feedback coefficients of the multiplier functions must be discovered. This is done using the generalized summation equation of the LFSR. Expanding out the summation will produce the following equation:

S_m ≡ (S_m-1 × P_m-1 + ... + S₀ × P₀) mod 2

The only unknowns in this equation are every single feedback coefficient. Thus, there are a total of m unknowns that need to be solved for. In order to solve for the total of m unknowns, a system-of-equations type problem, with m equations, must be created:

S_m ≡ (S_m-1 × P_m-1 + ... + S₀ × P₀) mod 2

S_m+1 ≡ (S_m × P_m-1 + ... + S₁ × P₀) mod 2

.
.
.

S_2m-1 ≡ (S_2m-2 × P_m-1 + ... + S_m-1 × P₀) mod 2

As was done for Linear Congruential Generators, each unknown can easily be solve for by Gaussian Elimination ([[[[[) or Matrix Inversion ([[[[[[[[).

Step 3:
Now that Oscar knows all of the Feedback Coefficients, he can build the LFSR himself.

He can determine the initial state using the output bits that are already known (using the Diagonal Rule, see Rule 29), and from there he will be able to compute the full keystream, S_2m+1 and beyond.

With the keystream, Oscar will now, of course, be able to recover the full plaintext:

x_i ≡ (y_i + s_i) mod 2

III.IV Alternate Secure Stream Ciphers.

# The Development of Modern Stream Ciphers:

During the 1990s, stream ciphers were a center of attention for many cryptanalysts, who ruthlessly revealed vulnerabilites and weaknesses that opened particular popular ciphers of the time - this would fuel the predominance of block ciphers in modern cryptographic communications.

As a result, the eSTREAM project was established in 2004 by the "European Network of Excellence in Cryptology" (an E.U. front), which called for the development (and expert selection) of stream ciphers.

By 2008, several new, highly secure stream ciphers were developed, divided into two main categories/profiles:

1. Stream ciphers designed for implementation in software with high-volume data transfer requirements. Examples: Salsa20 (see Rule 37) & ChaCha (see Rule [[[[).

2. Stream ciphers designed for hardware implementation, with low power consumption, storage capabilities, and gate counts. Example: Trivium (see Rule [[[[).

# Add-Rotate-XOR (ARX) Ciphers: An algorithm in a cryptosystem that only uses additions, rotations, and XOR operations to generate its keystream.

# "Nonce": A number used only once.

# C. Rule . Salsa20:

Mathematical Definition:

[[[[[To be completed at a later date. I have determined this topic: "Salsa20", to be non-essential to the learning and understanding of the major fundamentals of Cryptography, and so it has been placed in the low-priority learning register.[[[[[

Explanation:

"Salsa20" is a family of profile-1 stream ciphers, developed in 2005. The PRNG utilizes a 32-bit ARX framework, with additional layers of security added through increased 'rounds' of encryption: the base cipher (Salsa20/20) uses 20 ciphers, while reduced-round, faster varients use 12 rounds (Salsa20/12) and 8 rounds (Salsa20/8), respectively.

Salsa20/20 (hereafter referred to as "Salsa20") supports key-lengths of either 128 or 256 bits, though 256 is recommended. [[[[[To be completed at a later date. I have determined this topic: "Salsa20", to be non-essential to the learning and understanding of the major fundamentals of Cryptography, and so it has been placed in the low-priority learning register.[[[[[

# C. Rule . [[[[[Reserved for Future Usage. If additional rules are needed, then the WoO system will be used.[[[[[ [[[[[To be completed at a later date. I have determined this topic: "Salsa20, ChaCha, and Trivium", to be non-essential to the learning and understanding of the major fundamentals of Cryptography, and so it has been placed in the low-priority learning register.[[[[[

IV. Block Ciphers.

IV.I Historical Background.

# C. Rule . Block Ciphers:

A cipher that encrypts a block of 'b' plaintext bits all at the same time, where b is greater than one (since one bit would be a stream cipher). Thus, the encryption of any plaintext bit in the block depends on the nature of every other plaintext bit in the block, like the positions of celestial bodies in a galaxy.

Most block ciphers have a block length of 128 bits (16 bytes), such as the AES (see Rule [[[[[), or 64 bits (8 bytes), such as the DES and 3DES (see Rule 44 & Rule [[[[, respectively). For each input, a block cipher will produce an equivalent amount of output data: in the case of DES, this will be another 64 bits.

Block Ciphers are more commonly used than Stream Ciphers in modern computer encryption schemes, due to the stream cipher mass extinction event caused by relentless attacks by cryptanalysts in the '90s, and the mass adoption of block ciphers that followed.

# Look-up Table: A Substitution table - an algorithm in which every input is paired with a particular value/address, and has that value outputted. It is not a requirement that the number of output bits match the number of inputted bits - in the DES Feistel cipher, for example, table S₁ produces 4 output bits from 6 input bits.

# Product Cipher: A generic term used to describe any cipher that utilizes/concatenates the methods of permutation and repeated encryption (in "rounds") to create the element of "diffusion" (see Rule 42) necessary for any block cipher.

# C. Rule . Confusion & Diffusion:

During the postwar Cryptographic research boom, stemming from the rush of funding for military science, it was postulated in a classified paper that there were two atomic operations that all good block ciphers should perform:

Confusion:
The relationship between the plaintext and ciphertext must be obscured. Any cipher that scrambles its characters thoroughly enough that the plaintext and ciphertext bare no resemblance to one another, passes this requirement.

However, the sole utilization of this requirement and the abandonment of the second, as was done with the Substitution and Shift Ciphers, will produce a very unsecure cipher.

Diffusion:
The influence of each plaintext bit is spread over many ciphertext bits. In systems making good use of diffusion, changing one bit of plaintext results in an average change of half of the total output bits, known as the Avalanche effect and akin to chaos theory. Example: Permutations of an encryption method to make a cryptosystem more secure.

This element is impossible under stream ciphers, since each bit is processed and encrypted individually.

In order to make a truly "secure" cryptosystem using block ciphers, you need to combine both confusion & diffusion elements into the algorithm many times over, in the form of a product cipher.

IV.II Feistel Networks.

# C. Rule . Feistel Ciphers:

Mathematical Definition:

i ∈ {1, ..., 16}

L_i = R_i-1
R_i = (L_i-1 + f(R_i-1, k_i)) mod 2

i = The particular round, minus 1, that the left and right tranches of the data are at. L₀, for example, would be in the first round of the cipher. Since i has a minimum value of 1, the first round that can have the equations utilized on is round 2, since there is no past round data to feed into round 1.

L_i = The left tranche as it is in round i-1. If not in the first round, it will inherit the unencrypted form of the previous right tranche as its starting value in the round.

R_i = The right tranche as it is in round i-1. If not in the first round, it will inherit the XOR-ing of the left tranche of the previous round with the encrypted right tranche of the previous round as its starting position.

R_i-1 = The right tranche of the round prior to the current one, used by the f function (see below).

f = The mysterious encryption function, the central part of the entire Feistel scheme. Uses the right tranche and the subkey of a particular round to make an encryption. Explained in Rule 47.

k_i = The specific subkey used in the round. Each subkey is derived from the main key, explained (for DES) in Rule [[[[[.

mod 2 = The modulus/remainder operator with respect to 2. Numbers can only be 0 or 1 in this reality.

Explanation:

A Feistel Cipher/Feistel Network/Feistel Structure is a round-based algorithm used to encrypt data. Widely used among block ciphers (though not in the AES), several ciphers developed more optimized versions of the network, such as the DES. The original, standard, unadorned version of the Feistel Network is depicted below, while the DES-enhanced version can be seen in Rule 44.

A diagram of an individual round of the Feistel Network, almost totally unadorned of any additional characteristics specific to the DES system - see Rule 44 for that. The given values and lengths, such as the 48-bit subkey, are all specific to the DES varient.

The bulk of the modern ciphers currently in use were built using feistel ciphers and their varients. A particular advantage of the Feistel Network is that decryption and encryption are almost the same operation, simply using a reversed key schedule (see Rule [[[[[[[[). In terms of importance/usage in the cipher structure, Feistel Networks are the block cipher equivalent of the LFSR, although it actually encrypts the data and requires a key, instead of merely producing the key (like the LFSR).

Devoid of the more specialized characteristics (particular those relating to DES, described in Subsection IV.III), the system by which the "standard", unadorned Feistel Cipher encrypts data is simple:

The data path will be split into two parts: a Left and a Right tranche, L₀ and R₀ respectively. For a 64 bit cipher (like DES), each part will have 32 bits.

The Right tranche will have its bits fed into the f function, which is the actual encryption function of the entire operation.

Additionally, a subkey will also be fed into the f function: In each round of encryption, a new subkey derived from the main key is used, thus keeping the process deterministic and replicable, but complicating the process so much that discovery of the plaintext from knowledge of the ciphertext is made (most likely) impossible.

In the DES cipher, which differs from the more generalized standard feistel network described here and shown in the graphic, the subkey is created using a key schedule. See Rule [[[[[ for an explanation of how this is carried out.

The length of the subkey does not have to be exactly as long as tranche - in the DES, the subkey is 48 bits long, the purpose of which is explained in Rule [[[[[[[.

The output of the encryption of the right tranche will then be XOR'd with the left tranche (which had gone untouched until now).

For each round, the output of the previous round is fed in as input to the next round. The trick at this juncture is that the tranches will switch their positions in the new round: the right tranche will become the new left tranche (L₁), and the left tranche will become the new right tranche (R₁).

IMPORTANT NOTE: The Right tranche is sent to the left tranche of the next round in its unencrypted form. The Right tranche passes through unaltered to the left tranche; since the Left tranche of the first round got XOR'd with the encrypted version of the Right tranche, IT is the one being "encrypted" or simply changed in any meaningful way, not the right tranche.

IV.III Data Encryption Standard.

# Bijection: The mapping of every possible input uniquely to exactly one output, and vice versa. The DES (see Rule 44), for example, bijectively maps a 64-bit input block to a 64-bit output block using the principle of diffusion (see Rule 42): a difference in a single input bit will change the entire output block, thus making the blocks themselves bijectively mapped, but not the bits.

# C. Rule . Data Encryption Standard (DES):

Mathematical Definition:

[[[[[

Explanation:

The Data Encryption Standard is a block cipher that encrypts 64 bits at a time, outputting 64 bits of ciphertext in return, through bijective mapping (see definition). As a product cipher, DES uses 16 encryption rounds of an optimized form of the Feistel Network (depicted below) to thoroughly obfuscate and diffuse the impact of each bit, under the known elements of best-practice block ciphers (see Rule 42).

A diagram of an individual round of the DES-specific optimized version of the Feistel Network, complete with a representation of how the subkey is formed from a transform of the main key.

The key-length of DES is 56 bits (and thus has a keyspace of only 2^56 under the relation given in Rule 4), highly crackable, and so it is not in popular use today - it has been largely replaced by other ciphers with longer keys, such as 3DES ([[[[[) and AES ([[[[[). Still, it remains useful and neccesary to study the design of DES, which is modern Cryptography in its embryonic state. The subkeys used in DES for the Feistel Network are only 48 bits - see Rule [[[[ for why that is.

# C. Rule . History of the Data Encryption Standard (DES):

Proposed in 1974 by IBM (in coordination with the NSA), DES was the cryptographic standard for the U.S. government between 1977 and 1998, making it the most studied cipher in the world.

The DES was the first ever "secure" cryptosystem made public by a national government under the postulate of Kerckhoffs's principle (see Rule 3).

Today, there are still some varients of DES that see application: notably, 3DES, which was used in Firefox until 2021. While the original DES, as developed in 1974, is unsecure from Brute-Force attacks due to the short length of its key (56 bits), 3DES has a very long key, and as such is immune to such attacks.

# C. Rule . DES Internals #1 - Bit Permutation:

The first characteristic one may note about the DES Feistel Cipher (as depicted in the graphic of Rule 44) is that the operation undergone by data travelling through the cipher is IP(x), the Initial Permutation function. Undepicted is the very last operation in the encryption process, at the bottom of the cipher (which is 16 rounds deep), where the IP(x)⁻¹, the Final Permutation, would be located.

IP(x) and IP(x)⁻¹ are simple bit permutations, or bitwise permutations: the bits are scrambled and assigned to different placements in the data than they were before. This just the algorithmic term for a crosswiring. IP(x)⁻¹, being the inverse of IP(x), perfectly undoes the scrambling of IP(x), cancelling eachother out.

The purpose of the DES bit permutation functions is simple: while serving no cryptographic purpose (since the attacker can simply take the permutation into account, and because it is undone anyway), when DES was designed in 1974, the application of the cipher into hardcare necessitated cross-wiring (physical bit permutation) just to allow the data to enter the chip. The specific bit permutations used for the initial and final permutations (both depicted below) are done in order to, according to leading experts on the issue, "arrange the plaintext and ciphertext bits in a bytewise manner to make data fetches easier for 8-bit data buses", widely used in the '70s.

The Initial and Final Permutations in the DES cipher.

Thus, the bit permutations were merely placed in the DES for practical, electrical engineering purposes, devoid of cryptographic meaning. DES was initially designed purely for hardware, and the electrical engineering-devised permutation (no gates or special operations) made it very easy to be fabricated in hardware form, but somewhat tedious and more time-consuming in software.

# C. Rule . DES Internals #2 - The 'f' Function:

The f function, the heart of the Feistel Network (and thus the DES), acts similarly to the a PRNG, using the two input parameters R_i-1 (32 bits) and k_i (48 bits) to "encrypt" the right tranche. Below are the complete contents of the f function:

The complete F function of the DES-specific Feistel Network.

The first operation that will be performed will solely involve the first parameter: R_i-1. In order to account for the differences in length between the right tranche and the subkey, the right tranche will be expanded from 32 bits to 48 bits, using the special "Expansion function" or "Expansion box", also referred to as simply the "E-box". Notably, this function will provide diffusion to the cipher, as its effects (as described below) will increase the significance & influence of each individual bit of the plaintext with every additional round of encryption.

While the E-box also performs a basic permutation on the inputted bits, it differs from a standard bit permutation in that some of the inputted bits are connected to two bit positions, creating values for an expanded table. With the initial 32-bit string of the right tranche, 16 bits of data will be connected only once, while the other 16 bits will be connected twice. An example of how this can be done is seen below.

The complete F function of the DES-specific Feistel Network.

With the newly 48-bit right tranche, an XOR operation can be performed between the tranche and the subkey (which has its own special means of derivation, explained in Rule [[[[[[). This produces a 48-bit string that incorporates data from both parameters, and now all that is left to be done is to shrink the string back down to 32 bits so that it can be XOR'd later on with the data from the left tranche (outside of the f function).

After the XOR, the most important part, the heart of the f function, is performed: the S boxes. Complementary to the diffusion of the E-box, the S-box will provide the confusion.

The pattern the S-box follows is simple, and is shown in the complete diagram above: The newly-created 48-bit string will be split into 8 tranches of 6 bits. Each tranche will be sent to a Substitution Table, S₁ through S₈.

These tables will be akin to compression functions/compression boxes, as they will produce less output than input: Each table will output 4 bits from an input of 6, thus resulting in a net total of 32 bits, exactly what is needed.

The Compression Algorithm works as follows: Since each table accepts 6 bits, which are read in their combined, binary form, the table is simply taking the given, 6-bit value (0-63) and replacing it with a 4-bit value (0-15). The chosen 4-bit replacement value for each possible 6-bit value is assigned according to a set, predetermined substitution table, the purposeful and exact design of which was done to ensure protection against Differential Cryptanalysis ([[[[[[[[), which researchers at IBM discovered decades before the academics of the late '80s. For information as to what exact criteria the S-boxes were designed with in mind, see the relevant blue section below.

Note that EACH Substitution Table, S₁ through S₈, has its own, individual, substitution table. [[[[[[ The S₁ table is shown below; the rest of the tables can be found on page 3 of this here document.

The specific, exact table used in the first S-Box to replace the 6 bit string with a (max) 4 bit one. Chosen by IBM & the NSA to be differential cryptanalysis-proof.

In order to read the table, you must take the 6-bit value and examine it as follows:

For the Column Select, take the middle 4 values (the internal, non-edge values) and find their individual value in binary, separate from the rest of the string. This will give you a value 0-15.

For the Row Select, simply combine the left and right edge values of the 6-bit value in a single binary number, 0-3. This will serve as the row value.

All that needs to be done to find the 4-bit output (represented in decimal in the given table image, and all others) is to locate it on the table using the known column and row values. The value returned from properly finding the value from the specific column and row values.

All that remains within the f function is a simple bit permutation, in the same fashion (though different permutation patterns) as the one at the top and bottom of the DES cipher (described in Rule [[[[[).

# Most Significant Bit (MSB): The leftmost bit in a binary number - the largest value representation in the binary place value system. This varies from system to system.

# Least Significant Bit (LSB): The rightmost bit in a binary number (1, from 2^0) - the smallest value representation in the binary place value system.

# Design of DES S-Boxes:

The design of the S-boxes in the DES cipher, both the mechanism for determining the output and the tables of outputs themselves, was done by the NSA and IBM under the following specific criteria, ranging from the obvious to the esoteric:

Each S-box must have six input bits and four output bits, as to have an end total of 32 bits (to allow for XOR-ing with the left tranche).

No single output bit should be too close to a linear combination (Linear Algebra - [[[[[[) of the input bits.

If the lowest and the highest bits of the input are "fixed" (e.g., constant while the four middle values cycle through all possible incarnations) and the four middle bits are varied, each of the possible 4-bit output values must occur exactly once.

"For any nonzero 6-bit difference between inputs, no more than 8 of the 32 pairs of inputs exhibiting that difference may result in the same output difference."

If you know what the above words mean, since they seem (atleast to me) to be complete and utter nonsense, please contact me at jdlacabe@gmail.com. You will be credited (if you wish) on this page.

A collision (identical output values, known as a "Zero Difference" (since XOR-ing the values = 0)) of the outputs of the eight S-boxes must only be possible for a maximum of three adjacent S-boxes. Two adjacent colliding S-boxes would thus be fair game.

If it were possible for all 8 boxes to have a zero difference, then the S-box design would have a highly predictable structure.

Diffusion Requirements:

DIFFUSION: If two inputs to an S-box differ by exactly one bit, their outputs must differ by atleast two bits.

DIFFUSION: If two inputs to an S-box differ in two middle bits, their outputs must differ by atleast two bits.

DIFFUSION: If two inputs to an S-box differ in their first two bits, but are identical in their last two bits, the two outputs must be different.