Similar to Security Contest/2019/babyRSA/babyRSA_sol.ipynb


There are 4 options to choose:

  • Encrypt demo: The encryption that use a random generated AES key
  • Decrypt demo: The decryption of demo version
  • Encrypt pro: The encryption that add other information to the cipher encrypted by demo version
  • Decrypt pro: The decryption of pro version

Since the ciphertext of pro version using the base 64 of ciphertext of demo version as a substring, we can find the key by xoring the substring to the ciphertext at any position.

Firstly, we obtain the ciphertext of 'Plaintext is : ' + 'A' * 666 + hash('Plaintext is : ' + 'A' * 666) from the demo version, let say s1.
Secondly, we obtain the ciphertext of 'A' * 666 from the pro version, let say s2. Since the base64 has 3 characters as a block of 4 encoded ones, b64decode(s2) has either b64encode(s1) or b64decode('A' + s1) or b64decode('A' + s1) as a substring. By xoring s2 with any possible cyclic shifted string of these three, we could find the key which is the subarray that occurs many times in the xored string.

def findkey(s1, s2):
    KEYLEN = 80
    st2, keyc = b64decode(s2), None
    for l in range(3):
        st1 = b64encode(('A' * l + s1).encode())
        for j in range(len(st1)):
            k = [st1[(i + j) % len(st1)] ^ st2[i] for i in range(len(st2))]
            h = len([i for i in range(len(k) - KEYLEN) if k[i] == k[i + KEYLEN]])
            keyc = k if (h > m) else keyc
    for i in range(len(keyc) - KEYLEN):
        if keyc[i:i+KEYLEN] == keyc[i+KEYLEN:i+2*KEYLEN]:
    return keyc[i+KEYLEN-i%KEYLEN:i+2*KEYLEN-i%KEYLEN]

Now we could find the key in the encryption pro version. This process can be done locally because this key would not change. Next, we obtain the ciphertext of 'Plaintext is : Bae give me flag' + hash('Plaintext is : Bae give me flag') from the demo version, let's say s3. Then we obtain the ciphertext of 'something' from the pro version, let's say s4. Using the key, we can decrypt s4 to take the plaintext. It is a json type data. We just need to change the enc_data field of this plaintext to s3 and encrypt it again. Then the plaintext will have the string 'Bae give me flag'.


First of all, we have a class F3737:

class F3737(object):
    def __init__(self, components):
        self.components = list(map(reduce_modulo_37, components))
        assert len(self.components) == 37
    def __add__(self, other):
        return F3737(
            map(lambda x, y: x + y, self.components, other.components))
    def __mul__(self, scalar):
        return F3737(map(lambda x: x * scalar, self.components))
    def __eq__(self, other):
        return self.components == other.components
    def __str__(self):
        return str(self.components)
    def __repr__(self):
        return repr(self.components)

This class is a vector belonging to the Field ${\mathbb F}^{37}_{37}$.
For example: Let $v_1 = (0, 1, 2, ..., 36)$, $v_2 = (1, 1, 1, ..., 1)$
and $c = 3$ ($c$ is a scalar) then we have
The addition is: $v_1 + v_2 = (1, 2, 3, ..., 36, 0)$
The multiplication is: $c * v1 = (0, 3, 6, ..., 34)$

Next, we have a f3737 hash function:

def f3737_hash(input):
    if input == b'EVALUATE_TO_ZERO':
        return F3737([0] * 37)
    h = int(sha512(input).hexdigest(), 16)
    result = [0] * 37
    for i in range(37):
        result[i] = h % 37
        h = h // 37
    return F3737(result)

This function takes the sha512 value of the input. Then, it assigns $37$ last digist of the hash value written in base $37$. We can see that :

  • (1) It is hard to reverse the input with the given output.
  • (2) If the input is EVALUATE_TO_ZERO, then the output is $0$.

Let's denote this function by $f$ and go to another hash function:

def myhash(innput):
    i = 0
    result = F3737([0] * 37)
    while True:
        block = innput[BLOCKSIZE * i: BLOCKSIZE * (i + 1)]
        if not block:
        result += f3737_hash(block) * i
        i += 1
    return result

If $g$ is the function belonging to myhash and message $m$ can be divided into $m_0||m_1||...||m_k$ with 16-character string $m_i$, then
$$g(m)=\sum_{i=0}^{k}i * f(m_i)$$
Obviously, we can see that

  • (3) The 0-th block (37-th, 74-th, ... also) is evaluated to $0$
  • (3) The 1-st block (38-th, 75-th, ... also) is evaluated to $f(m_i)$
  • (3) The 2-nd block (39-th, 76-th, ... also) is evaluated to $2*f(m_i)$
  • ...

Coming to the challenge, we first see the generation and verification:

ALLOWED_LETTERS = string.ascii_letters + string.digits + '_'
def generate_password(size):
    rand = random.SystemRandom()
    password = ''.join(rand.choice(ALLOWED_LETTERS) for _ in range(size))
    password_hash_value = myhash(password.encode())
    return password, password_hash_value

def verify_password(password, hash_value):
    if all(c in ALLOWED_LETTERS for c in password):
        if myhash(password.encode()) == hash_value:
            return True
    return False

What we could see from these two functions is:

  • (4) The generation function takes a random string password (containing ascii letters, digits and underscore) with specified length size as input and returns it together with its hash value password_hash_value computed from myhash function.
  • (5) The verification function takes the string password and a hash value hash_value, then returns True if the hash_value is the value computed from myhash function with password as input and False otherwise.

Okay, now it's time to see what we need to get the flag:

def main():
    import signal
    for level in [1, 2, 3]:
        _, hash_value = generate_password(32 * level)
        print("Hash value: {}".format(hash_value))
            user_input_password = input("Your guess: ")
            if not verify_password(user_input_password, hash_value):
                raise Exception()
            print("Wrong password!")
    print("Congratulations! Here's the FLAG:")
    from secret import flag

There are three challenges and $10$ seconds for each. In i-th level, a hash_value is pre-computed for a $32*i$-character string. Then we have to find a string user_input_password that have the hash value equals to hash_value.

Let's come from the naive approach. Obviously, we needs to pass the first level. This level contains only two 16-character strings, let's call $str_1$ and $str_2$. From (3), we can see that the hash_value of this level is $f(str_2)$. It is only a block!! Then what we think next is to find $str_2$ from $z=f(str_2)$. From the beginning, may be you can not notice the characteristic (1) of the function $f$. But going by this way, you have to reverse it to obtain $str_2$. Assuming that if we could find $str_2$ by this way, then an input of SHA512 function could be found from the output which has $37$ known last digits in base $37$. This would make SHA512 not secure anymore so the characteristic (1) said that we should go another way to find flag.

It is also the time to ask whether we need to find the original input. If so then SHA512 would be not secure anymore as we discuss before. Since the input need not to be the original one, its length also could be different. This leads to the question that what if the input has really long length. There would be $37$ types of block (like (3)) and the sum of these block is $\sum_{36}^{i=0}i.a_{i}$. We want to have this value equals to the hash value:
Where we could have $a_i$? We can use the f3737_hash function to compute it. You can explore this way more, but with me this seems to be a hard problem the output of f3737_hash is considered as unpredictable. The only thing we got here is to have as many pre-computed $f(x)$ as we want.

Thinking a little deeper, what is the role of (2) in this problem? It could make any block immediately evaluated to zero. Using this way, we can put any pre-computed $f(x)$ at any block and EVALUATE_TO_ZERO (let's say null) to others. In other words, we can control the scalar multiplying the vector in the output. So what we need is to find the suitable scalar for pre-computed $f(x)$.

For example, if $x_i = f('ii...ii')$, then we have to find $c_i$ such that:
$$\sum_{i=0}^{k}c_i.x_i = z$$.
This can be written by:
x_{0,0} & x_{1,0} & ... & x_{k,0}\\
x_{0,1} & x_{1,1} & ... & x_{k,1}\\
... & ... & ... & ...\\
x_{0,36} & x_{1,36} & ... & x_{k,36}
Let's say $M.c=z$. Well, it is more clear now. We would choose $k=36$ and find $37$ pairwise independent pre-computed vector to solve these equations.
Taking $37$ first letter a..z, A..K, computing $f(a...a), ..., f(K...K)$ and arranging into a $37\times 37$ matrix by row, we will have $M$. $z$ is the hash_value taken from server. Using sage we can compute $c=M^{-1}.z$.
Finally, we need to map $c$ to the string.
This can be done by:

def findkey(v):
    str = ["EVALUATE_TO_ZERO"] * 37
    for i in range(len(v)):
        if v[i] == 0:
        k = v[i]
        while k < len(str) and str[k] != "EVALUATE_TO_ZERO":
            k += 37
        if k > len(str):    
            str += ["EVALUATE_TO_ZERO"] * 37
        str[k] = s[i]
    return str

For example:

  • $c = (1, 2, 3, 0, ..., 0) \rightarrow m = 'a...a' + 'b...b' + 'c...c' + null * 34$
  • $c = (1, 1, 0, 0, ..., 0) \rightarrow m = 'a...a' + null * 36 + 'b...b' + null * 36$
  • $c = (1, 3, 1, 0, ..., 0) \rightarrow m = 'a...a' + null + 'b...b' + null * 34 + 'c...c' + null * 36$

Then the challenge is solved.