Python Basics for Hackers
Review some Python basics to make hacking more convenient. Debugger giving you numbers (84) but you want letters ('T')?
This is about Python basics for supporting hacking. It can support your static analysis with Ghidra, debugging with GDB and ̣getting started with crypto hacking with Cryptopals. I do have many articles on hacking, too.
Feedback is the Breakfast of Champions
Try the smallest thing possible. Then add another small thing.
Fast feedback loop makes it not just faster but also more pleasant to code. Imagine driving a car or a bike eyes closed, only getting a peek every five seconds.
Working in small increments is helped with
- REPL: 'python' or 'ipython'
- tries/ - Having a tries/ directory where you try every new library of feature separate from the main program
- F5 compile - Press F5 in your editor to compile and show result
REPL - Read-Eval-Print
Write a command, press enter, see the result. Here, we talk about Python REPL, but so many things have a nice REPL:
- Python: 'python3', 'ipython3'
- JavaScript: Firefox ESR: F12 Console - for JavaScript. F12 Style Editor and Inspector - for CSS.
- SQL: sqlite3, psql, mysql, mariadb
- Many other languages: 'lua' "2+2", 'php --interactive' "print(2+2);"
Python built-in REPL
Run Python REPL
$ python3
>>> 2+2
4
>>> print("See you at TeroKarvinen.com!")
See you at TeroKarvinen.com!
Python REPL is a great calculator. Just add a few parenthesis, and it's very hard to be sure of any answer on a physical pocket calculator (or a simulation of that, included with a lesser operating system). If your calculations result in something valuable, you can just package it as a program. If you need to bring in the heavy guns, Python has it all, from stats to AI. In fact, Python is the most popular language in artificial intelligence research.
iPython advanced Python REPL
IPython gives you additional features for Python REPL. It's the CLI (command line interface) version of Jupyter, which you might know from AI coding.
History
Give some commands to fill history
>>> 2+2
>>> print("See you at TeroKarvinen.com")
>>> 2**3
Up arrow moves earlier in history, down arrow moves back to now.
Ctrl-R revese searches history
I-search backward: Tero
Shows
>>> print("See you at TeroKarvinen.com")
Press Ctrl-R again to search earlier uses. Just like in Bash shell.
Tab completion
Press tab to complete, double tab to show possible completions.
>>> import ba[tab][tab]
Shows menu of possible completions: do you want babel, backcall or base64.
>>> base64.[tab][tab]
Shows list of symbols provided by the library: functions, classes...
Help and Sources
Add a question mark "?" at the end of a symbol for reference documentation.
>>> import base64
>>> base64.b64decode?
Signature: base64.b64decode(s, altchars=None, validate=False)
Docstring:
Decode the Base64 encoded bytes-like object or ASCII string s.
...
Add two question marks "??" for source code.
>>> base64.b64decode??
Signature: base64.b64decode(s, altchars=None, validate=False)
Source:
def b64decode(s, altchars=None, validate=False):
# ...
s = _bytes_from_decode_data(s)
if altchars is not None:
altchars = _bytes_from_decode_data(altchars)
# ...
You can even get the source code of a whole library if you want. Thanks to docstrings, this includes the reference documentation.
>>> base64??
Quit
You can quickly quit with double Ctrl-D. Then you don't have to answer the question "Do you really want to quit?".
F5 compile
Sometimes your program does not conveniently fit on a single line. The next bigger environment from REPL is F5 to compile.
Write a short program that prints the result. Press F5 to immediately see the result.
F5 compile is just a convience. It's faster than
$ micro tero.py
$ python3 tero.py
Hello, Tero
This tactic is especially convenient when working with small files, like tries/. But for a big program, you can write a small program to test a special aspects, and simply import the libraries you have written.
Many programmer's text editors and IDEs (integrated development environment) have a feature to run your code with a click of a button. But here is how to do it in 'micro':
$ sudo apt-get update
$ sudo apt-get -y install micro
$ micro --plugin install runit
Now you can try them out
$ micro hello.py
Write some code
print("Hello!")
Press F5. See the output of your code
Hello
Do you just love micro? I have a whole article on how to Get Started Micro Editor
Convert and Calculate in Python
Python is a great calculator. If you calculator has parenthesis and powers (a**b), it's convenient to see the whole thing.
>>> 2*(6-9)**2
18
Modulus means what's left over after division. Often needed when something rolls over, like a clock: 13:00 equals 1 pm. (In Finnish: jakojäännös).
>>> 13 % 12
1
Modulus is very common in encryption and obfuscation. If you have 3 char password, you roll over to first char after the last character.
You can convert numbers to different representations
>>> ord("T")
84
>>> chr(84)
'T'
Did you know that the character zero "0" is different from the number zero 0? Ask Python if you don't believe me!
This is often faster than looking at the whole ASCII table
$ man ascii
Numbers can also be show in different bases, from our every day ten base to hex.
Hex is base 16, so individual numbers var from zero 0 to fifteen F. And sixteen is 0x10. The prefix "0x" is just to show it's hex, it's not part of the number.
>>> hex(84)
'0x54'
Binary is ones and zeros, base two. Prefixed "0b".
>>> bin(84)
'0b1010100'
Octal is base 8, so numbers are zero 0 to seven 7. Used in Linux numeric permissions, see 'man chmod' and 'stat tero.txt'.
>>> oct(84)
'0o124'
Printing and Strings
>>> print("See you at TeroKarvinen.com")
See you at TeroKarvinen.com
What if we have a variable
>>> url="TeroKarvinen.com"
Print can print multiple strings together
>>> print("See you at", url)
See you at TeroKarvinen.com
We can use concatenation to create a new string
>>> print("See you at "+url)
See you at TeroKarvinen.com
F-stings are so fancy, everyone loves them. You put f before your string, then it will print the result of anything you write inside whiskers / handlebars "{}"
>>> print(f"See you at {url}")
See you at TeroKarvinen.com
>>> print(f"Result is {2+2}")
Result is 4
F-strings also allow you to align numbers, a feature that we'll use in bitwise operations.
Looping shorter and longer
$ micro looping.py
for planet in ['Mercury', 'Venus', 'Earth', 'Mars']:
print(planet)
Then just press F5 to print the rocky planets
Practical example: Obfuscation
Let's consider an obfuscation example. After playing with Ghidra or GDB, you have the hypothesis that every letter in this string has been moved by two positions backwards:
RcpmI_ptglcl,amk
To get the original string, we can
for c in "RcpmI_ptglcl,amk":
print(chr(ord(c)+2), end="")
Which prints
TeroKarvinen.com
But you don't have to come up with the code in one go!
First try each of the things in iPython, like I did
>>> ord("R")
82
>>> ord("R")+1
83
>>> chr(ord("R")+1)
'S'
If you want to go all fancy, we can use list comprehension instead of a fully written out loop.
>>> [c for c in "RcpmI_ptglcl,amk"]
['R',
'c',
Well that was boring. So for each letter of the string, it printed that letter.
Let's take the previous letter
>>> [c-1 for c in "RcpmI_ptglcl,amk"]
TypeError: unsupported operand type(s) for -: 'str' and 'int'
Oh!
>>> [ord(c)-1 for c in "RcpmI_ptglcl,amk"]
[81, 98, 111, 108, 72, 94, 111, 115, 102, 107, 98, 107, 43, 96, 108, 106]
Of course, we can only substract numbers from numbers.
>>> [chr(ord(c)+2) for c in "RcpmI_ptglcl,amk"]
['T',
'e',
'r',
'o',
Looks good.
Now it return a list.
We could
>>> "-".join(["one", "two", "three"])
'one-two-three'
Combine all items of the list, putting nothing in between (the string "").
>>> "".join([chr(ord(c)+2) for c in "RcpmI_ptglcl,amk"])
'TeroKarvinen.com'
List comprehensions are nice for playing with REPL. For more difficult cases, a fully written out loop in a text file is more convenient.
Assert your beliefs
assert: Please crash my program if my expecations don't hold.
Assert is for programmers. In many languages, compilers skip "assert" from final production compilations. So you'll stille need to use if and raise exceptions in production.
But it's often pointless to go on if your beliefs don't hold. As they say: "No use beating a dead horse."
assert False
Results in
File "assertive.py", line 1, in <module>
assert False
AssertionError
Python does not have strict type checking. Sometimes it's convient to see that you even know what types you're playing with.
url = "TeroKarvinen.com"
assert type(url) == int
Results in
assert type(url) == int
^^^^^^^^^^^^^^^^
AssertionError
Debugging
Print debugging
Print debugging is easy and fast. It works very nice with F5 compile (described above).
Taking our previous example, if url would have been provided by some library, the type migth not be obvious.
url = "TeroKarvinen.com"
print(type(url)) ## print debug type
print(url) ## print debug contents
Prints
<class 'str'>
TeroKarvinen.com
So it's a string.
Breakpoints
For complicated cases, you might want a real debugger.
Your friend gave you this
ret=""
for i in range(0, 4):
for j, c in enumerate("foobar"):
ret += chr(ord(c)+i)
print(ret)
So you might want to play with it in the debuger. Just call breakpoint() where ever you want.
Python has the debugger built in. For all the fancy iPython features like tab completion, you need to set it up. 'sudo apt-get install python3-ipdb ipython3', 'export PYTHONBREAKPOINT=ipdb.set_trace'.
ret=""
for i in range(0, 4):
for j, c in enumerate("foobar"):
ret += chr(ord(c)+i)
breakpoint() # debugger will be called here
print(ret)
Debugger starts. Things you can do
- Run any Python. Your environment is exactly that point in code. In complicated environments, such as Django framework, this is very convenient.
- Access local variables. You can just type variable name. Here, my variables overlap debugger commands, so I had to prefix them with an exclamation mark "!".
- See the code you're running.
By brief debugger session
> /home/tee/Roinaa/pytry/loopy.py(3)<module>()
2 for i in range(0, 4):
----> 3 for j, c in enumerate("foobar"):
4 ret += chr(ord(c)+i)
ipdb> !c
'o'
ipdb> !i
0
ipdb> !j
1
ipdb> s
> /home/tee/Roinaa/pytry/loopy.py(4)<module>()
3 for j, c in enumerate("foobar"):
----> 4 ret += chr(ord(c)+i)
5 breakpoint()
ipdb> help
Other options helping debugging are
- logging, a full featured logging module in standard library
- PySnooper, trace Python. https://github.com/cool-RR/pysnooper
My advice? Use print debugging if your program is short and your problem is simple.
Loopier loops
You can basically sort people into 10 groups. Those who understand binary, those who don't, and those who make off-by-one errors.
Luckily, many languages have the concept of for-each aka for-in loop. Python is one of them. As it automatically does something for each item of the loop, you can't easily go off by one.
for x in [1, 2, 3]:
print(x)
1
2
3
You can quickly turn a string (str) to list with split:
>>> "foo bar".split()
['foo', 'bar']
You might need an index (i) in a loop. The index tells how many times the loop has been completed this far. enumerate() gives you index with for-in loop.
>>> list(enumerate(['foo', 'bar']))
[(0, 'foo'), (1, 'bar')]
For example
for i, s in enumerate(["Tero", "Karvinen", "com"]):
print(i, s)
0 Tero
1 Karvinen
2 com
Data types
Python nice data types. Python does not enforce data types. You can even change data type of the same variable. This is different from Go, Rust and C.
Simple data types in Python include int and float. Compond data types include list, dict, str and bytes. Read Python documentation for overview of the datatypes. Here, we'll look at some data types that could be a source of confusion when starting out with crypto challenges.
Str is the fancy datatype. It supports unicode, so you can have Finnish and Chinese letters in your string.
>>> myStr = "Päivää"
>>> type(myStr)
str
How many bytes does "P" take? What about the Finnish "ä", the a with umlaut. Who knows? Are there even any promises, or is it just an implementation detail?
Bytes literally stores bytes. You know, bytes: one byte == 1 B == 1 byte == 8 bits. In Finnish, byte is "tavu".
>>> myBytes = b"five"
>>> type(myBytes)
bytes
>>> len(myBytes)
4
>>> myBytes = b"Päivää"
Cell In [52], line 1
myBytes = b"Päivää"
^
SyntaxError: bytes can only contain ASCII literal characters
When learning to break encryption or obfuscation, you often need bytes.
You can encode strings (str) to bytes, but then you have to decide what to do with the funny characters not available in ASCII.
>>> "Päivää".encode("ascii", "ignore")
b'Piv'
The solution to all and any problems with national chars is UTF-8.
>>> "Päivää".encode("utf8")
b'P\xc3\xa4iv\xc3\xa4\xc3\xa4'
You can also decode bytes to strings (str)
>>> b"Tero's bytes".decode("ascii")
"Tero's bytes"
>>> type(b"Tero's bytes".decode("ascii"))
str
Let's play with encoding to bytes; and decodign to strings
>>> "Päivää".encode("utf8").decode("utf8")
'Päivää'
>>> type("Päivää".encode("utf8").decode("utf8"))
str
Bitwise operations
Encryption uses a lot of XOR, exclusive OR. It's the only boolean operator that allows you to encrypt something with another string, and then even get your original string back.
XOR means exactly one of the inputs is true. The truth table is
a | b | a XOR b | comment |
---|---|---|---|
0 | 0 | 0 | |
1 | 0 | 1 | |
1 | 1 | 0 | Here XOR is different from OR |
0 | 1 | 1 |
Let's try it
>>> 0b1 ^ 0b1
0
>>> 0b0 ^ 0b1
1
You can obviously only XOR boolean values, True or False, 1 or 0. But my messages says "Hugs and kisses"!
We can use bitwise operations. It means do something to each bit if the input.
>>> "T" ^ "X"
TypeError: unsupported operand type(s) for ^: 'str' and 'str'
>>> ord("T")
84
>>> ord("T") ^ ord("X")
12
>>> bin(ord("T") ^ ord("X"))
'0b1100'
Seeing the bits aligned would help. So let's try it
a = ord("T")
b = ord("X")
print(f"{bin(a)} a")
print(f"{bin(b)} b")
print(f"{bin(a^b)} a biwise XOR b")
Prints
0b1010100 a
0b1011000 b
0b1100 a biwise XOR b
What? The stupid bits are not aligned at all!
Do you still remember f-strings? We can align the numbers, so we can check if XOR:ing one digit of a and one of b gives the result we see in the last line.
a = ord("T")
b = ord("X")
print(f"{bin(a):>10} a")
print(f"{bin(b):>10} b")
print(f"{bin(a^b):>10} a biwise XOR b")
Prints
0b1010100 a
0b1011000 b
0b1100 a biwise XOR b
Sort a list
When breaking encryptions, you might need to try different keys and different parameters.
Computer can do this very fast, but who will decide if the answer is any good? You should score the answers.
hobbies = [ (-5, "filling paperwork"), (100, "coding"), (90, "jogging"), (80, "kayakking") ]
hobbies.sort(reverse=True)
print(hobbies)
print("Top 3 hobbies are", hobbies[:3])
print("The best hobby is", hobbies[0])
Prints
[(100, 'coding'), (90, 'jogging'), (80, 'kayakking'), (-5, 'filling paperwork')]
Top 3 hobbies are [(100, 'coding'), (90, 'jogging'), (80, 'kayakking')]
The best hobby is (100, 'coding')
Some letters are common in human language, like "A" or "E". Some are very rare, like unprintable ASCII characters '\0' or 0x2.
Frequency tables in your favourite languages are easy to remember as words. You might have used frequency analysis to break very simple substitution ciphers. So simple that they are found in childrens' books. You might be surprised to find out ETAOIN can help you break some harder encryptions, too.
"ETAOIN SHRDLU" are the most common letters in English. "E" is the most common, "T" is the second most common and so on.
- ETAOIN SHRDLU (English)
- AINTE SLOUK (Finnish)
More frequency tables can be found in Wikipedia Letter frequency, near end of page, search for French "esaitn".
Libraries
Some useful libraries to consider:
- requests - download web pages
- binascii - convert hex text (b2a_hex)
- base64 - put binary to ASCII armor, so that it can be copy-pasted. "VGVyb0thcnZpbmVuLmNvbQ=="
Some useful commands
$ echo -n "VGVyb0thcnZpbmVuLmNvbQ=="|base64 -d
$ echo -n "TeroKarvinen.com"|base64
$ man ascii
I want to try it out!
I recommend CryptoPals.
Start with Set 1, Challenge 1. Tip: You can use Python libraries for base64 and hex strings, you don't need to handle them bit by bit. These tools are used so you can use binary data in text format.