MIT Missing Semester 2020 Course

MIT Missing Semester 2020 Course

Missing Semester 2020

Notes from the MIT Missing Semester 2020 Course covering computing ecosystem literacy.

  1. Course overview + the shell
  2. Shell Tools and Scripting
  3. Editors (Vim)
  4. Data Wrangling
  5. Command Line Environment
  6. Version Control (Git)
  7. Debugging and Profiling
  8. Metaprogramming
  9. Security & Cryptography
  10. Random Topics

1. Course overview + the shell

The basics

2. Shell Tools and Scripting

The basics

STDOUT, STDERR and Return Code

commands return output via STDOUT, errors via STDERR and a Return Code.

false || echo "Oops, fail"
# Oops, fail

true || echo "Will not be printed"
#

true && echo "Things went well"
# Things went well

false && echo "Will not be printed"
#

false ; echo "This will always run"
# This will always run

Command and Process Substitution

get output of a command as a variable –> command substitution i.e. $(CMD)

Special Variables

[bash special variables] (https://www.tldp.org/LDP/abs/html/special-chars.html)

e.g. iterate through the arguments we provide, grep for the string foobar and append it to the file as a comment if it’s not found

#!/bin/bash

# command substitution
echo "Starting program at $(date)" # Date will be substituted

# special variables: $0 program name, $$ pid of script
echo "Running program $0 with $# arguments with pid $$"

# for file in all arguments
for file in $@; do

    # grep for foobar in file and redirect STDOUT and STDERR to null register 
    grep foobar $file > /dev/null 2> /dev/null
    # if pattern is not found, grep has exit status 1 ($? is previous code)
    if [[ $? -ne 0 ]]; then
        echo "File $file does not have any foobar, adding one"
        echo "# foobar" >> "$file"
    fi
done

Other Commands

3. Editors (Vim)

Modes:

Terminology

[map Caps Lock to ](https://vim.fandom.com/wiki/Map_caps_lock_to_escape_in_macOS)

Command Mode

- :q quit (close window)
- :w save (“write”)
- :wq save and quit
- :e {name of file} open file for editing
- :ls show open buffers
- :help {topic} open help
    - :help :w opens help for the :w command
    - :help w opens help for the w movement

Normal Mode

- Basic movement: hjkl (left, down, up, right)
- Words: w (next word), b (beginning of word), e (end of word)
- Lines: 0 (beginning of line), ^ (first non-blank character), $ (end of line)
- Screen: H (top of screen), M (middle of screen), L (bottom of screen)
- Scroll: Ctrl-u (up), Ctrl-d (down)
- File: gg (beginning of file), G (end of file)
- Line numbers: :{number}<CR> or {number}G (line {number})
- Misc: % (corresponding item)
- Find: f{character}, t{character}, F{character}, T{character}
- find/to forward/backward {character} on the current line
, / ; for navigating matches
- Search: /{regex}, n / N for navigating matches

Selection Mode

Edits (normal mode)

Modifiers

Customize and Extend vim

4. Data Wrangling

sed (stream editor)

Regex Recap:

Additional CLI commands:

awk

Another stream editor for processing text streams. For each row that is matched, process the entire row $0, or $1 to $nth field separated (by default with whitespaces, -F to modify) e.g. print the second field of each row where the first element is 1 and the second starts with h and ends in d. And then count the number of lines | awk '$1 == 1 && $2 ~ /^h[^ ]*d$/ { print $2 }' | wc -l

Note: programming language e.g.

BEGIN { rows = 0 }
$1 == 1 && $2 ~ /^c[^ ]*e$/ { rows += $1 }
END { print rows }

Also useful with mathematical operations

5. Command Line Environment

Job Control

Your shell is using a UNIX communication mechanism called a signal to communicate information to the process. When a process receives a signal it stops its execution, deals with the signal and potentially changes the flow of execution based on the information that the signal delivered. For this reason, signals are software interrupts. e.g. ^C sends a SIGINT to the process We can also stop a process from being interrupted

#!/usr/bin/env python
import signal, time

def handler(signum, time):
    print("\nI got a SIGINT, but I am not stopping")

signal.signal(signal.SIGINT, handler)
i = 0
while True:
    time.sleep(.1)
    print("\r{}".format(i), end="")
    i += 1

Now you would need to send a quit signal ^/

NOTE: SIGINT and SIGQUIT are usually associated with terminal related requests but a more generic signal for asking a process to exit gracefully is the SIGTERM signal. To send this signal we can use the kill command, with the syntax kill -TERM .

     1       HUP (hang up)
     2       INT (interrupt)
     3       QUIT (quit)
     6       ABRT (abort)
     9       KILL (non-catchable, non-ignorable kill)
     14      ALRM (alarm clock)
     15      TERM (software termination signal)

Pausing and backgrounding processes

SIGSTOP ^Z - pauses a process, which can then be continued to the foreground or in the background using fg or bg, respectively.

Note: all processes created are associated with your terminal and will die when the terminal is closed i.e. process receives a SIGHUP signal.
To prevent that from happening you can run the program with nohup (a wrapper to ignore SIGHUP), or use disown if the process has already been started.

Finally, SIGKILL (which cannot be captured by the process) will always terminate it immediately. However, it can have bad side effects such as leaving orphaned children processes.

Terminal Multiplexers

Terminal multiplexers let you detach a current terminal session and reattach at some point later in time. Ideal for remote machines since it voids the need to use nohup and similar tricks

tmux

tmux tutorial

Aliases

Dotfiles and Portability

Remote Machines

Port Forwarding

Credit to this stackoverflow post Local Port Forwarding
e.g. software on remote server listening to port 5000 and we want to forward it to our localhost port 8888. ssh -L 5000:localhost:8888 foobar@remote_server
local local: -L Specifies that the given port on the local (client) host is to be forwarded to the given host and port on the remote side.

ssh -L sourcePort:forwardToHost:onPort connectToHost means: connect with ssh to connectToHost, and forward all connection attempts to the local sourcePort to port onPort on the machine called forwardToHost, which can be reached from the connectToHost machine.

Remote Port Forwarding remote remote: -R Specifies that the given port on the remote (server) host is to be forwarded to the given host and port on the local side.

ssh -R sourcePort:forwardToHost:onPort connectToHost means: connect with ssh to connectToHost, and forward all connection attempts to the remote sourcePort to port onPort on the machine called forwardToHost, which can be reached from your local machine.

SSH Configurations

./ssh/config

Host vm
    User foobar
    HostName 172.16.174.141
    Port 2222
    IdentityFile ~/.ssh/id_ed25519
    RemoteForward 9999 localhost:8888

# Configs can also take wildcards
Host *.mit.edu
    User foobaz

Server side configuration is usually specified in /etc/ssh/sshd_config. Here you can make changes like disabling password authentication, changing ssh ports, enabling X11 forwarding, &c. You can specify config settings in a per user basis

6. Version Control (Git)

. (tree) ├── foo (tree) │   └── bar.txt (blob, contents = "hello world") ├── baz.txt (blob, contents = "git is wonderful")

// a file is a bunch of bytes
type blob = array<byte>

// a directory contains named files and directories
type tree = map<string, tree | file>

// a commit has parents, metadata, and the top-level tree
type commit = struct {
    parent: array<commit>
    author: string
    message: string
    snapshot: tree
}

NOTE: Commits are immutable i.e. “edits” to the commit history are actually creating entirely new commits, and references are updated to point to the new ones.

Objects and content-addressing

type object = blob | tree | commit

All objects are content-addressed by their SHA-1 hash i.e. 40 hexadecimal characters.

objects = map<string, object>

def store(object):
    id = sha1(object)
    objects[id] = object

def load(id):
    return objects[id]

Blobs, trees, and commits are unified in this way: they are all objects. When they reference other objects, they contain the other objects’ reference.

e.g. observe a tree reference git cat-file -p 698281bc680d1995c5f4caaf3359721a5a58d48d which contains a blob (baz.txt) and a dir (foo)

100644 blob 4448adbf7ecd394f42ae135bbeed9676e894af85    baz.txt
040000 tree c68d233a33c5c06e0340e4c224f0afca87c8ce87    foo

Then observe the contents of the blob with the hash git cat-file -p 4448adbf7ecd394f42ae135bbeed9676e894af85

git is wonderful

References

All snapshots can be identified by their SHA-1 hash. That’s inconvenient, because humans aren’t good at remembering strings of 40 hexadecimal characters. Git’s solution to this problem is human-readable names for SHA-1 hashes, called references.

References are pointers to commits. Unlike objects, which are immutable, references are mutable (can be updated to point to a new commit). For example, the master reference usually points to the latest commit in the main branch of development.

references = map<string, string>

def update_reference(name, id):
    references[name] = id

def read_reference(name):
    return references[name]

def load_reference(name_or_id):
    if name_or_id in references:
        return load(references[name_or_id])
    else:
        return load(name_or_id)

One detail is that we often want a notion of “where we currently are” in the history, so that when we take a new snapshot, we know what it is relative to. This is a special reference called “HEAD”.

Advanced Git

7. Debugging and Profiling

system log

Recently, systems have started using a system log, which is increasingly where all of your log messages go. systemd: a system daemon that controls many things in your system such as which services are enabled and running.

logger "Hello Logs"
# On macOS
log show --last 1m | grep Hello
# On Linux
journalctl --since "1m ago" | grep Hello

Debugger

ipdb commands:

Static Analysis Tools (preliminary check if function will fail)

(CPU) Profiling (Tracing vs Sampling)

A note on Time

e.g.

$ time curl https://missing.csail.mit.edu &> /dev/null`
real    0m2.561s
user    0m0.015s
sys     0m0.012s

Flame Graph
flame

Call Graph
call

Resource Monitoring

8. Metaprogramming

Processes surrouding the work involved in programming i.e. how systems are built, tested and handle dependencies

Build Systems

Encoding a list of commands to build your program into a tool e.g. make

paper.pdf: paper.tex plot-data.png
	pdflatex paper.tex

plot-%.png: %.dat plot.py
	./plot.py -i $*.dat -o $@

NOTE: make will only rebuild if any changes occurred to dependencies.

Dependency Management

Dependencies are available through a repository that hosts a large number of such dependencies in a single place, and provides a convenient mechanism for installing them.

Versioning

Semantic Versioning: major.minor.patch

lock file

A file that lists the exact version you are currently depending on of each dependency.

vendoring

Extreme version of dependency locking where you copy all the code of your dependencies into your own project.

Continuous integration systems

For additional tasks required when changes are made e.g.

Broad CI Platforms e.g. Travis CI, Github Actions etc.
Narrow CI Platforms e.g. CodeCov, Dependabot etc.

By far the most common one is a rule like “when someone pushes code, run the test suite”. When the event triggers, the CI provider spins up a virtual machines (or more), runs the commands in your recipe, and then usually notes down the results somewhere. You might set it up so that you are notified if the test suite stops passing, or so that a little badge appears on your repository as long as the tests pass.

Testing

9. Security & Cryptography

Entropy

Hash Functions

A cryptographic hash function maps data of arbitrary size to a fixed size, and has some special properties. A rough specification of a hash function is as follows

hash(value: array<byte>) -> vector<byte, N>  (for some fixed N)

e.g. SHA1 hash function (used in git) maps an arbitrary sized input to 160-bit outputs (40 hexadecimal characters)

Hash Properties

NOTE: while it may work for certain purposes, SHA-1 is no longer considered a strong cryptographic hash function.

Applications

Key Derivation Functions (KDF)

Similar to hash functions but deliberately slow to compute in order to prevent brute-force attacks e.g. PBKDF2

Applications

Symmetric Cryptography

Properties

Applications

passphrase –> KDF –> key ├── encrypt –> ciphertext (don't need to remember the key) plain text

Asymmetric Cryptography

Two keys: a private key and a public key.

Applications

Key Distribution

Case Studies

10. Random Topics

Daemons

Filesystems in User Space (FUSE)

Hammerspoon (Automation for MacOS)

Where are programs located?

Filesystem Hierarchy Standard

Daniel Diamond

Daniel Diamond

data watches music travel

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora