diff --git a/readme.rst b/readme.rst index 6f53f66..dbb1440 100644 --- a/readme.rst +++ b/readme.rst @@ -8,7 +8,7 @@ traffic to disk in `WARC `_ format. Warcprox captures encrypted https traffic by using the `"man-in-the-middle" `_ -technique (see the `Man-In-The_Middle`_ section for more info). +technique (see the `Man-in-the-middle`_ section for more info). The web pages that warcprox stores in WARC files can be played back using software like `OpenWayback `_ or `pywb @@ -41,21 +41,21 @@ To start warcprox run:: Try ``warcprox --help`` for documentation on command line options. -Man-In-The-Middle? -================== -Traffic to and from https sites is encrypted. Normally http proxies can't read -that traffic. The web client uses the http ``CONNECT`` method to establish a -tunnel through the proxy, and the proxy merely routes raw bytes between the -client and server. Since the bytes are encrypted, the proxy can't make sense of -the information it's proxying. Nonsensical encrypted bytes would not be very -useful to archive. +Man-in-the-middle +================= +Normally, http proxies can't read https traffic, because it's encrypted. The +browser uses the http ``CONNECT`` method to establish a tunnel through the +proxy, and the proxy merely routes raw bytes between the client and server. +Since the bytes are encrypted, the proxy can't make sense of the information +it's proxying. This nonsensical encrypted data would not be very useful to +archive. In order to capture https traffic, warcprox acts as a "man-in-the-middle" (MITM). When it receives a ``CONNECT`` directive from a client, it generates a public key certificate for the requested site, presents to the client, and -proceeds to establish an encrypted connection. Then it makes a separate, normal -https connection to the remote site. It decrypts, archives, and re-encrypts -traffic in both directions. +proceeds to establish an encrypted connection with the client. Then it makes a +separate, normal https connection to the remote site. It decrypts, archives, +and re-encrypts traffic in both directions. Although "man-in-the-middle" is often paired with "attack", there is nothing malicious about what warcprox is doing. If you configure an instance of