In addition to "url rewritinng prefix mode" (the default), pywb can also act as a full-fledged HTTP and HTTPS proxy, allowing
any browser or client supporting HTTP and HTTPS proxy to access web archives through the proxy.
Proxy mode can provide access to a single collection at time, eg. instead of accessing ``http://localhost:8080/my-coll/2017/http://example.com/``,
the user enters ``http://example.com/`` and is served content from the ``my-coll`` collection.
As a result, the collection and timestamp must be specified separately.
Configuring HTTP Proxy
^^^^^^^^^^^^^^^^^^^^^^
At this time, pywb requires the collection to be configured at setup time (though collection switching will be added soon).
The collection can be specified by running: ``wayback --proxy my-coll`` or by adding to the config::
proxy:
coll: my-coll
For HTTP proxy access, this is all that is needed to use the proxy. If pywb is running on port 8080 on localhost, the following curl command should provide proxy access: ``curl -x "localhost:8080" http://example.com/``
Proxy Recording
^^^^^^^^^^^^^^^
The proxy can additional be set to recording mode, equivalent to access the ``/<my-coll>/record/`` path,
by adding ``recording: true``, as follows::
proxy:
coll: my-coll
recording: true
By default, proxy recording will use the ``live`` collection if not otherwise configured.
See :ref:`recording-mode` for full set of configurable recording options.
HTTPS Proxy and pywb Certificate Authority
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For HTTPS proxy access, pywb provides its own Certificate Authority and dynamically generates certificates for each host and signes the responses
with these certificates. By design, this allows pywb to act as "man-in-the-middle" servring archived copies of a given site.
However, the pywb certificate authority (CA) will need to be accepted by the browser. The CA cert can be downloaded from pywb directly
using the specical download paths. Recommended set up for using the proxy is as follows:
1. Configure the browser proxy settings host port, for example ``localhost`` and ``8080`` (if running locally)
2. Download the CA:
* For most browsers, use the PEM format: ``http://wsgiprox/download/pem``
* For windows, use the PKCS12 format: ``http://wsgiprox/download/p12``
3. You may need to agree to "Trust this CA" to identify websites.
The pywb CA file is automatically generated if it does not exist, and may be added to the key store directly.
Additional proxy options ``ca_name`` and ``ca_file_cache`` allow configuring the location and name of the CA file.
The following are all the available proxy options (only ``coll`` is required)::
proxy:
coll: my-coll
ca_name: pywb HTTPS Proxy CA
ca_file_cache: ./proxy-certs/pywb-ca.pem
recording: false
The HTTP/S functionality is provided by the separate :mod:`wsgiprox` utility which provides HTTP/S proxy
for any WSGI application.
See the `wsgiprox README <https://github.com/webrecorder/wsgiprox/blob/master/README.rst>`_ for additional details on how it works.
For more information on custom certificate authority (CA) installation, the `mitmproxy certificate page <http://docs.mitmproxy.org/en/stable/certinstall.html>`_ provides a good overview for installing a custom CA on different platforms.
pywb supports UI customizations, either for an entire archive,
or per-collection.
Static Files
^^^^^^^^^^^^
The replay server will automatically support static files placed under the following directories:
* Files under the root ``static`` directory can be accessed via ``http://my-archive.example.com/static/<filename>``
* Files under the per-collection ``./collections/<coll name>/static`` directory can be accessed via ``http://my-archive.example.com/static/_/<coll name>/<filename>``
Templates
^^^^^^^^^
pywb users Jinja2 templates to render HTML to render the HTML for all aspects of the application.
A version placed in the ``templates`` directory, either in the root or per collection, will override that template.
To copy the default pywb template to the template directory run:
``wb-manager template --add search_html``
The following templates are available:
*``home.html`` -- Home Page Template, used for ``http://my-archive.example.com/``
*``search.html`` -- Collection Template, used for each collection page ``http://my-archive.example.com/<coll name>/``
*``query.html`` -- Capture Query Page for a given url, used for ``http://my-archive.example.com/<coll name/*/<url>``
Error Pages:
*``not_found.html`` -- Page to show when a url is not found in the archive
*``error.html`` -- Generic Error Page for any error (except not found)
Replay and Banner templates:
*``frame_insert.html`` -- Top-frame for framed replay mode (not used with frameless mode)
*``head_insert.html`` -- Rewriting code injected into ``<head>`` of each replayed page.
This template includes the banner template and itself should generally not need to be modified.
*``banner.html`` -- The banner used for frameless replay. Can be set to blank to disable the banner.
Custom Outer Replay Frame
^^^^^^^^^^^^^^^^^^^^^^^^^
The top-frame used for framed replay can be replaced or augmented
by modifiying the ``frame_insert.html``.
To start with modifiying the default outer page, you can add it to the current
templates directory by running ``wb-frame template --add frame_insert.html``
To initialize the replay, the outer page should include ``wb_frame.js``,
create an ``<iframe>`` element and pass the id (or element itself) to the ``ContentFrame`` constructor: