pywb supports UI customizations, either for an entire archive,
or per-collection.
Static Files
^^^^^^^^^^^^
The replay server will automatically support static files placed under the following directories:
* Files under the root ``static`` directory can be accessed via ``http://my-archive.example.com/static/<filename>``
* Files under the per-collection ``./collections/<coll name>/static`` directory can be accessed via ``http://my-archive.example.com/static/_/<coll name>/<filename>``
Templates
^^^^^^^^^
pywb users Jinja2 templates to render HTML to render the HTML for all aspects of the application.
A version placed in the ``templates`` directory, either in the root or per collection, will override that template.
To copy the default pywb template to the template directory run:
``wb-manager template --add search_html``
The following templates are available:
*``home.html`` -- Home Page Template, used for ``http://my-archive.example.com/``
*``search.html`` -- Collection Template, used for each collection page ``http://my-archive.example.com/<coll name>/``
*``query.html`` -- Capture Query Page for a given url, used for ``http://my-archive.example.com/<coll name/*/<url>``
Error Pages:
*``not_found.html`` -- Page to show when a url is not found in the archive
*``error.html`` -- Generic Error Page for any error (except not found)
Replay and Banner templates:
*``frame_insert.html`` -- Top-frame for framed replay mode (not used with frameless mode)
*``head_insert.html`` -- Rewriting code injected into ``<head>`` of each replayed page.
This template includes the banner template and itself should generally not need to be modified.
*``banner.html`` -- The banner used for frameless replay. Can be set to blank to disable the banner.
Custom Outer Replay Frame
^^^^^^^^^^^^^^^^^^^^^^^^^
The top-frame used for framed replay can be replaced or augmented
by modifiying the ``frame_insert.html``.
To start with modifiying the default outer page, you can add it to the current
In addition to "url rewritinng prefix mode" (the default), pywb can also act as a full-fledged HTTP and HTTPS proxy, allowing
any browser or client supporting HTTP and HTTPS proxy to access web archives through the proxy.
Proxy mode can provide access to a single collection at time, eg. instead of accessing ``http://localhost:8080/my-coll/2017/http://example.com/``,
the user enters ``http://example.com/`` and is served content from the ``my-coll`` collection.
As a result, the collection and timestamp must be specified separately.
Configuring HTTP Proxy
^^^^^^^^^^^^^^^^^^^^^^
At this time, pywb requires the collection to be configured at setup time (though collection switching will be added soon).
The collection can be specified by running: ``wayback --proxy my-coll`` or by adding to the config::
proxy:
coll: my-coll
For HTTP proxy access, this is all that is needed to use the proxy. If pywb is running on port 8080 on localhost, the following curl command should provide proxy access: ``curl -x "localhost:8080" http://example.com/``
Proxy Recording
^^^^^^^^^^^^^^^
The proxy can additional be set to recording mode, equivalent to access the ``/<my-coll>/record/`` path,
by adding ``recording: true``, as follows::
proxy:
coll: my-coll
recording: true
By default, proxy recording will use the ``live`` collection if not otherwise configured.
See :ref:`recording-mode` for full set of configurable recording options.
HTTPS Proxy and pywb Certificate Authority
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For HTTPS proxy access, pywb provides its own Certificate Authority and dynamically generates certificates for each host and signes the responses
with these certificates. By design, this allows pywb to act as "man-in-the-middle" servring archived copies of a given site.
However, the pywb certificate authority (CA) will need to be accepted by the browser. The CA cert can be downloaded from pywb directly
using the specical download paths. Recommended set up for using the proxy is as follows:
1. Configure the browser proxy settings host port, for example ``localhost`` and ``8080`` (if running locally)
2. Download the CA:
* For most browsers, use the PEM format: ``http://wsgiprox/download/pem``
* For windows, use the PKCS12 format: ``http://wsgiprox/download/p12``
3. You may need to agree to "Trust this CA" to identify websites.
The pywb CA file is automatically generated if it does not exist, and may be added to the key store directly.
Additional proxy options ``ca_name`` and ``ca_file_cache`` allow configuring the location and name of the CA file.
The following are all the available proxy options (only ``coll`` is required)::
proxy:
coll: my-coll
ca_name: pywb HTTPS Proxy CA
ca_file_cache: ./proxy-certs/pywb-ca.pem
recording: false
The HTTP/S functionality is provided by the separate :mod:`wsgiprox` utility which provides HTTP/S proxy
for any WSGI application.
See the `wsgiprox README <https://github.com/webrecorder/wsgiprox/blob/master/README.rst>`_ for additional details on how it works.
For more information on custom certificate authority (CA) installation, the `mitmproxy certificate page <http://docs.mitmproxy.org/en/stable/certinstall.html>`_ provides a good overview for installing a custom CA on different platforms.
By default, pywb does not redirect urls to the 'canonical' respresntation of a url with the exact timestamp.
For example, when requesting ``/my-coll/2017js_/http://example.com/example.js`` but the actual timestamp of the resource is ``2017010203000400``,
there is not a redirect to ``/my-coll/2017010203000400js_/http://example.com/example.js``. Instead, this 'canonical' url is returned in
the ``Content-Location`` value. This behavior is recommended for performance reasons as it avoids an extra roundtrip to the server for a redirect.
However, if the classic redirect behavior is desired, it can be enable by adding::
redirect_to_exact: true
to the config. This will force any url to be redirected to the exact url, and is consistent with previous behavior and other wayback machine implementations,
at expense of additional network traffic.
Memento Protocol
^^^^^^^^^^^^^^^^
:ref:`memento-api` support is enabled by default, and works with no-timestamp-redirect and classic redirect behaviors.
However, Memento API support can be disabled by adding::
enable_memento: false
Flash Video Override
^^^^^^^^^^^^^^^^^^^^
A custom system to override Flash video with a custom download via ``youtube-dl`` and replay with a custom player was enabled in previous versions of pywb.
However, this system was not widely used and is in need of maintainance. The system is of less need now that most video is HTML5 based.
For these reasons, this system, previosuly enabled by including the script ``/static/vidrw.js``, is disabled by default.
To enable previous behavior, add to config::
enable_flash_video_rewrite: true
The system may be revamped in the future and enabled by default, but for now, it is provided for compatibility reasons.