**pywb** allows high-quality replay (browsing) of archived web data stored in standardized `ARC <http://en.wikipedia.org/wiki/ARC_(file_format)>`_ and `WARC <http://en.wikipedia.org/wiki/Web_ARChive>`_,
The replay system is designed to accurately replay complex dynamic sites, including `video and audio content <https://github.com/ikreymer/pywb/wiki/Video-Replay-and-Recording>`_ and sites
Additionally, **pywb** includes an extensive `index query api <https://github.com/ikreymer/pywb/wiki/CDX-Server-API>`_ for querying information about archived content.
**pywb** is fully compliant with the `Memento <http://mementoweb.org/>`_ protocol (`RFC-7089 <http://tools.ietf.org/html/rfc7089>`_), offering aggregate searches over many web archives.
Point your browser to ``http://localhost:8080/my_coll/<url>/`` where ``<url>`` is a url you recorded before into your WARC/ARC file. (If you just recorded ``http://example.com/``, you should be able to view ``http://localhost:8080/my_coll/http://example.com/``)
3. Copy any existing cdx indexes to ``collections/<coll name>/indexes/``
4. Run ``wb-maanger cdx-convert collections/<coll name>/indexes/``. This step is optional but strongly recommended, as it will
ensure that the CDX indexes are in a consistent format.
This will fully migrate your archive and indexes the collection. Any new WARCs added with `wb-manager add` will be indexed and added to the existing collection.
Additionall, you may use the auto-indexing features (explained below) to add new content to the existing collection.
**pywb** makes it easy to customize most aspects of the UI around archived content, including a custom banner insert, query calendar, search and home pages,
via HTML Jinja2 templates.
You can see a list of all available UI templates by running: ``wayback-manager template --list``
To copy a default template to the file system (for modification), you can run ``wayback-manager template <collection> --add <template_name>``
**pywb** now supports custom user metadata for each collection. The metadata may be specified in the ``metadata.yaml`` in each collection's directory.
The metadata is accessible to all UI templates and may be displayed to the user as needed.
See the `Wayback Manager Tutorial <https://github.com/ikreymer/pywb/wiki/Auto-Configuration-and-Wayback-Collections-Manager>`_ and the
and `UI Customization <https://github.com/ikreymer/pywb/wiki/UI-Customization>`_ page for more details.
Automatic Indexing
------------------
**pywb** now also includes support for automatic indexing of any web archive files (WARC or ARC).
Whenever a WARC/ARC file is added or changed, pywb will update the internal index automatically and make the archived content
instantly available for replay, without manual intervention or restart. (Of course, indexing will take some time if adding
many gigabytes of data all at once, but is quite useful for smaller archive updates).
To enable auto-indexing, you can run the `wayback -a` when running command line, or run
`wb-manager autoindex <path/to/coll>` as a seperate program.
For additional reference on how pywb is being used, you may check some of the `public projects using with pywb <https://github.com/ikreymer/pywb/wiki/Public-Projects-using-pywb>`_
You can use this tool to quickly check the contents of any WARC or ARC file through a simple point-and-click GUI interface, no command line tools needed.
*``live-rewrite-server`` -- a demo live rewriting web server which accepts requests using wayback machine url format at ``/rewrite/`` path, eg, ``/rewrite/http://example.com/`` and applies the same url rewriting rules as are used for archived content.
This is useful for checking how live content will appear when archived before actually creating any archive files, or for recording data.
The `webrecorder.io <https://webrecorder.io>`_ service is built using this tool.
Includes most of the features of the `original cdx server implementation <https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server>`_,
**pywb** can also be used as an actual HTTP and/or HTTPS proxy server. See `pywb Proxy Mode Usage <https://github.com/ikreymer/pywb/wiki/Pywb-Proxy-Mode-Usage>`_ for more details
To run as an HTTPS proxy server, pywb provides a facility for generating a custom self-signed root certificate, which can be used to replay HTTPS content from the archive.
The command-line ``wayback`` utility starts pywb using the `waitress <http://waitress.readthedocs.org/en/latest/>`_ server. This should be sufficient for basic usage and testing.
However, since pywb conforms to the Python `WSGI <http://wsgi.readthedocs.org/en/latest/>`_ specification, it can be run with any standard WSGI container/server
For production deployments, `uWSGI <https://uwsgi-docs.readthedocs.org/en/latest/>`_ with gevent is the recommended container and the ``uwsgi.ini and ``run-uwsgi.sh``
scripts in this repo provides examples of running pywb with uWSGI.
**pywb** is compatible with the standard `Wayback Machine <http://en.wikipedia.org/wiki/Wayback_Machine>`_ url format, which was developed by the Internet Archive: