At its core, pywb includes a fully featured web archive replay system, sometimes known as 'wayback machine', to provide the ability to replay, or view, archived web content in the browser.
Point your browser to ``http://localhost:8080/my-web-archive/<url>/`` where ``<url>`` is a url you recorded before into your WARC/ARC file.
If all worked well, you should see your archived version of ``<url>``. Congrats, you are now running your own web archive!
Using Existing Web Archive Collections
--------------------------------------
Existing archives of WARCs/ARCs files can be used with pywb with minimal amount of setup. By using ``wb-manager add``,
WARC/ARC files will automatically be placed in the collection archive directory and indexed.
By default ``wb-manager``, places new collections in ``collections/<coll name>`` subdirectory in the current working directory. To specify a different root directory, the ``wb-manager -d <dir>``. Other options can be set in the config file.
If you have a large number of existing CDX index files, pywb will be able to read them as well after running through a simple conversion process.
It is recommended that any index files be converted to the latest CDXJ format, which can be done by running:
``wb-manager cdx-convert <path/to/cdx>``
To setup a collection with existing ARC/WARCs and CDX index files, you can:
1. Run ``wb-manager init <coll name>``. This will initialize all the required collection directories.
2. Copy any archive files (WARCs and ARCs) to ``collections/<coll name>/archive/``
3. Copy any existing cdx indexes to ``collections/<coll name>/indexes/``
4. Run ``wb-manager cdx-convert collections/<coll name>/indexes/``. This strongly recommended, as it will
ensure that any legacy indexes are updated to the latest CDXJ format.
This will fully migrate your archive and indexes the collection.
Any new WARCs added with ``wb-manager add`` will be indexed and added to the existing collection.
Dynamic Collections and Automatic Indexing
------------------------------------------
Collections created via ``wb-manager init`` are fully dynamic, and new collections can be added without restarting pywb.
When adding WARCs with ``wb-manager add``, the indexes are also updated automatically. No restart is required, and the
content is instantly available for replay.
For more complex use cases, mod:`pywb` also includes a background indexer that checks the archives directory and automatically
updates the indexes, if any files have changed or were added.
(Of course, indexing will take some time if adding a large amount of data all at once, but is quite useful for smaller archive updates).
To enable auto-indexing, run with ``wayback -a`` or ``wayback -a --auto-interval 30`` to adjust the frequency of auto-indexing (default is 30 seconds).
.._creating-warc:
Creating a Web Archive
----------------------
Using Webrecorder
^^^^^^^^^^^^^^^^^
If you do not have a web archive to test, one easy way to create one is to use `Webrecorder <https://webrecorder.io>`_
For testing, development and small production loads, the default ``wayback`` command line may be sufficient.
pywb uses the gevent coroutine library, and the default app will support many concurrent connections in a single process.
For larger scale production deployments, running with `uwsgi <http://uwsgi-docs.readthedocs.io/>`_ server application is recommended. The ``uwsgi.ini`` script provided can be used to launch pywb with uwsgi. uwsgi can be scaled to multiple processes to support the necessary workload, and pywb must be run with the `Gevent Loop Engine <http://uwsgi-docs.readthedocs.io/en/latest/Gevent.html>`_. Nginx or Apache can be used as an additional frontend for uwsgi.
Although uwsgi does not provide a way to specify command line, all command line options can alternatively be configured via ``config.yaml``. See :ref:`configuring-pywb` for more info on available configuration options.
The following Apache configuration snippet can be used to deploy pywb *without* uwsgi. A configuration with uwsgi is also probably possible but this covers the simplest case of launching the `wayback` binary directly.
The configuration assumes pywb is running on port 8080 on localhost, but it could be on a different machine as well.
..code:: apache
<VirtualHost *:80>
ServerName proxy.example.com
Redirect / https://proxy.example.com/
DocumentRoot /var/www/html/
</VirtualHost>
<VirtualHost *:443>
ServerName proxy.example.com
SSLEngine on
DocumentRoot /var/www/html/
ErrorDocument 404 /404.html
ProxyPreserveHost On
ProxyPass /.well-known/ !
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
RequestHeader set "X-Forwarded-Proto" expr=%{REQUEST_SCHEME}