1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-14 15:53:28 +01:00

Docs Update: OpenWayback -> pywb Transition Guide (#588)

* docs work on OpenWayback -> pywb transition, part 1

* docs: add config change examples, exclusions and deploy recommendations

* update with path index example

* update terms with collection info

* docs update:
- add zipnum examples to owb-to-pywb config transition
- add working docker compose examples for nginx subdirectory, apache subdirectory and outback cdx deployment in ./sample-deploy
- update usage and owb-to-pywb deployment docs with updated subdiretory deployment info + sample-deploy links

* tweak exclusion info, deploy title

* add missing filee uwsgi_subdir.ini

* Docs: fix typos and clarifications from review (thanks @ldko!)

Co-authored-by: Lauren Ko <lauren.ko@unt.edu>

* docs: explain that existing cdx can be added to outbackcdx, explain reindexing is optional

* docs: elaborate on docker-compose examples

* minor tweaks

* update to latest wombat 3.0.2
* update CHANGES.rst

* bump version to 2.5.0 for release

Co-authored-by: Lauren Ko <lauren.ko@unt.edu>
This commit is contained in:
Ilya Kreymer 2020-12-04 18:40:58 -08:00 committed by GitHub
parent 7b51101b04
commit 9e09bcd2a7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
24 changed files with 1418 additions and 20 deletions

View File

@ -4,6 +4,8 @@ karma-tests/
tests_disabled/
venv/
collections/
wombat/
docs/
.cache/
.eggs/

View File

@ -1,3 +1,13 @@
pywb 2.5.0 changelist
~~~~~~~~~~~~~~~~~~~~~
* New OpenWayback->pywb Transition Guide: ``https://pywb.readthedocs.io/en/latest/manual/owb-transition.html``
* Sample deployments with Docker Compose for running with Apache, Nginx and OutbackCDX in ``sample-deploy`` directory.
* Update to latest gevent to fix issues with latest python `#583 <https://github.com/webrecorder/pywb/pull/583>`_
pywb 2.4.2 changelist
~~~~~~~~~~~~~~~~~~~~~

View File

@ -20,6 +20,7 @@ A subset of features provides the basic functionality of a "Wayback Machine".
manual/ui-customization
manual/architecture
manual/apis
manual/owb-transition
code/pywb

View File

@ -34,6 +34,8 @@ To disable framed replay add:
Note: pywb also supports HTTP/S **proxy mode** which requires additional setup. See :ref:`https-proxy` for more details.
.. _dir_structure:
Directory Structure
-------------------

View File

@ -0,0 +1,31 @@
.. _migrating-cdx:
Migrating CDX
=============
If you are not using OutbackCDX, you may need to check on the format of the CDX files that you are using.
Over the years, there have been many variations on the CDX (capture index) format which is used by OpenWayback and pywb to look up captures in WARC/ARC files.
When migrating CDX from OpenWayback, there are a few options.
pywb currently supports:
- 9 field CDX (surt-ordered)
- 11 field CDX (surt-ordered)
- CDXJ (surt-ordered)
pywb will support the 11-field and 9-field `CDX format <http://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2015/>`_ that is also used in OpenWayback.
Non-SURT ordered CDXs are not currently supported, though they may be supported in the future (see this `pending pull request <https://github.com/webrecorder/pywb/pull/586>`_).
CDXJ Conversion
---------------
The native format used by pywb is the :ref:`cdxj-index` with SURT-ordering, which uses JSON to encode the fields, allowing for more flexibility by storing most of the index in a JSON, allowing support for optional fields as needed.
If your CDX are not SURT-ordered, 11 or 9 field CDX, or if there is a mix, pywb also offers a conversion utility which will convert all CDX to the pywb native CDXJ: ::
wb-manager cdx-convert <dir-of-cdx-files>
The converter will read the CDX files and create a corresponding .cdxj file for every cdx file. Since the conversion happens on the .cdx itself, it does not require reindexing the source WARC/ARC files and can happen fairly quickly. The converted CDXJ are guaranteed to be in the right format to work with pywb.

View File

@ -0,0 +1,74 @@
.. _using-outback:
Using OutbackCDX with pywb
==========================
The recommended setup is to run `OutbackCDX <https://github.com/nla/outbackcdx>`_ alongside pywb.
OutbackCDX provides an index (CDX) server and can efficiently store and look up web archive data by URL.
Adding CDX to OutbackCDX
------------------------
To set up OutbackCDX, please follow the instructions on the `OutbackCDX README <https://github.com/nla/outbackcdx>`_.
Since pywb also uses the default port 8080, be sure to use a different port for OutbackCDX, eg. ``java -jar outbackcdx*.jar -p 8084``.
OutbackCDX can generally ingest existing CDX used in OpenWayback simply by POSTing to OutbackCDX at a new index endpoint.
For example, assuming OutbackCDX is running on port 8084, to add CDX for ``index1.cdx``, ``index2.cdx``, run:
.. code:: console
curl -X POST --data-binary @index1.cdx http://localhost:8084/mycoll
curl -X POST --data-binary @index2.cdx http://localhost:8084/mycoll
The contents of each CDX file are added to the ``mycoll`` OutbackCDX index, which can correspond to the web archive collection ``mycoll``.
The index is created automatically if it does not exist.
See the `OutbackCDX Docs <https://github.com/nla/outbackcdx#loading-records>`_ for more info on ingesting CDX.
(Re)generating CDX from WARCs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are some exceptions where it may be useful to re-generate the CDX with pywb for existing WARCs:
- If your CDX is 9-field and does not include the compressed length, regnerating the CDX will result in more efficient HTTP range requests
- If you want to replay pages with POST requests, pywb generated CDX will soon be supported in OutbackCDX (see: `Issue #585 <https://github.com/webrecorder/pywb/issues/585>`_, `Issue #91 <https://github.com/nla/outbackcdx/pull/91>`_ )
To generate the CDX, run the ``cdx-indexer`` command (with ``-p`` flag for POST request handling) for each WARC or set of WARCs you wish to index:
.. code:: console
cdx-indexer /path/to/mywarcs/my.warc.gz > ./index1.cdx
cdx-indexer /path/to/all_warcs/*warc.gz > ./index2.cdx
Then, run the POST command as shown above to ingest to OutbackCDX.
The above can be repeated for each WARC file, or for a set of WARCs using the ``*.warc.gz`` wildcard.
If a CDX index is too big, OutbackCDX may fail and ingesting an index per-WARC may be needed.
Configure pywb with OutbackCDX
------------------------------
The ``config.yaml`` should be configured to point to OutbackCDX.
Assuming a collection named ``mycoll``, the ``config.yaml`` can be configured as follows to use OutbackCDX
.. code:: yaml
collections:
mycoll:
index_paths: cdx+http://localhost:8084/mycoll
archive_paths: /path/to/mywarcs/
The ``archive_paths`` can be configured to point to a directory of WARCs or a path index.

View File

@ -0,0 +1,42 @@
OpenWayback vs pywb Terms
=========================
pywb and OpenWayback use slightly different terms to describe the configuration options, as explained below.
Some differences are:
- The ``wayback.xml`` config file in OpenWayback is replaced with ``config.yaml`` yaml
- The terms ``Access Point`` and ``Wayback Collection`` are replaced with ``Collection`` in pywb. The collection configuration represents a unique path (access point) and the data that is accessed at that path.
- The ``Resource Store`` in OpenWayback is known in pywb as the archive paths, configured under ``archive_paths``
- The ``Resource Index`` in OpenWayback is known in pywb as the index paths, configurable under ``index_paths``
- The ``Exclusions`` in OpenWayback are replaced with general :ref:`access-control`
Pywb Collection Basics
----------------------
A pywb collection must consist of a minimum of three parts: the collection name, the ``index_paths`` (where to read the index), and the ``archive_paths`` (where to read the WARC files).
The collection is accessed by name, so there is no distinct access point.
The collections are configured in the ``config.yaml`` under the ``collections`` key:
For example, a basic collection definition can be specified via:
.. code:: yaml
collections:
wayback:
index_paths: /archive/cdx/
archive_paths: /archive/storage/warcs/
Pywb also supports a convention-based directory structure. Collections created in this structure can be detected automatically
and need not be specified in the ``config.yaml``. This structure is designed for smaller collections that are all stored locally in a subdirectory.
See the :ref:`dir_structure` for the default pywb directory structure.
However, for importing existing collections from OpenWayback, it is probably easier to specify the existing paths as shown above.

View File

@ -0,0 +1,308 @@
Converting OpenWayback Config to pywb Config
============================================
OpenWayback includes many different types of configurations.
For most use cases, using OutbackCDX with pywb is the recommended approach, as explained in :ref:`using-outback`.
The following are a few specific example of WaybackCollections gathered from active OpenWayback configurations
and how they can be configured for use with pywb.
Remote Collection / Access Point
--------------------------------
A collection configured with a remote index and WARC access can be converted to use OutbackCDX
for the remote index, while pywb can load WARCs directly from an HTTP endpoint.
For example, a configuration similar to:
.. code:: xml
<bean name="standardaccesspoint" class="org.archive.wayback.webapp.AccessPoint">
<property name="accessPointPath" value="/wayback/"/>
<property name="collection" ref="remotecollection" />
...
</bean>
<bean id="remotecollection" class="org.archive.wayback.webapp.WaybackCollection">
<property name="resourceStore">
<bean class="org.archive.wayback.resourcestore.SimpleResourceStore">
<property name="prefix" value="http://myarchive.example.com/RemoteStore/" />
</bean>
</property>
<property name="resourceIndex">
<bean class="org.archive.wayback.resourceindex.RemoteResourceIndex">
<property name="searchUrlBase" value="http://myarchive.example.com/RemoteIndex" />
</bean>
</property>
</bean>
can be converted to the following config, with OutbackCDX assumed to be running
at: ``http://myarchive.example.com/RemoteIndex``
.. code:: yaml
collections:
wayback:
index_paths: cdx+http://myarchive.example.com/RemoteIndex
archive_paths: http://myarchive.example.com/RemoteStore/
Local Collection / Access Point
-------------------------------
An OpenWayback configuration with a local collection and local CDX, for example:
.. code:: xml
<bean id="collection" class="org.archive.wayback.webapp.WaybackCollection">
<property name="resourceIndex">
<bean class="org.archive.wayback.resourceindex.cdxserver.EmbeddedCDXServerIndex">
...
<property name="cdxServer">
<bean class="org.archive.cdxserver.CDXServer">
<property name="cdxSource">
<bean class="org.archive.format.cdx.MultiCDXInputSource">
<property name="cdxUris">
<list>
<value>/wayback/cdx/mycdx1.cdx</value>
<value>/wayback/cdx/mycdx2.cdx</value>
</list>
</property>
</bean>
</property>
<property name="cdxFormat" value="cdx11"/>
<property name="surtMode" value="true"/>
</bean>
</property>
...
</bean>
</property>
</bean>
can be configured in pywb using the ``index_paths`` key.
Note that the CDX files should all be in the same format. See :ref:`migrating-cdx` for more info on converting
CDX to pywb native CDXJ format.
.. code:: yaml
collections:
wayback:
index_paths: /wayback/cdx/
archive_paths: ...
It's also possible to combine directories, individual CDX files, and even a remote index from OutbackCDX in a single collection
(as long as all CDX are in the same format).
pywb will query all the sources simultaneously to find the best match.
.. code:: yaml
collections:
wayback:
index_group:
cdx1: /wayback/cdx1/
cdx2: /wayback/cdx2/mycdx.cdx
remote: cdx+https://myarchive.example.com/outbackcdx
archive_paths: ...
However, OutbackCDX is still recommended to avoid more complex CDX configurations.
WatchedCDXSource
^^^^^^^^^^^^^^^^
OpenWayback includes a 'Watched CDX Source' option which watches a directory for new CDX indexes.
This functionality is default in pywb when specifying a directory for the index path:
For example, the config:
.. code:: xml
<property name="source">
<bean class="org.archive.wayback.resourceindex.WatchedCDXSource">
<property name="recursive" value="false" />
<property name="filters">
<list>
<value>^.+\.cdx$</value>
</list>
</property>
<property name="path" value="/wayback/cdx-index/" />
</bean>
</property>
can be replaced with:
.. code:: yaml
collections:
wayback:
index_paths: /wayback/cdx-index/
archive_paths: ...
pywb will load all CDX from that directory.
ZipNum Cluster Index
--------------------
pywb also supports using a compressed :ref:`zipnum` instead of a plain text CDX. For example, the following OpenWayback configuration:
.. code:: xml
<bean id="collection" class="org.archive.wayback.webapp.WaybackCollection">
<property name="resourceIndex">
<bean class="org.archive.wayback.resourceindex.LocalResourceIndex">
...
<property name="source">
<bean class="org.archive.wayback.resourceindex.ZipNumClusterSearchResultSource">
<property name="cluster">
<bean class="org.archive.format.gzip.zipnum.ZipNumCluster">
<property name="summaryFile" value="/webarchive/zipnum-cdx/all.summary"></property>
<property name="locFile" value="/webarchive/zipnum-cdx/all.loc"></property>
</bean>
</property>
...
</bean>
</property>
</bean>
can simply be converted to the pywb config:
.. code:: yaml
collections:
wayback:
index_paths: /webarchive/zipnum-cdx
# if the index is not surt ordered
surt_ordered: false
pywb will automatically determine the ``.summary`` and use the ``.loc`` files for the ZipNum Cluster if they are present in the directory.
Note that if the ZipNum index is **not** SURT ordered, the ``surt_ordered: false`` flag must be added to support this format.
Path Index Configuration
------------------------
OpenWayback supports a 'path index' that can be used to look up a WARC by filename and map to an exact path.
For compatibility, pywb supports the same path index lookup, as well as loading WARC files by path or URL prefix.
For example, an OpenWayback configuration that includes a path index:
.. code:: xml
<bean id="resourcefilelocationdb" class="org.archive.wayback.resourcestore.locationdb.FlatFileResourceFileLocationDB">
<property name="path" value="/archive/warc-paths.txt"/>
</bean>
<bean id="resourceStore" class="org.archive.wayback.resourcestore.LocationDBResourceStore">
<property name="db" ref="resourcefilelocationdb" />
</bean>
can be configured in the ``archive_paths`` field of pywb collection configuration:
.. code:: yaml
collections:
wayback:
index_paths: ...
archive_paths: /archive/warc-paths.txt
The path index is a tab-delimited text file for mapping WARC filenames to full file paths or URLs, eg:
.. code::
example.warc.gz<tab>/some/path/to/example.warc.gz
another.warc.gz<tab>/some-other/path/another.warc.gz
remote.warc.gz<tab>http://warcstore.example.com/serve/remote.warc.gz
However, if all WARC files are stored in the same directory, or in a few directories, a path index is not needed and pywb will try loading the WARC by prefix.
The ``archive_paths`` can accept a list of entries. For example, given the config:
.. code:: yaml
collections:
wayback:
index_paths: ...
archive_paths:
- /archive/warcs1/
- /archive/warcs2/
- https://myarchive.example.com/warcs/
- /archive/warc-paths.txt
And the WARC file: ``example.warc.gz``, pywb will try to find the WARC in order from:
.. code::
1. /archive/warcs1/example.warc.gz
2. /archive/warcs2/example.warc.gz
3. https://myarchive.example.com/warcs/example.warc.gz
4. Looking up example.warc.gz in /archive/warc-paths.txt
Proxy Mode Access
-----------------
A OpenWayback configuration may include many beans to support proxy mode, eg:
.. code:: xml
<bean id="proxyreplaydispatcher" class="org.archive.wayback.replay.SelectorReplayDispatcher">
...
<property name="renderer">
<bean class="org.archive.wayback.proxy.HttpsRedirectAndLinksRewriteProxyHTMLMarkupReplayRenderer">
...
<property name="uriConverter">
<bean class="org.archive.wayback.proxy.ProxyHttpsResultURIConverter"/>
</property>
</bean>
</propery>
</bean>
<bean name="proxy" class="org.archive.wayback.webapp.AccessPoint">
<property name="internalPort" value="${proxy.port}"/>
<property name="accessPointPath" value="${proxy.port}" />
<property name="collection" ref="localcdxcollection" />
...
</bean>
In pywb, the proxy mode can be enabled by adding to the main ``config.yaml`` the name of the collection
that should be served in proxy mode:
.. code:: yaml
proxy:
source_coll: wayback
There are some differences between OpenWayback and pywb proxy mode support.
In OpenWayback, proxy mode is configured using separate access points for different collections on different ports.
OpenWayback only supports HTTP proxy and attempts to rewrite HTTPS URLs to HTTP.
In pywb, proxy mode is enabled on the same port as regular access, and pywb supports HTTP and HTTPS proxy.
pywb does not attempt to rewrite HTTPS to HTTP, as most browsers disallow HTTP access as insecure for many sites.
pywb supports a default collection that is enabled for proxy mode, and a default timestamp accessed by the proxy mode.
(Switching the collection and date accessed is possible but not currently supported without extensions to pywb).
To support HTTPS access, pywb provides a certificate authority that can be trusted by a browser to rewrite HTTPS content.
See :ref:`https-proxy` for all of the options of pywb proxy mode configuration.

View File

@ -0,0 +1,80 @@
Deploying pywb: Collection Paths and routing with Nginx/Apache
======================================================
In pywb, the collection name is also the access point, and each of the collections in ``config.yaml``
can be accessed by their name as the subpath:
.. code:: yaml
collections:
wayback:
...
another-collection:
...
If pywb is deployed on port 8080, each collection will be available under:
``http://<hostname>/wayback/*/https://example.com/`` and ``http://<hostname>/another-collection/*/https://example.com/``
To make a collection available under the root, simply set its name to: ``$root``
.. code:: yaml
collections:
$root:
...
another-collection:
...
Now, the first collection is available at: ``http://<hostname>/*/https://example.com/``.
To deploy pywb on a subdirectory, eg. ``http://<hostname>/pywb/another-collection/*/https://example.com/``,
and in general, for production use, it is recommended to deploy pywb behind an Nginx or Apache reverse proxy.
Nginx and Apache Reverse Proxy
------------------------------
The recommended deployment for pywb is with uWSGI and behind an Nginx or Apache frontend.
This configuration allows for more robust deployment, and allowing these servers to handle static files.
See the :ref:`nginx-deploy` and :ref:`apache-deploy` sections for more info on deploying with Nginx and Apache.
Working Docker Compose Examples
-------------------------------
The pywb `Deployment Examples <https://github.com/webrecorder/pywb/blob/docs/sample-deploy/>`_ include working examples of deploying pywb with Nginx, Apache and OutbackCDX
in Docker using Docker Compose, widely available container orchestration tools.
See `Installing Docker <https://docs.docker.com/get-docker/>`_ and `Installing Docker Compose <https://docs.docker.com/compose/install/>`_ for instructions on how to install these tools.
The examples are available in the ``sample-deploy`` directory of the pywb repo. The examples include:
- ``docker-compose-outback.yaml`` -- Docker Compose config to start OutbackCDX and pywb, and ingest sample data into OutbackCDX
- ``docker-compose-nginx.yaml`` -- Docker Compose config to launch pywb and latest Nginx, with pywb running on subdirectory ``/wayback`` and Nginx serving static files from pywb.
- ``docker-compose-apache.yaml`` -- Docker Compose config to launch pywb and latest Apache, with pywb running on subdirectory ``/wayback`` and Apache serving static files from pywb.
The examples are designed to be run one at a time, and assume port 8080 is available.
After installing Docker and Docker Compose, run either of:
- ``docker-compose -f docker-compose-outback.yaml up``
- ``docker-compose -f docker-compose-nginx.yaml up``
- ``docker-compose -f docker-compose-apache.yaml up``
This will download the standard Docker images and start all of the components in Docker.
If everything works correctly, you should be able to access: ``http://localhost:8080/pywb/https://example.com/`` to view the sample pywb collection.
Press CTRL+C to interrupt and stop the example in the console.

View File

@ -0,0 +1,68 @@
Migrating Exclusion Rules
=========================
pywb includes a new :ref:`access-control` system, which allows granual allow/block/exclude access control rules on paths and subpaths.
The rules are configured in .aclj files, and a command-line utility exists to import OpenWayback exclusions
into the pywb ACLJ format.
For example, given an OpenWayback exclusion list configuration for a static file:
.. code:: xml
<bean id="excluder-factory-static" class="org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory">
<property name="file" value="/archive/exclusions.txt"/>
<property name="checkInterval" value="600000" />
</bean>
The exclusions file can be converted to an .aclj file by running: ::
wb-manager acl importtxt /archive/exclusions.aclj /archive/exclusions.txt exclude
Then, in the pywb config, specify:
.. code:: yaml
collections:
wayback:
index_paths: ...
archive_paths: ...
acl_paths: /archive/exclusions.aclj
It is possible to specify multiple access control files, which will all be applied.
Using ``block`` instead of ``exclude`` will result in pywb returning a 451 error, indicating that URLs are in the index but blocked.
CLI Tool
--------
After exclusions have been imported, it is recommended to use ``wb-manager acl`` command-line tool for managing exclusions:
To add an exclusion, run: ::
wb-manager acl add /archive/exclusions.aclj http://httpbin.org/anything/something exclude
To remove an exclusion, run: ::
wb-manager acl remove /archive/exclusions.aclj http://httpbin.org/anything/something
For more options, see the full :ref:`access-control` documentation or run ``wb-manager acl --help``.
Not Yet Supported
-----------------
Some OpenWayback exclusion options are not yet supported in pywb.
The following is not yet supported in the access control system:
- Exclusions/Access Control By specific date range
- Regex based exclusions
- Date Range Embargo on All URLs
- Robots.txt-based exclusions

View File

@ -0,0 +1,21 @@
.. _transition-openwayback:
OpenWayback Transition Guide
============================
This guide provides guidelines for transtioning from OpenWayback to pywb,
with additional recommendations. The main recommendation is to run pywb along
with OutbackCDX and nginx, and this configuration is covered below, along with additional options.
.. toctree::
:maxdepth: 2
owb-pywb-terms
outbackcdx
migrating-cdx
owb-to-pywb-config
owb-to-pywb-exclusions
owb-to-pywb-deploy

View File

@ -7,7 +7,7 @@ pywb includes a sophisticated server and client-side rewriting systems, includin
configuration for domain and content-specific rewriting rules, fuzzy index matching for replay,
and a thorough client-side JS rewriting system.
With pywb 2.3.0, the client-side rewriting system exists in a separate module at `https://github.com/webrecorder/wombat``
With pywb 2.3.0, the client-side rewriting system exists in a separate module at ``https://github.com/webrecorder/wombat``
URL Rewriting

View File

@ -230,6 +230,8 @@ To run pywb in Docker behind a local nginx (as shown below), port 8081 should al
See :ref:`getting-started-docker` for more info on using pywb with Docker.
.. _nginx-deploy:
Sample Nginx Configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -263,29 +265,55 @@ See the `Nginx Docs <https://nginx.org/en/docs/>`_ for a lot more details on how
}
}
.. _apache-deploy:
Sample Apache Configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following Apache configuration snippet can be used to deploy pywb *without* uwsgi. A configuration with uwsgi is also probably possible but this covers the simplest case of launching the `wayback` binary directly.
The recommended Apache configuration is to use pywb with ``mod_proxy`` and ``mod_proxy_uwsgi``.
The configuration assumes pywb is running on port 8080 on localhost, but it could be on a different machine as well.
To enable these, ensure that your httpd.conf includes:
.. code:: apache
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_uwsgi_module modules/mod_proxy_uwsgi.so
Then, in your config, simply include:
.. code:: apache
<VirtualHost *:80>
ServerName proxy.example.com
Redirect / https://proxy.example.com/
DocumentRoot /var/www/html/
ProxyPass / uwsgi://pywb:8081/
</VirtualHost>
<VirtualHost *:443>
ServerName proxy.example.com
SSLEngine on
DocumentRoot /var/www/html/
ErrorDocument 404 /404.html
ProxyPreserveHost On
ProxyPass /.well-known/ !
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
RequestHeader set "X-Forwarded-Proto" expr=%{REQUEST_SCHEME}
</VirtualHost>
The configuration assumes uwsgi is started with ``uwsgi uwsgi.ini``
Running on Subdirectory Path
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To run pywb on a subdirectory, rather than at the root of the web server, the recommended configuration is to adjust the ``uwsgi.ini`` to include the subdirectory:
For example, to deploy pywb under the ``/wayback`` subdirectory, the ``uwsgi.ini`` can be configured as follows:
.. code:: ini
mount = /wayback=./pywb/apps/wayback.py
manage-script-name = true
.. _example-deploy:
Deployment Examples
^^^^^^^^^^^^^^^^^^^
The ``sample-deploy`` directory includes working Docker Compose examples for deploying pywb with Nginx and Apache on the ``/wayback`` subdirectory.
See:
- `Docker Compose Nginx <https://github.com/webrecorder/pywb/blob/docs/sample-deploy/docker-compose-nginx.yaml>`_ for sample Nginx config.
- `Docker Compose Apache <https://github.com/webrecorder/pywb/blob/docs/sample-deploy/docker-compose-apache.yaml>`_ for sample Apache config.
- `uwsgi_subdir.ini <https://github.com/webrecorder/pywb/blob/docs/sample-deploy/uwsgi_subdir.ini>`_ for example subdirectory uwsgi config.

File diff suppressed because one or more lines are too long

View File

@ -1,4 +1,4 @@
__version__ = '2.4.2'
__version__ = '2.5.0'
if __name__ == '__main__':
print(__version__)

View File

@ -0,0 +1,34 @@
# This example demonstrates running pywb with apache frontend under a subpath /wayback
version: '3'
services:
# main pywb image
pywb:
image: webrecorder/pywb
volumes:
- ../config.yaml:/webarchive/config.yaml
- ../sample_archive/:/webarchive/sample_archive/
- ./uwsgi_subdir.ini:/uwsgi/uwsgi.ini
# optional volume to serve static assets from nginx
- pywb-static:/pywb/pywb/static
apache:
image: httpd
ports:
- 8080:80
volumes:
#- ./nginx-default.conf:/etc/nginx/conf.d/default.conf
- ./httpd.conf:/usr/local/apache2/conf/httpd.conf
- ./pywb-apache.conf:/usr/local/apache2/conf/extra/pywb-apache.conf
# optional volume to serve static assets from nginx
- pywb-static:/pywb/pywb/static
depends_on:
- pywb
volumes:
pywb-static:

View File

@ -0,0 +1,32 @@
# This example demonstrates running pywb with nginx frontend under a subpath /wayback
version: '3'
services:
# main pywb image
pywb:
image: webrecorder/pywb
volumes:
- ../config.yaml:/webarchive/config.yaml
- ../sample_archive/:/webarchive/sample_archive/
- ./uwsgi_subdir.ini:/uwsgi/uwsgi.ini
# optional volume to serve static assets from nginx
- pywb-static:/pywb/pywb/static
nginx:
image: nginx
ports:
- 8080:80
volumes:
- ./pywb-nginx.conf:/etc/nginx/conf.d/default.conf
# optional volume to serve static assets from nginx
- pywb-static:/pywb/pywb/static
depends_on:
- pywb
volumes:
pywb-static:

View File

@ -0,0 +1,39 @@
version: '3'
services:
# outbackcdx image
outbackcdx:
image: nlagovau/outbackcdx
ports:
- 8084:8080
# use cdx-indexer to index and ingest into outbackcdx
ingest:
image: webrecorder/pywb
entrypoint: ["bash", "-c"]
command: /tmp/run.sh
depends_on:
- outbackcdx
volumes:
- ../config.yaml:/webarchive/config.yaml
- ./run.sh:/tmp/run.sh
- ../sample_archive/:/webarchive/sample_archive/
# main pywb image
pywb:
image: webrecorder/pywb
volumes:
- ../config.yaml:/webarchive/config.yaml
- ../sample_archive/:/webarchive/sample_archive/
ports:
- 8080:8080
depends_on:
- ingest

555
sample-deploy/httpd.conf Normal file
View File

@ -0,0 +1,555 @@
#
# This is the main Apache HTTP server configuration file. It contains the
# configuration directives that give the server its instructions.
# See <URL:http://httpd.apache.org/docs/2.4/> for detailed information.
# In particular, see
# <URL:http://httpd.apache.org/docs/2.4/mod/directives.html>
# for a discussion of each configuration directive.
#
# Do NOT simply read the instructions in here without understanding
# what they do. They're here only as hints or reminders. If you are unsure
# consult the online docs. You have been warned.
#
# Configuration and logfile names: If the filenames you specify for many
# of the server's control files begin with "/" (or "drive:/" for Win32), the
# server will use that explicit path. If the filenames do *not* begin
# with "/", the value of ServerRoot is prepended -- so "logs/access_log"
# with ServerRoot set to "/usr/local/apache2" will be interpreted by the
# server as "/usr/local/apache2/logs/access_log", whereas "/logs/access_log"
# will be interpreted as '/logs/access_log'.
#
# ServerRoot: The top of the directory tree under which the server's
# configuration, error, and log files are kept.
#
# Do not add a slash at the end of the directory path. If you point
# ServerRoot at a non-local disk, be sure to specify a local disk on the
# Mutex directive, if file-based mutexes are used. If you wish to share the
# same ServerRoot for multiple httpd daemons, you will need to change at
# least PidFile.
#
ServerRoot "/usr/local/apache2"
#
# Mutex: Allows you to set the mutex mechanism and mutex file directory
# for individual mutexes, or change the global defaults
#
# Uncomment and change the directory if mutexes are file-based and the default
# mutex file directory is not on a local disk or is not appropriate for some
# other reason.
#
# Mutex default:logs
#
# Listen: Allows you to bind Apache to specific IP addresses and/or
# ports, instead of the default. See also the <VirtualHost>
# directive.
#
# Change this to Listen on specific IP addresses as shown below to
# prevent Apache from glomming onto all bound IP addresses.
#
#Listen 12.34.56.78:80
Listen 80
#
# Dynamic Shared Object (DSO) Support
#
# To be able to use the functionality of a module which was built as a DSO you
# have to place corresponding `LoadModule' lines at this location so the
# directives contained in it are actually available _before_ they are used.
# Statically compiled modules (those listed by `httpd -l') do not need
# to be loaded here.
#
# Example:
# LoadModule foo_module modules/mod_foo.so
#
LoadModule mpm_event_module modules/mod_mpm_event.so
#LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
#LoadModule mpm_worker_module modules/mod_mpm_worker.so
LoadModule authn_file_module modules/mod_authn_file.so
#LoadModule authn_dbm_module modules/mod_authn_dbm.so
#LoadModule authn_anon_module modules/mod_authn_anon.so
#LoadModule authn_dbd_module modules/mod_authn_dbd.so
#LoadModule authn_socache_module modules/mod_authn_socache.so
LoadModule authn_core_module modules/mod_authn_core.so
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule authz_groupfile_module modules/mod_authz_groupfile.so
LoadModule authz_user_module modules/mod_authz_user.so
#LoadModule authz_dbm_module modules/mod_authz_dbm.so
#LoadModule authz_owner_module modules/mod_authz_owner.so
#LoadModule authz_dbd_module modules/mod_authz_dbd.so
LoadModule authz_core_module modules/mod_authz_core.so
#LoadModule authnz_ldap_module modules/mod_authnz_ldap.so
#LoadModule authnz_fcgi_module modules/mod_authnz_fcgi.so
LoadModule access_compat_module modules/mod_access_compat.so
LoadModule auth_basic_module modules/mod_auth_basic.so
#LoadModule auth_form_module modules/mod_auth_form.so
#LoadModule auth_digest_module modules/mod_auth_digest.so
#LoadModule allowmethods_module modules/mod_allowmethods.so
#LoadModule isapi_module modules/mod_isapi.so
#LoadModule file_cache_module modules/mod_file_cache.so
#LoadModule cache_module modules/mod_cache.so
#LoadModule cache_disk_module modules/mod_cache_disk.so
#LoadModule cache_socache_module modules/mod_cache_socache.so
#LoadModule socache_shmcb_module modules/mod_socache_shmcb.so
#LoadModule socache_dbm_module modules/mod_socache_dbm.so
#LoadModule socache_memcache_module modules/mod_socache_memcache.so
#LoadModule socache_redis_module modules/mod_socache_redis.so
#LoadModule watchdog_module modules/mod_watchdog.so
#LoadModule macro_module modules/mod_macro.so
#LoadModule dbd_module modules/mod_dbd.so
#LoadModule bucketeer_module modules/mod_bucketeer.so
#LoadModule dumpio_module modules/mod_dumpio.so
#LoadModule echo_module modules/mod_echo.so
#LoadModule example_hooks_module modules/mod_example_hooks.so
#LoadModule case_filter_module modules/mod_case_filter.so
#LoadModule case_filter_in_module modules/mod_case_filter_in.so
#LoadModule example_ipc_module modules/mod_example_ipc.so
#LoadModule buffer_module modules/mod_buffer.so
#LoadModule data_module modules/mod_data.so
#LoadModule ratelimit_module modules/mod_ratelimit.so
LoadModule reqtimeout_module modules/mod_reqtimeout.so
#LoadModule ext_filter_module modules/mod_ext_filter.so
#LoadModule request_module modules/mod_request.so
#LoadModule include_module modules/mod_include.so
LoadModule filter_module modules/mod_filter.so
#LoadModule reflector_module modules/mod_reflector.so
#LoadModule substitute_module modules/mod_substitute.so
#LoadModule sed_module modules/mod_sed.so
#LoadModule charset_lite_module modules/mod_charset_lite.so
#LoadModule deflate_module modules/mod_deflate.so
#LoadModule xml2enc_module modules/mod_xml2enc.so
#LoadModule proxy_html_module modules/mod_proxy_html.so
#LoadModule brotli_module modules/mod_brotli.so
LoadModule mime_module modules/mod_mime.so
#LoadModule ldap_module modules/mod_ldap.so
LoadModule log_config_module modules/mod_log_config.so
#LoadModule log_debug_module modules/mod_log_debug.so
#LoadModule log_forensic_module modules/mod_log_forensic.so
#LoadModule logio_module modules/mod_logio.so
#LoadModule lua_module modules/mod_lua.so
LoadModule env_module modules/mod_env.so
#LoadModule mime_magic_module modules/mod_mime_magic.so
#LoadModule cern_meta_module modules/mod_cern_meta.so
#LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so
#LoadModule ident_module modules/mod_ident.so
#LoadModule usertrack_module modules/mod_usertrack.so
#LoadModule unique_id_module modules/mod_unique_id.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule version_module modules/mod_version.so
#LoadModule remoteip_module modules/mod_remoteip.so
LoadModule proxy_module modules/mod_proxy.so
#LoadModule proxy_connect_module modules/mod_proxy_connect.so
#LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
#LoadModule proxy_http_module modules/mod_proxy_http.so
#LoadModule proxy_fcgi_module modules/mod_proxy_fcgi.so
#LoadModule proxy_scgi_module modules/mod_proxy_scgi.so
LoadModule proxy_uwsgi_module modules/mod_proxy_uwsgi.so
#LoadModule proxy_fdpass_module modules/mod_proxy_fdpass.so
#LoadModule proxy_wstunnel_module modules/mod_proxy_wstunnel.so
#LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
#LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
#LoadModule proxy_express_module modules/mod_proxy_express.so
#LoadModule proxy_hcheck_module modules/mod_proxy_hcheck.so
#LoadModule session_module modules/mod_session.so
#LoadModule session_cookie_module modules/mod_session_cookie.so
#LoadModule session_crypto_module modules/mod_session_crypto.so
#LoadModule session_dbd_module modules/mod_session_dbd.so
#LoadModule slotmem_shm_module modules/mod_slotmem_shm.so
#LoadModule slotmem_plain_module modules/mod_slotmem_plain.so
#LoadModule ssl_module modules/mod_ssl.so
#LoadModule optional_hook_export_module modules/mod_optional_hook_export.so
#LoadModule optional_hook_import_module modules/mod_optional_hook_import.so
#LoadModule optional_fn_import_module modules/mod_optional_fn_import.so
#LoadModule optional_fn_export_module modules/mod_optional_fn_export.so
#LoadModule dialup_module modules/mod_dialup.so
#LoadModule http2_module modules/mod_http2.so
#LoadModule proxy_http2_module modules/mod_proxy_http2.so
#LoadModule md_module modules/mod_md.so
#LoadModule lbmethod_byrequests_module modules/mod_lbmethod_byrequests.so
#LoadModule lbmethod_bytraffic_module modules/mod_lbmethod_bytraffic.so
#LoadModule lbmethod_bybusyness_module modules/mod_lbmethod_bybusyness.so
#LoadModule lbmethod_heartbeat_module modules/mod_lbmethod_heartbeat.so
LoadModule unixd_module modules/mod_unixd.so
#LoadModule heartbeat_module modules/mod_heartbeat.so
#LoadModule heartmonitor_module modules/mod_heartmonitor.so
#LoadModule dav_module modules/mod_dav.so
LoadModule status_module modules/mod_status.so
LoadModule autoindex_module modules/mod_autoindex.so
#LoadModule asis_module modules/mod_asis.so
#LoadModule info_module modules/mod_info.so
#LoadModule suexec_module modules/mod_suexec.so
<IfModule !mpm_prefork_module>
#LoadModule cgid_module modules/mod_cgid.so
</IfModule>
<IfModule mpm_prefork_module>
#LoadModule cgi_module modules/mod_cgi.so
</IfModule>
#LoadModule dav_fs_module modules/mod_dav_fs.so
#LoadModule dav_lock_module modules/mod_dav_lock.so
#LoadModule vhost_alias_module modules/mod_vhost_alias.so
#LoadModule negotiation_module modules/mod_negotiation.so
LoadModule dir_module modules/mod_dir.so
#LoadModule imagemap_module modules/mod_imagemap.so
#LoadModule actions_module modules/mod_actions.so
#LoadModule speling_module modules/mod_speling.so
#LoadModule userdir_module modules/mod_userdir.so
LoadModule alias_module modules/mod_alias.so
#LoadModule rewrite_module modules/mod_rewrite.so
<IfModule unixd_module>
#
# If you wish httpd to run as a different user or group, you must run
# httpd as root initially and it will switch.
#
# User/Group: The name (or #number) of the user/group to run httpd as.
# It is usually good practice to create a dedicated user and group for
# running httpd, as with most system services.
#
User daemon
Group daemon
</IfModule>
# 'Main' server configuration
#
# The directives in this section set up the values used by the 'main'
# server, which responds to any requests that aren't handled by a
# <VirtualHost> definition. These values also provide defaults for
# any <VirtualHost> containers you may define later in the file.
#
# All of these directives may appear inside <VirtualHost> containers,
# in which case these default settings will be overridden for the
# virtual host being defined.
#
#
# ServerAdmin: Your address, where problems with the server should be
# e-mailed. This address appears on some server-generated pages, such
# as error documents. e.g. admin@your-domain.com
#
ServerAdmin you@example.com
#
# ServerName gives the name and port that the server uses to identify itself.
# This can often be determined automatically, but we recommend you specify
# it explicitly to prevent problems during startup.
#
# If your host doesn't have a registered DNS name, enter its IP address here.
#
#ServerName www.example.com:80
#
# Deny access to the entirety of your server's filesystem. You must
# explicitly permit access to web content directories in other
# <Directory> blocks below.
#
<Directory />
AllowOverride none
Require all denied
</Directory>
#
# Note that from this point forward you must specifically allow
# particular features to be enabled - so if something's not working as
# you might expect, make sure that you have specifically enabled it
# below.
#
#
# DocumentRoot: The directory out of which you will serve your
# documents. By default, all requests are taken from this directory, but
# symbolic links and aliases may be used to point to other locations.
#
DocumentRoot "/usr/local/apache2/htdocs"
<Directory "/usr/local/apache2/htdocs">
#
# Possible values for the Options directive are "None", "All",
# or any combination of:
# Indexes Includes FollowSymLinks SymLinksifOwnerMatch ExecCGI MultiViews
#
# Note that "MultiViews" must be named *explicitly* --- "Options All"
# doesn't give it to you.
#
# The Options directive is both complicated and important. Please see
# http://httpd.apache.org/docs/2.4/mod/core.html#options
# for more information.
#
Options Indexes FollowSymLinks
#
# AllowOverride controls what directives may be placed in .htaccess files.
# It can be "All", "None", or any combination of the keywords:
# AllowOverride FileInfo AuthConfig Limit
#
AllowOverride None
#
# Controls who can get stuff from this server.
#
Require all granted
</Directory>
#
# DirectoryIndex: sets the file that Apache will serve if a directory
# is requested.
#
<IfModule dir_module>
DirectoryIndex index.html
</IfModule>
#
# The following lines prevent .htaccess and .htpasswd files from being
# viewed by Web clients.
#
<Files ".ht*">
Require all denied
</Files>
#
# ErrorLog: The location of the error log file.
# If you do not specify an ErrorLog directive within a <VirtualHost>
# container, error messages relating to that virtual host will be
# logged here. If you *do* define an error logfile for a <VirtualHost>
# container, that host's errors will be logged there and not here.
#
ErrorLog /proc/self/fd/2
#
# LogLevel: Control the number of messages logged to the error_log.
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
#
LogLevel warn
<IfModule log_config_module>
#
# The following directives define some format nicknames for use with
# a CustomLog directive (see below).
#
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
<IfModule logio_module>
# You need to enable mod_logio.c to use %I and %O
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedio
</IfModule>
#
# The location and format of the access logfile (Common Logfile Format).
# If you do not define any access logfiles within a <VirtualHost>
# container, they will be logged here. Contrariwise, if you *do*
# define per-<VirtualHost> access logfiles, transactions will be
# logged therein and *not* in this file.
#
CustomLog /proc/self/fd/1 common
#
# If you prefer a logfile with access, agent, and referer information
# (Combined Logfile Format) you can use the following directive.
#
#CustomLog "logs/access_log" combined
</IfModule>
<IfModule alias_module>
#
# Redirect: Allows you to tell clients about documents that used to
# exist in your server's namespace, but do not anymore. The client
# will make a new request for the document at its new location.
# Example:
# Redirect permanent /foo http://www.example.com/bar
#
# Alias: Maps web paths into filesystem paths and is used to
# access content that does not live under the DocumentRoot.
# Example:
# Alias /webpath /full/filesystem/path
#
# If you include a trailing / on /webpath then the server will
# require it to be present in the URL. You will also likely
# need to provide a <Directory> section to allow access to
# the filesystem path.
#
# ScriptAlias: This controls which directories contain server scripts.
# ScriptAliases are essentially the same as Aliases, except that
# documents in the target directory are treated as applications and
# run by the server when requested rather than as documents sent to the
# client. The same rules about trailing "/" apply to ScriptAlias
# directives as to Alias.
#
ScriptAlias /cgi-bin/ "/usr/local/apache2/cgi-bin/"
</IfModule>
<IfModule cgid_module>
#
# ScriptSock: On threaded servers, designate the path to the UNIX
# socket used to communicate with the CGI daemon of mod_cgid.
#
#Scriptsock cgisock
</IfModule>
#
# "/usr/local/apache2/cgi-bin" should be changed to whatever your ScriptAliased
# CGI directory exists, if you have that configured.
#
<Directory "/usr/local/apache2/cgi-bin">
AllowOverride None
Options None
Require all granted
</Directory>
<IfModule headers_module>
#
# Avoid passing HTTP_PROXY environment to CGI's on this or any proxied
# backend servers which have lingering "httpoxy" defects.
# 'Proxy' request header is undefined by the IETF, not listed by IANA
#
RequestHeader unset Proxy early
</IfModule>
<IfModule mime_module>
#
# TypesConfig points to the file containing the list of mappings from
# filename extension to MIME-type.
#
TypesConfig conf/mime.types
#
# AddType allows you to add to or override the MIME configuration
# file specified in TypesConfig for specific file types.
#
#AddType application/x-gzip .tgz
#
# AddEncoding allows you to have certain browsers uncompress
# information on the fly. Note: Not all browsers support this.
#
#AddEncoding x-compress .Z
#AddEncoding x-gzip .gz .tgz
#
# If the AddEncoding directives above are commented-out, then you
# probably should define those extensions to indicate media types:
#
AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz
#
# AddHandler allows you to map certain file extensions to "handlers":
# actions unrelated to filetype. These can be either built into the server
# or added with the Action directive (see below)
#
# To use CGI scripts outside of ScriptAliased directories:
# (You will also need to add "ExecCGI" to the "Options" directive.)
#
#AddHandler cgi-script .cgi
# For type maps (negotiated resources):
#AddHandler type-map var
#
# Filters allow you to process content before it is sent to the client.
#
# To parse .shtml files for server-side includes (SSI):
# (You will also need to add "Includes" to the "Options" directive.)
#
#AddType text/html .shtml
#AddOutputFilter INCLUDES .shtml
</IfModule>
#
# The mod_mime_magic module allows the server to use various hints from the
# contents of the file itself to determine its type. The MIMEMagicFile
# directive tells the module where the hint definitions are located.
#
#MIMEMagicFile conf/magic
#
# Customizable error responses come in three flavors:
# 1) plain text 2) local redirects 3) external redirects
#
# Some examples:
#ErrorDocument 500 "The server made a boo boo."
#ErrorDocument 404 /missing.html
#ErrorDocument 404 "/cgi-bin/missing_handler.pl"
#ErrorDocument 402 http://www.example.com/subscription_info.html
#
#
# MaxRanges: Maximum number of Ranges in a request before
# returning the entire resource, or one of the special
# values 'default', 'none' or 'unlimited'.
# Default setting is to accept 200 Ranges.
#MaxRanges unlimited
#
# EnableMMAP and EnableSendfile: On systems that support it,
# memory-mapping or the sendfile syscall may be used to deliver
# files. This usually improves server performance, but must
# be turned off when serving from networked-mounted
# filesystems or if support for these functions is otherwise
# broken on your system.
# Defaults: EnableMMAP On, EnableSendfile Off
#
#EnableMMAP off
#EnableSendfile on
# Supplemental configuration
#
# The configuration files in the conf/extra/ directory can be
# included to add extra features or to modify the default configuration of
# the server, or you may simply copy their contents here and change as
# necessary.
# Server-pool management (MPM specific)
#Include conf/extra/httpd-mpm.conf
# Multi-language error messages
#Include conf/extra/httpd-multilang-errordoc.conf
# Fancy directory listings
#Include conf/extra/httpd-autoindex.conf
# Language settings
#Include conf/extra/httpd-languages.conf
# User home directories
#Include conf/extra/httpd-userdir.conf
# Real-time info on requests and configuration
#Include conf/extra/httpd-info.conf
# Virtual hosts
#Include conf/extra/httpd-vhosts.conf
# Local access to the Apache HTTP Server Manual
#Include conf/extra/httpd-manual.conf
# Distributed authoring and versioning (WebDAV)
#Include conf/extra/httpd-dav.conf
# Various default settings
#Include conf/extra/httpd-default.conf
# Configure mod_proxy_html to understand HTML4/XHTML1
<IfModule proxy_html_module>
Include conf/extra/proxy-html.conf
</IfModule>
# Secure (SSL/TLS) connections
#Include conf/extra/httpd-ssl.conf
#
# Note: The following must must be present to support
# starting without SSL on platforms with no /dev/random equivalent
# but a statically compiled-in mod_ssl.
#
<IfModule ssl_module>
SSLRandomSeed startup builtin
SSLRandomSeed connect builtin
</IfModule>
Include conf/extra/pywb-apache.conf

View File

@ -0,0 +1,17 @@
<VirtualHost *:80>
# optional: optimization to have apache serve static assets
Alias /wayback/static "/pywb/pywb/static"
ProxyPass /wayback/static !
<Directory "/pywb/pywb/static">
Options None
AllowOverride None
Order allow,deny
Allow from all
Require all granted
</Directory>
# required: proxy pass to pywb
ProxyPass /wayback uwsgi://pywb:8081/
</VirtualHost>

View File

@ -0,0 +1,21 @@
# nginx config for running under /wayback/ prefix
server {
listen 80;
# optinal: optimization to have nginx serve static assets
location /wayback/static {
alias /pywb/pywb/static;
}
# required: pywb with prefix
location /wayback/ {
resolver 127.0.0.1;
uwsgi_pass pywb:8081;
include uwsgi_params;
uwsgi_param UWSGI_SCHEME $scheme;
}
}

4
sample-deploy/run.sh Executable file
View File

@ -0,0 +1,4 @@
#!/bin/bash
cdx-indexer /webarchive/sample_archive/warcs/example.warc.gz > /tmp/index.cdx
curl -X POST --data-binary @/tmp/index.cdx http://outbackcdx:8080/pywb

View File

@ -0,0 +1,29 @@
[uwsgi]
if-not-env = PORT
http-socket = :8080
socket = :8081
endif =
master = true
buffer-size = 65536
die-on-term = true
if-env = VIRTUAL_ENV
venv = $(VIRTUAL_ENV)
endif =
gevent = 100
#Not available until uwsgi 2.1
#monkey-patching manually in pywb.apps.wayback
#gevent-early-monkey-patch =
# for uwsgi<2.1, set env when using gevent
env = GEVENT_MONKEY_PATCH=1
# specify config file here
env = PYWB_CONFIG_FILE=config.yaml
#wsgi = pywb.apps.wayback
# config to run pywb from a prefix
mount = /wayback=/pywb/pywb/apps/wayback.py
manage-script-name = true

2
wombat

@ -1 +1 @@
Subproject commit 3f04dcdcb071042d498c4912599454a15c11f0e4
Subproject commit 5ede99b6ffb3e0e3c240f2403a9f58189edda543