mirror of
https://github.com/webrecorder/pywb.git
synced 2025-03-14 15:53:28 +01:00
* docs work on OpenWayback -> pywb transition, part 1 * docs: add config change examples, exclusions and deploy recommendations * update with path index example * update terms with collection info * docs update: - add zipnum examples to owb-to-pywb config transition - add working docker compose examples for nginx subdirectory, apache subdirectory and outback cdx deployment in ./sample-deploy - update usage and owb-to-pywb deployment docs with updated subdiretory deployment info + sample-deploy links * tweak exclusion info, deploy title * add missing filee uwsgi_subdir.ini * Docs: fix typos and clarifications from review (thanks @ldko!) Co-authored-by: Lauren Ko <lauren.ko@unt.edu> * docs: explain that existing cdx can be added to outbackcdx, explain reindexing is optional * docs: elaborate on docker-compose examples * minor tweaks * update to latest wombat 3.0.2 * update CHANGES.rst * bump version to 2.5.0 for release Co-authored-by: Lauren Ko <lauren.ko@unt.edu>
69 lines
2.0 KiB
ReStructuredText
69 lines
2.0 KiB
ReStructuredText
Migrating Exclusion Rules
|
|
=========================
|
|
|
|
pywb includes a new :ref:`access-control` system, which allows granual allow/block/exclude access control rules on paths and subpaths.
|
|
|
|
The rules are configured in .aclj files, and a command-line utility exists to import OpenWayback exclusions
|
|
into the pywb ACLJ format.
|
|
|
|
For example, given an OpenWayback exclusion list configuration for a static file:
|
|
|
|
.. code:: xml
|
|
|
|
<bean id="excluder-factory-static" class="org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory">
|
|
<property name="file" value="/archive/exclusions.txt"/>
|
|
<property name="checkInterval" value="600000" />
|
|
</bean>
|
|
|
|
|
|
The exclusions file can be converted to an .aclj file by running: ::
|
|
|
|
wb-manager acl importtxt /archive/exclusions.aclj /archive/exclusions.txt exclude
|
|
|
|
|
|
Then, in the pywb config, specify:
|
|
|
|
.. code:: yaml
|
|
|
|
collections:
|
|
wayback:
|
|
index_paths: ...
|
|
archive_paths: ...
|
|
acl_paths: /archive/exclusions.aclj
|
|
|
|
|
|
It is possible to specify multiple access control files, which will all be applied.
|
|
|
|
Using ``block`` instead of ``exclude`` will result in pywb returning a 451 error, indicating that URLs are in the index but blocked.
|
|
|
|
|
|
CLI Tool
|
|
--------
|
|
|
|
After exclusions have been imported, it is recommended to use ``wb-manager acl`` command-line tool for managing exclusions:
|
|
|
|
|
|
To add an exclusion, run: ::
|
|
|
|
wb-manager acl add /archive/exclusions.aclj http://httpbin.org/anything/something exclude
|
|
|
|
To remove an exclusion, run: ::
|
|
|
|
wb-manager acl remove /archive/exclusions.aclj http://httpbin.org/anything/something
|
|
|
|
|
|
For more options, see the full :ref:`access-control` documentation or run ``wb-manager acl --help``.
|
|
|
|
|
|
Not Yet Supported
|
|
-----------------
|
|
|
|
Some OpenWayback exclusion options are not yet supported in pywb.
|
|
The following is not yet supported in the access control system:
|
|
|
|
- Exclusions/Access Control By specific date range
|
|
- Regex based exclusions
|
|
- Date Range Embargo on All URLs
|
|
- Robots.txt-based exclusions
|
|
|