mirror of
https://github.com/webrecorder/pywb.git
synced 2025-03-15 00:03:28 +01:00
* metadata/coll_config: don't confuse user metadata with collection config, don't display collection config settings as metadata (ukwa/ukwa-pywb#47) - for collection template, add separate 'coll_config' dict, keep user metadata only in 'metadata' dict (default to empty) - for static collections, assume metadata is in the 'metadata' dict of collection config - for dynamic collections, load metadata.yaml into 'metadata' dict - ensure 'metadata' key is passed to frame_insert - ensure 'metadata' added consistently in framed and non-framed mode - tests: update tests to ensure metadata is added consistently - fuzzymatch: don't match 204 OPTIONS responses, update fuzzymatcher test * documentation - add documentation for metadata in ui-customization, rebuild docs, - add link to ui customization from configuring - work on access control docs * fixed small typo's in ui-customization.rst * frontendapp: fix doc string - misc: remove warning on urllib3 Retry init - set version to pywb 2.4.0rc0 Co-Authored-By: John Berlin <n0tan3rd@gmail.com>
149 lines
5.9 KiB
ReStructuredText
149 lines
5.9 KiB
ReStructuredText
.. _access-control:
|
|
|
|
Access Control System
|
|
---------------------
|
|
|
|
The access controls system allows for a flexible configuration of rules to allow,
|
|
block or exclude access to individual urls by longest-prefix match.
|
|
|
|
Access Control Files (.aclj)
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Access controls are set in one or more access control json files (.aclj), sorted in reverse alphabetical order.
|
|
To determine the best match, a binary search is used (similar to CDXJ) lookup and then the best match is found forward.
|
|
|
|
An .aclj file may look as follows::
|
|
|
|
org,httpbin)/anything/something - {"access": "allow", "url": "http://httpbin.org/anything/something"}
|
|
org,httpbin)/anything - {"access": "exclude", "url": "http://httpbin.org/anything"}
|
|
org,httpbin)/ - {"access": "block", "url": "httpbin.org/"}
|
|
com, - {"access": "allow", "url": "com,"}
|
|
|
|
|
|
Each JSON entry contains an ``access`` field and the original ``url`` field that was used to convert to the SURT (if any).
|
|
|
|
The prefix consists of a SURT key and a ``-`` (currently reserved for a timestamp/date range field to be added later)
|
|
|
|
Given these rules, a user would:
|
|
* be allowed to visit ``http://httpbin.org/anything/something`` (allow)
|
|
* but would receive an 'access blocked' error message when viewing ``http://httpbin.org/`` (block)
|
|
* would receive a 404 not found error when viewing ``http://httpbin.org/anything`` (exclude)
|
|
|
|
|
|
Access Types: allow, block, exclude
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The available access types are as follows:
|
|
|
|
- ``exclude`` - when matched, results are excluded from the index, as if they do not exist. User will receive a 404.
|
|
- ``block`` - when matched, results are not excluded from the index, marked with ``access: block``, but access to the actual is blocked. User will see a 451
|
|
- ``allow`` - full access to the index and the resource.
|
|
|
|
The difference between ``exclude`` and ``block`` is that when blocked, the user can be notified that access is blocked, while
|
|
with exclude, no trace of the resource is presented to the user.
|
|
|
|
The use of ``allow`` is useful to provide access to more specific resources within a broader block/exclude rule.
|
|
|
|
Access Error Messages
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The special error code 451 is used to indicate that a resource has been blocked (access setting ``block``)
|
|
|
|
The [error.html](https://github.com/webrecorder/pywb/blob/master/pywb/templates/error.html) template contains a special message for this access and can be customized further.
|
|
|
|
By design, resources that are ``exclude``-ed simply appear as 404 not found and no special error is provided.
|
|
|
|
|
|
Managing Access Lists via Command-Line
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The .aclj files need not ever be added or edited manually.
|
|
|
|
The pywb ``wb-manager`` utility has been extended to provide tools for adding, removing and checking access control rules.
|
|
|
|
The access rules are written to ``<collection>/acl/access-rules.acl`` for a given collection ``<collection>`` for automatic collections.
|
|
|
|
For example, to add the first line to an ACL file ``access.aclj``, one could run::
|
|
|
|
wb-manager acl add <collection> http://httpbin.org/anything/something exclude
|
|
|
|
|
|
The URL supplied can be a URL or a SURT prefix. If a SURT is supplied, it is used as is::
|
|
|
|
wb-manager acl add <collection> com, allow
|
|
|
|
|
|
By default, access control rules apply to a prefix of a given URL or SURT.
|
|
|
|
To have the rule apply only to the exact match, use::
|
|
|
|
wb-manager acl add <collection> http://httpbin.org/anything/something allow --exact-match
|
|
|
|
Rules added with and without the ``--exact-match`` flag are considered distinct rules, and can be added
|
|
and removed separately.
|
|
|
|
With the above rules, ``http://httpbin.org/anything/something`` would be allowed, but
|
|
``http://httpbin.org/anything/something/subpath`` would be excluded for any ``subpath``.
|
|
|
|
To remove a rule, one can run::
|
|
|
|
wb-manager acl remove <collection> http://httpbin.org/anything/something
|
|
|
|
To import rules in bulk, such as from an OpenWayback-style excludes.txt and mark them as ``exclude``::
|
|
|
|
wb-manager acl importtxt <collection> ./excludes.txt exclude
|
|
|
|
|
|
See ``wb-manager acl -h`` for a list of additional commands such as for validating rules files and running a match against
|
|
an existing rule set.
|
|
|
|
|
|
|
|
Access Controls for Custom Collections
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
For manually configured collections, there are additional options for configuring access controls.
|
|
The access control files can be specified explicitly using the ``acl_paths`` key and allow specifying multiple ACL files,
|
|
and allowing sharing access control files between different collections.
|
|
|
|
Single ACLJ::
|
|
|
|
collections:
|
|
test:
|
|
acl_paths: ./path/to/file.aclj
|
|
default_access: block
|
|
|
|
|
|
|
|
Multiple ACLJ::
|
|
|
|
collections:
|
|
test:
|
|
acl_paths:
|
|
- ./path/to/allows.aclj
|
|
- ./path/to/blocks.aclj
|
|
- ./path/to/other.aclj
|
|
- ./path/to/directory
|
|
|
|
default_access: block
|
|
|
|
The ``acl_paths`` can be a single entry or a list, and can also include directories. If a directory is specified, all ``.aclj`` files
|
|
in the directory are checked.
|
|
|
|
When finding the best rule from multiple ``.aclj`` files, each file is binary searched and the result
|
|
set merge-sorted to find the best match (very similar to the CDXJ index lookup).
|
|
|
|
Note: It might make sense to separate ``allows.aclj`` and ``blocks.aclj`` into individual files for organizational reasons,
|
|
but there is no specific need to keep more than one access control files.
|
|
|
|
Default Access
|
|
^^^^^^^^^^^^^^
|
|
|
|
An additional ``default_access`` setting can be added to specify the default rule if no other rules match for custom collections.
|
|
If omitted, this setting is ``default_access: allow``, which is usually the desired default.
|
|
|
|
Setting ``default_access: block`` and providing a list of ``allow`` rules provides a flexible way to allow access
|
|
to only a limited set of resources, and block access to anything out of scope by default.
|
|
|
|
|