1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-14 15:53:28 +01:00

Allow ACLJs to use *, SURT wildcard to match all URLs (#882)

Also adds tests and documentation
This commit is contained in:
Tessa Walsh 2024-04-03 17:11:58 -04:00 committed by GitHub
parent d1e1636ae3
commit 86ee3bd752
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 26 additions and 0 deletions

View File

@ -105,6 +105,12 @@ Given these rules, a user would:
* but would receive an 'access blocked' error message when viewing ``http://httpbin.org/`` (block) * but would receive an 'access blocked' error message when viewing ``http://httpbin.org/`` (block)
* would receive a 404 not found error when viewing ``http://httpbin.org/anything`` (exclude) * would receive a 404 not found error when viewing ``http://httpbin.org/anything`` (exclude)
To match any possible URL in an .aclj file, set ``*,`` as the leading SURT, for example::
*, - {"access": "allow"}
Lines starting with ``*,`` should generally be at the end of the file, respecting the reverse alphabetical order.
Access Types: allow, block, exclude, allow_ignore_embargo Access Types: allow, block, exclude, allow_ignore_embargo
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -149,6 +155,10 @@ To make this work, pywb must be running behind an Apache or Nginx system that is
For example, this header may be set based on IP range, or based on password authentication. For example, this header may be set based on IP range, or based on password authentication.
To allow a user access to all URLs, overriding more specific rules and the ``default_access`` configuration setting, use the ``*,`` SURT::
*, - {"access": "allow", "user": "staff"}
Further examples of how to set this header will be provided in the deployments section. Further examples of how to set this header will be provided in the deployments section.
**Note: Do not use the user-based rules without configuring proper authentication on an Apache or Nginx frontend to set or remove this header, otherwise the 'X-Pywb-ACL-User' can easily be faked.** **Note: Do not use the user-based rules without configuring proper authentication on an Apache or Nginx frontend to set or remove this header, otherwise the 'X-Pywb-ACL-User' can easily be faked.**

View File

@ -260,6 +260,10 @@ class AccessChecker(object):
if key.startswith(acl_key): if key.startswith(acl_key):
acl_obj = CDXObject(acl) acl_obj = CDXObject(acl)
# Check for "*," in ACL, which matches any URL
if acl_key == b"*,":
acl_obj = CDXObject(acl)
if acl_obj: if acl_obj:
user = acl_obj.get('user') user = acl_obj.get('user')
if user == acl_user: if user == acl_user:

View File

@ -0,0 +1 @@
*, - {"access": "allow", "user": "staff"}

View File

@ -62,6 +62,13 @@ collections:
acl_paths: acl_paths:
- ./sample_archive/access/pywb.aclj - ./sample_archive/access/pywb.aclj
pywb-wildcard-surt:
index_paths: ./sample_archive/cdx/
archive_paths: ./sample_archive/warcs/
default_access: block
acl_paths:
- ./sample_archive/access/allow_all.aclj

View File

@ -96,5 +96,9 @@ class TestACLApp(BaseConfigTest):
assert '"http://httpbin.org/anything/resource.json"' in resp.text assert '"http://httpbin.org/anything/resource.json"' in resp.text
def test_allow_all_acl_user_specific(self):
resp = self.testapp.get('/pywb-wildcard-surt/mp_/http://example.com/', status=451)
assert 'Access Blocked' in resp.text
resp = self.testapp.get('/pywb-wildcard-surt/mp_/http://example.com/', headers={"X-Pywb-Acl-User": "staff"}, status=200)