* embargo: add support for per-collection date range embargo with embargo options of 'before', 'after', 'newer' and 'older'
'before' and 'after' accept a timestamp
'newer' and 'older' options configured with a dictionary consisting of any combo of 'years', 'months', 'days'
add basic test for each embargo option
* acl/embargo work:
- support acl access value 'allow_ignore_embargo' for overriding embargo
- support 'user' in acl setting, matched with value of 'X-Pywb-ACL-User' header
- support passing through 'X-Pywb-ACL-User' setting to warcserver
- aclmanager: support -u/--user param for adding, removing and matching rules
- tests: add test for 'allow_ignore_embargo', user-specific acl rule matching
* docs: add docs for new embargo system!
* docs: add info on how to configure ACL header with short examples to usage page.
sample-deploy: add examples of configuring X-pywb-ACL-user header based on IP for nginx and apache sample deployments
* docs: fix access control page header, text tweaks
* bump version to 2.6.0b0
* docs work on OpenWayback -> pywb transition, part 1
* docs: add config change examples, exclusions and deploy recommendations
* update with path index example
* update terms with collection info
* docs update:
- add zipnum examples to owb-to-pywb config transition
- add working docker compose examples for nginx subdirectory, apache subdirectory and outback cdx deployment in ./sample-deploy
- update usage and owb-to-pywb deployment docs with updated subdiretory deployment info + sample-deploy links
* tweak exclusion info, deploy title
* add missing filee uwsgi_subdir.ini
* Docs: fix typos and clarifications from review (thanks @ldko!)
Co-authored-by: Lauren Ko <lauren.ko@unt.edu>
* docs: explain that existing cdx can be added to outbackcdx, explain reindexing is optional
* docs: elaborate on docker-compose examples
* minor tweaks
* update to latest wombat 3.0.2
* update CHANGES.rst
* bump version to 2.5.0 for release
Co-authored-by: Lauren Ko <lauren.ko@unt.edu>
* ensure that the RemoteCDXIndexSource also adds a 'matchType=' param, fix for ukwa-pywb/ukwa#57
* 2.4.2 fixes:
- cdxindexer: don't treat first param as output, require '-o <output>' instead, update tests
- cleanup: move url-polyfill.min.js to correct static dir, addresses #571
- update to latest wombat
- move logo to ./pywb/static, fix README path
- tests: update indexing tests for cdx-indexer fix
- bump version to 2.4.2
- Fix link in access-control docs to use RST instead of MD syntax (#568) (by @machawk1)
* misc fixes for 2.4.0rc7:
- warcserver: when parsing headers to check for redirect, reserialized headers
may be of different length then original, causing warcserver->app response to hang
now adjusting the content-length on the warc record and also not including a fixed
length when serving warcserver->app, possible fix for ukwa/ukwa-pywb#53
- undo change in path resolvers to use os.path.join, just concatenate full_path + filename
- rewrite 'date' -> 'x-orig-archive-date' header to avoid confusion (eg. #548)
- bump version to rc7
* ci: attempt to fix travis build for 27, 35
* fixes for RC6:
- blockrecordloader: ensure record stream is closed after parsing one record
- wrap HttpLoader streams in StreamClosingReader() which should close the connection even if stream not fully consumed
- simplify no_except_close
may help with ukwa/ukwa-pywb#53
- iframe: add allow fullscreen, autoplay
- wombat: update to latest, filter out custom wombat props from getOwnPropertyNames
- rules: add rule for vimeo
* cdx formatting: fix output=text to return plain text / non-cdxj output
* auto fetch fix:
- update to latest wombat to fix auto-fetch in rewriting mode
- fix /proxy-fetch/ endpoint for proxy mode recording, switch proxy-fetch to run in recording mode
- don't use global to allow repeated checks
* rewriter html check: peek 1024 bytes to determine if page is html instead of 128
* fix jinja2 dependency for py2
* misc fixes (rc 5):
- banner: only auto init banner if not in top-frame (check for no-frame mode and replay url is set)
- index: 'cdx+' fix for use as internal index: if cdx has a warc filename and offset, don't attempt default live web load
- improved self-redirect: avoid www2 -> www redirect altogether, not just for second redirect
- tests: update tests for improved self-redirect checking
- bump version to pywb-2.4.0-rc5
* banner: fix banner display for non-framed and proxy mode replay, ensure new 'View All Captures' ancillary section is also shown
* bump version to 2.4.0rc4
- if preflight OPTIONS request, respond directly (don't attempt OPTIONS capture lookup)
- if preflight CORS request, ensure response has appropriate CORS headers, even if not captured
- wombat: update to latest wombat with updated Date() fixed timezone in proxy mode
- bump version to 2.4.0rc3
* metadata/coll_config: don't confuse user metadata with collection config, don't display collection config settings as metadata (ukwa/ukwa-pywb#47)
- for collection template, add separate 'coll_config' dict, keep user metadata only in 'metadata' dict (default to empty)
- for static collections, assume metadata is in the 'metadata' dict of collection config
- for dynamic collections, load metadata.yaml into 'metadata' dict
- ensure 'metadata' key is passed to frame_insert
- ensure 'metadata' added consistently in framed and non-framed mode
- tests: update tests to ensure metadata is added consistently
- fuzzymatch: don't match 204 OPTIONS responses, update fuzzymatcher test
* documentation
- add documentation for metadata in ui-customization, rebuild docs,
- add link to ui customization from configuring
- work on access control docs
* fixed small typo's in ui-customization.rst
* frontendapp: fix doc string
- misc: remove warning on urllib3 Retry init
- set version to pywb 2.4.0rc0
Co-Authored-By: John Berlin <n0tan3rd@gmail.com>
general auto-fetch improvements:
- Fixed issue that caused HTTP 404 errors to happen when parsing <link> stylesheet hrefs as sheets (webrecorder/wombat#11)
- Ensured that auto-fetch requests made are cached by the browser (webrecorder/wombat#13 & webrecorder/wombat#15)
- Ensured that the request made by the backing web worker when in proxy mode are not blocked by CORS (webrecorder/wombat#13 & webrecorder/wombat#15)
updated changelist and bumped version to 2.3.5
* auto-fetch page fetch support:
- check for X-Wombat-History-Page header to indicate page url
- set title from X-Wombat-History-Title header, and attempt to parse <title> from response
- update auto-fetch workers in wombat
- update changelist, bump to 2.3.4
- insert head-insert before first tag that is not <html> or <head> insert before
- addresses issue with rewriting pages that have no <head> tag (already handled in full rewriter)
- tests: add tests for HTMLInsertOnlyRewriter
- bump version to 2.3.3, update changelist
* misc fixes:
- ensure SCRIPT_NAME is never empty, fixes#466
- static: if ending in '/' look for '/index.html'
- tests: use local httpbin instead of iana.org tests
- docker: switch to $VOLUME_DIR before initing collection
- ensure static_prefix is set correctly after host prefix
- bump version to 2.3.2.dev0
* rules update: fix fuzzy matching, rewriting rules for soundcloud
versioning: switch back to semver for 2.3.0, manual version updates
- rename update-version.sh -> update-tag.sh to push tag for existing versions
- bump version to 2.3.0 for release
- Fix: a few broken tests due to iana.org requiring a user agent in its requests
rewrite:
- introduced a new JSWorkerRewriter class in order to support rewriting via wombat workers in the context of all supported worker variants via
- ensured rewriter app correctly sets the static prefix
wombat:
- add wombat as submodule!
* proxy: add option to set default timestamp for proxy mode, fixes#452
- set via flag --proxy-default-timestamp or config 'proxy_options.default_timestamp'
- can be iso date or all-digit timestamp
- overridable via accept-datetime header
* docs: update docs for proxy timestamp
- add docs on memento support in proxy mode
* update-version: script can update version only, commit with 'update-version.sh commit'
* indexer post append: remove 'WB_wombat_' from POST query, could have been added in previous versions of pywb!
* versioning: new MAJ.MIN.DATE versioning
move version to version.py for easier updates
add update-version.sh for autoupdating version in version.py, pushing new tag with current version