1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

26 Commits

Author SHA1 Message Date
Ilya Kreymer
f07d35709a
Access Control Improvements: Embargo + ACL User Support (#642)
* embargo: add support for per-collection date range embargo with embargo options of 'before', 'after', 'newer' and 'older'
'before' and 'after' accept a timestamp
'newer' and 'older' options configured with a dictionary consisting of any combo of 'years', 'months', 'days'
add basic test for each embargo option

* acl/embargo work:
- support acl access value 'allow_ignore_embargo' for overriding embargo
- support 'user' in acl setting, matched with value of 'X-Pywb-ACL-User' header
- support passing through 'X-Pywb-ACL-User' setting to warcserver
- aclmanager: support -u/--user param for adding, removing and matching rules
- tests: add test for 'allow_ignore_embargo', user-specific acl rule matching

* docs: add docs for new embargo system!

* docs: add info on how to configure ACL header with short examples to usage page.
sample-deploy: add examples of configuring X-pywb-ACL-user header based on IP for nginx and apache sample deployments

* docs: fix access control page header, text tweaks

* bump version to 2.6.0b0
2021-05-18 20:09:18 -07:00
Ilya Kreymer
084be82550 bump version to 2.6.0.dev0 2021-04-26 20:04:26 -07:00
Ilya Kreymer
9e09bcd2a7
Docs Update: OpenWayback -> pywb Transition Guide (#588)
* docs work on OpenWayback -> pywb transition, part 1

* docs: add config change examples, exclusions and deploy recommendations

* update with path index example

* update terms with collection info

* docs update:
- add zipnum examples to owb-to-pywb config transition
- add working docker compose examples for nginx subdirectory, apache subdirectory and outback cdx deployment in ./sample-deploy
- update usage and owb-to-pywb deployment docs with updated subdiretory deployment info + sample-deploy links

* tweak exclusion info, deploy title

* add missing filee uwsgi_subdir.ini

* Docs: fix typos and clarifications from review (thanks @ldko!)

Co-authored-by: Lauren Ko <lauren.ko@unt.edu>

* docs: explain that existing cdx can be added to outbackcdx, explain reindexing is optional

* docs: elaborate on docker-compose examples

* minor tweaks

* update to latest wombat 3.0.2
* update CHANGES.rst

* bump version to 2.5.0 for release

Co-authored-by: Lauren Ko <lauren.ko@unt.edu>
2020-12-04 18:40:58 -08:00
Ilya Kreymer
9b8c187b3a
2.4.2 Develop->Master (#572)
* ensure that the RemoteCDXIndexSource also adds a 'matchType=' param, fix for ukwa-pywb/ukwa#57

* 2.4.2 fixes:
- cdxindexer: don't treat first param as output, require '-o <output>' instead, update tests
- cleanup: move url-polyfill.min.js to correct static dir, addresses #571
- update to latest wombat
- move logo to ./pywb/static, fix README path
- tests: update indexing tests for cdx-indexer fix
- bump version to 2.4.2
- Fix link in access-control docs to use RST instead of MD syntax (#568) (by @machawk1)
2020-07-10 20:22:58 -07:00
Ilya Kreymer
94b7fdcf97 minor fix: timegate check: allow timegate content check from #564 to be ignored if 'no_timegate_check' option is set (for use with derived classes)
bump version to 2.4.1
2020-06-08 17:12:18 -07:00
Ilya Kreymer
47e87ef387 CHANGES: bump version and update changelist for 2.4.0 2020-06-08 15:03:55 -07:00
Ilya Kreymer
7e56ca8ca2
RC7 Fixes (#561)
* misc fixes for 2.4.0rc7:
- warcserver: when parsing headers to check for redirect, reserialized headers
may be of different length then original, causing warcserver->app response to hang
now adjusting the content-length on the warc record and also not including a fixed
length when serving warcserver->app, possible fix for ukwa/ukwa-pywb#53
- undo change in path resolvers to use os.path.join, just concatenate full_path + filename
- rewrite 'date' -> 'x-orig-archive-date' header to avoid confusion (eg. #548)
- bump version to rc7

* ci: attempt to fix travis build for 27, 35
2020-04-30 22:39:47 -07:00
Ilya Kreymer
92e459bda5
R6 - Various Fixes (#540)
* fixes for RC6:
- blockrecordloader: ensure record stream is closed after parsing one record 
- wrap HttpLoader streams in StreamClosingReader() which should close the connection even if stream not fully consumed
- simplify no_except_close
may help with ukwa/ukwa-pywb#53
- iframe: add allow fullscreen, autoplay
- wombat: update to latest, filter out custom wombat props from getOwnPropertyNames
- rules: add rule for vimeo

* cdx formatting: fix output=text to return plain text / non-cdxj output

* auto fetch fix:
- update to latest wombat to fix auto-fetch in rewriting mode
- fix /proxy-fetch/ endpoint for proxy mode recording, switch proxy-fetch to run in recording mode
- don't use global to allow repeated checks

* rewriter html check: peek 1024 bytes to determine if page is html instead of 128

* fix jinja2 dependency for py2
2020-02-20 21:53:00 -08:00
Ilya Kreymer
fa021eebab
Misc Fixes for RC5 (#534)
* misc fixes (rc 5):
- banner: only auto init banner if not in top-frame (check for no-frame mode and replay url is set)
- index: 'cdx+' fix for use as internal index: if cdx has a warc filename and offset, don't attempt default live web load
- improved self-redirect: avoid www2 -> www redirect altogether, not just for second redirect
- tests: update tests for improved self-redirect checking
- bump version to pywb-2.4.0-rc5
2020-01-17 17:38:08 -08:00
Ilya Kreymer
93ce4f6f7a
Banner fix (#531)
* banner: fix banner display for non-framed and proxy mode replay, ensure new 'View All Captures' ancillary section is also shown

* bump version to 2.4.0rc4
2020-01-11 13:05:28 -08:00
Ilya Kreymer
30680803e8
proxy mode: replay improvements for content not captured via proxy mode (#520)
- if preflight OPTIONS request, respond directly (don't attempt OPTIONS capture lookup)
- if preflight CORS request, ensure response has appropriate CORS headers, even if not captured
- wombat: update to latest wombat with updated Date() fixed timezone in proxy mode
- bump version to 2.4.0rc3
2019-11-12 12:41:04 -08:00
Ilya Kreymer
6f79840b79
Docs, custom metadata improvements (#509)
* metadata/coll_config: don't confuse user metadata with collection config, don't display collection config settings as metadata (ukwa/ukwa-pywb#47)
- for collection template, add separate 'coll_config' dict, keep user metadata only in 'metadata' dict (default to empty)
- for static collections, assume metadata is in the 'metadata' dict of collection config
- for dynamic collections, load metadata.yaml into 'metadata' dict
- ensure 'metadata' key is passed to frame_insert
- ensure 'metadata' added consistently in framed and non-framed mode
- tests: update tests to ensure metadata is added consistently

- fuzzymatch: don't match 204 OPTIONS responses, update fuzzymatcher test

* documentation
- add documentation for metadata in ui-customization, rebuild docs, 
- add link to ui customization from configuring
- work on access control docs
* fixed small typo's in ui-customization.rst
* frontendapp: fix doc string

- misc: remove warning on urllib3 Retry init

- set version to pywb 2.4.0rc0

Co-Authored-By: John Berlin <n0tan3rd@gmail.com>
2019-10-27 01:39:52 +01:00
Ilya Kreymer
2f6fb74ea1 bump version to 2.4.0 2019-09-11 09:17:41 -07:00
John Berlin
295f67e675 auto-fetch/wombat: updated wombat submodule to current master for 2.3.5 release (#503)
general auto-fetch improvements: 
- Fixed issue that caused HTTP 404 errors to happen when parsing <link> stylesheet hrefs as sheets (webrecorder/wombat#11)
- Ensured that auto-fetch requests made are cached by the browser (webrecorder/wombat#13 & webrecorder/wombat#15)
- Ensured that the request made by the backing web worker when in proxy mode are not blocked by CORS (webrecorder/wombat#13 & webrecorder/wombat#15)

updated changelist and bumped version to 2.3.5
2019-08-28 11:35:18 -07:00
Ilya Kreymer
e79c657255
New Feature: support for autoFetch of urls deemed as pages by history api (pywb part) (#497)
* auto-fetch page fetch support:
- check for X-Wombat-History-Page header to indicate page url
- set title from X-Wombat-History-Title header, and attempt to parse <title> from response
- update auto-fetch workers in wombat
- update changelist, bump to 2.3.4
2019-08-12 13:34:33 -07:00
Ilya Kreymer
bf9284fec5
proxy mode HTMLInsertOnlyRewriter: (#496)
- insert head-insert before first tag that is not <html> or <head> insert before
- addresses issue with rewriting pages that have no <head> tag (already handled in full rewriter)
- tests: add tests for HTMLInsertOnlyRewriter
- bump version to 2.3.3, update changelist
2019-08-03 11:24:50 -07:00
Ilya Kreymer
42089e237b update CHANGELIST and version for 2.3.2 release 2019-08-01 16:23:31 -07:00
Ilya Kreymer
837894a07f
Misc fixes for 2.3.2 release (#490)
* misc fixes:
- ensure SCRIPT_NAME is never empty, fixes #466
- static: if ending in '/' look for '/index.html'
- tests: use local httpbin instead of iana.org tests
- docker: switch to $VOLUME_DIR before initing collection
- ensure static_prefix is set correctly after host prefix
- bump version to 2.3.2.dev0

* rules update: fix fuzzy matching, rewriting rules for soundcloud
2019-07-24 10:47:17 -07:00
Ilya Kreymer
d4518ae557 update to latest wombat 3.0.0, fix issue with parent override (webrecorder/wombat#3)
bump version to 2.3.1
2019-07-10 18:09:22 -07:00
Ilya Kreymer
a4027c7904
Switch back to Semver for 2.3.0 (#488)
versioning: switch back to semver for 2.3.0, manual version updates
- rename update-version.sh -> update-tag.sh to push tag for existing versions
- bump version to 2.3.0 for release
2019-07-09 19:29:52 -07:00
John Berlin
22b4297fc5 pywb:
- Fix: a few broken tests due to iana.org requiring a user agent in its requests
rewrite:
  - introduced a new JSWorkerRewriter class in order to support rewriting via wombat workers in the context of all supported worker variants via
  - ensured rewriter app correctly sets the static prefix
wombat:
 - add wombat as submodule!
2019-07-02 19:24:11 -07:00
Ilya Kreymer
77f8bb6476 CHANGES: update changelist
bump version to 2.2.20190410
2019-04-10 11:17:33 -07:00
Ilya Kreymer
9448f4fe45 release: update changelist for 2.2.20190311
docs: fix typos
2019-03-11 16:40:53 -07:00
Ilya Kreymer
455efb17ad
Support for default timestamp/date for proxy mode (#454)
* proxy: add option to set default timestamp for proxy mode, fixes #452
- set via flag --proxy-default-timestamp or config 'proxy_options.default_timestamp'
- can be iso date or all-digit timestamp
- overridable via accept-datetime header

* docs: update docs for proxy timestamp
- add docs on memento support in proxy mode

* update-version: script can update version only, commit with 'update-version.sh commit'

* indexer post append: remove 'WB_wombat_' from POST query, could have been added in previous versions of pywb!
2019-03-11 16:28:09 -07:00
Ilya Kreymer
21b5cf36b1 version: update to 2.2.20190227 2019-02-27 15:51:31 -08:00
Ilya Kreymer
0fb1fa68a8
Versioning: Add script to set up MAJ.MIN.DATE version (#445)
* versioning: new MAJ.MIN.DATE versioning
move version to version.py for easier updates
add update-version.sh for autoupdating version in version.py, pushing new tag with current version
2019-02-25 11:46:37 -08:00