* post append query: fix json parsing of lists to be identical to cdxj-indexer
if json parsing errors occur, log to stderr
fixes#709 in a better way
* update CHANGES.rst
* rewrite: add missing wordbreak to eval regex to avoid false positives, eg. '_eval' from being rewritten!
* dependencies: bump gevent to 21.12.0
* inputrequest: remove unnecessary print
* bump version to 2.6.7, update CHANGES for 2.6.7
* js rewriting: default to moden js-proxy based rewriting by default, use legacy rewriting only if browsers are older than minimum, as suggested in #707
* user-agent detection: use ua_parser for user-agent detection instead of obsolete werkzeug.useragent, which also did not support browsers >=100
* tests: additional tests for rewriting with various user-agents, defaulting to new-style rewriting for unknown browsers
* dockerfile: Update Dockerfile to use py3.8
* tests: skip s3 tests dependent on commoncrawl data (for now, need better s3 tests).
* bump to 2.6.6, update CHANGES
- when 'redirect_to_exact' is enabled, the top-frame expects a redirect for top-frame, however, live mode does not result in redirect to top-frame, so render live top-frame same as before
- tests: ensure top-frame loads correctly for live mode with redirect_to_exact enabled
- tests: fix webenact index tests
There are small typos in:
- pywb/utils/test/test_binsearch.py
- pywb/warcserver/resource/responseloader.py
- pywb/warcserver/resource/test/test_pathresolvers.py
Fixes:
- Should read `length` rather than `lenghth`.
- Should read `equals` rather than `eqauls`.
- Should read `assume` rather than `asume`.
The top two access types end in a ".". The final two do not. This simply adds periods to the third and fourth list items to make punctuation consistent among the bullets.
CHANGES: update changes for 2.6.3
location rewrite: pass 'arguments' to rewrite func to guard against rewriting local 'location' in some circumstances, partial fix for #684
ci: add automated docker push on new v-* tag
* template/custom env var fix:
- ensure pywb.host_prefix, pywb.app_prefix and pywb.static_prefix set for all requests via prepare_env()
- ensure X-Forwarded-Proto is accounted for in pywb.host_prefix
- call prepare_env() in handle_request(), and also in rewriterapp (in case using a different front-end app).
* update wombat to 3.3.6 (includes partial fix for #684)
* bump version to 2.6.3
* rules: add custom twitter video rewriting to capture non-chunked twitter video (max bitrate of 5000000)
* autoescaping regression fix: don't escape URL in frame_insert.html, use as is
* html rewriting:
- don't rewrite 'data-' attributes, no longer necessary for best fidelity
- do rewrite <link rel='alternate'> as main page (mp_)
- update html rewriting test
* feature: support customizing the static path used in pywb via 'static_prefix' config option (defaults to 'static')
* update to latest wombat (3.3.4)
* bump to 2.6.1, update CHANGES for 2.6.1
* eval fix: instead of rewriting to 'WB_wombat_eval', rewrite to 'self.eval' for non-top-level eval
the wombat object will handle rewriting the eval arg on 'self.eval'
tighten rewriting for top-level 'eval', add additional tests
part of fix for #663
* rewrite wrap: add extra {, } to avoid collisions, as suggested in webrecorder/wombat#72
eval rewrite: exclude ',eval' as more likely than not causing a false positive, as per #643
* update to latest wombat 3.3.0 with corresponding fixes
* add support for custom data being added via 'PUT /<coll>/record' when in recording mode and 'enable_put_custom_record: true' set in 'recorder' config
- url specified via 'url' query arg and content type via request Content-Type
- update docs for put custom record options
* bump version to 2.6.0b4
update CHANGES
comment out default locales in config.yaml
only show warning for installing i18n extra when locales actually specified in config
bump to 2.6.0b3
* more locale fixes:
- fix running wb-manager w/o i18n dependencies
- dependencies: move babel to extra_requires, show warning if locale used or 'wb-manager i18n' called and i18n are not installed
- not found page: don't language switch header banner on nested content frame
- replace erroneous/outdated `/coll-cdx` API endpoint
by default API endpoint `/<coll>/cdx`
- if clear from preceding context: reduce examples
to params only `?url=...¶m1=...`
* localization / doc fixes:
- add missing header.html
- docs: support 'i18n' extra, mention in docs
- use 'default_locale' for html lang tag
- access control docs: fix documentation for adding user with acl command
* localization: add compile_catalog after extract as well to simplify updates for identity (en) locale
* ui:
- include locale in home page collection listing
- keep locale on error page home link
* autoescape:
- ensure jinja2 templates are autoescaped to prevent xss issues (thanks @sebastian-nagel for suggested fix)
- ensure banner inserts are not double-escaped
- update tests for template autoescaping
* update CHANGES.rst
* bump version to 2.6.0b1
* add localization utilities:
- add locmanager to support extract, update, remove, list using pybabel
- add po2csv/csv2po conversion with translate-utils
- docs: add localization.rst to manual!
* add language switch header (via header.html) to all pages if more than one locale is present.
* localization: wrap more text strings in templates in existing templates
* docs:
- document `wb-manager i18n` commands
- mention `<html lang>` setting
- include csv example
- add info about adding localizable text in templates
* add localization to CHANGES
* embargo: add support for per-collection date range embargo with embargo options of 'before', 'after', 'newer' and 'older'
'before' and 'after' accept a timestamp
'newer' and 'older' options configured with a dictionary consisting of any combo of 'years', 'months', 'days'
add basic test for each embargo option
* acl/embargo work:
- support acl access value 'allow_ignore_embargo' for overriding embargo
- support 'user' in acl setting, matched with value of 'X-Pywb-ACL-User' header
- support passing through 'X-Pywb-ACL-User' setting to warcserver
- aclmanager: support -u/--user param for adding, removing and matching rules
- tests: add test for 'allow_ignore_embargo', user-specific acl rule matching
* docs: add docs for new embargo system!
* docs: add info on how to configure ACL header with short examples to usage page.
sample-deploy: add examples of configuring X-pywb-ACL-user header based on IP for nginx and apache sample deployments
* docs: fix access control page header, text tweaks
* bump version to 2.6.0b0
* post append improvements:
- parse json primitives for post query
- for text/plain, attempt to parse as json, then as binary
- standardize post append indexing
- include '__wb_method' in urlkey
- add 'requestBody' and 'method' to cdxj
- support unique dupe params for json-to-query conversion
* test fixes:
- update tests for test_inputreq,
- update post-test.cdxj and post-test.cdx
* ci: fixes
- tox: run full test suite!
- disable appveyor
* inputrequest buffering fix:
- never truncate reading POST request, must read entire POST data to avoid hung request in live mode
- truncate final query string to 4096
The field is unfortunately misnamed compressedendoffset in XML but OWB
actually uses this for the compressed length 'S' CDX field.
Without this field when WARC files are accessed over HTTP pywb will make
open byte range requests which results in a lot more data being read
from disk than necessary.