1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00

2286 Commits

Author SHA1 Message Date
Ilya Kreymer
6e7a8b1e59 CHANGES: Update changelist for 2.6.8
README: Remove unused appveyor badge (fixes #757)
v-2.6.8
2022-08-31 19:01:35 -07:00
Ilya Kreymer
1fddec216d
Add ir_ modifier (#759)
* rewrite: add 'ir_' mod to support header only url-rewriting with no content rewriting
* tests: add tests for ir_ to test that content is identical to id_, but Location headers are rewritten with ir_ modifier.
2022-08-31 18:49:45 -07:00
Ilya Kreymer
8ef4ff102d
rewrite: tw: improve twitter rewrite to force mp4 for videos in embedded tweets (#761) 2022-08-31 18:48:11 -07:00
Ilya Kreymer
16135d956a
tests fix: add PYWB_NO_VERIFY_SSL env var for tests to avoid failing tests when connecting to external services (#760)
- if variable is set, RemoteIndexSource loading does not verify certs
2022-08-31 18:30:45 -07:00
Ilya Kreymer
1249b41dba
rewrite: detect edge-case where html starts with BOM characters followed followed <!DOCTYPE html> as html (#758)
tests: add test that now results in correct html rewriting
fixes #756
2022-08-31 16:51:41 -07:00
Ilya Kreymer
2ccd8eb2c3
tests run improvements: update from python setup.py test -> tox (#754)
* tests cleanup:
- move test requirements to test_requirements.txt to share between setup.py and tox.ini
- README: update to recommend using 'tox --current-env' for running tests locally
- replaces #741

* test tweaks:
- don't require i18n to import locmanager, instead set flag on load (to avoid breaking tox / pytest)
- don't add werkzeug to test requirements
2022-08-31 16:04:55 -07:00
Ilya Kreymer
f0340c6898 proxy: add COEP header for proxy mode to avoid errors 2022-08-20 22:59:08 -07:00
Ilya Kreymer
c121198183
revisit of redirect optimization: (#753)
- if a revisit is of a redirect (3xx response) and revisit has http headers, return
the http headers with empty payload -- don't bother loading the original record
builds on changes in #751
- cleanup redirect revisit tests from #751
2022-08-20 13:53:16 -07:00
Jonas Linde
0cc912da95
Enable translation for the remaining strings on the search results page (#752)
* Enable translation for the remaining strings on the search results page

* Use toLocaleString() to format timestamps also for search results without matchType
2022-08-18 23:27:22 -07:00
Ilya Kreymer
f190190128
Revisit headers load fix (#751)
* revisit loading fix for revisit records with http headers:
- if revisit record has http headers, always use those headers
- otherwise, continue to use http headers from payload record
- parse headers of http and payload records on initial lookup, to simplify loading
- tests: add test for loading revisit records with different urls, different headers but same payload
- fix for sul-dlss/was-pywb#64
* also bump version to 2.6.8
2022-08-18 23:25:38 -07:00
Laura Wrubel
49393ce16a
Improve replay banner's accessibility (#742)
* Puts banner in header and nav landmark regions
* Adds landmark role of banner to header
2022-08-09 15:25:38 -07:00
Ed Summers
a97ad7ebbe
Ensure CDX status is a string (#739)
If a CDXJ entry has a status that is an int that can cause problems in
multiple places in pywb. This change ensures that int status lines are
converted to str.
2022-08-09 15:04:42 -07:00
Ed Summers
4f1a6303fa
Format error messages (#737)
Currently error messages display on a single line that can be difficult
to scroll. This updates the CSS slightly to allow the message to spread
over multiple lines if needed.
2022-08-09 15:03:00 -07:00
Victor "Vito" Gama
7432299079
Add missing org/image to docker run commands (#733) 2022-08-09 13:53:02 -07:00
Sebastian Gassner
7b00d0627e
describing installation using pip (#726) 2022-08-09 13:51:49 -07:00
Sebastian Nagel
510c9dc9f1
S3 loader to use boto3 built-in credential configuration (#723)
* S3Loader: allow authenticated S3 access using boto3 built-in
configuration methods without explicitly passing credentials, cf.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials

* S3Loader tests: re-enable tests reading from s3://commoncrawl/
in order to test authenticated reads. Tests are skipped
if no AWS credentials are configured.
2022-08-08 17:25:16 -07:00
Jonas Linde
fbed87aa46
Activate field validation when expanding the advanced options (#722) 2022-08-08 15:45:04 -07:00
Jonas Linde
4ac580e401
Add missing translation for the filter-epression field placeholder (#721) 2022-08-08 13:18:44 -07:00
Ed Summers
8e06c2f351
Increase uwsgi_buffer_size for nginx config (#716)
I was playing back a YouTube video and noticed that the playback worked
fine with using uwsgi/pywb directly but failed when using nginx. I think
a very long HTTP Link header was causing nginx to hang up. I increased
the uwsgi_buffer_size to 8k and the problem went away. Maybe this will
save someone else some time if it is increased?

https://nginx.org/en/docs/http/ngx_http_uwsgi_module.html#uwsgi_buffer_size
2022-08-08 13:18:01 -07:00
Tessa Walsh
12a9e32129
Prevent jinja2 from autoescaping markup in metadata (#747)
Connected to https://github.com/webrecorder/pywb/issues/727
2022-08-02 18:41:08 -07:00
Yasar
32e9020fd2
html_rewriter: fixed attribute 'srcset' rewriting (#712)
Co-authored-by: Yasar Kunduz <yasar.kunduz@nationaalarchief.nl>
2022-07-31 17:31:04 -07:00
mark beasley
62633a48c4
Upgrade webassets to v2.0 (#730) 2022-06-29 18:02:59 -07:00
Ilya Kreymer
4f44c2ec98
Post query json parse fix (#711)
* post append query: fix json parsing of lists to be identical to cdxj-indexer
if json parsing errors occur, log to stderr
fixes #709 in a better way

* update CHANGES.rst
v-2.6.7
2022-04-14 21:30:52 -07:00
Ilya Kreymer
09f7084aa1
pywb 2.6.7 (#710)
* rewrite: add missing wordbreak to eval regex to avoid false positives, eg. '_eval' from being rewritten!

* dependencies: bump gevent to 21.12.0

* inputrequest: remove unnecessary print

* bump version to 2.6.7, update CHANGES for 2.6.7
2022-04-14 20:21:24 -07:00
Ilya Kreymer
403167fbe0
User-Agent Detection Fix + New-Style rewriting on by default + Dependency Update (2.6.6) (#708)
* js rewriting: default to moden js-proxy based rewriting by default, use legacy rewriting only if browsers are older than minimum, as suggested in #707 
* user-agent detection: use ua_parser for user-agent detection instead of obsolete werkzeug.useragent, which also did not support browsers >=100
* tests: additional tests for rewriting with various user-agents, defaulting to new-style rewriting for unknown browsers
* dockerfile: Update Dockerfile to use py3.8
* tests: skip s3 tests dependent on commoncrawl data (for now, need better s3 tests).
* bump to 2.6.6, update CHANGES
v-2.6.6
2022-04-11 14:51:11 -07:00
Rhenan
63ac82ee6f
Ping werkzeug version to 1.0.1. Fix #704 (#705) 2022-04-10 11:48:50 -07:00
Andy Jackson
0c3eb4ce94
Cope when SCRIPT_NAME is not defined (#701)
Making this one line consistent with the rest of the code.
2022-04-04 16:59:51 -07:00
Ilya Kreymer
42445562da
dependency fix (#697)
* add dependency bound (markupsafe<2.1.0)
* bump to 2.6.5
2022-02-20 16:36:28 -08:00
Ilya Kreymer
0f05dbde55 CHANGES: update changelist for 2.6.4 release v-2.6.4 2022-01-25 23:19:23 -08:00
Philip Clegg
825e4e54ab
rules: feat: remove fbclids (#691)
- fuzzy match 'fbclid=' query arguments (from facebook redirects)
2022-01-25 21:40:53 -08:00
Ilya Kreymer
38b1952d34
live route fix: (#692)
- when 'redirect_to_exact' is enabled, the top-frame expects a redirect for top-frame, however, live mode does not result in redirect to top-frame, so render live top-frame same as before
- tests: ensure top-frame loads correctly for live mode with redirect_to_exact enabled
- tests: fix webenact index tests
2022-01-25 19:10:28 -08:00
Tim Gates
c42833d4ad
docs: Fix a few typos (#669)
There are small typos in:
- pywb/utils/test/test_binsearch.py
- pywb/warcserver/resource/responseloader.py
- pywb/warcserver/resource/test/test_pathresolvers.py

Fixes:
- Should read `length` rather than `lenghth`.
- Should read `equals` rather than `eqauls`.
- Should read `assume` rather than `asume`.
2022-01-25 18:21:01 -08:00
Mat Kelly
ddcbde573c
Documentation: Add periods to end of list of access types in docs to make consistent (#670)
The top two access types end in a ".". The final two do not. This simply adds periods to the third and fourth list items to make punctuation consistent among the bullets.
2022-01-25 18:19:51 -08:00
Ilya Kreymer
6bde8fd8c4 wombat.js: rebuild wombat.js to 3.3.6 (was not properly rebuilt previously), alternative fix to #690
update CHANGES
bump to 2.6.4
2022-01-19 18:35:39 -08:00
Ilya Kreymer
7ff789f1a8 CHANGES: fix typos in changelist v-2.6.3 2021-12-22 17:42:19 -08:00
Ilya Kreymer
c0519a53c3 ci release: update description 2021-12-22 17:38:29 -08:00
Ilya Kreymer
de9b9310d4
Additional fixes for 2.6.3 (#689)
CHANGES: update changes for 2.6.3

location rewrite: pass 'arguments' to rewrite func to guard against rewriting local 'location' in some circumstances, partial fix for #684

ci: add automated docker push on new v-* tag
v-2.6.3b0
2021-12-22 17:26:45 -08:00
Ilya Kreymer
0c4e406876 quickfix: localization: ensure placeholder text also marked as localized, fixes #685 2021-12-22 16:51:02 -08:00
Ilya Kreymer
c97a66703b
More consistent env var setting / static path fix (#688)
* template/custom env var fix:
- ensure pywb.host_prefix, pywb.app_prefix and pywb.static_prefix set for all requests via prepare_env()
- ensure X-Forwarded-Proto is accounted for in pywb.host_prefix
- call prepare_env() in handle_request(), and also in rewriterapp (in case using a different front-end app).

* update wombat to 3.3.6 (includes partial fix for #684)
* bump version to 2.6.3
2021-12-22 16:15:27 -08:00
Lauren Ko
5c35a43dac
Modify examples in cdx-indexer help text to do as stated (#683) 2021-12-07 16:09:44 -08:00
Ilya Kreymer
e64e58f040
2.6.2 fix (#682)
2.6.2 release:
* fix for regression caused by 2.6.1, invalid static path #681
* add missing base.css
v-2.6.2
2021-11-12 17:51:34 -08:00
Ilya Kreymer
a6be76642a
2.6.1 Release Work (#679)
* rules: add custom twitter video rewriting to capture non-chunked twitter video (max bitrate of 5000000)

* autoescaping regression fix: don't escape URL in frame_insert.html, use as is

* html rewriting:
- don't rewrite 'data-' attributes, no longer necessary for best fidelity
- do rewrite <link rel='alternate'> as main page (mp_)
- update html rewriting test

* feature: support customizing the static path used in pywb via 'static_prefix' config option (defaults to 'static')

* update to latest wombat (3.3.4)

* bump to 2.6.1, update CHANGES for 2.6.1
v-2.6.1
2021-11-11 22:30:54 -08:00
Ilya Kreymer
96de80f83e update CHANGES for 2.6.0 release!
README: update for 2.6, add links to guides!
bump version to 2.6.0
v-2.6.0
2021-08-11 19:00:54 -07:00
Ilya Kreymer
b28c8f1748
Eval Rewriting + Scope Fix (#668)
* eval fix: instead of rewriting to 'WB_wombat_eval', rewrite to 'self.eval' for non-top-level eval
the wombat object will handle rewriting the eval arg on 'self.eval'
tighten rewriting for top-level 'eval', add additional tests
part of fix for #663

* rewrite wrap: add extra {, } to avoid collisions, as suggested in webrecorder/wombat#72
eval rewrite: exclude ',eval' as more likely than not causing a false positive, as per #643

* update to latest wombat 3.3.0 with corresponding fixes
2021-08-11 18:45:54 -07:00
Lauren Ko
b2a460c33c
docs: fix broken links (#666) 2021-08-10 08:17:40 -07:00
Ilya Kreymer
342007244b update CHANGES.rst for 2.6.0b4
update wombat to latest
2021-07-18 17:13:12 -07:00
Ilya Kreymer
98c6fba44d
Support for custom data being added via 'PUT /<coll>/record' when… (#661)
* add support for custom data being added via 'PUT /<coll>/record' when in recording mode and 'enable_put_custom_record: true' set in 'recorder' config
- url specified via 'url' query arg and content type via request Content-Type
- update docs for put custom record options

* bump version to 2.6.0b4
2021-07-18 17:04:34 -07:00
Ilya Kreymer
a0faf904ef
rules: add rules for disabling dash for instagram (#662) 2021-07-18 16:40:54 -07:00
Marius Elsfjordstrand Beck
3e5d97f70b
Properly encode load_url (#659) 2021-07-18 13:50:56 -07:00
Marius Elsfjordstrand Beck
843fe28ed8
Encode url search parameter when performing query (#657) 2021-07-06 21:07:07 -07:00