1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00

2269 Commits

Author SHA1 Message Date
Jonas Linde
4ac580e401
Add missing translation for the filter-epression field placeholder (#721) 2022-08-08 13:18:44 -07:00
Ed Summers
8e06c2f351
Increase uwsgi_buffer_size for nginx config (#716)
I was playing back a YouTube video and noticed that the playback worked
fine with using uwsgi/pywb directly but failed when using nginx. I think
a very long HTTP Link header was causing nginx to hang up. I increased
the uwsgi_buffer_size to 8k and the problem went away. Maybe this will
save someone else some time if it is increased?

https://nginx.org/en/docs/http/ngx_http_uwsgi_module.html#uwsgi_buffer_size
2022-08-08 13:18:01 -07:00
Tessa Walsh
12a9e32129
Prevent jinja2 from autoescaping markup in metadata (#747)
Connected to https://github.com/webrecorder/pywb/issues/727
2022-08-02 18:41:08 -07:00
Yasar
32e9020fd2
html_rewriter: fixed attribute 'srcset' rewriting (#712)
Co-authored-by: Yasar Kunduz <yasar.kunduz@nationaalarchief.nl>
2022-07-31 17:31:04 -07:00
mark beasley
62633a48c4
Upgrade webassets to v2.0 (#730) 2022-06-29 18:02:59 -07:00
Ilya Kreymer
4f44c2ec98
Post query json parse fix (#711)
* post append query: fix json parsing of lists to be identical to cdxj-indexer
if json parsing errors occur, log to stderr
fixes #709 in a better way

* update CHANGES.rst
v-2.6.7
2022-04-14 21:30:52 -07:00
Ilya Kreymer
09f7084aa1
pywb 2.6.7 (#710)
* rewrite: add missing wordbreak to eval regex to avoid false positives, eg. '_eval' from being rewritten!

* dependencies: bump gevent to 21.12.0

* inputrequest: remove unnecessary print

* bump version to 2.6.7, update CHANGES for 2.6.7
2022-04-14 20:21:24 -07:00
Ilya Kreymer
403167fbe0
User-Agent Detection Fix + New-Style rewriting on by default + Dependency Update (2.6.6) (#708)
* js rewriting: default to moden js-proxy based rewriting by default, use legacy rewriting only if browsers are older than minimum, as suggested in #707 
* user-agent detection: use ua_parser for user-agent detection instead of obsolete werkzeug.useragent, which also did not support browsers >=100
* tests: additional tests for rewriting with various user-agents, defaulting to new-style rewriting for unknown browsers
* dockerfile: Update Dockerfile to use py3.8
* tests: skip s3 tests dependent on commoncrawl data (for now, need better s3 tests).
* bump to 2.6.6, update CHANGES
v-2.6.6
2022-04-11 14:51:11 -07:00
Rhenan
63ac82ee6f
Ping werkzeug version to 1.0.1. Fix #704 (#705) 2022-04-10 11:48:50 -07:00
Andy Jackson
0c3eb4ce94
Cope when SCRIPT_NAME is not defined (#701)
Making this one line consistent with the rest of the code.
2022-04-04 16:59:51 -07:00
Ilya Kreymer
42445562da
dependency fix (#697)
* add dependency bound (markupsafe<2.1.0)
* bump to 2.6.5
2022-02-20 16:36:28 -08:00
Ilya Kreymer
0f05dbde55 CHANGES: update changelist for 2.6.4 release v-2.6.4 2022-01-25 23:19:23 -08:00
Philip Clegg
825e4e54ab
rules: feat: remove fbclids (#691)
- fuzzy match 'fbclid=' query arguments (from facebook redirects)
2022-01-25 21:40:53 -08:00
Ilya Kreymer
38b1952d34
live route fix: (#692)
- when 'redirect_to_exact' is enabled, the top-frame expects a redirect for top-frame, however, live mode does not result in redirect to top-frame, so render live top-frame same as before
- tests: ensure top-frame loads correctly for live mode with redirect_to_exact enabled
- tests: fix webenact index tests
2022-01-25 19:10:28 -08:00
Tim Gates
c42833d4ad
docs: Fix a few typos (#669)
There are small typos in:
- pywb/utils/test/test_binsearch.py
- pywb/warcserver/resource/responseloader.py
- pywb/warcserver/resource/test/test_pathresolvers.py

Fixes:
- Should read `length` rather than `lenghth`.
- Should read `equals` rather than `eqauls`.
- Should read `assume` rather than `asume`.
2022-01-25 18:21:01 -08:00
Mat Kelly
ddcbde573c
Documentation: Add periods to end of list of access types in docs to make consistent (#670)
The top two access types end in a ".". The final two do not. This simply adds periods to the third and fourth list items to make punctuation consistent among the bullets.
2022-01-25 18:19:51 -08:00
Ilya Kreymer
6bde8fd8c4 wombat.js: rebuild wombat.js to 3.3.6 (was not properly rebuilt previously), alternative fix to #690
update CHANGES
bump to 2.6.4
2022-01-19 18:35:39 -08:00
Ilya Kreymer
7ff789f1a8 CHANGES: fix typos in changelist v-2.6.3 2021-12-22 17:42:19 -08:00
Ilya Kreymer
c0519a53c3 ci release: update description 2021-12-22 17:38:29 -08:00
Ilya Kreymer
de9b9310d4
Additional fixes for 2.6.3 (#689)
CHANGES: update changes for 2.6.3

location rewrite: pass 'arguments' to rewrite func to guard against rewriting local 'location' in some circumstances, partial fix for #684

ci: add automated docker push on new v-* tag
v-2.6.3b0
2021-12-22 17:26:45 -08:00
Ilya Kreymer
0c4e406876 quickfix: localization: ensure placeholder text also marked as localized, fixes #685 2021-12-22 16:51:02 -08:00
Ilya Kreymer
c97a66703b
More consistent env var setting / static path fix (#688)
* template/custom env var fix:
- ensure pywb.host_prefix, pywb.app_prefix and pywb.static_prefix set for all requests via prepare_env()
- ensure X-Forwarded-Proto is accounted for in pywb.host_prefix
- call prepare_env() in handle_request(), and also in rewriterapp (in case using a different front-end app).

* update wombat to 3.3.6 (includes partial fix for #684)
* bump version to 2.6.3
2021-12-22 16:15:27 -08:00
Lauren Ko
5c35a43dac
Modify examples in cdx-indexer help text to do as stated (#683) 2021-12-07 16:09:44 -08:00
Ilya Kreymer
e64e58f040
2.6.2 fix (#682)
2.6.2 release:
* fix for regression caused by 2.6.1, invalid static path #681
* add missing base.css
v-2.6.2
2021-11-12 17:51:34 -08:00
Ilya Kreymer
a6be76642a
2.6.1 Release Work (#679)
* rules: add custom twitter video rewriting to capture non-chunked twitter video (max bitrate of 5000000)

* autoescaping regression fix: don't escape URL in frame_insert.html, use as is

* html rewriting:
- don't rewrite 'data-' attributes, no longer necessary for best fidelity
- do rewrite <link rel='alternate'> as main page (mp_)
- update html rewriting test

* feature: support customizing the static path used in pywb via 'static_prefix' config option (defaults to 'static')

* update to latest wombat (3.3.4)

* bump to 2.6.1, update CHANGES for 2.6.1
v-2.6.1
2021-11-11 22:30:54 -08:00
Ilya Kreymer
96de80f83e update CHANGES for 2.6.0 release!
README: update for 2.6, add links to guides!
bump version to 2.6.0
v-2.6.0
2021-08-11 19:00:54 -07:00
Ilya Kreymer
b28c8f1748
Eval Rewriting + Scope Fix (#668)
* eval fix: instead of rewriting to 'WB_wombat_eval', rewrite to 'self.eval' for non-top-level eval
the wombat object will handle rewriting the eval arg on 'self.eval'
tighten rewriting for top-level 'eval', add additional tests
part of fix for #663

* rewrite wrap: add extra {, } to avoid collisions, as suggested in webrecorder/wombat#72
eval rewrite: exclude ',eval' as more likely than not causing a false positive, as per #643

* update to latest wombat 3.3.0 with corresponding fixes
2021-08-11 18:45:54 -07:00
Lauren Ko
b2a460c33c
docs: fix broken links (#666) 2021-08-10 08:17:40 -07:00
Ilya Kreymer
342007244b update CHANGES.rst for 2.6.0b4
update wombat to latest
2021-07-18 17:13:12 -07:00
Ilya Kreymer
98c6fba44d
Support for custom data being added via 'PUT /<coll>/record' when… (#661)
* add support for custom data being added via 'PUT /<coll>/record' when in recording mode and 'enable_put_custom_record: true' set in 'recorder' config
- url specified via 'url' query arg and content type via request Content-Type
- update docs for put custom record options

* bump version to 2.6.0b4
2021-07-18 17:04:34 -07:00
Ilya Kreymer
a0faf904ef
rules: add rules for disabling dash for instagram (#662) 2021-07-18 16:40:54 -07:00
Marius Elsfjordstrand Beck
3e5d97f70b
Properly encode load_url (#659) 2021-07-18 13:50:56 -07:00
Marius Elsfjordstrand Beck
843fe28ed8
Encode url search parameter when performing query (#657) 2021-07-06 21:07:07 -07:00
Simon Chan
096850b41d
fix errors in docs/manual/rewriter.rst (#655)
* fix format error in docs/manual/rewriter.rst

* fix incorrect names in docs/manual/rewriter.rst
2021-07-06 21:01:38 -07:00
Ilya Kreymer
81308780ec
version display: add -V/--version flag to wb-manager and wayback/pywb commands to display version and exit (#654)
update CHANGES
comment out default locales in config.yaml
only show warning for installing i18n extra when locales actually specified in config

bump to 2.6.0b3
2021-06-24 11:28:48 -07:00
Ilya Kreymer
cff2a9efc5
more locale fixes: (#653)
* more locale fixes:
- fix running wb-manager w/o i18n dependencies
- dependencies: move babel to extra_requires, show warning if locale used or 'wb-manager i18n' called and i18n are not installed
- not found page: don't language switch header banner on nested content frame
2021-06-18 14:58:21 -07:00
Ilya Kreymer
3ca765f847 add autoescapding disable to banner.html
update CHANGES
bump version to 2.6.0b2
2021-06-17 17:40:15 -07:00
Sebastian Nagel
f9f5d2dc33
Improve docs about CDXJ Server API endpoint (#651)
- replace erroneous/outdated `/coll-cdx` API endpoint
  by default API endpoint `/<coll>/cdx`
- if clear from preceding context: reduce examples
  to params only `?url=...&param1=...`
2021-06-15 18:12:48 -07:00
Ilya Kreymer
f7bd84cdac
Localization / doc fixes (#650)
* localization / doc fixes:
- add missing header.html
- docs: support 'i18n' extra, mention in docs
- use 'default_locale' for html lang tag
- access control docs: fix documentation for adding user with acl command

* localization: add compile_catalog after extract as well to simplify updates for identity (en) locale

* ui: 
- include locale in home page collection listing
- keep locale on error page home link

* autoescape:
- ensure jinja2 templates are autoescaped to prevent xss issues (thanks @sebastian-nagel for suggested fix)
- ensure banner inserts are not double-escaped
- update tests for template autoescaping

* update CHANGES.rst

* bump version to 2.6.0b1
v-2.6.0b1
2021-06-14 17:09:00 -07:00
Lauren Ko
9587954856
Fix typos in localization and access-control docs (#649)
* Fix typos in localization doc

* Fix typos in access-control doc
2021-06-11 22:50:35 -07:00
Ilya Kreymer
12fcc87962
Localization Support (#647)
* add localization utilities:
- add locmanager to support extract, update, remove, list using pybabel
- add po2csv/csv2po conversion with translate-utils
- docs: add localization.rst to manual!

* add language switch header (via header.html) to all pages if more than one locale is present.

* localization: wrap more text strings in templates in existing templates

* docs:
- document `wb-manager i18n` commands
- mention `<html lang>` setting
- include csv example
- add info about adding localizable text in templates

* add localization to CHANGES
v-2.6.0b0
2021-06-09 13:12:53 -07:00
Ilya Kreymer
0eedd1502f remove fakeredis from tests_require, fixes #644 2021-06-09 12:41:08 -07:00
Ilya Kreymer
d95b79a8ab CHANGES: update changelist for 2.6.0b0
bump version to 2.6.0b0
2021-06-09 12:20:47 -07:00
Ilya Kreymer
f07d35709a
Access Control Improvements: Embargo + ACL User Support (#642)
* embargo: add support for per-collection date range embargo with embargo options of 'before', 'after', 'newer' and 'older'
'before' and 'after' accept a timestamp
'newer' and 'older' options configured with a dictionary consisting of any combo of 'years', 'months', 'days'
add basic test for each embargo option

* acl/embargo work:
- support acl access value 'allow_ignore_embargo' for overriding embargo
- support 'user' in acl setting, matched with value of 'X-Pywb-ACL-User' header
- support passing through 'X-Pywb-ACL-User' setting to warcserver
- aclmanager: support -u/--user param for adding, removing and matching rules
- tests: add test for 'allow_ignore_embargo', user-specific acl rule matching

* docs: add docs for new embargo system!

* docs: add info on how to configure ACL header with short examples to usage page.
sample-deploy: add examples of configuring X-pywb-ACL-user header based on IP for nginx and apache sample deployments

* docs: fix access control page header, text tweaks

* bump version to 2.6.0b0
2021-05-18 20:09:18 -07:00
Ilya Kreymer
818b518765 update to latest wombat (3.1.6), includes more consist post-to-get handling on client-side to match server side handling
fuzzymatcher: ensure fuzzy match enabled for non-get requests
2021-05-17 23:12:55 -07:00
Alex Osborne
551b8fe026
xmlquery: remove space after the "limit:" query field name (#640)
OutbackCDX can't handle a space here as it decodes fields by splitting
on space.
2021-05-12 18:33:58 -07:00
Ilya Kreymer
abb76911f5
Recorder Pending count (#637)
* recorder: add pending counter (in redis) to when using redis based dedup system, supports webrecorder/browsertrix#44
2021-04-28 16:10:39 -07:00
Ilya Kreymer
626da99899
POST request handling and indexing improvements (#636)
* post append improvements:
- parse json primitives for post query
- for text/plain, attempt to parse as json, then as binary
- standardize post append indexing
- include '__wb_method' in urlkey
- add 'requestBody' and 'method' to cdxj
- support unique dupe params for json-to-query conversion

* test fixes:
- update tests for test_inputreq,
- update post-test.cdxj and post-test.cdx

* ci: fixes
- tox: run full test suite!
- disable appveyor

* inputrequest buffering fix:
- never truncate reading POST request, must read entire POST data to avoid hung request in live mode
- truncate final query string to 4096
2021-04-27 20:52:24 -07:00
Sebastian Nagel
106a9e9200
IndexHandler: report BadRequestException as error while loading index (#625) 2021-04-27 12:47:13 -07:00
Ilya Kreymer
5d34018b9f
yt rules: more general yt rules (#635) 2021-04-26 21:10:30 -07:00