1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00

34 Commits

Author SHA1 Message Date
Ed Summers
b4955cca66
Upgrade dependencies (#839)
- Update and pin dependencies to specific versions that support Python 3.7-3.11
- Replace deprecated werkzeug.pop_path_info with wsgiref.shift_path_info
- Use the latest httpbin from psf/httpbin
- Remove unused flask test dependency
- Drop Python 2 and Python <3.7 support
- Ensure greenlet 2 is used for now, as psf/httpbin doesn't yet work with greenlet 3

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-04-02 17:16:50 -04:00
Ilya Kreymer
1e9d8f44af
Title parse tweak (#498)
* proxy: update wombat history callback to fire immediately, update to latest wombat
* title parse: add html unescaping (use original unescaped method overridden in htmlrewriter)
tests: add tests for page fetch and title extraction
2019-08-13 16:12:37 -07:00
Ilya Kreymer
05cc593da6 tests: don't run video tests on ci due to rate limiting 2019-07-31 18:11:42 -07:00
Ilya Kreymer
ffca45c855
Support/Improvements to Domain Cookie Cache (#491)
* domain cookie fix:
- don't set cookies for service worker modifiers if response is not 200
- don't add existing cookies to Cookie or Set-Cookie headers
- add sw_/, wkrf_/ modifiers to generate paths
- enable domain cookie cacheing by default with fakeredis for live index and record mode, keyed by collection
- reqs: add fakeredis, tldextract, update warcio
- tests: add initial tests for domain cookie rewriting
2019-07-31 14:58:15 -07:00
Ilya Kreymer
e1e8917bc3
live rewriting/utf-8 headers: fix for sites that have utf-8 in headers despite standard (#402)
- attempt to encode headers as utf-8 first for live web, then latin-1 (similar to warcio http header parsing)
- only encode headers for py3 (in py2, headers are already bytestrings)
- tests: add tests for utf-8 in header
bump version to 2.1.1
2018-10-26 15:06:59 -07:00
Ilya Kreymer
671dd2c204
Rewriting fixes for http-only cookies, bad content-length, and document with base (#386)
* rewriting fixes:
server side: cookie rewriting: if httponly cookie with mp_/if_ modifier and path ends with '/', add set-cookie for all known modifiers
content length parsing: improve content-length parsing to support 'content-length: num,num', parse out the first number (occasionally seen with range requests when range is dropped for upstream)
wombat: rewrite_elem: use element.ownerDocument for resolving baseUri for parent paths
tests: add tests for cookie all modifier rewrite, bad content-length parsing (skip for py2.7)
2018-10-05 14:37:32 -07:00
Ilya Kreymer
5f3d37bb44
origin header improvement: if Referer header is available, compute Origin from the Referer, not from target url (#329)
(Origin header received will be the pywb host, using Referer will result in more accurate Origin, which may not be the target url)
tests: add tests to verify Origin header with and without Referer
2018-05-21 11:57:43 -07:00
Ilya Kreymer
bef63b4c6c
Local httpbin tests + LiveIndexSource improvement (#318)
tests and LiveIndexSource improvements:
- run local instance of httpbin in separate gevent server for any httpbin.org requests
- LiveIndexSource: has overridable get_load_url(), also use 'load_url' for HEAD check, remove unused proxy_url
- test update: add HttpBinLiveTests which patches LiveIndexSource.get_load_url() to redirect httpbin requests to local instance
- test update: just use httpbin.org/get instead of httpbin.org/anything, unsupported in older version (0.5.0) require for windows support
- setup: add 'httpbin==0.5.0' to test requires, remove jinja2 pin to old version
2018-04-28 18:20:37 -07:00
Ilya Kreymer
84723c9d7d tests: fix video tests not running, related to #270, fix typo importorskip('youtube-dl') -> importorskip('youtube_dl') 2018-02-27 17:49:36 -08:00
Ilya Kreymer
e2fa14bc2d tests: add 'importorskip' for tests that require 'extra' dependencies, (youtube-dl, socks), addresses #270
setup: remove 2.6 classifier, update repo path
bump to 2.0.1
2018-01-30 18:26:53 -08:00
Ilya Kreymer
a954a5470f HEAD requests: fix pywb recording & replay of HEAD requests (force payload of 0 instead of content-length if HEAD request from live web)
tests: fix socks-proxy test to fast-fail to a random unused port to detect proxy hook is enabled
2018-01-29 16:34:25 -08:00
Ilya Kreymer
af0f9c22cb server-side rewrite: fix '#' rewriting
- only encode from request, not in WbUrl in general
- tests: add live rewrite test to ensure encoded '#' is used
2017-10-24 12:52:15 -07:00
Ilya Kreymer
aa0a019567 Frame insert refactor (#246)
refactor frame/head insert templates:
ContentFrame:
- content iframe inited with new ContentFrame() which creates iframe
- wb_frame.js: contains ContentFrame system for initing, updating, closing content frame for replayed content.
- wb_frame.js: supports 'app_prefix' and 'content_prefix' or default 'prefix' for replay content
- window.location.hash passed added to init url.
- frame insert and head insert: simplify, remove 'wbrequest'
- frame insert: global wbinfo object no longer needed in top frame, each ContentFrame self-contained.
- wombat.js: next_parent() check does not assume wbinfo is present in top frame
- vidrw.js: only init if wbinfo is present

Banner:
- wb.js no longer needed, frame check/redirect folded into wombat.js
- default banner self-contained in default_banner.js/default_banner.css, handles both frame and frameless case
- rename wb.css -> default_banner.css
- banner html passed in as 'banner_html' variable to be optionally included, supports per collection banner html.
- templateview: BaseInsertView can accept an option 'banner view', used by HeadInsertView and TopFrameView

Tests:
- tests: test_auto_colls uses shared app to test dynamic changes, testing both frame and non-frame access, added per-collection banner html check.
2017-09-30 21:09:38 -07:00
Ilya Kreymer
94262546d5 integration tests: add fixture to run all relevant tests in framed and non-framed mode
rename test_framed_inverse -> test_memento, remove unneeded test config
2017-05-03 20:05:07 -07:00
Ilya Kreymer
f593b5f80f trailing slash fix: add trailing slash, preserving query, if no slash present after hostname (#211) 2017-04-04 18:10:49 -07:00
Ilya Kreymer
a4b770d34e new-pywb refactor!
frontendapp compatibility
- add support for separate not found page for 404s (not_found.html)
- support for exception handling with error template (error.html)
- support for home page (index.html)
- add memento headers for replay
- add referrer fallback check
- tests: port integration tests for front-end replay, cdx server
- not included: proxy mode, exact redirect mode, non-framed replay
- move unused tests to tests_disabled
- cli: add optional werkzeug profiler with --profile flag
2017-02-27 19:07:51 -08:00
Ilya Kreymer
66ca8d8b26 http block loader: raise exception for 4xx, 5xx responses
tests: add tests for limitreader posting, fix charset for frame test
2016-07-31 12:56:00 -04:00
Ilya Kreymer
3a584a1ec3 py3: all tests pass, at last!
but not yet py2... need to resolve encoding in rewriting issues
2016-02-23 13:26:53 -08:00
Ilya Kreymer
979fcaeda3 tests: fix mock YoutubeDLWrapper after refactor, #141 2015-10-23 12:19:15 -07:00
Ilya Kreymer
e249f300e3 tests refactor! init pywb once per module, instead of once per test
refactor common init pattern to server_mock for now (can add fixtures also)
2015-10-14 20:34:46 -07:00
Ilya Kreymer
080587516b youtube-dl tests: use mock youtube-dl info for tests 2015-06-27 20:46:55 -07:00
Ilya Kreymer
f0359877f0 youtube-dl: remove from dependency, installation is optional. Return 404 if attempting live
proxy of videos and youtube-dl is not available (the only use case).
HTTPParser wrapping logic no longer needed in latest versions
Modify tests to only run if youtube-dl is installed in cases where it is not available #118
2015-06-27 16:11:59 -07:00
Ilya Kreymer
06fcc89de6 readers: support 'content-encoding: deflate' using different zlib decompression options
support default and alt settings for attempting to decompress deflate stream
tests: add tests with httpbin.org/deflate Fixes #115
2015-06-24 13:11:33 -07:00
Ilya Kreymer
307809bbe9 live-rewrite-server: switch to 'inverse' frame mode by default,
switch from /rewrite/ to /live/ path, update tests
2015-04-13 13:00:06 -07:00
Ilya Kreymer
4a85869427 cli refactor: use classes in cli to allow custom options
get rid of custom init for live_rewrite_handler, just use create_wb_router()
with custom config for consistent init
2015-04-03 10:43:39 -07:00
Ilya Kreymer
a9892f531f proxy testing: refactored test server thread into ServerThreadRunner
class which runs a server in a seperate thread.. used by http/s proxies
as well, as mock live server proxy
add test for live rewrite with proxy, covering simple case as well as
video
2014-12-23 11:07:47 -08:00
Ilya Kreymer
1aac5a9f15 cache: move cache wrappers to seperate cache.py in framework from
proxy_resolvers
range cache: and buffering cache for serving range requests, intended
for videos but not only. full response cached in temp file and range
requests served from cache, still experimental
need to add deletion.
youtube_dl: wrap youtube-dl import due to youtube-dl HTMLParser regex
bug
tests: add test for vi_ handler
2014-11-01 15:41:01 -07:00
Ilya Kreymer
f14f37d5b1 tests: use httpbin for redirect tests 2014-10-29 09:47:32 -07:00
Ilya Kreymer
4a1cc46fa3 framed replay: invert framed replay paradigm, replay always uses
canonical, no-modifier archival url (instead of mp_).
When using frames, the page redirects to a 'tf_' page, which then uses
replaceHistory() to change url back to canonical form.
memento: support for framed replay, include memento headers in top frame
bump version to 0.6.2
2014-10-18 11:21:07 -07:00
Ilya Kreymer
e1e8f679b2 rewrite/testing: add additional test for live rewrite post, invalid post
htmlrewrite: annotate untestable sections (unimplemented, 2.6 only exceptions)
2014-08-04 21:59:46 -07:00
Ilya Kreymer
0b8a8f0ae2 live rewrite: catch errors from live rewrite and raise a new LiveResourceError with a 400 error code,
indicating bad request for live resource. Add test for invalid live rewrite requests
2014-07-21 22:43:34 -07:00
Ilya Kreymer
96fcaab521 live-rewrite-server: add ability to specify http/https proxy for live fetching
(for example, for use with a recording proxy)
2014-07-19 14:43:28 -07:00
Ilya Kreymer
85593696fa remove rfc3987 validation, was rejecting valid urls
add extract_referer_wburl_str() to extract WbUrl str, if any,
from the referrer. Use that for live_rewrite_handler to override
default referrer
2014-04-15 16:38:53 -07:00
Ilya Kreymer
bfc2e63793 live rewriter: integrate handler with rewrite_live.py module,
clean up css, add unit and integration tests
clean up cli server now known as 'live-rewrite-server', which performs live rewrite using
iframe paradigm
2014-04-09 15:49:55 -07:00