1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

2282 Commits

Author SHA1 Message Date
Ilya Kreymer
a192932858 slash redirects: if a capture ends with '/' (with or without a query), and requested url does not end in '/', (#346)
redirect to '/' version, fixes #344
2018-06-14 18:01:14 -04:00
John Berlin
9404f89e31 client-side rewrite: Add rewriting of SVG Filter attribute for http://fotopaulmartens.netcam.nl/vucht.php (#341) 2018-06-14 14:00:31 -04:00
John Berlin
bb5d46d19b Server-side rewriting of script[src='js/...'] and link rel='import' (#334)
* Updated html_rewriter.py to account for rewriting of script[src] values that are super relative (http://fotopaulmartens.netcam.nl/vucht.php) and added link rel='import' rewriting
Updated test_html_rewriter.py for super rel script[src] rewriting and link rel='import'
Updated wombat to account for the new rewriting of script[src]  (http://fotopaulmartens.netcam.nl/vucht.php)
Changed the postMessage override in wombat to use $wbwindow rather than window to fix google calendar replay / recording (http://qasrcc.org/events/calendar/)

* Updated tests for forcing absolute and fixed merge conflicts

* wombat: extracted removal and retrieval of __wb_original_src into own functions
2018-06-14 13:56:46 -04:00
Ilya Kreymer
ac5b4da9eb
Self-Redirect Fix (#345)
* self-redirect fix for multiple continuous 3xx responses: if after one self-redirect, next match is also a redirect where url canonicalizes to same as previously rejected, also treat as self-redirect
tests: add new test_self_redirect for generating example pattern where self-redirect could occur

* self-redirect: ensure warc records are closed when handling self-redirect exception!
2018-06-14 10:48:32 -04:00
Ilya Kreymer
a3476d8baa tests: also rewrite 'test.httpbin.org' to internal httpbin to allow subdomain testing 2018-06-08 16:20:43 -07:00
John Berlin
2825535ae2 Added FontFace to wombat overrides, https://drafts.csswg.org/css-font-loading/#FontFace-interface (#340) 2018-06-01 15:13:43 -07:00
Ilya Kreymer
1e9f457ef1 setup: bump min versions for wsgiprox, warcio
rewriterapp: add warc record param to _add_custom_params() to expose record to extensions
2018-05-31 17:29:37 -07:00
Ilya Kreymer
dc1982784e
ServiceWorker Rewrite Improvements (#339)
* service worker rewrite work:
- use sw_ modifier to add Server-Worker-Allowed: <domain root>
- force scope if none set to domain url
- resolve sw url to absolute url

* wombat: don't reinit wombat paths if already inited (eg. from imported documents)

* service-worker rewrite test: add test to verify sw rewrite is identity, Service-Worker-Allowed header is added
2018-05-31 08:57:51 -07:00
John Berlin
bd329aaa76 wombat postMessage improvements: (#338)
- renamed obj to this_obj to reflect that we using the deproxied this
- use this_obj rather than window in the first if block that populates
  the from variable in order to match the logic in pm_origin and
  because proxy_to_obj returns raw this if not proxy
2018-05-30 18:08:07 -07:00
Ilya Kreymer
bb1dbc0080
html unescape: ensure escaped urls are rewritten (py2 and 3) (#337) 2018-05-29 09:17:04 -07:00
Ilya Kreymer
a138fca5e3
jsonp rewriter: expand jsonp matching: (#336)
- treat as jsonp if url query contains 'callback=jsonp',
- fuzzy match query containing 'callback=jsonp'
- tests: add test for additional jsonp matching
2018-05-29 08:57:50 -07:00
Ilya Kreymer
efb7b2db90
rules: add rule for yt dash rewriting for json watch page, update tests (#335) 2018-05-29 08:47:53 -07:00
John Berlin
ba998d95a7 Wombat client-side rewriting improvements + server-side rel='preload' updates (#332)
Updated rewrite modifiers for server-side rewriting of `link rel='preload' as='x'`
Added client-side rewriting of `link rel='[preload|import]' as='x'`
Added helper method for determining the correct rewrite modifier to be used in client-side rewriting and updated duplicate modifier logic in wombat
Added Element.insertAdjacentElement override and added special case rewriting of nested elements in insertAdjacentElement and Node.[appendChild|replaceChild|insertBefore]
Add MouseEvent override to account for the view argument which is windowProxy
Fixed implicit variable declaration that resulted in global pollution and possible variable collisions in rewriting logic
Updated wb_unrewrite_rx to now consider protocol and host as optional to fix imgur
Nit document.[write|writeln] override: rather than using Array.apply then Array.join we now use just Array.join as it works on array like objects
2018-05-25 16:06:44 -07:00
Ilya Kreymer
bf3e76d2be rewriting fixes (to avoid client-side infinite loops!):
- server-side: rewrite '}(this)' or '})(this)' with js object proxy override convert
- client-side: fix typo in 'onstorage' override, fix typo that prevented SameOriginListener() from being used -- ensure
custom 'onstorage' events only sent to original window
2018-05-22 19:52:17 -07:00
humberthardy
dc883ec708 Handle amf requests (#321)
* Add representation for Amf requests to index them correctly

* rewind the stream in case of an error append during amf decoding. (pyamf seems to have a problem supporting multi-bytes utf8)

* fix python 2.7 retrocompatibility

* update inputrequest.py

* reorganize import and for appveyor to retest
2018-05-21 19:29:33 -07:00
Ilya Kreymer
f65ac7068f
postMessage edge cases fixes: safer postmessage: (#328)
- if targetOrigin is the replay host, default to unrewritten from origin, not '*'
- don't set targetOrigin to 'null' or empty to avoid errors
- if target window's unrewritten origin is actually 'null' or '', don't pass message at all, and don't set to '*' -- represents actual behavior,
as postMessage to 'null' origin (about:blank page) will be received only if targetOrigin is already '*'.
2018-05-21 13:13:36 -07:00
Ilya Kreymer
1faa75a126
mod fix for cookies: set wbinfo.mod to replay_mod (mp_ or '') to avoid cookie issues caused by content loaded with wrong modifier (eg. with yt comments) (#330) 2018-05-21 11:58:25 -07:00
Ilya Kreymer
5f3d37bb44
origin header improvement: if Referer header is available, compute Origin from the Referer, not from target url (#329)
(Origin header received will be the pywb host, using Referer will result in more accurate Origin, which may not be the target url)
tests: add tests to verify Origin header with and without Referer
2018-05-21 11:57:43 -07:00
Ilya Kreymer
a8bb3cfce6
default_banner fix: save last state for use with 'title' event changes -- use previous url, timestamp when changing title (#327) 2018-05-21 11:56:03 -07:00
John Berlin
18cc71af3b Fix wombats overrides of document.[write, writeln] to account for the variadic case (#325)
* tweaked wombats overrides of document.[write, writeln] to account for the variadic case (https://html.spec.whatwg.org/multipage/dom.html#the-document-object)
Fixes #324

* added handling arguments length is 0 per PR comment
2018-05-20 12:55:41 -07:00
Ilya Kreymer
9acad27801 indexing: py2 fix: if decoding error while writing utf-8 encoded url, try decoding as utf-8. avoids indexing error in py2 when if warc has non-ascii urls, fix for #312
test: add test for decoding utf-8 url
2018-04-28 23:31:42 -07:00
Ilya Kreymer
bef63b4c6c
Local httpbin tests + LiveIndexSource improvement (#318)
tests and LiveIndexSource improvements:
- run local instance of httpbin in separate gevent server for any httpbin.org requests
- LiveIndexSource: has overridable get_load_url(), also use 'load_url' for HEAD check, remove unused proxy_url
- test update: add HttpBinLiveTests which patches LiveIndexSource.get_load_url() to redirect httpbin requests to local instance
- test update: just use httpbin.org/get instead of httpbin.org/anything, unsupported in older version (0.5.0) require for windows support
- setup: add 'httpbin==0.5.0' to test requires, remove jinja2 pin to old version
2018-04-28 18:20:37 -07:00
Ilya Kreymer
de3ec0e1bc proxy: use FrontEndApp.proxy_route_request() to determine proxy route
Extensions can override this function to provide custom proxy routing
Update docs
2018-04-20 15:20:56 -07:00
Ilya Kreymer
5349d0518c
Proxy Options (#317)
* proxy mode options: #316
- add 'use_banner' option, if false, will disable standard banner.html from being added
- add 'use_head_insert' option, if false, will disable injecting head_insert.html in proxy mode
both options default to true

* docs: add docs for new proxy options

* also add 'override_route' option and docs for extending proxy routing
2018-04-20 10:04:34 -07:00
Ilya Kreymer
804734525c appveyor fix: use 'python -m pip' to upgrade pip (pypa/pip#5240) 2018-04-20 08:51:48 -07:00
Ilya Kreymer
b7bf693885
request-uri handling: use REQUEST_URI if available to maintain %-encoding when constructing WbUrl (#315)
geventserver: use custom handler to set raw 'REQUEST_URI' when running default gevent wsgi server. (uwsgi already sets REQUEST_URI)
testing: add REQUEST_URI check to proxy tests as real server is being used (webtest tests decodes %-encoding)
bump version to 2.0.4
2018-04-10 17:17:38 -07:00
Ilya Kreymer
33cca0bc02
Update CHANGES for 2.0.3 2018-04-03 19:10:08 -07:00
Ilya Kreymer
3101e567f3 config: add support for forcing a scheme for url rewriting, eg: 'force_scheme: https', fixes #314 2018-04-03 19:05:01 -07:00
Ilya Kreymer
4f58111875 update changelist for 2.0.3 2018-04-02 18:04:44 -07:00
Ilya Kreymer
c71611e6b7 cookie rewriter: don't rewrite cookies if not rewriting urls, eg. banner only or proxy mode
tests: update content rewriter tests to test for cookie rewriting
2018-04-02 17:58:23 -07:00
Ilya Kreymer
d732cdd01f aggregator timeout fixes (#310):
- fix memento aggregation if timeout is 0.0
- use default timeout (5.0), instead of default to 0.0 and failing
- add 'timeout' property to warcserver aggregation tests
- docs: mention property in warcserver docs also
2018-04-02 17:52:13 -07:00
Ilya Kreymer
8f981743ae docs: add sample nginx config to deployment section, mention how https is handled, fixes #314 2018-04-02 17:23:04 -07:00
Ilya Kreymer
f32eb608f1 tests: make redis test fix py27 compatible also 2018-03-29 22:12:07 -07:00
Ilya Kreymer
8d9951bc7b misc test fixes: make record_replay tests for consistent, use different url to ensure consistent ordering
fakeredistests: fix for fakenewredis, clear fakeredis databases and pubsub list
2018-03-29 21:43:37 -07:00
Ilya Kreymer
9da5bd1083
Decoding and Recorder Fixes (#313)
* redisindex: use decode_resposes=True for redisindex
* recorder: close_file(): return true if closed, close_key() return filename if closed
* logging: if debug=True, log warc load failures
* appveyor build fix: add pypiwin32 as dependency for windows build
2018-03-29 13:42:00 -07:00
humberthardy
a9cbdc1bd6 rewrite_amf.py: Fix bug introduced by recent refactoring (#308) 2018-03-05 13:20:37 -08:00
Ilya Kreymer
6d879cb8b8 docs: fix typos in memento docs (#307)
- URI-M instead of URL-M
- remove mention of vary: accept-datetime for URI-M
2018-03-05 13:12:12 -08:00
Ilya Kreymer
e812ed2d45 head request replay fix: treat head requests as traditionally GET requests w/o payload, instead of HEAD request replay, see #309, mentioned in #307 2018-03-05 13:10:53 -08:00
Ilya Kreymer
98cdf36626 bump version to 2.0.3 2018-03-05 13:10:53 -08:00
Ilya Kreymer
427dc3e00c update CHANGES for 2.0.2 2018-02-27 18:26:05 -08:00
Ilya Kreymer
e928f8a7e6 replay top-frame redirect: add fast redirect check to top-frame, instead of waiting for check in wombat.js, closes #305
tests: ensure redirect check only added in framed mode, ensure added for banner only mode, but not for proxy mode
2018-02-27 18:13:07 -08:00
Ilya Kreymer
84723c9d7d tests: fix video tests not running, related to #270, fix typo importorskip('youtube-dl') -> importorskip('youtube_dl') 2018-02-27 17:49:36 -08:00
Ilya Kreymer
61bf5e09ca
proxy-mode tweaks: (fixes #302): (#304)
- don't include wombat.js in banner only mode, including in proxy mode
  (instead, do set devicePixelRatio to fix certain fidelity issues)
- default_banner: set title to document.title on load when frameless, including in proxy mode
- improve docs for configuring proxy mode cert
- tests: update tests to ensure no wombat.js injected in proxy or banner-only mode
2018-02-27 15:52:19 -08:00
Ilya Kreymer
e2cbdbc27c cli: add -b/--bind cli option (defaulting to 0.0.0.0) to avoid default geventserver behavior, which doesn't bind to 0.0.0.0 and silently fails if other services are running on same port. related to issue mentioned in #298 2018-02-26 17:07:01 -08:00
Ilya Kreymer
0767bf80d5 client-side override improvements:
- override window.EventTarget.prototype.addEventListener instead of window.add/removeEventListener to work correctly with angular
- add 'document.title' override to detect title change event and propagate to top frame (history title often unused)
- add equivalent wrappers from addEventListener to window.onmessage and window.onstorage properties
2018-02-13 18:30:42 -08:00
Ilya Kreymer
7234bc51f0 client-side top-frame notification fix:
- send_top_message() to wrap all postMessage calls
- url change, hash change, history change notifications only sent if window is top replay frame, cookie notification sent for all windows
- don't send url change notification if 'about:blank' or 'javascript:' url
bump version to 2.0.2
2018-02-13 13:49:10 -08:00
Ilya Kreymer
fc48e23dae docs/README: fix typos, add changes for 2.0.1 2018-02-10 11:48:50 -08:00
Ilya Kreymer
448fb2cf1e client-side rewrite fixes:
- override Function.apply() to de-proxy thisArg and all params before calling native functions (may make per-function overrides unnecessary)
- ensure init_top_frame_notify() is called on $wbwindow object not window
2018-02-05 09:38:48 -08:00
Ilya Kreymer
728d9b3ca1 query ui: display capturse with second precision included 2018-02-05 09:36:42 -08:00
Ilya Kreymer
e2fa14bc2d tests: add 'importorskip' for tests that require 'extra' dependencies, (youtube-dl, socks), addresses #270
setup: remove 2.6 classifier, update repo path
bump to 2.0.1
2018-01-30 18:26:53 -08:00