1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-28 08:32:29 +01:00

308 Commits

Author SHA1 Message Date
Ilya Kreymer
cd272013b8 client-side rewrite: fix override_func_this_proxy_to_obj() for unsupported/undefined objects (just ignore) 2017-09-14 21:46:48 -07:00
Ilya Kreymer
ba6d0245a5 client-side rewrite: add proxy->obj this for getComputedStyle() function 2017-09-14 21:05:39 -07:00
Ilya Kreymer
71a5853334 History Change Simplification (#240)
framed replay: history change simplifications
- simplify history changes for top frame, remove unused code
- only use 'replaceState' to replace top-frame url with current url, avoid adding new history entries
- use onpopstate to notify top frame, don't override go/back/forward
2017-09-13 13:19:41 -07:00
Ilya Kreymer
d1f8d8fdcb rewrite edge-case js proxy obj fixes:
server-side rewrite: rewrite '||this' but not '|||this'
client-side rewrite:
- check for null in rewrite_style()
- use proxy_to_obj() in postMessage(), open() rewrite overrides
2017-09-12 16:28:51 -07:00
Ilya Kreymer
48b0b329d7 header rewriter improvements:
- enumerate standard headers, prefix only known headers, keep others (like Date)
- don't rewrite custom headers by default
typo fixes: fix typo in wombat.js, fix special case rewrite_dash() for fb
2017-09-11 18:49:41 -07:00
Ilya Kreymer
5a0867fed9 LocalStorage/SessionStorage Overrides (#235)
* client-side rewrite: Custom LocalStorage/SessionStorage override:
- custom, in-mem only objects for localStorage and sessionStorage to avoid polluting browser storage, using Proxy if available to allow accessors
- storage event listeners tracked in addEventListener override, called directly with custom StorageEvent.
- storage event listener wrapped in SameOriginListener() to prevent notifying listeners from different origins

* addEventListener fix: prevents duplicate additions for wrapped listeners, for both message and storage
2017-09-06 23:14:48 -07:00
Ilya Kreymer
31dbbc4f05 client-side rewrite: add rewrite_script() to wrap generated script in proxy js obj wrapper, if Proxy exists 2017-09-06 22:58:25 -07:00
Ilya Kreymer
b22904e5f1 client-side rewrite fixes:
- don't rewrite already rewritten scheme-relative urls
- proxy obj wrapper: use getOwnPropertyDescriptor() from wrapped object, if exists, than from window
2017-09-06 22:29:18 -07:00
Ilya Kreymer
246940348f client-side rewrite: use element parser (instead of custom checks) to get absolute url for pushState/replaceState checks 2017-09-06 17:41:09 -07:00
Ilya Kreymer
fe55d7e895 client-side (wombat) fixes:
- anchor property override: don't set prop to "href"!
- frames override: catch exception (cross-origin access)
2017-09-02 12:53:52 -07:00
Ilya Kreymer
03b7cb4f28 client-side rewrite improvements:
- remove old createElement() override with non-standard param, which caused issues
- add HTMLFormElement.prototype.action override, now fully supported
2017-08-31 16:54:48 -07:00
Ilya Kreymer
9a47748296 Rewrite Fixes for JS Obj Proxy (#234)
js proxy obj server-side and client-side rewrite fixes:
server-side:
 - if rewriting '<newline>this', add ';' in case previous line has none
 - if peeking stream (to determine if html), ensure new wrapped content_stream used even if no rewriting
client-side (wombat js):
 - add object->proxy for EventTarget.target, proxy->object for Node.contains overrides
 - add missing return from overrides
 - override CSSStyleDeclaration.setProperty() to rewrite css property values which may be urls (getPropertyValue / property getters not unrewritten for now)
 - rewrite_style() convert with value.toString() if value is an object
2017-08-29 17:31:44 -07:00
Ilya Kreymer
da01d0b4e9 rewriting enhancements:
- server-side: if JS url contains 'callback=jQuery', treat as jsonp
- client-side: add full url if history change url starts with '#'
- client-side: override SVGImageElement setAttr / setAttrNS / getAttr / getAttrNS to rewrite setting "href" attribute (with or without namespace)
2017-08-25 16:53:52 -07:00
Ilya Kreymer
ae703e6677 cleanup: content rewriter: don't try to resolve text type if already 'html' and 'mp_'/default mod
client-side rewrite: when checking history change, allow for relative urls also (convert to absolute)
2017-08-24 16:25:28 -07:00
Ilya Kreymer
f14bb7b6bf Wombat Improvements (#232)
* client-side rewrite (wombat) fixes:
- ensure make_parser() calls createElement() on associated document if rewriting within an element
- ensure host-relative urls are rewritten as host-relative, eg.. a.href = "/path" stay host-relative when unrewritten

* head_insert: use request_ts instead of actual ts for client-side rewriting, consistent with server-side
2017-08-24 13:37:23 -07:00
Ilya Kreymer
b2f3a580c2 wombat work:
- for prototype override, ensure object exists
- for domain setter, ensure location exists, default to window
rules: expand facebook rule to match fbid also
2017-08-22 13:51:10 -07:00
Ilya Kreymer
7ddd3296ad client-side rewrite:
- override EventTarget.addEventListener/removeEventListener to ensure function called on actual object, not proxy
- add proxy_to_obj() to existing window.addEventListener/removeEventListener overrides
2017-08-22 12:22:02 -07:00
Ilya Kreymer
d0dafb268d client-side rewrite: add proxy-to-obj dereference for Document.createTreeWalker 2017-08-18 19:50:58 -07:00
Ilya Kreymer
bbe3cebd2f client side fixes for proxy obj:
- add general override_func_first_arg_proxy_to_obj() to dereference proxy->obj for first arg
- used for MutationObserver.observe() and Node.compareDocumentPosition() for now
2017-08-17 00:08:18 -07:00
Ilya Kreymer
9fdff8388e client-side override fix: first set window.devicePixelRatio to 1, also prevent from changing, if possible (catch exception) 2017-08-10 16:36:29 -07:00
Ilya Kreymer
ce3ba9e42e client-side rewrite: fix window.devicePixelRatio to 1 to ensure consistent replay (esp for video) 2017-08-10 16:13:17 -07:00
Ilya Kreymer
e9fa167564 wayback app: add support for root collection, specified as '$root' -- no other collections support if root colletion is set
tests: add test_root_coll.py (move from unused tests)
wombat.js: proxy: fix typo in location access
2017-08-07 22:19:10 -07:00
Ilya Kreymer
33ba67646b JS proxy fix (#229)
* proxy access fixes:
- catch proxy access (in case cross-domain, eg. from service worker)
- document.location access falls back to defaultView._WB_wombat_location if not available
- use obj_to_proxy(), proxy_to_obj() wrappers access, catch exceptions
2017-08-07 20:00:30 -07:00
Ilya Kreymer
6db2a1161d client-side rewrite: improve rewrite_html(), use wrap html fragments … (#227)
client-side rewrite: improve rewrite_html(), use wrap html fragments in <template> to avoid filtering out valid html, use existing system if full html starting with <html>/<body>/<head>. Addresses #138 in a better way
ensure WombatLocation.origin is always set using protocol/host, even if parser doesn't have it (ie and edge)
2017-08-07 16:46:27 -07:00
Ilya Kreymer
a6ab167dd3 JS Object Proxy Override System (#224)
* Init commit for Wombat JS Proxies off of https://github.com/ikreymer/pywb/tree/develop

Changes
- cli.py: add import os for os.chdir(self.r.directory)
- frontendapp.py: added initial support for cors requests.
- static_handler.py: add import for NotFoundException
- wbrequestresponse.py: added the intital implementation for cors requests, webrecoder needs this for recording!
- default_rewriter.py: added JSWombatProxyRewriter to default js rewriter class for internal testing
- html_rewriter.py: made JSWombatProxyRewriter to be default js rewriter class for internal testing
- regex_rewriters.py: implemented JSWombatProxyRewriter and JSWombatProxyRewriter to support wombat JS Proxy
- wombat.js: added JS Proxy support
- remove print

* wombat proxy: simplify mixin using 'first_buff'

* js local scope rewrite/proxy work:
- add DefaultHandlerWithJSProxy to enable new proxy rewrite (disabled by default)
- new proxy toggleable with 'js_local_scope_rewrite: true'
- work on integrating john's proxy work
- getAllOwnProps() to generate list of functions that need to be rebound
- remove non-proxy related changes for now, remove angular special cases (for now)

* local scope proxy work:
- add back __WB_pmw() prefix for postMessage
- don't override postMessage() in proxy obj
- MessageEvent resolve proxy to original window obj

* js obj proxy: use local_init() to load local vars from proxy obj

* wombat: js object proxy improvements:
- use same object '_WB_wombat_obj_proxy' on window and document objects
- reuse default_proxy_get() for get operation from window or document
- resolve and Window/Document object to the proxy, eg. if '_WB_wombat_obj_proxy' exists, return that
- override MessageEvent.source to return window proxy object

* obj proxy work:
- window proxy: defineProperty() override calls Reflect.defineProperty on dummy object as well as window to avoid exception
- window proxy: set() also sets on dummy object, and returns false if Reflect.set returns false (eg. altered by Reflect.defineProperty disabled writing)
- add override_prop_to_proxy() to add override to return proxy obj for attribute
- add override for Node.ownerDocument and HTMLElement.parentNode to return document proxy
server side rewrite: generalize local proxy insert, add list for local let overrides

* js obj proxy work:
- add default '__WB_pmw' to self if undefined (for service workers)
- document.origin override
- proxy obj: improved defineProperty override to work with safari
- proxy obj: catch any exception in dummy obj setter

* client-side rewriting:
- proxy obj: catch exception (such as cross-domain access) in own props init
- proxy obj: check for self reference '_WB_wombat_obj_proxy' access to avoid infinite recurse
- rewrite style: add 'cursor' attr for css url rewriting

* content rewriter: if is_ajax(), skip JS proxy obj rewriting also (html rewrite also skipped)

* client-side rewrite: rewrite 'data:text/css' as inline stylesheet when set via setAttribute() on 'href' in link

* client-side document override improvements:
- fix document.domain, document.referrer, forms add document.origin overrides to use only the document object
- init_doc_overrides() called as part of proxy init
- move non-document overrides to main init
rewrite: add rewrite for "Function('return this')" pattern to use proxy obj

* js obj proxy: now a per-collection (and even a per-request) setting 'use_js_obj_prox' (defaults to False)
live-rewrite-server: defaults to enabled js obj proxy
metadata: get_metadata() loads metadata.yaml for config settings for dynamic collections),
or collection config for static collections
warcserver: get_coll_config() returns config for static collection
tests: use custom test dir instead of default 'collections' dir
tests: add basic test for js obj proxy
update to warcio>=1.4.0

* karma tests: update to safari >10

* client-side rewrite:
- ensure wombat.js is ES5 compatible (don't use let)
- check if Proxy obj exists before attempting to init

* js proxy obj: RewriteWithProxyObj uses user-agent to determine if Proxy obj can be supported
content_rewriter: add overridable get_rewriter()
content_rewriter: fix elif -> if in should_rw_content()
tests: update js proxy obj test with different user agents (supported and unsupported)
karma: reset test to safari 9

* compatibility: remove shorthand notation from wombat.js

* js obj proxy: override MutationObserver.observe() to retrieve original object from proxy
wombat.js: cleanup, remove commented out code, label new proxy system functions, bump version to 2.40
2017-08-05 10:37:32 -07:00
Ilya Kreymer
d8b6ad3a31 client-side rewrite: rewrite_html() doesn't prefix/rewrite table tags (td/th/tr) for now, fixes issues caused by rewriting those tags 2017-07-24 21:50:43 +00:00
Ilya Kreymer
c88b843170 client rewrite: rewrite_html() ensure rewriting string! 2017-07-23 09:02:03 -07:00
Ilya Kreymer
9d86601aab client-side rewrite: for rewrite_html(), pre-rewrite problematic tags (FRAME/TD/TH/TR) that are filtered out if standalone, improves #138 2017-07-21 12:01:40 -07:00
Ilya Kreymer
64d05aca45 client-side (wombat): for now, fetch() always includes credentials (needed for WR, maybe should be optional?) 2017-07-21 11:49:28 -07:00
Ilya Kreymer
adab304f33 client-side rewrite: rewrite svg <image xlink:href> attr created via generated html 2017-07-11 18:24:35 -07:00
Ilya Kreymer
b3b843405a client-side (wombat) fix: postMessage() override was treating targetOrigin as hostname, instead of origin prefix.
Check if starts with targetOrigin starts with the WB_wombat_location.origin in target window, prints via console.warn() otherwise.
2017-07-09 15:46:23 -07:00
Ilya Kreymer
1d7e5a73e5 client-side rewrite (wombat) improvements:
- <base> override applies for both set/get
- remove <base>-specific override, using generic 'href' rewriting for <base>
- add <meta> element 'content' rewriting (if url)
- refactor: remove REWRITE_ATTRS/equals_any, add should_rewrite_attr()
- should_rewrite_attr(tagName, attr) to determines if attr should be rewritten for given tag
- bump version to 2.30
2017-07-08 12:44:22 -07:00
Ilya Kreymer
f0f274c0c9 wb_frame: allow "load" event to pushState() instead of replaceState() if window.pushStateOnLoad.
This is necessary to have working history when running in electron, which does not combine
iframe history into the top-frame history
2017-05-16 17:18:37 -07:00
Ilya Kreymer
d6cfb7cd2d wb_frame/wb.js: don't call push_state() if already on the current state,
eg. if two load events received for different readyState
add document.readyState to load event
2017-05-15 22:26:52 -07:00
Ilya Kreymer
296b4ed94d client-side rewrite: remove WB_wombat_ from any id/class= in document.write() 2017-05-03 15:31:06 -07:00
Ilya Kreymer
15a7b15d44 proxy mode support via rewriterapp!
- check for 'wsgiprox.fixed_host' and use that as host_prefix if set
- don't include Connection/Proxy-Connection headers in upstram request
- ensure proxy response has length or is chunk-encoded
2017-04-22 18:17:41 -07:00
Ilya Kreymer
4b055c9394 client-rewrite: support proper srcset= attr rewriting 2017-04-21 12:31:56 -07:00
Ilya Kreymer
3dd6c442ed client-side rewrite: unrewrite accessing Attr object value/nodeValue for href, src, poster attributes 2017-04-18 11:40:28 -07:00
Ilya Kreymer
8849eb494e client-side: init postMessage override on iframe access 2017-04-17 13:39:41 -07:00
Ilya Kreymer
0c833eb27e client-side rewrite fixes:
- rewrite-blob: more generic removal of postMessage override for worker scripts
- rewrite-style: wrap decodeURIComponent in exception handling
2017-04-15 23:37:07 -07:00
Ilya Kreymer
bae9a09671 client-side Date override: override 'constructor' property so 'new Date().constructor == Date' 2017-04-14 09:21:29 -07:00
Ilya Kreymer
a20480b9ab wombat rewrite: rewrite href="data:text/css" using rewrite_style()
rewrite_style fix: replace all 'WB_wombat_' in text not just first once
2017-03-21 11:17:15 -07:00
Ilya Kreymer
a82cfc1ab2 rewriter: add rewrite_dash for rewriting DASH and HLS manifests!
rewriter: refactor to use mixins to extend base rewriter (todo: more refactoring)
fuzzy-matcher: support for additional 'match_filters' to filter fuzzy results via optional regexes by mime type,
eg. allow more lenient fuzzy matching on DASH manifests than other resources (for now)
fuzzy-matching: add WebAgg-Fuzzy-Match response header if response is fuzzy matched, redirect to exact match in rewriterapp
2017-03-20 14:41:12 -07:00
Ilya Kreymer
1344907032 wombat fixes: message listener fixes for multiple listeners
- don't reject multiple listeners
- create new WrappedListener() obj for each listener
- extract_orig() add current scheme if url starts with '//'
2017-03-15 11:14:04 -07:00
Ilya Kreymer
93f26452e5 wombat fixes:
- add service worker rewrite
- add documentURI rewrite
- allow history change from "about:blank"
2017-03-14 18:28:18 -07:00
Ilya Kreymer
20e49c7391 karma fixes: avoid accessing undef var 2017-03-14 12:28:13 -07:00
Ilya Kreymer
e0878f0f67 wombat: reinit paths if inited via new window creation/iframe to reflect correct url!
refactor wombat into single _WBWombat object
2017-03-14 11:44:09 -07:00
Ilya Kreymer
57eba8fcde client side rewrite: add override for window.frames access 2017-03-12 09:47:29 -07:00
Ilya Kreymer
0784e4e5aa spin-off warcio!
update imports to point to warcio
warcio rename fixes:
- ArcWarcRecord.stream -> raw_stream
- ArcWarcRecord.status_headers -> http_headers
- ArchiveLoadFailed single param init
2017-03-07 10:58:00 -08:00
Ilya Kreymer
531422fc1b client-side rewrite improvements:
- add overrides for document.URL, xhr.responseURL, function for general single property override
- postMessage: add overrides for additional MessageEvent properties, target, srcElement, path, eventPhase
- postMessage: avoid duplicate event listeners registered
- check for duplicate postMessage override inits
2017-02-15 17:03:15 -08:00