* rewrite improvement: better srcset parsing for comma-separated urls
* extensive server-side tests for srcset rewriting (with and without spaces and extra srcset modifiers)
* compile regex once for improved performance
* same regex for server and client side rewriting
Work by @rebeccacremona
rules system:
- 'mixin' class for adding custom rewrite mixin, initialized with optional 'mixin_params'
- 'force_type' to always force rewriting text type for rule match (eg. if application/octet-stream)
- fuzzy rewrite: 'find_all' mode for matching via regex.findall() instead of search()
- load_function moved to generic load_py_name
- new rules for fb!
- JSReplaceFuzzy mixin to replace content based on query (or POST) regex match
- tests: tests JSReplaceFuzzy rewriting
query:
- append '?' for fuzzy matching if filters are set
- cdx['is_fuzzy'] set to '1' instead of True
client-side: rewrite
- add window.Request object rewrite
- improved rewrite of wb server + path, avoid double-slash
- fetch() rewrite proxy_to_obj()
- proxy_to_obj() null check
- WombatLocation prop change, skip if prop is the same
- add 'ww_rw' for injecting into webworkers via importScript() added when loading web workers as blobs
- 'WB_wombat_location' override checks for defaultView more consistently if _WB_wombat_location is null/undefined
- custom overrides __WB_pmw, WB_wombat_frameElement just fail silently instead of raising exception on assignment
- client-side script: only rewrite if overridden objects are found in script text
- server-side inline js rewrite: only rewrite if overriden objects are found, don't insert before 'javascript:' marker
- tests: add improved tests for html js in attribute rewriting
refactor frame/head insert templates:
ContentFrame:
- content iframe inited with new ContentFrame() which creates iframe
- wb_frame.js: contains ContentFrame system for initing, updating, closing content frame for replayed content.
- wb_frame.js: supports 'app_prefix' and 'content_prefix' or default 'prefix' for replay content
- window.location.hash passed added to init url.
- frame insert and head insert: simplify, remove 'wbrequest'
- frame insert: global wbinfo object no longer needed in top frame, each ContentFrame self-contained.
- wombat.js: next_parent() check does not assume wbinfo is present in top frame
- vidrw.js: only init if wbinfo is present
Banner:
- wb.js no longer needed, frame check/redirect folded into wombat.js
- default banner self-contained in default_banner.js/default_banner.css, handles both frame and frameless case
- rename wb.css -> default_banner.css
- banner html passed in as 'banner_html' variable to be optionally included, supports per collection banner html.
- templateview: BaseInsertView can accept an option 'banner view', used by HeadInsertView and TopFrameView
Tests:
- tests: test_auto_colls uses shared app to test dynamic changes, testing both frame and non-frame access, added per-collection banner html check.
framed replay: history change simplifications
- simplify history changes for top frame, remove unused code
- only use 'replaceState' to replace top-frame url with current url, avoid adding new history entries
- use onpopstate to notify top frame, don't override go/back/forward
server-side rewrite: rewrite '||this' but not '|||this'
client-side rewrite:
- check for null in rewrite_style()
- use proxy_to_obj() in postMessage(), open() rewrite overrides
- enumerate standard headers, prefix only known headers, keep others (like Date)
- don't rewrite custom headers by default
typo fixes: fix typo in wombat.js, fix special case rewrite_dash() for fb
* client-side rewrite: Custom LocalStorage/SessionStorage override:
- custom, in-mem only objects for localStorage and sessionStorage to avoid polluting browser storage, using Proxy if available to allow accessors
- storage event listeners tracked in addEventListener override, called directly with custom StorageEvent.
- storage event listener wrapped in SameOriginListener() to prevent notifying listeners from different origins
* addEventListener fix: prevents duplicate additions for wrapped listeners, for both message and storage
- don't rewrite already rewritten scheme-relative urls
- proxy obj wrapper: use getOwnPropertyDescriptor() from wrapped object, if exists, than from window
- remove old createElement() override with non-standard param, which caused issues
- add HTMLFormElement.prototype.action override, now fully supported
js proxy obj server-side and client-side rewrite fixes:
server-side:
- if rewriting '<newline>this', add ';' in case previous line has none
- if peeking stream (to determine if html), ensure new wrapped content_stream used even if no rewriting
client-side (wombat js):
- add object->proxy for EventTarget.target, proxy->object for Node.contains overrides
- add missing return from overrides
- override CSSStyleDeclaration.setProperty() to rewrite css property values which may be urls (getPropertyValue / property getters not unrewritten for now)
- rewrite_style() convert with value.toString() if value is an object
- server-side: if JS url contains 'callback=jQuery', treat as jsonp
- client-side: add full url if history change url starts with '#'
- client-side: override SVGImageElement setAttr / setAttrNS / getAttr / getAttrNS to rewrite setting "href" attribute (with or without namespace)
* client-side rewrite (wombat) fixes:
- ensure make_parser() calls createElement() on associated document if rewriting within an element
- ensure host-relative urls are rewritten as host-relative, eg.. a.href = "/path" stay host-relative when unrewritten
* head_insert: use request_ts instead of actual ts for client-side rewriting, consistent with server-side
- for prototype override, ensure object exists
- for domain setter, ensure location exists, default to window
rules: expand facebook rule to match fbid also
- override EventTarget.addEventListener/removeEventListener to ensure function called on actual object, not proxy
- add proxy_to_obj() to existing window.addEventListener/removeEventListener overrides
- add general override_func_first_arg_proxy_to_obj() to dereference proxy->obj for first arg
- used for MutationObserver.observe() and Node.compareDocumentPosition() for now
* proxy access fixes:
- catch proxy access (in case cross-domain, eg. from service worker)
- document.location access falls back to defaultView._WB_wombat_location if not available
- use obj_to_proxy(), proxy_to_obj() wrappers access, catch exceptions
client-side rewrite: improve rewrite_html(), use wrap html fragments in <template> to avoid filtering out valid html, use existing system if full html starting with <html>/<body>/<head>. Addresses #138 in a better way
ensure WombatLocation.origin is always set using protocol/host, even if parser doesn't have it (ie and edge)
* Init commit for Wombat JS Proxies off of https://github.com/ikreymer/pywb/tree/develop
Changes
- cli.py: add import os for os.chdir(self.r.directory)
- frontendapp.py: added initial support for cors requests.
- static_handler.py: add import for NotFoundException
- wbrequestresponse.py: added the intital implementation for cors requests, webrecoder needs this for recording!
- default_rewriter.py: added JSWombatProxyRewriter to default js rewriter class for internal testing
- html_rewriter.py: made JSWombatProxyRewriter to be default js rewriter class for internal testing
- regex_rewriters.py: implemented JSWombatProxyRewriter and JSWombatProxyRewriter to support wombat JS Proxy
- wombat.js: added JS Proxy support
- remove print
* wombat proxy: simplify mixin using 'first_buff'
* js local scope rewrite/proxy work:
- add DefaultHandlerWithJSProxy to enable new proxy rewrite (disabled by default)
- new proxy toggleable with 'js_local_scope_rewrite: true'
- work on integrating john's proxy work
- getAllOwnProps() to generate list of functions that need to be rebound
- remove non-proxy related changes for now, remove angular special cases (for now)
* local scope proxy work:
- add back __WB_pmw() prefix for postMessage
- don't override postMessage() in proxy obj
- MessageEvent resolve proxy to original window obj
* js obj proxy: use local_init() to load local vars from proxy obj
* wombat: js object proxy improvements:
- use same object '_WB_wombat_obj_proxy' on window and document objects
- reuse default_proxy_get() for get operation from window or document
- resolve and Window/Document object to the proxy, eg. if '_WB_wombat_obj_proxy' exists, return that
- override MessageEvent.source to return window proxy object
* obj proxy work:
- window proxy: defineProperty() override calls Reflect.defineProperty on dummy object as well as window to avoid exception
- window proxy: set() also sets on dummy object, and returns false if Reflect.set returns false (eg. altered by Reflect.defineProperty disabled writing)
- add override_prop_to_proxy() to add override to return proxy obj for attribute
- add override for Node.ownerDocument and HTMLElement.parentNode to return document proxy
server side rewrite: generalize local proxy insert, add list for local let overrides
* js obj proxy work:
- add default '__WB_pmw' to self if undefined (for service workers)
- document.origin override
- proxy obj: improved defineProperty override to work with safari
- proxy obj: catch any exception in dummy obj setter
* client-side rewriting:
- proxy obj: catch exception (such as cross-domain access) in own props init
- proxy obj: check for self reference '_WB_wombat_obj_proxy' access to avoid infinite recurse
- rewrite style: add 'cursor' attr for css url rewriting
* content rewriter: if is_ajax(), skip JS proxy obj rewriting also (html rewrite also skipped)
* client-side rewrite: rewrite 'data:text/css' as inline stylesheet when set via setAttribute() on 'href' in link
* client-side document override improvements:
- fix document.domain, document.referrer, forms add document.origin overrides to use only the document object
- init_doc_overrides() called as part of proxy init
- move non-document overrides to main init
rewrite: add rewrite for "Function('return this')" pattern to use proxy obj
* js obj proxy: now a per-collection (and even a per-request) setting 'use_js_obj_prox' (defaults to False)
live-rewrite-server: defaults to enabled js obj proxy
metadata: get_metadata() loads metadata.yaml for config settings for dynamic collections),
or collection config for static collections
warcserver: get_coll_config() returns config for static collection
tests: use custom test dir instead of default 'collections' dir
tests: add basic test for js obj proxy
update to warcio>=1.4.0
* karma tests: update to safari >10
* client-side rewrite:
- ensure wombat.js is ES5 compatible (don't use let)
- check if Proxy obj exists before attempting to init
* js proxy obj: RewriteWithProxyObj uses user-agent to determine if Proxy obj can be supported
content_rewriter: add overridable get_rewriter()
content_rewriter: fix elif -> if in should_rw_content()
tests: update js proxy obj test with different user agents (supported and unsupported)
karma: reset test to safari 9
* compatibility: remove shorthand notation from wombat.js
* js obj proxy: override MutationObserver.observe() to retrieve original object from proxy
wombat.js: cleanup, remove commented out code, label new proxy system functions, bump version to 2.40
- <base> override applies for both set/get
- remove <base>-specific override, using generic 'href' rewriting for <base>
- add <meta> element 'content' rewriting (if url)
- refactor: remove REWRITE_ATTRS/equals_any, add should_rewrite_attr()
- should_rewrite_attr(tagName, attr) to determines if attr should be rewritten for given tag
- bump version to 2.30
- check for 'wsgiprox.fixed_host' and use that as host_prefix if set
- don't include Connection/Proxy-Connection headers in upstram request
- ensure proxy response has length or is chunk-encoded