- don't rewrite already rewritten scheme-relative urls
- proxy obj wrapper: use getOwnPropertyDescriptor() from wrapped object, if exists, than from window
- remove old createElement() override with non-standard param, which caused issues
- add HTMLFormElement.prototype.action override, now fully supported
js proxy obj server-side and client-side rewrite fixes:
server-side:
- if rewriting '<newline>this', add ';' in case previous line has none
- if peeking stream (to determine if html), ensure new wrapped content_stream used even if no rewriting
client-side (wombat js):
- add object->proxy for EventTarget.target, proxy->object for Node.contains overrides
- add missing return from overrides
- override CSSStyleDeclaration.setProperty() to rewrite css property values which may be urls (getPropertyValue / property getters not unrewritten for now)
- rewrite_style() convert with value.toString() if value is an object
- server-side: if JS url contains 'callback=jQuery', treat as jsonp
- client-side: add full url if history change url starts with '#'
- client-side: override SVGImageElement setAttr / setAttrNS / getAttr / getAttrNS to rewrite setting "href" attribute (with or without namespace)
- add special 'guess-none' and 'guess-bin' types for guessing content-type
- 'application/octet-stream' treated as 'guess-bin', treated as js or css if js_ or cs_
- tests: add tests for application/octet-stream detection, keeping charset
- guess-none applied for js_, cs_, as well as mp_ and default mod to guess html also
* client-side rewrite (wombat) fixes:
- ensure make_parser() calls createElement() on associated document if rewriting within an element
- ensure host-relative urls are rewritten as host-relative, eg.. a.href = "/path" stay host-relative when unrewritten
* head_insert: use request_ts instead of actual ts for client-side rewriting, consistent with server-side
- for prototype override, ensure object exists
- for domain setter, ensure location exists, default to window
rules: expand facebook rule to match fbid also
- override EventTarget.addEventListener/removeEventListener to ensure function called on actual object, not proxy
- add proxy_to_obj() to existing window.addEventListener/removeEventListener overrides
* separate default rules config for query matching: 'not_exts', 'mimes', and new 'url_normalize'
- regexes in 'url_normalize' applied on each cdx entry to see if there's a match with requested url
- jsonp: allow for '/* */' comments prefix in jsonp (experimental)
- fuzzy rule: add rule for '\w+=jquery[\d]+' collapsing, supports any callback name
- fuzzy rule: add rule for more generic 'cache busting' params, 'bust' in name, possible timestamp in value (experimental)
- fuzzy rule add: add ga utm_* rule & tests
tests: improve fuzzy matcher tests to use indexing system, test all new rules
tests: add jsonp_rewriter tests
config: use_js_obj_proxy=true in default config.yaml, setting added to each collection's metadata
- if content-type missing, resolve if text type by checking for html and modifier
- if text type has changed, set default JS and CSS text type
- if text type is html, ensure mime type is text/html (force xhtml mime type to text/html)
tests: add test_content_rewriter for direct header + content rewriting tests
- add general override_func_first_arg_proxy_to_obj() to dereference proxy->obj for first arg
- used for MutationObserver.observe() and Node.compareDocumentPosition() for now
- add tests for WBMementoIndexSource, member-list based RedisIndexSource
- convert redis aggregator and index source tests to use testutils BaseTestClass system
- rename configwarcserver -> warcserver
* proxy access fixes:
- catch proxy access (in case cross-domain, eg. from service worker)
- document.location access falls back to defaultView._WB_wombat_location if not available
- use obj_to_proxy(), proxy_to_obj() wrappers access, catch exceptions
- memento fixes, fully support memento pattern 2.2 api spec
- add timemap endpoints at /timemap/link/<url>, also /timemap/cdxj/<url>, /timemap/json/<url>
- include original and timemap links in Link header
- correct memento headers for timegate, timemap, memento
- support Accept-Datetime header for timegate
- Link rel="memento" includes canonical url, matches Content-Location url
- tests: update memento tests
client-side rewrite: improve rewrite_html(), use wrap html fragments in <template> to avoid filtering out valid html, use existing system if full html starting with <html>/<body>/<head>. Addresses #138 in a better way
ensure WombatLocation.origin is always set using protocol/host, even if parser doesn't have it (ie and edge)
* html rewriter: better rewrite of link preload, set wburl modifier to match preload type (js_ for js, cs_ for style, im_ for image, if_ for iframe, oe_ as default)
* tests: add tests for improved preload rewrite
windows build fixes: all tests should pass, ci with appveyor
- add appveyor.yml
- path fixes for windows, use os.path.join
- templates_dir: use '/' always for jinja2 paths
- auto colls: ensure chdir before deleting dir
- recorder: ensure warc writer is always closed
- recorder: disable locking in warcwriter on windows for now (read access not avail, shared
lock seems to not be working)
- zipnum: ensure block is closed after read!
- cached dir test: wait before adding file
- tests: adjust timeout tests to allow more leeway in timing
* Init commit for Wombat JS Proxies off of https://github.com/ikreymer/pywb/tree/develop
Changes
- cli.py: add import os for os.chdir(self.r.directory)
- frontendapp.py: added initial support for cors requests.
- static_handler.py: add import for NotFoundException
- wbrequestresponse.py: added the intital implementation for cors requests, webrecoder needs this for recording!
- default_rewriter.py: added JSWombatProxyRewriter to default js rewriter class for internal testing
- html_rewriter.py: made JSWombatProxyRewriter to be default js rewriter class for internal testing
- regex_rewriters.py: implemented JSWombatProxyRewriter and JSWombatProxyRewriter to support wombat JS Proxy
- wombat.js: added JS Proxy support
- remove print
* wombat proxy: simplify mixin using 'first_buff'
* js local scope rewrite/proxy work:
- add DefaultHandlerWithJSProxy to enable new proxy rewrite (disabled by default)
- new proxy toggleable with 'js_local_scope_rewrite: true'
- work on integrating john's proxy work
- getAllOwnProps() to generate list of functions that need to be rebound
- remove non-proxy related changes for now, remove angular special cases (for now)
* local scope proxy work:
- add back __WB_pmw() prefix for postMessage
- don't override postMessage() in proxy obj
- MessageEvent resolve proxy to original window obj
* js obj proxy: use local_init() to load local vars from proxy obj
* wombat: js object proxy improvements:
- use same object '_WB_wombat_obj_proxy' on window and document objects
- reuse default_proxy_get() for get operation from window or document
- resolve and Window/Document object to the proxy, eg. if '_WB_wombat_obj_proxy' exists, return that
- override MessageEvent.source to return window proxy object
* obj proxy work:
- window proxy: defineProperty() override calls Reflect.defineProperty on dummy object as well as window to avoid exception
- window proxy: set() also sets on dummy object, and returns false if Reflect.set returns false (eg. altered by Reflect.defineProperty disabled writing)
- add override_prop_to_proxy() to add override to return proxy obj for attribute
- add override for Node.ownerDocument and HTMLElement.parentNode to return document proxy
server side rewrite: generalize local proxy insert, add list for local let overrides
* js obj proxy work:
- add default '__WB_pmw' to self if undefined (for service workers)
- document.origin override
- proxy obj: improved defineProperty override to work with safari
- proxy obj: catch any exception in dummy obj setter
* client-side rewriting:
- proxy obj: catch exception (such as cross-domain access) in own props init
- proxy obj: check for self reference '_WB_wombat_obj_proxy' access to avoid infinite recurse
- rewrite style: add 'cursor' attr for css url rewriting
* content rewriter: if is_ajax(), skip JS proxy obj rewriting also (html rewrite also skipped)
* client-side rewrite: rewrite 'data:text/css' as inline stylesheet when set via setAttribute() on 'href' in link
* client-side document override improvements:
- fix document.domain, document.referrer, forms add document.origin overrides to use only the document object
- init_doc_overrides() called as part of proxy init
- move non-document overrides to main init
rewrite: add rewrite for "Function('return this')" pattern to use proxy obj
* js obj proxy: now a per-collection (and even a per-request) setting 'use_js_obj_prox' (defaults to False)
live-rewrite-server: defaults to enabled js obj proxy
metadata: get_metadata() loads metadata.yaml for config settings for dynamic collections),
or collection config for static collections
warcserver: get_coll_config() returns config for static collection
tests: use custom test dir instead of default 'collections' dir
tests: add basic test for js obj proxy
update to warcio>=1.4.0
* karma tests: update to safari >10
* client-side rewrite:
- ensure wombat.js is ES5 compatible (don't use let)
- check if Proxy obj exists before attempting to init
* js proxy obj: RewriteWithProxyObj uses user-agent to determine if Proxy obj can be supported
content_rewriter: add overridable get_rewriter()
content_rewriter: fix elif -> if in should_rw_content()
tests: update js proxy obj test with different user agents (supported and unsupported)
karma: reset test to safari 9
* compatibility: remove shorthand notation from wombat.js
* js obj proxy: override MutationObserver.observe() to retrieve original object from proxy
wombat.js: cleanup, remove commented out code, label new proxy system functions, bump version to 2.40
- add optional 'first_buff' defaulting to ''
- rename close() -> final_read()
- add rewrite_complete() for single-pass complete rewrite (including first buff and final_read()
- rewrite_text_stream_to_gen() uses first_buff, uses member funcs directly
- remove unused close() from other rewriters, only needed for HTMLParser interface
- <base> override applies for both set/get
- remove <base>-specific override, using generic 'href' rewriting for <base>
- add <meta> element 'content' rewriting (if url)
- refactor: remove REWRITE_ATTRS/equals_any, add should_rewrite_attr()
- should_rewrite_attr(tagName, attr) to determines if attr should be rewritten for given tag
- bump version to 2.30