1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-20 18:59:11 +01:00

471 Commits

Author SHA1 Message Date
John Berlin
9c5673968c wombat: improved the fetch override to ensure that a live leak does not occur when input is an instance of WombatLocation or URL, will also handle any object that has href (#276) 2018-01-08 16:08:21 -08:00
Rebecca Lynn Cremona
d3b379e788 Improved rewriting of srcset image urls; handle urls with commas (#269)
* rewrite improvement: better srcset parsing for comma-separated urls

* extensive server-side tests for srcset rewriting (with and without spaces and extra srcset modifiers)

* compile regex once for improved performance

* same regex for server and client side rewriting

Work by @rebeccacremona
2018-01-05 12:24:52 -08:00
Ilya Kreymer
efda3df640 client-side rewrite:
- rewrite style '@import' directives
- don't rewrite <input value> attributes
- cleanup, remove obsolote data
2017-11-21 17:58:06 -08:00
Ilya Kreymer
1bb1a32ee1 client-side rewrite:
- rewrite Audio() constructor
- unrewrite innerHTML, outerHTML accessors
- rewrite DocumentFragments
rules: add rules for readspeaker
2017-11-21 08:02:50 -08:00
Ilya Kreymer
5cc1e60048 client-side rewrite: add <img> srcset attribute override 2017-11-07 19:52:07 -08:00
Ilya Kreymer
7ed9275446 rewrite improvement: add custom rewrite for 'location =' with '__WB_check_loc(location).href' to check if actually changing location at runtime, replacing fixed 'WB_wombat_' prefix 2017-11-06 22:52:19 -08:00
Ilya Kreymer
f34970c5ec client-side rewrite fixes:
- frameElement override returns 'null' instead of 'undefined'
- remove unused WB_wombat_frameElement
- add deproxy wrapper for setTimeout, setInterval
- add 'outerHTML' rewrite
2017-11-03 18:08:35 -07:00
Ilya Kreymer
93b3b95664 client-side rewrite: add custom FuncMap() wrapper for func->func associative array, for handling message and storage event mapping, instead of using functions as keys, use function equality only to compare. fixes events not being fired due to different function objects treated as same object 2017-11-02 16:57:55 -07:00
Ilya Kreymer
bcbc00a89b
Fuzzy Rewrite Improvements (#263)
rules system:
- 'mixin' class for adding custom rewrite mixin, initialized with optional 'mixin_params'
- 'force_type' to always force rewriting text type for rule match (eg. if application/octet-stream)
- fuzzy rewrite: 'find_all' mode for matching via regex.findall() instead of search()
- load_function moved to generic load_py_name
- new rules for fb!
- JSReplaceFuzzy mixin to replace content based on query (or POST) regex match
- tests: tests JSReplaceFuzzy rewriting

query:
- append '?' for fuzzy matching if filters are set
- cdx['is_fuzzy'] set to '1' instead of True

client-side: rewrite
- add window.Request object rewrite
- improved rewrite of wb server + path, avoid double-slash
- fetch() rewrite proxy_to_obj()
- proxy_to_obj() null check
- WombatLocation prop change, skip if prop is the same
2017-10-31 20:35:29 -07:00
Ilya Kreymer
520ee35081
client-side rewrite: (#262)
- add 'ww_rw' for injecting into webworkers via importScript() added when loading web workers as blobs
- 'WB_wombat_location' override checks for defaultView more consistently if _WB_wombat_location is null/undefined
- custom overrides __WB_pmw, WB_wombat_frameElement just fail silently instead of raising exception on assignment
2017-10-30 18:54:13 -07:00
Ilya Kreymer
456ac09b62 rewriting fixes:
- wburl: escape any '#' -> '%23' (presumably unescaped by wsgi), add tests
- wombat: call proxy_to_obj() for overriden property accessors
2017-10-19 15:41:32 -07:00
Ilya Kreymer
70a09e2804 js insert rewrite improvements:
- client-side script: only rewrite if overridden objects are found in script text
- server-side inline js rewrite: only rewrite if overriden objects are found, don't insert before 'javascript:' marker
- tests: add improved tests for html js in attribute rewriting
2017-10-18 10:51:24 -07:00
Ilya Kreymer
aa0a019567 Frame insert refactor (#246)
refactor frame/head insert templates:
ContentFrame:
- content iframe inited with new ContentFrame() which creates iframe
- wb_frame.js: contains ContentFrame system for initing, updating, closing content frame for replayed content.
- wb_frame.js: supports 'app_prefix' and 'content_prefix' or default 'prefix' for replay content
- window.location.hash passed added to init url.
- frame insert and head insert: simplify, remove 'wbrequest'
- frame insert: global wbinfo object no longer needed in top frame, each ContentFrame self-contained.
- wombat.js: next_parent() check does not assume wbinfo is present in top frame
- vidrw.js: only init if wbinfo is present

Banner:
- wb.js no longer needed, frame check/redirect folded into wombat.js
- default banner self-contained in default_banner.js/default_banner.css, handles both frame and frameless case
- rename wb.css -> default_banner.css
- banner html passed in as 'banner_html' variable to be optionally included, supports per collection banner html.
- templateview: BaseInsertView can accept an option 'banner view', used by HeadInsertView and TopFrameView

Tests:
- tests: test_auto_colls uses shared app to test dynamic changes, testing both frame and non-frame access, added per-collection banner html check.
2017-09-30 21:09:38 -07:00
Ilya Kreymer
cd272013b8 client-side rewrite: fix override_func_this_proxy_to_obj() for unsupported/undefined objects (just ignore) 2017-09-14 21:46:48 -07:00
Ilya Kreymer
ba6d0245a5 client-side rewrite: add proxy->obj this for getComputedStyle() function 2017-09-14 21:05:39 -07:00
Ilya Kreymer
71a5853334 History Change Simplification (#240)
framed replay: history change simplifications
- simplify history changes for top frame, remove unused code
- only use 'replaceState' to replace top-frame url with current url, avoid adding new history entries
- use onpopstate to notify top frame, don't override go/back/forward
2017-09-13 13:19:41 -07:00
Ilya Kreymer
d1f8d8fdcb rewrite edge-case js proxy obj fixes:
server-side rewrite: rewrite '||this' but not '|||this'
client-side rewrite:
- check for null in rewrite_style()
- use proxy_to_obj() in postMessage(), open() rewrite overrides
2017-09-12 16:28:51 -07:00
Ilya Kreymer
48b0b329d7 header rewriter improvements:
- enumerate standard headers, prefix only known headers, keep others (like Date)
- don't rewrite custom headers by default
typo fixes: fix typo in wombat.js, fix special case rewrite_dash() for fb
2017-09-11 18:49:41 -07:00
Ilya Kreymer
5a0867fed9 LocalStorage/SessionStorage Overrides (#235)
* client-side rewrite: Custom LocalStorage/SessionStorage override:
- custom, in-mem only objects for localStorage and sessionStorage to avoid polluting browser storage, using Proxy if available to allow accessors
- storage event listeners tracked in addEventListener override, called directly with custom StorageEvent.
- storage event listener wrapped in SameOriginListener() to prevent notifying listeners from different origins

* addEventListener fix: prevents duplicate additions for wrapped listeners, for both message and storage
2017-09-06 23:14:48 -07:00
Ilya Kreymer
31dbbc4f05 client-side rewrite: add rewrite_script() to wrap generated script in proxy js obj wrapper, if Proxy exists 2017-09-06 22:58:25 -07:00
Ilya Kreymer
b22904e5f1 client-side rewrite fixes:
- don't rewrite already rewritten scheme-relative urls
- proxy obj wrapper: use getOwnPropertyDescriptor() from wrapped object, if exists, than from window
2017-09-06 22:29:18 -07:00
Ilya Kreymer
246940348f client-side rewrite: use element parser (instead of custom checks) to get absolute url for pushState/replaceState checks 2017-09-06 17:41:09 -07:00
Ilya Kreymer
fe55d7e895 client-side (wombat) fixes:
- anchor property override: don't set prop to "href"!
- frames override: catch exception (cross-origin access)
2017-09-02 12:53:52 -07:00
Ilya Kreymer
03b7cb4f28 client-side rewrite improvements:
- remove old createElement() override with non-standard param, which caused issues
- add HTMLFormElement.prototype.action override, now fully supported
2017-08-31 16:54:48 -07:00
Ilya Kreymer
9a47748296 Rewrite Fixes for JS Obj Proxy (#234)
js proxy obj server-side and client-side rewrite fixes:
server-side:
 - if rewriting '<newline>this', add ';' in case previous line has none
 - if peeking stream (to determine if html), ensure new wrapped content_stream used even if no rewriting
client-side (wombat js):
 - add object->proxy for EventTarget.target, proxy->object for Node.contains overrides
 - add missing return from overrides
 - override CSSStyleDeclaration.setProperty() to rewrite css property values which may be urls (getPropertyValue / property getters not unrewritten for now)
 - rewrite_style() convert with value.toString() if value is an object
2017-08-29 17:31:44 -07:00
Ilya Kreymer
da01d0b4e9 rewriting enhancements:
- server-side: if JS url contains 'callback=jQuery', treat as jsonp
- client-side: add full url if history change url starts with '#'
- client-side: override SVGImageElement setAttr / setAttrNS / getAttr / getAttrNS to rewrite setting "href" attribute (with or without namespace)
2017-08-25 16:53:52 -07:00
Ilya Kreymer
ae703e6677 cleanup: content rewriter: don't try to resolve text type if already 'html' and 'mp_'/default mod
client-side rewrite: when checking history change, allow for relative urls also (convert to absolute)
2017-08-24 16:25:28 -07:00
Ilya Kreymer
f14bb7b6bf Wombat Improvements (#232)
* client-side rewrite (wombat) fixes:
- ensure make_parser() calls createElement() on associated document if rewriting within an element
- ensure host-relative urls are rewritten as host-relative, eg.. a.href = "/path" stay host-relative when unrewritten

* head_insert: use request_ts instead of actual ts for client-side rewriting, consistent with server-side
2017-08-24 13:37:23 -07:00
Ilya Kreymer
b2f3a580c2 wombat work:
- for prototype override, ensure object exists
- for domain setter, ensure location exists, default to window
rules: expand facebook rule to match fbid also
2017-08-22 13:51:10 -07:00
Ilya Kreymer
7ddd3296ad client-side rewrite:
- override EventTarget.addEventListener/removeEventListener to ensure function called on actual object, not proxy
- add proxy_to_obj() to existing window.addEventListener/removeEventListener overrides
2017-08-22 12:22:02 -07:00
Ilya Kreymer
d0dafb268d client-side rewrite: add proxy-to-obj dereference for Document.createTreeWalker 2017-08-18 19:50:58 -07:00
Ilya Kreymer
bbe3cebd2f client side fixes for proxy obj:
- add general override_func_first_arg_proxy_to_obj() to dereference proxy->obj for first arg
- used for MutationObserver.observe() and Node.compareDocumentPosition() for now
2017-08-17 00:08:18 -07:00
Ilya Kreymer
9fdff8388e client-side override fix: first set window.devicePixelRatio to 1, also prevent from changing, if possible (catch exception) 2017-08-10 16:36:29 -07:00
Ilya Kreymer
ce3ba9e42e client-side rewrite: fix window.devicePixelRatio to 1 to ensure consistent replay (esp for video) 2017-08-10 16:13:17 -07:00
Ilya Kreymer
e9fa167564 wayback app: add support for root collection, specified as '$root' -- no other collections support if root colletion is set
tests: add test_root_coll.py (move from unused tests)
wombat.js: proxy: fix typo in location access
2017-08-07 22:19:10 -07:00
Ilya Kreymer
33ba67646b JS proxy fix (#229)
* proxy access fixes:
- catch proxy access (in case cross-domain, eg. from service worker)
- document.location access falls back to defaultView._WB_wombat_location if not available
- use obj_to_proxy(), proxy_to_obj() wrappers access, catch exceptions
2017-08-07 20:00:30 -07:00
Ilya Kreymer
6db2a1161d client-side rewrite: improve rewrite_html(), use wrap html fragments … (#227)
client-side rewrite: improve rewrite_html(), use wrap html fragments in <template> to avoid filtering out valid html, use existing system if full html starting with <html>/<body>/<head>. Addresses #138 in a better way
ensure WombatLocation.origin is always set using protocol/host, even if parser doesn't have it (ie and edge)
2017-08-07 16:46:27 -07:00
Ilya Kreymer
a6ab167dd3 JS Object Proxy Override System (#224)
* Init commit for Wombat JS Proxies off of https://github.com/ikreymer/pywb/tree/develop

Changes
- cli.py: add import os for os.chdir(self.r.directory)
- frontendapp.py: added initial support for cors requests.
- static_handler.py: add import for NotFoundException
- wbrequestresponse.py: added the intital implementation for cors requests, webrecoder needs this for recording!
- default_rewriter.py: added JSWombatProxyRewriter to default js rewriter class for internal testing
- html_rewriter.py: made JSWombatProxyRewriter to be default js rewriter class for internal testing
- regex_rewriters.py: implemented JSWombatProxyRewriter and JSWombatProxyRewriter to support wombat JS Proxy
- wombat.js: added JS Proxy support
- remove print

* wombat proxy: simplify mixin using 'first_buff'

* js local scope rewrite/proxy work:
- add DefaultHandlerWithJSProxy to enable new proxy rewrite (disabled by default)
- new proxy toggleable with 'js_local_scope_rewrite: true'
- work on integrating john's proxy work
- getAllOwnProps() to generate list of functions that need to be rebound
- remove non-proxy related changes for now, remove angular special cases (for now)

* local scope proxy work:
- add back __WB_pmw() prefix for postMessage
- don't override postMessage() in proxy obj
- MessageEvent resolve proxy to original window obj

* js obj proxy: use local_init() to load local vars from proxy obj

* wombat: js object proxy improvements:
- use same object '_WB_wombat_obj_proxy' on window and document objects
- reuse default_proxy_get() for get operation from window or document
- resolve and Window/Document object to the proxy, eg. if '_WB_wombat_obj_proxy' exists, return that
- override MessageEvent.source to return window proxy object

* obj proxy work:
- window proxy: defineProperty() override calls Reflect.defineProperty on dummy object as well as window to avoid exception
- window proxy: set() also sets on dummy object, and returns false if Reflect.set returns false (eg. altered by Reflect.defineProperty disabled writing)
- add override_prop_to_proxy() to add override to return proxy obj for attribute
- add override for Node.ownerDocument and HTMLElement.parentNode to return document proxy
server side rewrite: generalize local proxy insert, add list for local let overrides

* js obj proxy work:
- add default '__WB_pmw' to self if undefined (for service workers)
- document.origin override
- proxy obj: improved defineProperty override to work with safari
- proxy obj: catch any exception in dummy obj setter

* client-side rewriting:
- proxy obj: catch exception (such as cross-domain access) in own props init
- proxy obj: check for self reference '_WB_wombat_obj_proxy' access to avoid infinite recurse
- rewrite style: add 'cursor' attr for css url rewriting

* content rewriter: if is_ajax(), skip JS proxy obj rewriting also (html rewrite also skipped)

* client-side rewrite: rewrite 'data:text/css' as inline stylesheet when set via setAttribute() on 'href' in link

* client-side document override improvements:
- fix document.domain, document.referrer, forms add document.origin overrides to use only the document object
- init_doc_overrides() called as part of proxy init
- move non-document overrides to main init
rewrite: add rewrite for "Function('return this')" pattern to use proxy obj

* js obj proxy: now a per-collection (and even a per-request) setting 'use_js_obj_prox' (defaults to False)
live-rewrite-server: defaults to enabled js obj proxy
metadata: get_metadata() loads metadata.yaml for config settings for dynamic collections),
or collection config for static collections
warcserver: get_coll_config() returns config for static collection
tests: use custom test dir instead of default 'collections' dir
tests: add basic test for js obj proxy
update to warcio>=1.4.0

* karma tests: update to safari >10

* client-side rewrite:
- ensure wombat.js is ES5 compatible (don't use let)
- check if Proxy obj exists before attempting to init

* js proxy obj: RewriteWithProxyObj uses user-agent to determine if Proxy obj can be supported
content_rewriter: add overridable get_rewriter()
content_rewriter: fix elif -> if in should_rw_content()
tests: update js proxy obj test with different user agents (supported and unsupported)
karma: reset test to safari 9

* compatibility: remove shorthand notation from wombat.js

* js obj proxy: override MutationObserver.observe() to retrieve original object from proxy
wombat.js: cleanup, remove commented out code, label new proxy system functions, bump version to 2.40
2017-08-05 10:37:32 -07:00
Ilya Kreymer
d8b6ad3a31 client-side rewrite: rewrite_html() doesn't prefix/rewrite table tags (td/th/tr) for now, fixes issues caused by rewriting those tags 2017-07-24 21:50:43 +00:00
Ilya Kreymer
c88b843170 client rewrite: rewrite_html() ensure rewriting string! 2017-07-23 09:02:03 -07:00
Ilya Kreymer
9d86601aab client-side rewrite: for rewrite_html(), pre-rewrite problematic tags (FRAME/TD/TH/TR) that are filtered out if standalone, improves #138 2017-07-21 12:01:40 -07:00
Ilya Kreymer
64d05aca45 client-side (wombat): for now, fetch() always includes credentials (needed for WR, maybe should be optional?) 2017-07-21 11:49:28 -07:00
Ilya Kreymer
adab304f33 client-side rewrite: rewrite svg <image xlink:href> attr created via generated html 2017-07-11 18:24:35 -07:00
Ilya Kreymer
b3b843405a client-side (wombat) fix: postMessage() override was treating targetOrigin as hostname, instead of origin prefix.
Check if starts with targetOrigin starts with the WB_wombat_location.origin in target window, prints via console.warn() otherwise.
2017-07-09 15:46:23 -07:00
Ilya Kreymer
1d7e5a73e5 client-side rewrite (wombat) improvements:
- <base> override applies for both set/get
- remove <base>-specific override, using generic 'href' rewriting for <base>
- add <meta> element 'content' rewriting (if url)
- refactor: remove REWRITE_ATTRS/equals_any, add should_rewrite_attr()
- should_rewrite_attr(tagName, attr) to determines if attr should be rewritten for given tag
- bump version to 2.30
2017-07-08 12:44:22 -07:00
Ilya Kreymer
f0f274c0c9 wb_frame: allow "load" event to pushState() instead of replaceState() if window.pushStateOnLoad.
This is necessary to have working history when running in electron, which does not combine
iframe history into the top-frame history
2017-05-16 17:18:37 -07:00
Ilya Kreymer
d6cfb7cd2d wb_frame/wb.js: don't call push_state() if already on the current state,
eg. if two load events received for different readyState
add document.readyState to load event
2017-05-15 22:26:52 -07:00
Ilya Kreymer
296b4ed94d client-side rewrite: remove WB_wombat_ from any id/class= in document.write() 2017-05-03 15:31:06 -07:00
Ilya Kreymer
15a7b15d44 proxy mode support via rewriterapp!
- check for 'wsgiprox.fixed_host' and use that as host_prefix if set
- don't include Connection/Proxy-Connection headers in upstram request
- ensure proxy response has length or is chunk-encoded
2017-04-22 18:17:41 -07:00
Ilya Kreymer
4b055c9394 client-rewrite: support proper srcset= attr rewriting 2017-04-21 12:31:56 -07:00