1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

6 Commits

Author SHA1 Message Date
John Berlin
d869f7ef48 auto-fetch: (#484)
- reworked both proxy and non-proxy mode backing workers to no-longer fetch in burst mode but as sent with a maximum of 20 fetches running at a time
 - added just-fetch to non-proxy mode backing worker
 - updated the auto fetch worker abstraction in non-proxy mode used by wombat to exposed like in proxy mode and ensured that value property for the srcset object is used when sending rewritten srcset values to the backing worker
  - combined the backing worker proxy & non-proxy mode into a single file
  - added rollup config for back auto fetch worker
2019-06-27 22:01:45 -07:00
Ilya Kreymer
7cf5828cdb
Wombat Improvements for extractOriginalUrl and hash change (#481)
* wombat fixes:
- extractOriginalUrl() now takes into account the abs or rel prefix, if available, in determining the original url
- receive_hash_change() for 'outer_hashchange' message handling is now before MessageEvent override, fix to get the original message and check .from_top directly to determine if it originated with top frame.
2019-06-22 09:47:55 -07:00
John Berlin
6437dab1f4 server side rewriting: (#475)
- fixed edge case in jsonP rewriting where no callback name is supplied only ? but body has normal jsonP callback (url = https://geolocation.onetrust.com/cookieconsentpub/v1/geo/countries/EU?callback=?)
  - made the `!self.__WB_pmw` server side inject match the client side one done via wombat
  - added regex's for eval override to JSWombatProxyRules

wombat:
  - added eval override in order to ensure AD network JS does not lockup the browser (cnn.com)
  - added returning a proxied value for window.parent from the window proxy object in order to ensure AD network JS does not lock up browser (cnn.com, boston.com, et. al.)
  - ensured that the read only storageArea property of 'StorageEvents' fired by the custom storage class is present (spec)
  - ensured that userland cannot create new instances of Storage and that localStorage instanceof Storage works with our override (spec)
  - ensured that assignments to SomeElement.[innerHTML|outerHTML], iframe.srcdoc, or style.textContent are more correctly handled in particular doing script.[innerHTML|outerHTML] = <JS>
  - ensured that the test used in the isArgumentsObj function does not throw errors IFF the pages JS has made it so when toString is called (linkedin.com)
  - ensured that Web Worker construction when using the optional options object, the options get supplied to the create worker (spec)
  - ensured that the expected TypeError exception thrown from DomConstructors of overridden interfaces are thrown when their pre-conditions are not met (spec)
  - ensured that wombat worker rewriting skipes data urls which should not be rewritten
  - ensured the bound function returned by the window function is the same on subsequent retrievals fixes #474

  tests:
   - added direct testing of wombat's parts
2019-06-20 19:06:41 -07:00
John Berlin
85435433e8 wombat postMessage override tweaking (#473)
* removed the definition of `__WB_pmw` from `ensureServerSideInjectsExistOnWindow` in order to allow more proper handling of that definition t occur from `initNewWindowWombat` or `wombatInit`.

`initNewWindowWombat` now initializes wombat for (i)frame's with src values prefixed with about: as about:srcdoc is commonly used

tweaked postMessage and event listener overrides to be more like the previous wombat revision

* rebased on develop and rebuilt bundle
2019-06-04 09:10:49 -07:00
John Berlin
4a4e6c2b33 made the rewrite modifier wombat's rewriting of js workers init'd as a blob is wkrf_ not wkr_ to match the python JSWorkerRewriter (#470) 2019-06-03 09:05:53 -07:00
John Berlin
94784d6e5d wombat overhaul! fixes #449 (#451)
wombat:
 - I: function overrides applied by wombat now better appear to be the original new function name same as originals when possible
 - I: WombatLocation now looks and behaves more like the original Location interface
 - I: The custom storage class now looks and behaves more like the original Storage
 - I: SVG image rewriting has been improved: both the href and xlink:href deprecated since SVG2 now rewritten always
 - I: document.open now handles the case of creation of a new window
 - I: Request object rewriting of the readonly href property is now correctly handled
 - I: EventTarget.addEventListener, removeEventListener overrides now preserve the original this argument of the wrapped listener
 - A: document.close override to ensure wombat is initialized after write or writeln usage
 - A: reconstruction of <doctype...> in rewriteHTMLComplete IFF it was included in the original string of HTML
 - A: document.body setter override to ensure rewriting of the new body or frameset
 - A: Attr.[value, nodeValue, textContent] added setter override to perform URL rewrites
 - A: SVGElements rewriting of the filter, style, xlink:href, href, and src attributes
 - A: HTMLTrackElement rewriting of the src attribute of the
 - A: HTMLQuoteElement and HTMLModElement rewriting of the cite attribute
 - A: Worklet.addModule: Loads JS module specified by a URL.
 - A: HTMLHyperlinkElementUtils overrides to the areaelement
 - A: ShadowRootoverrides to: innerHTML even though inherites from DocumentFragement and Node it still has innerHTML getter setter.
 - A: ShadowRoot, Element, DocumentFragment append, prepend: adds strings of HTML or a new Node inherited from ParentNode
 - A: StylePropertyMap override: New way to access and set CSS properties.
 - A: Response.redirecthttps rewriting of the URL argument.
 - A:  UIEvent, MouseEvent, TouchEvent, KeyboardEvent, WheelEvent, InputEvent, and CompositionEven constructor and init{even-name} overrides in order to ensure that wombats JS Proxy usage does not affect their defined behaviors
 - A: XSLTProcessor override to ensure its usage is not affected by wombats JS Proxy usage.
 - A: navigator.unregisterProtocolHandler: Same override as existing navigator.registerProtocolHandler but from the inverse operation
 - A: PresentationRequest: Constructor takes a URL or an array of URLs.
 - A: EventSource and WebSocket override in order to ensure that they do not cause live leaks
 - A: overrides for the child node interface
 - Fix: autofetch worker creatation of the backing worker when it is operating within an execution context with a null origin
tests:
  - A: 559 tests specific to wombat and client side rewritting
pywb:
  - Fix: a few broken tests due to iana.org requiring a user agent in its requests
rewrite:
  - introduced a new JSWorkerRewriter class in order to support rewriting via wombat workers in the context of all supported worker variants via
  - ensured rewriter app correctly sets the static prefix
ci:
 - Modified travis.yml to specifically enumerate jobs
documentation:
  - Documented new wombat, wombat proxy moded, wombat workers
auto-fetch:
 - switched to mutation observer when in proxy mode so that the behaviors can operate in tandem with the autofetcher
2019-05-15 11:42:51 -07:00