- tweaked the JSWombatProxyRules regex for = this to be = this and , this
- added comments to the more complicated regex's used by JSWombatProxyRules
- added test case for tweaked regex
- reworked both proxy and non-proxy mode backing workers to no-longer fetch in burst mode but as sent with a maximum of 20 fetches running at a time
- added just-fetch to non-proxy mode backing worker
- updated the auto fetch worker abstraction in non-proxy mode used by wombat to exposed like in proxy mode and ensured that value property for the srcset object is used when sending rewritten srcset values to the backing worker
- combined the backing worker proxy & non-proxy mode into a single file
- added rollup config for back auto fetch worker
- fixed edge case in jsonP rewriting where no callback name is supplied only ? but body has normal jsonP callback (url = https://geolocation.onetrust.com/cookieconsentpub/v1/geo/countries/EU?callback=?)
- made the `!self.__WB_pmw` server side inject match the client side one done via wombat
- added regex's for eval override to JSWombatProxyRules
* removed the definition of `__WB_pmw` from `ensureServerSideInjectsExistOnWindow` in order to allow more proper handling of that definition t occur from `initNewWindowWombat` or `wombatInit`.
`initNewWindowWombat` now initializes wombat for (i)frame's with src values prefixed with about: as about:srcdoc is commonly used
tweaked postMessage and event listener overrides to be more like the previous wombat revision
* rebased on develop and rebuilt bundle
- Fix: a few broken tests due to iana.org requiring a user agent in its requests
rewrite:
- introduced a new JSWorkerRewriter class in order to support rewriting via wombat workers in the context of all supported worker variants via
- ensured rewriter app correctly sets the static prefix
wombat:
- add wombat as submodule!
* Reworked query.js to know the difference between date search and advanced searching.
Exposed cdx api's through the query html page
- from, to
- matchType
- filter
Added more appealing styling to the error, index, not-found, query, and search templates
Updated the included jquery and boostrap static files to jQuery v3.3.1, Bootstrap v4.1.3
Implemented optionally using a web worker for making the cdx api request and processing the results
Documented the code
* ensure the display count str function uses the correct "first" value
* added view all captures for an result displayed in the advanced results view
query worker now sends over the recordCount as an integer and as a formatted string
moved the search button to the right after advanced options
* tests: fixed test_intergration.py:test_static_nested_dir failing due to updates
Added methods to AutoFetchWorker in proxy mode that allow external JS to initiate checks
Updated the actual proxy mode worker implementation to match the functionality added
Changes:
Reworked ContentFrame and the default banner to be ES5 classes.
Introduced an optional relationship between ContentFrame and banners.
If a banner is exposed then ContentFrame controls the initialization of the banner and routes any messages received from the replay iframe to the banner.
When the replay iframe is navigated to a page and the replay iframe loads, the ContentFrame waits 2 seconds before checking to see if the banner still indicates it a loading state and if so updates the displayed information using the URL and timestamp the replay iframe was navigated to.
- added to the auto-fetch worker of both wombat and wombatProxymode
- added utility function isImageSrcset to wombat for determining if the srcset values being rewritten are from either a image tag or a source tag within a picture tag
- added utility function isImageDataSrcset to wombat to check for img/source data-srcset attributes
- reworked the backing auto-fetch worker to now queue all URLs and perform fetch batching with maximum batch size of 60. A delay of 2 seconds is applied after each batch.
Ensured that the srcset values sent to the auto-fetch worker can be resolved in non-proxy mode fixes#413
Renamed the auto-fetch class named used in proxy mode from AutoFetchWorker to AutoFetchWorkerProxyMode
Added checking of script tage types application/json and text/template to rewrite_script
* client-side rewrite: fix extract_orig() to unrewrite relative urls using current page scheme, don't default to http
* wombat tests: fix karma tests by adding 'wombat_scheme' to test setup
- FullHTMLRegex: performs a case insensitive check for <html, <body, <head and <!doctype html>
updated rewrite_elem to:
- rewrite meta tags that deliever csp policies
- check for additional attributes that could contain un-rewritten URLs (form.style, iframe.style)
Made check for full html into regex
- ensured that autoFetchWorker uses full srcset URLs
- resolves the URL against the img.src or document.baseURI if not rewritten
- otherwise ensures the rewritten URL is not relative or schemeless
wombat.js:
- AutoFetchWorker updated extractFromLocalDoc to send URL resolution information to the worker
- defer extractFromLocalDoc and preserveSrcset postMessages to ensure page viewer can see the images first
Rename:
- rename auto-fetch config to 'enable_auto_fetch' and '--enable-auto-fetch' cli param
- rename 'use_head_insert' -> 'enable_content_rewrite'
- rename 'use_banner' -> 'enable_banner'
- rename 'use_wombat' -> 'enable_wombat'
Misc Cleanup:
- enable_auto_fetch applies to both proxy and non-proxy mode
- remove setting 'wbinfo.use_wombat', implied if wombatProxyMode.js is included
- docs: add docs for auto-fetch system, improved docs for proxy rewrite options
- tests: test with enable_auto_fetch, update tests for renames
- bump version to 2.1.0 due to breaking changes
- changelist: updates to changelist
- requirements: use bounded version for gevent
* rewriting fixes:
server side: cookie rewriting: if httponly cookie with mp_/if_ modifier and path ends with '/', add set-cookie for all known modifiers
content length parsing: improve content-length parsing to support 'content-length: num,num', parse out the first number (occasionally seen with range requests when range is dropped for upstream)
wombat: rewrite_elem: use element.ownerDocument for resolving baseUri for parent paths
tests: add tests for cookie all modifier rewrite, bad content-length parsing (skip for py2.7)
- rename override_func_first_arg_proxy_to_obj -> override_func_arg_proxy_to_obj to support resolving object proxy not just from first param
- add document.evaluate() 'de-proxy' to 2nd param
- optimize override_func_arg_proxy_to_obj() to call original apply, avoid modifying arguments array in place
* wombat init fix:
- fix change from #339 which removed reiniting of wombat
- allow reiniting of wombat if inited via init_new_window_wombat()
- don't allow if reinited directly from <head>, as happened in document import
* tests: fix tests for 'new _WBWombat -> WombatInit' change
* wombat: window.frames optimization:
- since window.frames === window, no need for separate override!
- ensure init_new_window_wombat() is called on any returned window from object proxy
frameworks that like to append a single text node as a child to a style
node modifying and then only modify that text node to add/remove css
dynamically via:
- initTextNodeOverrides (entry point)
- overrideTextProtoFunction (overrides the appendData, insertData, and replaceData functions of inherited by Text)
- overrideTextProtoGetSet (overrides property getters and setters of data and wholeText)
Added window.CSSStyleSheet.insertRule override
- dynamically adds a raw css rule (text) to an existing stylesheet
- Split wombat and auto-fetch worker into two files (proxy mode and non-proxy mode)
- Renamed preservationWorker to autoFetchWorker in order to better convey what it does
- Root config file control over including wombat and auto-fetch worker in proxy or non-proxy mode
- Added additional proxy mode + auto-fetch worker only route for fetching the auto-fetch worker code nicely for CORS
- templateview: add 'tobool' formatter to more cleanly format python bools to JS 'true'/'false'
- proxy options: config and command line:
'use_auto_fetch_worker' and '--proxy-with-auto-fetch'
'use_wombat' and '--proxy-with-wombat'
- head_insert.html: only include wombat in proxy mode when use_wombat or use_auto_fetch_worker are set.
- wombatProxyMode.js: slimmed down wombat for proxy mode only including auto-fetch support.
- more consistent naming: rename 'preserveWorker' and 'autoArchive' to 'auto-fetch'
Updated tests:
- test_wbrequestresponse.py: added tests covering constructor defaults, _init_derived, options_response, json_response, encode_stream, text_stream
- test_auto_colls.py: fixed broken test test_more_custom_templates, reason using ujson now not json so spacing was off
- test_proxy.py: updated existing tests to reflect splitting wombat into proxy and non-proxy mode, added tests covering auto-fetch worker specific endpoints in proxy mode
removed duplicate addons key in .travis.yml
- test_cli.py: updated to properly test the cli with these changes
added ultrajon dep to tests_require in setup.py to reflect its usage by wbrequestresponse.py
Fully documented:
- cli.py
- frontendapp.py
- templateview.py
- wbrequestresponse.py
Removed duplicate addons key in .travis.yml
Added ultrajson dependency to tests_require in setup.py to reflect its usage by wbrequestresponse.py
Fixes#371
wombat.js:
- Finalized PreserveWorker that preserves srcset values and Media Query values
- Defered extraction and preservation of the values to be preserved so that the UI thread is not clobered
- Hooked into places where wombat rewrites the values we are interested in
wombatPreservationWorker.js:
- Updated handling of srcset extraction now that we are sending wombat srcset rewrites
- Added check to see if we have seen a URL to be fetched
- Added light polyfill of Promise and fetch if they are not defined in wombatPreservationWorker.js, for safari
wombat.spec.js
- Updated to include values necessary to work with PWorker changes.
- only add icons if in top frame, fix indent
- favicon: move icon and title logic to default_banner to allow overriding default behavior (eg. Webrecorder uses its own favicon)
- title: prepend original page title with 'pywb Live: ' or 'pywb Archived: ' in default banner to avoid confusion with actual site, also works for frameless mode.
Set favicon and title from top-most replay frame to the top frame (work from @Devhercule):
Favicon display in no-proxy mode with framed_replay: true.
When "iframe": "#replay_iframe", the icon of the tab in question is not visible (or a wrong icon is displayed provided from cache memor ) because of the presence of an added frame (#replay_iframe).
The modification allows to get the replay_iframe favicon and pass it to the main frame to be correctly displayed in the tab.
(see Issue #342)
- improved worker rewriting: updated worker rewriting handles non-blob urls, added SharedWorker override
ww_rw.js:
- updated to be a much more complete rewriting system: overrides for importScripts, and fetch
content_rewriter.py:
- added wkr_ mod for handling Worker/SharedWorker, follows convention of service worker
test_content_rewriter.py
- added test for content rewriting of Worker/SharedWorker
* Updated html_rewriter.py to account for rewriting of script[src] values that are super relative (http://fotopaulmartens.netcam.nl/vucht.php) and added link rel='import' rewriting
Updated test_html_rewriter.py for super rel script[src] rewriting and link rel='import'
Updated wombat to account for the new rewriting of script[src] (http://fotopaulmartens.netcam.nl/vucht.php)
Changed the postMessage override in wombat to use $wbwindow rather than window to fix google calendar replay / recording (http://qasrcc.org/events/calendar/)
* Updated tests for forcing absolute and fixed merge conflicts
* wombat: extracted removal and retrieval of __wb_original_src into own functions
* service worker rewrite work:
- use sw_ modifier to add Server-Worker-Allowed: <domain root>
- force scope if none set to domain url
- resolve sw url to absolute url
* wombat: don't reinit wombat paths if already inited (eg. from imported documents)
* service-worker rewrite test: add test to verify sw rewrite is identity, Service-Worker-Allowed header is added
- renamed obj to this_obj to reflect that we using the deproxied this
- use this_obj rather than window in the first if block that populates
the from variable in order to match the logic in pm_origin and
because proxy_to_obj returns raw this if not proxy
Updated rewrite modifiers for server-side rewriting of `link rel='preload' as='x'`
Added client-side rewriting of `link rel='[preload|import]' as='x'`
Added helper method for determining the correct rewrite modifier to be used in client-side rewriting and updated duplicate modifier logic in wombat
Added Element.insertAdjacentElement override and added special case rewriting of nested elements in insertAdjacentElement and Node.[appendChild|replaceChild|insertBefore]
Add MouseEvent override to account for the view argument which is windowProxy
Fixed implicit variable declaration that resulted in global pollution and possible variable collisions in rewriting logic
Updated wb_unrewrite_rx to now consider protocol and host as optional to fix imgur
Nit document.[write|writeln] override: rather than using Array.apply then Array.join we now use just Array.join as it works on array like objects
- server-side: rewrite '}(this)' or '})(this)' with js object proxy override convert
- client-side: fix typo in 'onstorage' override, fix typo that prevented SameOriginListener() from being used -- ensure
custom 'onstorage' events only sent to original window
- if targetOrigin is the replay host, default to unrewritten from origin, not '*'
- don't set targetOrigin to 'null' or empty to avoid errors
- if target window's unrewritten origin is actually 'null' or '', don't pass message at all, and don't set to '*' -- represents actual behavior,
as postMessage to 'null' origin (about:blank page) will be received only if targetOrigin is already '*'.
- don't include wombat.js in banner only mode, including in proxy mode
(instead, do set devicePixelRatio to fix certain fidelity issues)
- default_banner: set title to document.title on load when frameless, including in proxy mode
- improve docs for configuring proxy mode cert
- tests: update tests to ensure no wombat.js injected in proxy or banner-only mode
- override window.EventTarget.prototype.addEventListener instead of window.add/removeEventListener to work correctly with angular
- add 'document.title' override to detect title change event and propagate to top frame (history title often unused)
- add equivalent wrappers from addEventListener to window.onmessage and window.onstorage properties
- send_top_message() to wrap all postMessage calls
- url change, hash change, history change notifications only sent if window is top replay frame, cookie notification sent for all windows
- don't send url change notification if 'about:blank' or 'javascript:' url
bump version to 2.0.2
- override Function.apply() to de-proxy thisArg and all params before calling native functions (may make per-function overrides unnecessary)
- ensure init_top_frame_notify() is called on $wbwindow object not window
README: update features list, contributing section, fix typos
docs: update features list, fix wording, add more links to other sections, fix typos
renaming: change 'ikreymer/pywb' -> 'webrecorder/pywb', add Rhizome to copyright statement
Dockerfile: remove deprecated MAINTAINER, add 'ARG PYTHON' to support custom base python image
* query fix:
setup: ensure all static files included in package_data recursively to add new query assets
test: add test for nested static asset
query: correctly display 0 captures, 'capture' and 'captures' text moved to Text block
- move scripts to query.js, fix formatting
- init ui from cdx list, refactor into single script
- use cdx api to retrieve query via ajax
- tests: update query tests to use cdx lookup instead
- remove server-side cdx lookup for /*/ endpoint