refactor frame/head insert templates:
ContentFrame:
- content iframe inited with new ContentFrame() which creates iframe
- wb_frame.js: contains ContentFrame system for initing, updating, closing content frame for replayed content.
- wb_frame.js: supports 'app_prefix' and 'content_prefix' or default 'prefix' for replay content
- window.location.hash passed added to init url.
- frame insert and head insert: simplify, remove 'wbrequest'
- frame insert: global wbinfo object no longer needed in top frame, each ContentFrame self-contained.
- wombat.js: next_parent() check does not assume wbinfo is present in top frame
- vidrw.js: only init if wbinfo is present
Banner:
- wb.js no longer needed, frame check/redirect folded into wombat.js
- default banner self-contained in default_banner.js/default_banner.css, handles both frame and frameless case
- rename wb.css -> default_banner.css
- banner html passed in as 'banner_html' variable to be optionally included, supports per collection banner html.
- templateview: BaseInsertView can accept an option 'banner view', used by HeadInsertView and TopFrameView
Tests:
- tests: test_auto_colls uses shared app to test dynamic changes, testing both frame and non-frame access, added per-collection banner html check.
support dynamic collections, all collection with remote archives (eg. s3:// paths)
- warcserver: allow custom dynamic collections index and archive path templates via 'dyn_index_path' and 'dyn_archive_path'
- pathresolver: allow resolving wildcard path prefixes with collection, to support remote paths and avoid globbing
- warcserver: don't add fixed collections dir to source to support resolving wildcard
- pathresolver: add wildcard resolving s3 path test
- referrer unrewrite: ensure referrer not empty
- windows: fix paths for pathresolver test on windows
- timemap: add tests for all collection timemap, add cdxj timemap test
- timemap: only add original, timegate links for 'link' timemap
- enabled with 'all_coll' in config or --all-coll cli option, eg. --all-coll all to enable
- supported for replay, timemap and cdx endpoints, uses wildcard '*' for coll name with directory aggregator
- tests: record/replay tests updated to replay via all collection, check all collection cdxj
proxy mode support readded!
- use wsgiprox wrapper in FrontEndApp.init_proxy() with fixed collection prefix, ca options
- cli --proxy <coll> flag added to specify proxy collection
- cleanup: remove cookie rw (already disabled), fix post handling paths
- headers: ensure request headers are not rewritten when in proxy mode, response headers marked with 'url-rewrite' also no rewritten if no url rewrite/proxy mode
- urlrewriter: add IdentityRewriter with no rewriting as default, instead of SchemeOnlyUrlRewriter
- memento support: for now, only include rel="original" and Memento-Datetime in for proxy replay response
- responseloader: disable urllib3 unsecure response warnings
- tests: add test for proxy replay and proxy record/replay of new collection
recording support: now available for dynamic collections via config
- config.yaml 'recorder: live' entry enables /record/ subpath which records to any dynamic collections (can record from any collection, though usually live)
- autoindex refactor: simplified, standalone AutoIndexer() -- indexes any changed warc files to autoindex.cdxj
- windows autoindex support: also check for changed file size, as last modified time may not be changing
- manager: remove autoindex, now part of main cli
- tests: updated test_auto_colls with autoindex changes
- tests: add record/replay tests for recording and replay
frontendapp/warcserver improvements:
- support '/cdx' endpoint for every collection, exposing standard cdx-server api
- remove '-cdx' endpoint in warcserver, redundant with index and frontend /cdx endpoint
- warcserver: simplify paths! support static paths (/A, /B) + dynamic paths (/<path>) on same endpoint
- memento fixes, fully support memento pattern 2.2 api spec
- add timemap endpoints at /timemap/link/<url>, also /timemap/cdxj/<url>, /timemap/json/<url>
- include original and timemap links in Link header
- correct memento headers for timegate, timemap, memento
- support Accept-Datetime header for timegate
- Link rel="memento" includes canonical url, matches Content-Location url
- tests: update memento tests
windows build fixes: all tests should pass, ci with appveyor
- add appveyor.yml
- path fixes for windows, use os.path.join
- templates_dir: use '/' always for jinja2 paths
- auto colls: ensure chdir before deleting dir
- recorder: ensure warc writer is always closed
- recorder: disable locking in warcwriter on windows for now (read access not avail, shared
lock seems to not be working)
- zipnum: ensure block is closed after read!
- cached dir test: wait before adding file
- tests: adjust timeout tests to allow more leeway in timing
* Init commit for Wombat JS Proxies off of https://github.com/ikreymer/pywb/tree/develop
Changes
- cli.py: add import os for os.chdir(self.r.directory)
- frontendapp.py: added initial support for cors requests.
- static_handler.py: add import for NotFoundException
- wbrequestresponse.py: added the intital implementation for cors requests, webrecoder needs this for recording!
- default_rewriter.py: added JSWombatProxyRewriter to default js rewriter class for internal testing
- html_rewriter.py: made JSWombatProxyRewriter to be default js rewriter class for internal testing
- regex_rewriters.py: implemented JSWombatProxyRewriter and JSWombatProxyRewriter to support wombat JS Proxy
- wombat.js: added JS Proxy support
- remove print
* wombat proxy: simplify mixin using 'first_buff'
* js local scope rewrite/proxy work:
- add DefaultHandlerWithJSProxy to enable new proxy rewrite (disabled by default)
- new proxy toggleable with 'js_local_scope_rewrite: true'
- work on integrating john's proxy work
- getAllOwnProps() to generate list of functions that need to be rebound
- remove non-proxy related changes for now, remove angular special cases (for now)
* local scope proxy work:
- add back __WB_pmw() prefix for postMessage
- don't override postMessage() in proxy obj
- MessageEvent resolve proxy to original window obj
* js obj proxy: use local_init() to load local vars from proxy obj
* wombat: js object proxy improvements:
- use same object '_WB_wombat_obj_proxy' on window and document objects
- reuse default_proxy_get() for get operation from window or document
- resolve and Window/Document object to the proxy, eg. if '_WB_wombat_obj_proxy' exists, return that
- override MessageEvent.source to return window proxy object
* obj proxy work:
- window proxy: defineProperty() override calls Reflect.defineProperty on dummy object as well as window to avoid exception
- window proxy: set() also sets on dummy object, and returns false if Reflect.set returns false (eg. altered by Reflect.defineProperty disabled writing)
- add override_prop_to_proxy() to add override to return proxy obj for attribute
- add override for Node.ownerDocument and HTMLElement.parentNode to return document proxy
server side rewrite: generalize local proxy insert, add list for local let overrides
* js obj proxy work:
- add default '__WB_pmw' to self if undefined (for service workers)
- document.origin override
- proxy obj: improved defineProperty override to work with safari
- proxy obj: catch any exception in dummy obj setter
* client-side rewriting:
- proxy obj: catch exception (such as cross-domain access) in own props init
- proxy obj: check for self reference '_WB_wombat_obj_proxy' access to avoid infinite recurse
- rewrite style: add 'cursor' attr for css url rewriting
* content rewriter: if is_ajax(), skip JS proxy obj rewriting also (html rewrite also skipped)
* client-side rewrite: rewrite 'data:text/css' as inline stylesheet when set via setAttribute() on 'href' in link
* client-side document override improvements:
- fix document.domain, document.referrer, forms add document.origin overrides to use only the document object
- init_doc_overrides() called as part of proxy init
- move non-document overrides to main init
rewrite: add rewrite for "Function('return this')" pattern to use proxy obj
* js obj proxy: now a per-collection (and even a per-request) setting 'use_js_obj_prox' (defaults to False)
live-rewrite-server: defaults to enabled js obj proxy
metadata: get_metadata() loads metadata.yaml for config settings for dynamic collections),
or collection config for static collections
warcserver: get_coll_config() returns config for static collection
tests: use custom test dir instead of default 'collections' dir
tests: add basic test for js obj proxy
update to warcio>=1.4.0
* karma tests: update to safari >10
* client-side rewrite:
- ensure wombat.js is ES5 compatible (don't use let)
- check if Proxy obj exists before attempting to init
* js proxy obj: RewriteWithProxyObj uses user-agent to determine if Proxy obj can be supported
content_rewriter: add overridable get_rewriter()
content_rewriter: fix elif -> if in should_rw_content()
tests: update js proxy obj test with different user agents (supported and unsupported)
karma: reset test to safari 9
* compatibility: remove shorthand notation from wombat.js
* js obj proxy: override MutationObserver.observe() to retrieve original object from proxy
wombat.js: cleanup, remove commented out code, label new proxy system functions, bump version to 2.40
- rewrite headers after content to ensure content-length/content-encoding rewritten if content modified
- header rewriter: remove proxyrewriter, set default rule to 'prefix' or 'keep' if url rewriting or not
- set is_content_rw if record.content_stream(), assume content is modified
- add BufferedRewriter as base for dash, hls, amf rewriting which processes the full stream
- should_rw_content() determines if should attempt content rewriting
- support banner-only insert mode: added HTMLInsertOnlyRewriter, enable if no custom JS rules
- test: enable banner-only test mode
- rewriter interface accepts RewriteInfo instance
- add StreamingRewriter adapter wraps html, regex rewriters to support rewriting streaming text from general rewriter interface
- add RewriteDASH, RewriteHLS as (non-streaming) rewriters. Need to read contents into buffer (for now)
- add RewriteAMF experimental AMF rewriter
- general rewriting system in BaseContentRewriter, default rewriters configured in DefaultRewriter
- tests: disable banner-only test as not currently support banner only (for now)
- wb-manager added metadata now loaded dynamically, cached, for search and index pages (#196)
- metadata updated w/o restart (#87)
- per-collection template overrides and per-template static file support
tests: test_auto_colls.py fully ported to new system
(per-collection config.yaml no longer supported)
rewriterapp: add 'Content-Location' if fuzzy match, or if using memento
tests: fix test to check for Content-Location for fuzzy match instead of redirect
rewriter: refactor to use mixins to extend base rewriter (todo: more refactoring)
fuzzy-matcher: support for additional 'match_filters' to filter fuzzy results via optional regexes by mime type,
eg. allow more lenient fuzzy matching on DASH manifests than other resources (for now)
fuzzy-matching: add WebAgg-Fuzzy-Match response header if response is fuzzy matched, redirect to exact match in rewriterapp
- webagg-server
- wayback
- live-rewrite-server
support adding custom settings to AutoApp
support for --live flag that automatically adds live-web source at '/live'
tests: disable cdx_server tests as old cdx_server removed
frontendapp compatibility
- add support for separate not found page for 404s (not_found.html)
- support for exception handling with error template (error.html)
- support for home page (index.html)
- add memento headers for replay
- add referrer fallback check
- tests: port integration tests for front-end replay, cdx server
- not included: proxy mode, exact redirect mode, non-framed replay
- move unused tests to tests_disabled
- cli: add optional werkzeug profiler with --profile flag
- don't remove max-age and expires if in 'live' rewrite mode (flag set on urlrewriter)
- remove secure only if replay prefix is not https
- fix expires UTC->GMT as cookie parsing chokes on UTC
- other rewriting: don't append rewrite prefix to x- headers
tests: add more cookie rewriting tests
proxy options: 'use_default_coll' must specify exact default coll
(otherwise a random coll is chosen, as ordering is not defined)
travis: add py3.4, py3.5!
the value fo collinfo.json template. Default template returns an entry for each handler route,
including the route path (id), title (name) and memento timegate and timemap paths, to be used with
an aggregator. Using a custom 'info_json' template can specify a different collinfo template, alternative to #69 (local aggregation)
Closes#146
proxy_ip_resolver: add option to use RedisCache if redis_cache_key set in config
proxy_ip_resolver: add 'delete' option to delete ip from cache, closes#145