* New client-side test system for Wombat.js in place using Karma and SauceLabs with initial set of tests and travis integration.
* Wombat Improvements:
- Better Safari/IE support: accessors overriden only when actually supported in browser, override gracefully skipped otherwise
- Use getOwnPropertyDescriptor() to get properties in addition to __lookupGetter__, __lookupSetter__
- baseURI overriden on correct prototype
- CSSStyleSheet.href override
- HTMLAnchorElement.toString() override
- Avoid making <base>.href read-only
* Proxy Mode Improvements:
- To avoid breaking HTTPS envelope, if no content-length provided, chunked encoding is used (HTTP/1.1) or response is buffered and content-length is computed (HTTP/1.0)
- Rewriter: Scheme-only rewriter converts embedded urls to http or https to match the scheme of containing page.
- IP Resolver: Supports IP cache in Redis
- Default resolver set to cookie resolver, collection/datetime switching options removed from UI in auth or ip resolvers.
Encoding: Use webencoding lib to better encode head-insert to match page encoding
Live Proxy: Support for explicit 'recording' mode, decoupled from using http/https proxy. (Before using proxy implied recording)
Rewriting: Convert relative urls for `rel=canonical` to absolute urls, even if not rewriting to ensure correct url.
UI: Use custom webkit scrollbars to minimize scrollbar-in-iframe issues that sometimes occur in Chrome.
Memento Improvements:
- Add /collinfo.json endpoint which by default returns a JSON spec for all collections as Memento endpoints, in a format compatible with MemGator.
- /collinfo.json endpoint customizable via `templates/collinfo.json` and must be enabled with `enable_coll_info: true`
- 'Not Found' error for timemap query returns empty timemap instead of standard HTML 404.
WARC Indexing:
- Better detection of content-length < payload, skip to next record boundary and warn, if possible.
- Use ujson if proper version (without forward-slash escaping) is available when writing CDXJ
* IPProxyResolver: Support new simple proxy resolver where collection and timestamp stored in server-side cache by IP and set via a rest api through `pywb.proxy` eg: ``curl -x "localhost:8080" http://pywb.proxy/set?ts=2015&coll=all``. No cookies or proxy auth needed in this mode. Useful for Docker-based deployments where virtual IP is fixed. Enabled with ``cookie_resolver: ip`` in ``proxy_options``.
* CDX Server: Add support for timestamp-bounded queries CDX queries ``from=`` and ``to=``, also support calendar query with (inclusive) ranges, eg. ``/2010-2015/example.com``, ``/2010-/example.com/``, ``/-2015/example.com/``.
* Rewrite: ensure ``<base>`` tag has trailing slash, or add ``<base>`` with trailing slash for host-name only urls, eg: ``http://localhost:8080/example.com``
* Disable url rewriting in JS by default! No longer needed due to improved client side rewriting of all urls.
* wombat 2.7 more rewriting improvements:
-``document.write`` override rewrites all elements, not just one top level elements.
- iframe ``srcdoc`` also rewritten.
- support for custom modifiers, such as ``js_`` for ``SCRIPT`` tag rewriting, otherwise for element overrides.
- improved css rewriting, override standard css attributes on ``CSSStyleDeclaration`` to avoid mutation observers, rewrite ``STYLE`` text content.
-``postMessage``: original ``source`` window now also preserved along with origin.
- cookie rewrite: don't remove expires, but adjust by date offset. Allow cookies to be deleted by setting to expired date.
* Embed mode, pywb framed replay can now be embedded in an iframe when ``embeddable: True`` option is set. ``postMessage`` on framed replay proxies between replay frame and embedded frame, and ``window.parent`` is not set to top replay frame, allowing access to containing frame.
* vidrw: don't replace video with generic swf, find better match.
* path index loader: ensure each request handled by own file reader.
- Override JS prototype getters and setters on ``href`` and ``src`` attributes of standard HTML elements, so that JavaScript access receives and sets the original url, but the element actually contains the rewritten url internally.
- Improved ``postMessage`` emulation: Ensure the original ``origin`` of the caller is saved, by wrapping ``X.postMessage`` in a special ``X.__WB_pmw(window).postMessage()`` call which will save origin of current window in X. Store origin and destination hosts.
- Improved ``message`` listener emulation: Add filtering to skip messages that were not inteded for destination host.
Can be disabled with ``no_match_rel=True`` in ``rewrite_opts``.
* Optional ``force_html_decl`` option to add a ``<!DOCTYPE>`` or other HTML declaration if none is present.
* Improved handling for `redir_to_exact=False`` mode. When set, no redirect on memento timegate, and serve ``Content-Location `` headers for actual memento, in conformance with Mememnto RFC Pattern 2.2 (http://tools.ietf.org/html/rfc7089#section-4.2.2)
* Proxy Mode Fixes: Ensure ``Content-Length`` header is always added and correct in proxy mode, needed for proper HTTPS
* s3 loading: support ``s3://`` scheme in block loader, allowing for loading index and archive files from s3. ``boto`` library must be installed seperately
via ``pip install boto``. Attempt default boto auth path, and if that fails, attempt anonymous s3 connection.
* Embedding improvements: If set, the contents of ``environ['pywb.template_params']`` dictionary are added directly to Jinja context, allowing for custom template
* Root collection support: Can specify a route with `''` which will be the root collection. Fix routing paths to ensure root collection is checked last.
* Customization: support custom route_class for cdx server and pass wbrequest to ``not_found_html`` error handlers.
* Manager: Validate collection names to start with word char and contain alphanum or dash only.
More details at: `Auto-Configuration and Wayback Collections Manager <https://github.com/ikreymer/pywb/wiki/Auto-Configuration-and-Wayback-Collections-Manager>`_
* Support for user metadata via per-collection ``metadata.yaml``
* Templates: improved/simpified home page and collection search page, show user metadata by default.
* Support for writing and reading new cdx JSON format (.cdxj), with searchable key followed by json dictionary: ``urlkey timestamp { ... }`` on each line
*``cdx-indexer -j``: support for generating cdxj format
*``cdx-indexer -mj``: support for minimal cdx format (in JSON format) only which skips reading the HTTP record.
Fields included in minimal format are: urlkey, timestamp, original url, record length, digest, offset, and filename
* rewrite: fix for redirect loop related to pages with 'www.' prefix. Since canonicalization removes the prefix, treat redirect to 'www.' as self-redirect (for now).
* memento: ensure rel=memento url matches timegate redirect exactly (urls may differ due to canonicalization, use actual instead of requested for both)
- via head-insert, the exact request timestamp is provided as ``wbinfo.request_ts`` and accessible to the banner insert or the top frame when in framed mode.
* new not found Jinja2 template: Add per-collection-overridable ``not_found.html`` template, specified via ``not_found_html`` option. For missing resources, the ``not_found_html`` template is now used instead of the generic ``error_html``
* JS Rewriters: add mixins for link + location (default), link only, location only rewriting by setting ``js_rewrite_location`` to ``all``, ``urls``, ``location``, respectively.
(New: location only rewriting does not change JS urls)
* Minor fixes for extensability and support https://webrecorder.io, easier to override any request (handle_request), handle_replay or handle_query via WBHandler
* Invert framed replay paradigm: Canonical page is always without a modifier (instead of with ``mp_``), if using frames, the page redirects to ``tf_``, and uses replaceState() to change url back to canonical form.
* Easier to customize just the banner html, via ``banner_html`` setting in the config. Default banner uses ui/banner.html and inserts the script default_banner.js, which creates the banner.
* Improved cookie and csrf-token rewriting, including: ability to set ``cookie_scope: root`` per collection to have all replayed cookies have their Path set to application root.
* better framed replay for non-html content -- include live rewrite timestamp via temp 'pywb.timestamp' cookie, updating banner of iframe load. All timestamp formatting moved to client-side for better customization.
* Support for a fallback handler which will be called from a replay handler instead of a 404 response.
The handler, specified via the ``fallback`` option, can be the name of any other replay handler. Typically, it can be used with a live rewrite handler to fetch missing content from live instead of showing a 404.
*``live-rewrite-server`` has optional ``--proxy host:port`` param to specify a loading live web data through an HTTP/S proxy, such as for use with a recording proxy.
* Tests: Additional testing of bad cdx lines, missing revisit records.
* Rewrite: Removal of lxml support for now, as it leads to problematic replay and not much performance improvements.
* Rewrite: Parsing of html as raw bytes instead of decode/encode, detection still needed for non-ascii compatible encoding.
* Indexing: Refactoring of cdx-indexer using a seperate 'archive record iterator' and pluggable cdx writer classes. Groundwork for creating custom indexers.
* Indexing: Support for 9 field cdx formats with -9 flag.
* Rewrite: Improved top -> WB_wombat_top rewriting.
* Rewrite: Better handling of framed replay url notification
* Support for framed or non-framed mode replay, toggleable via the ``framed_replay`` flag in the config.yaml
* Cookie rewriter: remove Max-Age to use ensure session-expiry instead of long-term cookie (experimental).
* Live Rewrite: proxy all headers, instead of a whitelist.
* Fixes to ``<base>`` tag handling, now correctly rewriting remainder of urls with the set base.
*``cdx-indexer`` options for resolving POST requests, and indexing request records. (``-p`` and ``-a``)
* Improved `POST request replay <https://github.com/ikreymer/pywb/wiki/POST-request-replay>`_, allowing for improved replay of many captures relying on POST requests.
* Cookie Rewriting in Archival Mode: HTTP Set-Cookie header rewritten to remove Expires, rewrite Path and Domain. If Domain is used, Path is set to / to ensure cookie is visible from all archival urls.
* Much improved handling of chunk encoded responses, better handling of zero-length chunks and fix bug where not enough gzip data was read for a full chunk to be decoded. Support for chunk-decoding w/o gzip decompression
* Further rewrite of wombat.js: support for window.open, postMessage overrides, additional rewriting at Node creation time, better hash change detection.
* Support for optional LXML html-based parser for fastest possible parsing. If lxml is installed on the system and via ``pip install lxml``, lxml parser is enabled by default.
(This can be turned off by setting ``use_lxml_parser: false`` in the config)
* Full support for `Memento Protocol RFC7089 <http://www.mementoweb.org/guide/rfc/>`_ Memento, TimeGate and TimeMaps. Memento: TimeMaps in ``application/link-format`` provided via the ``/timemap/*/`` query.. eg: http://localhost:8080/pywb/timemap/\*/http://example.com
* pywb now features new `domain-specific rules <https://github.com/ikreymer/pywb/blob/master/pywb/rules.yaml>`_ which are applied to resolve and render certain difficult and dynamic content, in order to make accurate web replay work.
This ruleset will be under further iteration to address further challenges as the web evoles.