pywb 2.6.0b3 changelist ~~~~~~~~~~~~~~~~~~~~~~~ * Display 'ignoring locales' warning only if locales specified (don't specify any by default) * Add -V flag to wb-manager and pywb/wayback commands to display current version and exit pywb 2.6.0b2 changelist ~~~~~~~~~~~~~~~~~~~~~~~ * Update documentation for CDX Server API (by @sebastian-nagel) `#651 `_ Localization fixes: `#653 `_ * Ensure banner template is not autoescaped * Don't show locale switch on not found pages (redundant with banner) * Ensure wb-manager works when optional i18n dependencies are not installed pywb 2.6.0b1 changelist ~~~~~~~~~~~~~~~~~~~~~~~ Additional documentation / localization fixes `#650 `_ * Ensure home page and error page keeps locale, language switching is working. * Add autoescaping to Jinja2 to avoid XSS issues (suggested by @sebastian-nagel) * Add support for 'pywb[i18n]' extra to install localization dependencies Documentation typo fixes (by @ldko, `#649 `_) pywb 2.6.0b0 changelist ~~~~~~~~~~~~~~~~~~~~~~~ Documentation Updates: * `Embargo + ACL system updates `_ * `New ACL header configuration `_ * `Locaalization / Multi-lingual Support Guide `_ Localization Improvements: (`#647 `_) * Support for extracting, updating, listing and removing localizable commands via ``wb-manager i18n`` command. * UI: Add language switch header to all UI templates. * Mark localizable strings in translatable in existing templates. Access Control Improvements: * Support for Embargo System for date-based embargo, overridable via ACL ``allow_ignore_embargo`` `#642 `_ * Support for custom ACL 'user' specified via ``X-pywb-ACL-User`` header passed from frontend proxies. * Fixes for exact rule matching `#629 `_ * Fixes for ACL for auto-collections `#620 `_ Rewriting Improvements: * Updated YT rewriting rules `#635 `_ * POST-to-get rewriting consistent with cdxj-indexer, wabac.js/replayweb.page `#636 `_ * Improved fuzzy matching to ensure non-POST requests handled via fuzzy matching. * Live web: never truncate when reading POST request to avoid hung requests! (Apply limit only on indexing CDX Server / API Compatibility Fixes: * XmlQuery: set WARC record length field, if available `#633 `_ * ZipNum: Don't count pages with filter `#631 `_ * Better handle of CDX Server HTTP status `#624 `_ * Better handling of errors from CDX Server API with 400 `#623 `_, `#625 `_, `#626 `_, `#630 `_ * Backwards compatibility of ``fl`` param `#621 `_ Recording Redis Dedup mode: * Fix dedup index config loading `#617 `_ * Add recording size counter to track any in-flight requests `#637 `_ pywb 2.5.0 changelist ~~~~~~~~~~~~~~~~~~~~~ * Update to latest wombat.js (3.0.3) * Dedup Mode: Support for Redis-based dedup index to skip or write revisit records for duplicates, replay from Redis-based index `#597 `_, `#611 `_ * Rewriting: Updated Rules for youtube and vimeo replay `#610 `_ * CDX Indexing: More efficint cdx sorting `#609 `_ * Set default CDX closest lookup limit to 100 instead of 10 `#606 `_ * UI: Try to avoid css class conflicts in injected banner `#604 `_ * Catch invalid headers in uWSGI `#603 `_ * Config option to support certificate validation when capturing `#596 `_ * Fix indexing POST requests with multipart/form-data without boundary `#599 `_ * New OpenWayback->pywb Transition Guide: `https://pywb.readthedocs.io/en/latest/manual/owb-transition.html `_ * Sample deployments with Docker Compose for running with Apache, Nginx and OutbackCDX in ``sample-deploy`` directory. * Update to latest gevent to fix issues with latest python `#583 `_ pywb 2.4.2 changelist ~~~~~~~~~~~~~~~~~~~~~ * ensure RemoteCDXIndexSource also passes ``matchType`` to upstream * cdx-indexer: use ``-o`` flag to specify output, not first param (output to stdout by default) * static paths cleanup, move ``url-polyfill.min.js`` to correct dir (fixes `#571 `_) * minor fixes to docs * logo: resize new logo to actual size, add logo via absolute link to ensure it works on pypi also pywb 2.4.1 changelist ~~~~~~~~~~~~~~~~~~~~~ * Minor fix: allow timegate content check in `#564 `_ to be ignored (for use with derived classes) pywb 2.4.0 changelist ~~~~~~~~~~~~~~~~~~~~~ This release includes significant update, specifically merging of https://github.com/ukwa/pywb branch into this release. A few selected improvements: * New Access Control System: https://pywb.readthedocs.io/en/latest/manual/access-control.html * Support for Localization, configuring multiple languages (not enabled by default): https://github.com/ukwa/ukwa-pywb/blob/master/docs/localization.md * Support for OpenWayback-style XML-based index source (xmlquery) * Support for loading from WebHDFS via `webhdfs://` scheme. * Initial support for a new embeds/transclusions replay system, in combination with warcit: https://github.com/webrecorder/warcit/wiki/Warcit-Video-Audio-Conversion * Proxy mode improvements: handle OPTIONS requests and CORS `#520 `_ * Memento Prefer header: support for experimental `Prefer` header to select 'raw' or 'rewritten' memento * Other memento fixes: fix timemap including invalid mementos, correct timegate behavior on top frame `#564 `_ * Fixes for collection metadata display: `#509 `_ * Fix for incorrected WARC record length due to re-serialized headers: `#561 `_ * Filter invalid WARC records `#536 `_ * Updated fuzzy matching rules and wombat client-side rewriting. For the full changelist, see this PR: `#565 `_ * Access Control System pywb 2.3.5 changelist ~~~~~~~~~~~~~~~~~~~~~ * General auto-fetch fixes (#503) - Fixed issue that caused HTTP 404 errors to happen when parsing stylesheet hrefs as sheets (webrecorder/wombat #11) - Ensured that requests made are cached by the browser (webrecorder/wombat #13 & #15) - Ensured that the request made by the backing web worker when in proxy mode are not blocked by CORS (webrecorder/wombat #13 & #15) * SOCKS proxy fixes (#504) - simplify SOCKS config (avoiding global socket monkey patch), default to no cert verify to match non-proxy behavior - SOCKS proxy can be disabled dynamically by setting SOCKS_DISABLE pywb 2.3.4 changelist ~~~~~~~~~~~~~~~~~~~~~ * Improvements to auto-fetch to support page fetch (webrecroder/wombat#5, #497) - Support fetching page with ``X-Wombat-History-Page`` and title ``X-Wombat-History-Title`` headers present. - Attempt to extract title and pass along with cdx to ``_add_history_page()`` callback in RewriterApp, to indicate a url is a page. (#498) - General auto-fetch fixes: queue messages if worker not yet inited (in proxy mode), only parse stylesheet hrefs as sheets. * Cookie Rewriting Fix: don't update cookie cache on service worker (``sw_`` modifier) responses (#499) * Rewriting: HTML Unescape Fix: Attempt to HTML-entity-decode urls and innline styles that contain ``&#`` to get correct rewriting of encoded urls (#500) pywb 2.3.3 changelist ~~~~~~~~~~~~~~~~~~~~~ * Proxy Mode: Ensure head insert added even if no ```` tag, insert after first tag that is not ```` or ```` (#496) pywb 2.3.2 changelist ~~~~~~~~~~~~~~~~~~~~~ * Eval rewriting fix: don't rewrite ``$eval``, only ``eval`` identifier (#493) * Cookie rewriting improvements: (#491) - Enable domain cookie cache for live index and recording modes using fakeredis, previously only available in Webrecorder - Don't add duplicate cookies to Set-Cookie or Cookie headers - Don't include cached Set-Cookie headers to serviceworkers for non-200 responses. - Add cookies for ``sw_/`` and ``wkrf_`` modifiers - Testing: add initial testing for domain cookie rewriting * Misc fixes: (#490) - Ensure SCRIPT_NAME never empty (#490) - Static Paths: load ``/index.html`` for paths ending in ``/``, ensure static_prefix always inited correctly - Docker: switch to designated $VOLUME_DIR before initializing - Rules: update rules for soundcloud pywb 2.3.1 changelist ~~~~~~~~~~~~~~~~~~~~~ * Fix regression in wombat, new window.parent override from (webrecorder/wombat#2) was throwing exception if top-frame was cross-origin (webrecorder/wombat#3) * Update to latest wombat, v3.0.0 pywb 2.3.0 changelist ~~~~~~~~~~~~~~~~~~~~~ * Wombat Improvements and modularization: - Client-side rewriting and auto-fetch systems moved to https://github.com/webrecorder/wombat - Module-based setup and full testing for wombat - Continuous auto-fetch up to 20 requests (#484) * Replay / Fidelity Improvements (#451): - Introduced a new server-side rewriter, JSWorkerRewriter, that handles rewriting JS workers and service-workers - Improvements to JSOP Rewriter to handle empty query (#475) - Improvements to postMessage rewriting, override `eval(` while preserving scope (#475) - Fixes to ``this`` proxy rewrite to include ``, this`` * Misc Changes: - Versioning: switched back to semver to more easily keep track of versions (#488) - Improved handling of open http connections and file handles (#463) - Fixes for latest urllib3, not verifying SSL certs (#467), (#469) - Better logging for invalid cdxlines and cookies (#477), (#478) - Fix warning in yaml.load (#472) - Index invalid form-data as binary (#471) pywb 2.2.20190410 changelist ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Improved rewriting of JSONP, support matching JSONP with ``//`` comments (fixes #459) pywb 2.2.20190311 changelist ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Support for setting timestamp in proxy mode via ``--proxy-default-timestamp`` (fixes #452) * Remove any ``WB_wombat_`` found in POST requests from old versions of pywb. * Fixes new query UI when loading traditional calendar ``/*/`` pages (#455, #456) pywb 2.2.x changelist ~~~~~~~~~~~~~~~~~~~~~ * New Versioning System: (#445) - Switching to hybrid semantic / calendar ``major.minor.yyyymmdd`` versioning. - The ``major.minor`` version will be updated for larger changes. - The ``.yyyymmdd`` date component will be updated for smaller incremental releases, for fidelity improvements and smaller bug fixes. * Auto Fetch System: - Added ``picture > source[srcset]`` extraction and increased the robustness of relative srcset URLs resolution (#415) - Enabled auto-fetching of video, audio resources (#427) - Expoxed AutoFetchWorker api in proxy mode to allow external JS to initiate checks (#389) * Build / CI Improvements: - Tweaked usage of wr-tests in CI (#431) - Ensured that usage of XVFB works on travis.ci (#436) - Updated Docker image to support - Python 3.7 support and CI testing (#447) * Docker: - Updated Docker image to Python 3.7.2, match docker user uid/gid to that of existing volume (#446) - Add documentation for using Docker image and automated images (#448) * Fuzzy Matching: - Added an additional Facebook rule targeting timeline replay (#440) * Memento: - Fixed regression in FrontendApp when handling TimeMap requests (#423) * Recording: - Remove Transer-Encoding from internal response (#437) - If brotli decoding package can't be loaded, remove ``br`` from ``Accept-Encoding`` header (#444) * Replay / Fidelity Improvements: - Wombat now uses the actual page scheme instead of defaulting to http when extracting the original url (#404) - Improved URL rewriting in web workers (#420) - Improved replay of content coming from a frameset's frame (#438) - Updated rules for facebook (#440) - Introduce new banner behavior and ensured that banner does not become stuck displaying "Loading..." (#418) * Server-Side Rewriting: - Improved the rewriting process of HTTP headers that are encoded in the non-standard ``UTF-8`` encoding (#402) - Improved the JavaScript rewriter's rewrites of the ``location`` symbol in order to avoid rewriting ``$location`` (#403) - Added an additional check of ``text/html`` content to ensure that it is actually ``html`` (#428) - Fixed HTML detection for UTF-8 files starting with BOM (#441) - Fixed parsing of invalid conditional comments, eg. treat '' as '' (#441) * UI: - New Query UI with support for prefix queries, forms for advanced search via cdx server api, incremental results loading (#421) pywb 2.1.0 changelist ~~~~~~~~~~~~~~~~~~~~~ * Replay Fidelity Improvements: - Improved wombat web worker rewriting overrides, use custom modifier ``wkr_`` (#351) - Added checks to wombat that preserve the behavior of non-wombat added polyfills to native functions (#350) - Framed replay: Ensured the page title and favicon are displayed in the top-frame (#356, #369) - Improved replay of request sent as ``text/html`` but are actually ``application/json``` (#367) - Added replay of compressed resources by forcing decompression if the UA did not indicate it could handle the resources encoding (#372) - Added ``window.origin``, and ``setTimeout``, ``setInterval`` overrides to wombat to handle the non-function callback case (#381) - Added ``CSSStyleSheet.insertRule`` and ```Text``` overrides to wombat improve rewriting of dynamically added/modification of CSS (#382) - Remove extra ``window.frames`` override to avoid extra override if ``window.frames === window`` (#383) - Wombat inited via ``window._WBWombatInit(wbinfo);``, allows for reinit if inited 'synethically' and not from the page html insert (#383) - Added ``document.evaluate`` override in-order to deproxy the context node (#385) - Optimized argument de-proxying in wombat (#385) - Improved iframe srcdoc rewriting in wombat (#386) - Improved rewriting strings of full HTML by making the check case insensitive and looking for `` and