pywb 2.7.3 changelist ~~~~~~~~~~~~~~~~~~~~~ * issue_792 catch warcio exception by @oskarhek in https://github.com/webrecorder/pywb/pull/793 * Add ui.logo_home_url as config.yaml option by @tw4l in https://github.com/webrecorder/pywb/pull/791 * [#795] Show error when adding duplicate warc file by @kuechensofa in https://github.com/webrecorder/pywb/pull/797 * Make search page more intuitive by @krakan in https://github.com/webrecorder/pywb/pull/794 * Modify search template buttons by @tw4l in https://github.com/webrecorder/pywb/pull/801 * [#804] Use default_locale when lang not set in the request by @krakan in https://github.com/webrecorder/pywb/pull/805 * feat: regex substitution on surt rules match by @mijho in https://github.com/webrecorder/pywb/pull/780 * Bump minimatch from 3.0.4 to 3.1.2 in /pywb/vueui by @dependabot in https://github.com/webrecorder/pywb/pull/777 * Bump decode-uri-component from 0.2.0 to 0.2.2 in /pywb/vueui by @dependabot in https://github.com/webrecorder/pywb/pull/786 * rules: add 'debugNoBatch' rewrite for fb and insta by @ikreymer in https://github.com/webrecorder/pywb/pull/806 * Vue main order by @tw4l in https://github.com/webrecorder/pywb/pull/809 * wombat: bump to 3.4.4 https://github.com/webrecorder/pywb/pull/808 pywb 2.7.2 changelist ~~~~~~~~~~~~~~~~~~~~~ * Fix regression introduced by improper wombat update in 2.7.1 * Fix `redirect_to_exact: false` functionality: if not set, UI will stay on current timestamp, but will display info on actual capture. * Location bar nav now keeps current timestamp instead of defaulting to calendar view. * 'Live' mode fixes, no longer cache live cdx entry, don't add timestamp when navigating in live mode without timestamp * Calendar dropdown on replay now scrollable. * Timeline toggle on replay is 'sticky', will stay on if toggled on replay. * Capture text: use '|' as in 'Current Capture: [title] | [capture date]' * Document title: Add 'Archived Page: ' prefix to avoid confusion with live pages. pywb 2.7.1 changelist ~~~~~~~~~~~~~~~~~~~~~ * Add locale-dependent handling of first day of week by @krakan in https://github.com/webrecorder/pywb/pull/781 * Make filter expressions translatable by @krakan in https://github.com/webrecorder/pywb/pull/783 * Add title to top frame in framed replay * Add missing tooltip translation strings * Fix calendar and timeline rendering for replay URLs without a timestamp * Update template documentation pywb 2.7.0 changelist ~~~~~~~~~~~~~~~~~~~~~ * New banner and calendar implementation in Vue.js, which supports localization/internationalization and easier local theming by @vanecat @ikreymer @tw4l with helpful feedback from @ldko * New interactive timeline to assist in navigating between captures * Add basic development Docker Compose configuration file * Update documentation * Add contributing guide pywb 2.6.9 changelist ~~~~~~~~~~~~~~~~~~~~~ * eval rewrite update + latest wombat by @ikreymer in https://github.com/webrecorder/pywb/pull/763 * Rewrite: Support target rewriting, open new windows in top-frame instead by @tw4l in https://github.com/webrecorder/pywb/pull/767 * Add arm64 platform support by @luandro in https://github.com/webrecorder/pywb/pull/775 * Add uwsgi virtualenv information by @tw4l in https://github.com/webrecorder/pywb/pull/770 * update to wombat 3.3.11 to support additional replay improvements * automated pypi publish on release https://github.com/webrecorder/pywb/pull/776 pywb 2.6.8 changelist ~~~~~~~~~~~~~~~~~~~~~ * Upgrade webassets to v2.0 by @m4rk3r in https://github.com/webrecorder/pywb/pull/730 * Encoding image 'srcset' value including the intrinsic width by @yasarkunduz in https://github.com/webrecorder/pywb/pull/712 * Prevent jinja2 from escaping HTML markup in collection metadata by @tw4l in https://github.com/webrecorder/pywb/pull/747 * Increase uwsgi_buffer_size for nginx config by @edsu in https://github.com/webrecorder/pywb/pull/716 * Add missing translation for the filter-epression field placeholder by @krakan in https://github.com/webrecorder/pywb/pull/721 * Activate field validation when expanding the advanced options by @krakan in https://github.com/webrecorder/pywb/pull/722 * S3 loader to use boto3 built-in credential configuration by @sebastian-nagel in https://github.com/webrecorder/pywb/pull/723 * describing installation using pip by @sepastian in https://github.com/webrecorder/pywb/pull/726 * Add missing org/image to docker run commands by @heyvito in https://github.com/webrecorder/pywb/pull/733 * Format error messages by @edsu in https://github.com/webrecorder/pywb/pull/737 * Ensure CDX status is a string by @edsu in https://github.com/webrecorder/pywb/pull/739 * Improve replay banner's accessibility by @lwrubel in https://github.com/webrecorder/pywb/pull/742 * Revisit headers load fix by @ikreymer in https://github.com/webrecorder/pywb/pull/751 * Enable translation for the remaining strings on the search results page by @krakan in https://github.com/webrecorder/pywb/pull/752 * revisit of redirect optimization: by @ikreymer in https://github.com/webrecorder/pywb/pull/753 * proxy: add COEP header for proxy mode to avoid errors by @ikreymer in https://github.com/webrecorder/pywb/pull/755 * tests run improvements: update from python setup.py test -> tox by @ikreymer in https://github.com/webrecorder/pywb/pull/754 * rewrite: detect edge-case where html starts with bom followed by @ikreymer in https://github.com/webrecorder/pywb/pull/758 * tests options: add PYWB_NO_VERIFY_SSL env var for tests to avoid fail… by @ikreymer in https://github.com/webrecorder/pywb/pull/760 * rewriting fix: twitter video in embedded tweets by @ikreymer in https://github.com/webrecorder/pywb/pull/761 * Add ir_ modifier by @ikreymer in https://github.com/webrecorder/pywb/pull/759 * Remove unused Appveyor badge pywb 2.6.7 changelist ~~~~~~~~~~~~~~~~~~~~~ * dependency: bump gevent to latest (21.12.0) * rewrite: fix eval rewriting where '._eval' was accidentally being rewritten * post-to-get conversion: properly handle json with top-level lists, to match cdxj-indexer, print parse errors, fixes `#709 `_ pywb 2.6.6 changelist ~~~~~~~~~~~~~~~~~~~~~ * dependency: don't use obsolete werkzeug useragent package `#704 `_ * fix user-agent detection: use ua-parser module, default to new js-proxy mode, unless older browser detected `#707 `_ * fix tests: disable broken s3 tests for now * Dockerfile: use python 3.8 by default pywb 2.6.5 changelist ~~~~~~~~~~~~~~~~~~~~~ * fix build: add 'markupsafe<2.1.0' to requirements pywb 2.6.4 changelist ~~~~~~~~~~~~~~~~~~~~~ * wombat.js: actually update to 3.3.6, update built wombat.js * Fix live mode when ``redirect_to_exact`` is enabled `#692 `_ * Rules: additional fuzzy ignore of facebook query param: `#691 `_ * Docs: typo fixes: `#669 `_, `#670 `_ pywb 2.6.3 changelist ~~~~~~~~~~~~~~~~~~~~~ * Fix false-positive rewriting of ``location`` through additional check if local var is used, fixes `#684 `_ * Fix missing localization of placeholder, fixes `#685 `_ * Fix regression caused by 2.6.2, ensure pywb.app_prefix, pywb.host_prefix and pywb.static_prefix paths set correctly for all pages `#688 `_, fixes `#686 `_ * Documentation: Fixes to ``cdx-indexer`` helped (from @ldko) `#683 `_ * Update wombat.js to 3.3.6 * Add automatic Docker push on new GitHub release pywb 2.6.2 changelist ~~~~~~~~~~~~~~~~~~~~~ Fix regression caused by 2.6.1, with static files not being loaded correctly. `#678 `_ pywb 2.6.1 changelist ~~~~~~~~~~~~~~~~~~~~~ * Domain-Specific Rewriting Rules: Rewrite twitter video to capture full videos. * Disable rewriting ``data-`` attributes, better fidelity without rewriting, fixes `#676 `_ * Fix regression in autoescaping URL in frame_insert.html * Feature: ability to set path used to serve static assets (default ``static``) via ``static_prefix`` config option. * Update wombat.js 3.3.4 (includes various rewriting fixes) pywb 2.6.0 changelist ~~~~~~~~~~~~~~~~~~~~~ * Improvements for eval() rewriting + extra unnamed scope to avoid variable collision `#668 `_ * fix for documentation links `#666 `_ * Update to latest wombat.js (3.3.0) pywb 2.6.0b4 changelist ~~~~~~~~~~~~~~~~~~~~~~~ * Update rules for IG rewriting to disable Dash `#662 `_ * Support for adding custom resource records via PUT ``//record`` `#661 `_ * Fixes for URL encoding for query and remote index `#657 `_ and `#658 `_ * Doc fixes for incorrect param name `#651 `_ * Update to latest wombat.js (3.2.2) pywb 2.6.0b3 changelist ~~~~~~~~~~~~~~~~~~~~~~~ * Display 'ignoring locales' warning only if locales specified (don't specify any by default) * Add -V flag to wb-manager and pywb/wayback commands to display current version and exit pywb 2.6.0b2 changelist ~~~~~~~~~~~~~~~~~~~~~~~ * Update documentation for CDX Server API (by @sebastian-nagel) `#651 `_ Localization fixes: `#653 `_ * Ensure banner template is not autoescaped * Don't show locale switch on not found pages (redundant with banner) * Ensure wb-manager works when optional i18n dependencies are not installed pywb 2.6.0b1 changelist ~~~~~~~~~~~~~~~~~~~~~~~ Additional documentation / localization fixes `#650 `_ * Ensure home page and error page keeps locale, language switching is working. * Add autoescaping to Jinja2 to avoid XSS issues (suggested by @sebastian-nagel) * Add support for 'pywb[i18n]' extra to install localization dependencies Documentation typo fixes (by @ldko, `#649 `_) pywb 2.6.0b0 changelist ~~~~~~~~~~~~~~~~~~~~~~~ Documentation Updates: * `Embargo + ACL system updates `_ * `New ACL header configuration `_ * `Locaalization / Multi-lingual Support Guide `_ Localization Improvements: (`#647 `_) * Support for extracting, updating, listing and removing localizable commands via ``wb-manager i18n`` command. * UI: Add language switch header to all UI templates. * Mark localizable strings in translatable in existing templates. Access Control Improvements: * Support for Embargo System for date-based embargo, overridable via ACL ``allow_ignore_embargo`` `#642 `_ * Support for custom ACL 'user' specified via ``X-pywb-ACL-User`` header passed from frontend proxies. * Fixes for exact rule matching `#629 `_ * Fixes for ACL for auto-collections `#620 `_ Rewriting Improvements: * Updated YT rewriting rules `#635 `_ * POST-to-get rewriting consistent with cdxj-indexer, wabac.js/replayweb.page `#636 `_ * Improved fuzzy matching to ensure non-POST requests handled via fuzzy matching. * Live web: never truncate when reading POST request to avoid hung requests! (Apply limit only on indexing CDX Server / API Compatibility Fixes: * XmlQuery: set WARC record length field, if available `#633 `_ * ZipNum: Don't count pages with filter `#631 `_ * Better handle of CDX Server HTTP status `#624 `_ * Better handling of errors from CDX Server API with 400 `#623 `_, `#625 `_, `#626 `_, `#630 `_ * Backwards compatibility of ``fl`` param `#621 `_ Recording Redis Dedup mode: * Fix dedup index config loading `#617 `_ * Add recording size counter to track any in-flight requests `#637 `_ pywb 2.5.0 changelist ~~~~~~~~~~~~~~~~~~~~~ * Update to latest wombat.js (3.0.3) * Dedup Mode: Support for Redis-based dedup index to skip or write revisit records for duplicates, replay from Redis-based index `#597 `_, `#611 `_ * Rewriting: Updated Rules for youtube and vimeo replay `#610 `_ * CDX Indexing: More efficint cdx sorting `#609 `_ * Set default CDX closest lookup limit to 100 instead of 10 `#606 `_ * UI: Try to avoid css class conflicts in injected banner `#604 `_ * Catch invalid headers in uWSGI `#603 `_ * Config option to support certificate validation when capturing `#596 `_ * Fix indexing POST requests with multipart/form-data without boundary `#599 `_ * New OpenWayback->pywb Transition Guide: `https://pywb.readthedocs.io/en/latest/manual/owb-transition.html `_ * Sample deployments with Docker Compose for running with Apache, Nginx and OutbackCDX in ``sample-deploy`` directory. * Update to latest gevent to fix issues with latest python `#583 `_ pywb 2.4.2 changelist ~~~~~~~~~~~~~~~~~~~~~ * ensure RemoteCDXIndexSource also passes ``matchType`` to upstream * cdx-indexer: use ``-o`` flag to specify output, not first param (output to stdout by default) * static paths cleanup, move ``url-polyfill.min.js`` to correct dir (fixes `#571 `_) * minor fixes to docs * logo: resize new logo to actual size, add logo via absolute link to ensure it works on pypi also pywb 2.4.1 changelist ~~~~~~~~~~~~~~~~~~~~~ * Minor fix: allow timegate content check in `#564 `_ to be ignored (for use with derived classes) pywb 2.4.0 changelist ~~~~~~~~~~~~~~~~~~~~~ This release includes significant update, specifically merging of https://github.com/ukwa/pywb branch into this release. A few selected improvements: * New Access Control System: https://pywb.readthedocs.io/en/latest/manual/access-control.html * Support for Localization, configuring multiple languages (not enabled by default): https://github.com/ukwa/ukwa-pywb/blob/master/docs/localization.md * Support for OpenWayback-style XML-based index source (xmlquery) * Support for loading from WebHDFS via `webhdfs://` scheme. * Initial support for a new embeds/transclusions replay system, in combination with warcit: https://github.com/webrecorder/warcit/wiki/Warcit-Video-Audio-Conversion * Proxy mode improvements: handle OPTIONS requests and CORS `#520 `_ * Memento Prefer header: support for experimental `Prefer` header to select 'raw' or 'rewritten' memento * Other memento fixes: fix timemap including invalid mementos, correct timegate behavior on top frame `#564 `_ * Fixes for collection metadata display: `#509 `_ * Fix for incorrected WARC record length due to re-serialized headers: `#561 `_ * Filter invalid WARC records `#536 `_ * Updated fuzzy matching rules and wombat client-side rewriting. For the full changelist, see this PR: `#565 `_ * Access Control System pywb 2.3.5 changelist ~~~~~~~~~~~~~~~~~~~~~ * General auto-fetch fixes (#503) - Fixed issue that caused HTTP 404 errors to happen when parsing stylesheet hrefs as sheets (webrecorder/wombat #11) - Ensured that requests made are cached by the browser (webrecorder/wombat #13 & #15) - Ensured that the request made by the backing web worker when in proxy mode are not blocked by CORS (webrecorder/wombat #13 & #15) * SOCKS proxy fixes (#504) - simplify SOCKS config (avoiding global socket monkey patch), default to no cert verify to match non-proxy behavior - SOCKS proxy can be disabled dynamically by setting SOCKS_DISABLE pywb 2.3.4 changelist ~~~~~~~~~~~~~~~~~~~~~ * Improvements to auto-fetch to support page fetch (webrecroder/wombat#5, #497) - Support fetching page with ``X-Wombat-History-Page`` and title ``X-Wombat-History-Title`` headers present. - Attempt to extract title and pass along with cdx to ``_add_history_page()`` callback in RewriterApp, to indicate a url is a page. (#498) - General auto-fetch fixes: queue messages if worker not yet inited (in proxy mode), only parse stylesheet hrefs as sheets. * Cookie Rewriting Fix: don't update cookie cache on service worker (``sw_`` modifier) responses (#499) * Rewriting: HTML Unescape Fix: Attempt to HTML-entity-decode urls and innline styles that contain ``&#`` to get correct rewriting of encoded urls (#500) pywb 2.3.3 changelist ~~~~~~~~~~~~~~~~~~~~~ * Proxy Mode: Ensure head insert added even if no ```` tag, insert after first tag that is not ```` or ```` (#496) pywb 2.3.2 changelist ~~~~~~~~~~~~~~~~~~~~~ * Eval rewriting fix: don't rewrite ``$eval``, only ``eval`` identifier (#493) * Cookie rewriting improvements: (#491) - Enable domain cookie cache for live index and recording modes using fakeredis, previously only available in Webrecorder - Don't add duplicate cookies to Set-Cookie or Cookie headers - Don't include cached Set-Cookie headers to serviceworkers for non-200 responses. - Add cookies for ``sw_/`` and ``wkrf_`` modifiers - Testing: add initial testing for domain cookie rewriting * Misc fixes: (#490) - Ensure SCRIPT_NAME never empty (#490) - Static Paths: load ``/index.html`` for paths ending in ``/``, ensure static_prefix always inited correctly - Docker: switch to designated $VOLUME_DIR before initializing - Rules: update rules for soundcloud pywb 2.3.1 changelist ~~~~~~~~~~~~~~~~~~~~~ * Fix regression in wombat, new window.parent override from (webrecorder/wombat#2) was throwing exception if top-frame was cross-origin (webrecorder/wombat#3) * Update to latest wombat, v3.0.0 pywb 2.3.0 changelist ~~~~~~~~~~~~~~~~~~~~~ * Wombat Improvements and modularization: - Client-side rewriting and auto-fetch systems moved to https://github.com/webrecorder/wombat - Module-based setup and full testing for wombat - Continuous auto-fetch up to 20 requests (#484) * Replay / Fidelity Improvements (#451): - Introduced a new server-side rewriter, JSWorkerRewriter, that handles rewriting JS workers and service-workers - Improvements to JSOP Rewriter to handle empty query (#475) - Improvements to postMessage rewriting, override `eval(` while preserving scope (#475) - Fixes to ``this`` proxy rewrite to include ``, this`` * Misc Changes: - Versioning: switched back to semver to more easily keep track of versions (#488) - Improved handling of open http connections and file handles (#463) - Fixes for latest urllib3, not verifying SSL certs (#467), (#469) - Better logging for invalid cdxlines and cookies (#477), (#478) - Fix warning in yaml.load (#472) - Index invalid form-data as binary (#471) pywb 2.2.20190410 changelist ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Improved rewriting of JSONP, support matching JSONP with ``//`` comments (fixes #459) pywb 2.2.20190311 changelist ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Support for setting timestamp in proxy mode via ``--proxy-default-timestamp`` (fixes #452) * Remove any ``WB_wombat_`` found in POST requests from old versions of pywb. * Fixes new query UI when loading traditional calendar ``/*/`` pages (#455, #456) pywb 2.2.x changelist ~~~~~~~~~~~~~~~~~~~~~ * New Versioning System: (#445) - Switching to hybrid semantic / calendar ``major.minor.yyyymmdd`` versioning. - The ``major.minor`` version will be updated for larger changes. - The ``.yyyymmdd`` date component will be updated for smaller incremental releases, for fidelity improvements and smaller bug fixes. * Auto Fetch System: - Added ``picture > source[srcset]`` extraction and increased the robustness of relative srcset URLs resolution (#415) - Enabled auto-fetching of video, audio resources (#427) - Expoxed AutoFetchWorker api in proxy mode to allow external JS to initiate checks (#389) * Build / CI Improvements: - Tweaked usage of wr-tests in CI (#431) - Ensured that usage of XVFB works on travis.ci (#436) - Updated Docker image to support - Python 3.7 support and CI testing (#447) * Docker: - Updated Docker image to Python 3.7.2, match docker user uid/gid to that of existing volume (#446) - Add documentation for using Docker image and automated images (#448) * Fuzzy Matching: - Added an additional Facebook rule targeting timeline replay (#440) * Memento: - Fixed regression in FrontendApp when handling TimeMap requests (#423) * Recording: - Remove Transer-Encoding from internal response (#437) - If brotli decoding package can't be loaded, remove ``br`` from ``Accept-Encoding`` header (#444) * Replay / Fidelity Improvements: - Wombat now uses the actual page scheme instead of defaulting to http when extracting the original url (#404) - Improved URL rewriting in web workers (#420) - Improved replay of content coming from a frameset's frame (#438) - Updated rules for facebook (#440) - Introduce new banner behavior and ensured that banner does not become stuck displaying "Loading..." (#418) * Server-Side Rewriting: - Improved the rewriting process of HTTP headers that are encoded in the non-standard ``UTF-8`` encoding (#402) - Improved the JavaScript rewriter's rewrites of the ``location`` symbol in order to avoid rewriting ``$location`` (#403) - Added an additional check of ``text/html`` content to ensure that it is actually ``html`` (#428) - Fixed HTML detection for UTF-8 files starting with BOM (#441) - Fixed parsing of invalid conditional comments, eg. treat '' as '' (#441) * UI: - New Query UI with support for prefix queries, forms for advanced search via cdx server api, incremental results loading (#421) pywb 2.1.0 changelist ~~~~~~~~~~~~~~~~~~~~~ * Replay Fidelity Improvements: - Improved wombat web worker rewriting overrides, use custom modifier ``wkr_`` (#351) - Added checks to wombat that preserve the behavior of non-wombat added polyfills to native functions (#350) - Framed replay: Ensured the page title and favicon are displayed in the top-frame (#356, #369) - Improved replay of request sent as ``text/html`` but are actually ``application/json``` (#367) - Added replay of compressed resources by forcing decompression if the UA did not indicate it could handle the resources encoding (#372) - Added ``window.origin``, and ``setTimeout``, ``setInterval`` overrides to wombat to handle the non-function callback case (#381) - Added ``CSSStyleSheet.insertRule`` and ```Text``` overrides to wombat improve rewriting of dynamically added/modification of CSS (#382) - Remove extra ``window.frames`` override to avoid extra override if ``window.frames === window`` (#383) - Wombat inited via ``window._WBWombatInit(wbinfo);``, allows for reinit if inited 'synethically' and not from the page html insert (#383) - Added ``document.evaluate`` override in-order to deproxy the context node (#385) - Optimized argument de-proxying in wombat (#385) - Improved iframe srcdoc rewriting in wombat (#386) - Improved rewriting strings of full HTML by making the check case insensitive and looking for `` and