- insert head-insert before first tag that is not <html> or <head> insert before
- addresses issue with rewriting pages that have no <head> tag (already handled in full rewriter)
- tests: add tests for HTMLInsertOnlyRewriter
- bump version to 2.3.3, update changelist
* domain cookie fix:
- don't set cookies for service worker modifiers if response is not 200
- don't add existing cookies to Cookie or Set-Cookie headers
- add sw_/, wkrf_/ modifiers to generate paths
- enable domain cookie cacheing by default with fakeredis for live index and record mode, keyed by collection
- reqs: add fakeredis, tldextract, update warcio
- tests: add initial tests for domain cookie rewriting
* misc fixes:
- ensure SCRIPT_NAME is never empty, fixes#466
- static: if ending in '/' look for '/index.html'
- tests: use local httpbin instead of iana.org tests
- docker: switch to $VOLUME_DIR before initing collection
- ensure static_prefix is set correctly after host prefix
- bump version to 2.3.2.dev0
* rules update: fix fuzzy matching, rewriting rules for soundcloud
versioning: switch back to semver for 2.3.0, manual version updates
- rename update-version.sh -> update-tag.sh to push tag for existing versions
- bump version to 2.3.0 for release
- tweaked the JSWombatProxyRules regex for = this to be = this and , this
- added comments to the more complicated regex's used by JSWombatProxyRules
- added test case for tweaked regex
- reworked both proxy and non-proxy mode backing workers to no-longer fetch in burst mode but as sent with a maximum of 20 fetches running at a time
- added just-fetch to non-proxy mode backing worker
- updated the auto fetch worker abstraction in non-proxy mode used by wombat to exposed like in proxy mode and ensured that value property for the srcset object is used when sending rewritten srcset values to the backing worker
- combined the backing worker proxy & non-proxy mode into a single file
- added rollup config for back auto fetch worker
* Handle incorrectly formatted form data; address #471.
* Attempt to always decode application/x-www-form-urlencoded form-data as utf-8, if fails to decode, treat it as binary post data (base64 encode and add with __wb_post_data=)
- fixed edge case in jsonP rewriting where no callback name is supplied only ? but body has normal jsonP callback (url = https://geolocation.onetrust.com/cookieconsentpub/v1/geo/countries/EU?callback=?)
- made the `!self.__WB_pmw` server side inject match the client side one done via wombat
- added regex's for eval override to JSWombatProxyRules
* removed the definition of `__WB_pmw` from `ensureServerSideInjectsExistOnWindow` in order to allow more proper handling of that definition t occur from `initNewWindowWombat` or `wombatInit`.
`initNewWindowWombat` now initializes wombat for (i)frame's with src values prefixed with about: as about:srcdoc is commonly used
tweaked postMessage and event listener overrides to be more like the previous wombat revision
* rebased on develop and rebuilt bundle
* improved pywb's closing of open file handles and http connects by adding to pywb.util.io no_except_close
replaced close calls with no_except_close
reformatted and optimizes import of files that were modified
additional ci build fixes:
- pin gevent to 1.4.0 in order to ensure build of pywb on ubuntu use gevent's wheel distribution
- youtube-dl fix: use youtube-dl in quiet mode to avoid errors with youtube-dl logging in pytest
- Fix: a few broken tests due to iana.org requiring a user agent in its requests
rewrite:
- introduced a new JSWorkerRewriter class in order to support rewriting via wombat workers in the context of all supported worker variants via
- ensured rewriter app correctly sets the static prefix
wombat:
- add wombat as submodule!
* proxy: add option to set default timestamp for proxy mode, fixes#452
- set via flag --proxy-default-timestamp or config 'proxy_options.default_timestamp'
- can be iso date or all-digit timestamp
- overridable via accept-datetime header
* docs: update docs for proxy timestamp
- add docs on memento support in proxy mode
* update-version: script can update version only, commit with 'update-version.sh commit'
* indexer post append: remove 'WB_wombat_' from POST query, could have been added in previous versions of pywb!
* Misc improvements, including fixes from @funkyfuture:
- Dockerfile: Reduces number of created layers and source contents
- Support for automatic collection creation if INIT_COLLECTION is defined
- Add entry point script docker-entrypoint.sh
- update to latest python (3.7.2 currently)
- additions to .dockerignore
- setup.py and requirements cleanup (just use plain 'gevent' requirement)
* docker-entrypoint.sh improvements:
- before running cmd, match uid/gid to that of volume dir (specified via $VOLUME_DIR, defaulting to /webarchive)
- if volume is owned by root (default if none mounted), just run as root
- if volume is owned by different user, create/update user 'archivist' to match the uid/gid of $VOLUME_DIR, then run cmd as 'su archivist'
* py3.7 fixes:
- add __repr__ to WBException for consistent output in py3.7
- don't raise StopIteration in generator, just return
* ci: add py3.7 builds to travis and appveyor, (don't include in integration test suite for now)
* versioning: new MAJ.MIN.DATE versioning
move version to version.py for easier updates
add update-version.sh for autoupdating version in version.py, pushing new tag with current version
* brotli: if the brotli module can not be loaded, print warning
and also remove `br` from any Accept-Encoding header to avoid recording with brotli, addresses #434
* Reworked query.js to know the difference between date search and advanced searching.
Exposed cdx api's through the query html page
- from, to
- matchType
- filter
Added more appealing styling to the error, index, not-found, query, and search templates
Updated the included jquery and boostrap static files to jQuery v3.3.1, Bootstrap v4.1.3
Implemented optionally using a web worker for making the cdx api request and processing the results
Documented the code
* ensure the display count str function uses the correct "first" value
* added view all captures for an result displayed in the advanced results view
query worker now sends over the recordCount as an integer and as a formatted string
moved the search button to the right after advanced options
* tests: fixed test_intergration.py:test_static_nested_dir failing due to updates
* recoder fix: ensure Transfer-Encoding header is not passed through by RecorderApp,
as may result in duplicate Transfer-Encoding in py2.7, fixes#432
* html rewriter fixes:
- html detection: allow for UTF-8 BOM when detecting if text is html
- html decl parsing: modify base parser regex to allow IE conditional declaration to also
end with -->, eg. support '<![endif]-->' in addition to '<![endif]>', fixes#425
* travis: add allow failure for integration tests (for now)
update pip and setuptools when running install.sh found in .travis
use xenial
removed trailing dash
only run webrecorder-tests using chrome and firefox
only run webrecorder-tests using pywbtest and chrometest marker expression
Added methods to AutoFetchWorker in proxy mode that allow external JS to initiate checks
Updated the actual proxy mode worker implementation to match the functionality added
Changes:
Reworked ContentFrame and the default banner to be ES5 classes.
Introduced an optional relationship between ContentFrame and banners.
If a banner is exposed then ContentFrame controls the initialization of the banner and routes any messages received from the replay iframe to the banner.
When the replay iframe is navigated to a page and the replay iframe loads, the ContentFrame waits 2 seconds before checking to see if the banner still indicates it a loading state and if so updates the displayed information using the URL and timestamp the replay iframe was navigated to.