backup/pywb - pywb - Source code and issue tracker for Open Eggbert

mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00

Author	SHA1	Message	Date
Ilya Kreymer	bf9284fec5	proxy mode HTMLInsertOnlyRewriter: (#496 ) - insert head-insert before first tag that is not <html> or <head> insert before - addresses issue with rewriting pages that have no <head> tag (already handled in full rewriter) - tests: add tests for HTMLInsertOnlyRewriter - bump version to 2.3.3, update changelist	2019-08-03 11:24:50 -07:00
Ilya Kreymer	42089e237b	update CHANGELIST and version for 2.3.2 release	2019-08-01 16:23:31 -07:00
NeolithEra	af1a34cb58	Fix dependency conflict for issue (#494 ) #492	2019-08-01 15:23:34 -07:00
Ilya Kreymer	05cc593da6	tests: don't run video tests on ci due to rate limiting	2019-07-31 18:11:42 -07:00
John Berlin	511c6f7985	ensured that the regular expressions for rewriting JavaScript eval usage do not match "$eval", only "eval" identifier (#493 ) added tests for new JS eval rewriting regex tweaks	2019-07-31 15:03:42 -07:00
Ilya Kreymer	ffca45c855	Support/Improvements to Domain Cookie Cache (#491 ) * domain cookie fix: - don't set cookies for service worker modifiers if response is not 200 - don't add existing cookies to Cookie or Set-Cookie headers - add sw_/, wkrf_/ modifiers to generate paths - enable domain cookie cacheing by default with fakeredis for live index and record mode, keyed by collection - reqs: add fakeredis, tldextract, update warcio - tests: add initial tests for domain cookie rewriting	2019-07-31 14:58:15 -07:00
Ilya Kreymer	837894a07f	Misc fixes for 2.3.2 release (#490 ) * misc fixes: - ensure SCRIPT_NAME is never empty, fixes #466 - static: if ending in '/' look for '/index.html' - tests: use local httpbin instead of iana.org tests - docker: switch to $VOLUME_DIR before initing collection - ensure static_prefix is set correctly after host prefix - bump version to 2.3.2.dev0 * rules update: fix fuzzy matching, rewriting rules for soundcloud	2019-07-24 10:47:17 -07:00
Ilya Kreymer	d4518ae557	update to latest wombat 3.0.0, fix issue with parent override (webrecorder/wombat#3 ) bump version to 2.3.1 v-2.3.1	2019-07-10 18:09:22 -07:00
Ilya Kreymer	a72d938f15	README: Update for 2.3 v-2.3.0	2019-07-09 19:37:03 -07:00
Ilya Kreymer	a4027c7904	Switch back to Semver for 2.3.0 (#488 ) versioning: switch back to semver for 2.3.0, manual version updates - rename update-version.sh -> update-tag.sh to push tag for existing versions - bump version to 2.3.0 for release	2019-07-09 19:29:52 -07:00
Ilya Kreymer	11610f6e04	2.3 Changelist + Docs Update (#487 ) * docs: update changelist and add docs about new wombat * update to latest wombat * update wombat, fix pytest cmdline in setup	2019-07-09 17:50:57 -07:00
Eoin Kilfeather	96a7a4bbb0	Update configuring.rst to reflect default config.yaml. (#483 ) The Docs specify the default value for the warc files path as 'archives' but the default config.yaml file specifies 'archive' https://github.com/webrecorder/pywb/blob/master/pywb/default_config.yaml#L4	2019-07-08 14:16:57 -07:00
Ilya Kreymer	d2467d5fad	wombat + tests - add build-wombat.sh for building wombat - fix tests (no more karma tests, now in wombat) - update to latest wombat	2019-07-02 19:25:13 -07:00
John Berlin	db50efc558	server side rewriting: (#486 ) - tweaked the JSWombatProxyRules regex for = this to be = this and , this - added comments to the more complicated regex's used by JSWombatProxyRules - added test case for tweaked regex	2019-07-02 19:24:28 -07:00
John Berlin	06513c2592	auto-fetch: (#484 ) - reworked both proxy and non-proxy mode backing workers to no-longer fetch in burst mode but as sent with a maximum of 20 fetches running at a time - added just-fetch to non-proxy mode backing worker - updated the auto fetch worker abstraction in non-proxy mode used by wombat to exposed like in proxy mode and ensured that value property for the srcset object is used when sending rewritten srcset values to the backing worker - combined the backing worker proxy & non-proxy mode into a single file - added rollup config for back auto fetch worker	2019-07-02 19:24:28 -07:00
Rebecca Lynn Cremona	193607eed8	inputrequest/indexing: Fix #471 : failed playback due to encoding issue (#480 ) * Handle incorrectly formatted form data; address #471. * Attempt to always decode application/x-www-form-urlencoded form-data as utf-8, if fails to decode, treat it as binary post data (base64 encode and add with __wb_post_data=)	2019-07-02 19:24:28 -07:00
John Berlin	56fc26333e	server side rewriting: (#475 ) - fixed edge case in jsonP rewriting where no callback name is supplied only ? but body has normal jsonP callback (url = https://geolocation.onetrust.com/cookieconsentpub/v1/geo/countries/EU?callback=?) - made the `!self.__WB_pmw` server side inject match the client side one done via wombat - added regex's for eval override to JSWombatProxyRules	2019-07-02 19:24:28 -07:00
Rebecca Lynn Cremona	178413fe0c	More detailed logging of invalid cdxlines. (#478 )	2019-07-02 19:24:28 -07:00
Rebecca Lynn Cremona	d74d4f92a3	Quieter logging of cookie errors. (#477 )	2019-07-02 19:24:28 -07:00
John Berlin	c55518640f	wombat postMessage override tweaking (#473 ) * removed the definition of `__WB_pmw` from `ensureServerSideInjectsExistOnWindow` in order to allow more proper handling of that definition t occur from `initNewWindowWombat` or `wombatInit`. `initNewWindowWombat` now initializes wombat for (i)frame's with src values prefixed with about: as about:srcdoc is commonly used tweaked postMessage and event listener overrides to be more like the previous wombat revision * rebased on develop and rebuilt bundle	2019-07-02 19:24:28 -07:00
John Berlin	361ac0081b	made the rewrite modifier wombat's rewriting of js workers init'd as a blob is wkrf_ not wkr_ to match the python JSWorkerRewriter (#470 )	2019-07-02 19:24:28 -07:00
John Berlin	6794f6d79d	specified the loader for yaml.load since calling yaml.load without a loader is now depreciated (#472 )	2019-07-02 19:24:28 -07:00
John Berlin	cef557eb40	added custom requests HTTPAdapter, PywbHttpAdapter, that restores the behavior of urllib3 < 1.25.x which was to not verify ssl certs fixes #467 (#469 )	2019-07-02 19:24:28 -07:00
John Berlin	a907b2b511	Improved handling of open http connections and file handles (#463 ) * improved pywb's closing of open file handles and http connects by adding to pywb.util.io no_except_close replaced close calls with no_except_close reformatted and optimizes import of files that were modified additional ci build fixes: - pin gevent to 1.4.0 in order to ensure build of pywb on ubuntu use gevent's wheel distribution - youtube-dl fix: use youtube-dl in quiet mode to avoid errors with youtube-dl logging in pytest	2019-07-02 19:24:28 -07:00
John Berlin	22b4297fc5	pywb: - Fix: a few broken tests due to iana.org requiring a user agent in its requests rewrite: - introduced a new JSWorkerRewriter class in order to support rewriting via wombat workers in the context of all supported worker variants via - ensured rewriter app correctly sets the static prefix wombat: - add wombat as submodule!	2019-07-02 19:24:11 -07:00
Ilya Kreymer	77f8bb6476	CHANGES: update changelist bump version to 2.2.20190410 v-2.2.20190410	2019-04-10 11:17:33 -07:00
Ilya Kreymer	32962be7c4	JSONP Rewriter: Fix regex to match both /* and // comments (#460 ) * jsonp rewriter: improve regex to match starting /* and // multiline comments, update test * fix regex, add and cleanup jsonp rewriter tests * Fixes #459	2019-04-10 10:38:58 -07:00
Ilya Kreymer	9448f4fe45	release: update changelist for 2.2.20190311 docs: fix typos v-2.2.20190311	2019-03-11 16:40:53 -07:00
John Berlin	4e4f1d80c1	query ui: reworked how we construct the query to better differentiate between coming from the collection search interface vs direct querying in particular the prefix//url vs prefix/?url= case fixes #455 (#456 )	2019-03-11 16:31:34 -07:00
Ilya Kreymer	455efb17ad	Support for default timestamp/date for proxy mode (#454 ) * proxy: add option to set default timestamp for proxy mode, fixes #452 - set via flag --proxy-default-timestamp or config 'proxy_options.default_timestamp' - can be iso date or all-digit timestamp - overridable via accept-datetime header * docs: update docs for proxy timestamp - add docs on memento support in proxy mode * update-version: script can update version only, commit with 'update-version.sh commit' * indexer post append: remove 'WB_wombat_' from POST query, could have been added in previous versions of pywb!	2019-03-11 16:28:09 -07:00
Ilya Kreymer	4b5c397992	readme: update for 2.2 release version update: tweak script, ensure tag added after commit v-2.2.20190227	2019-02-27 16:07:43 -08:00
Ilya Kreymer	21b5cf36b1	version: update to 2.2.20190227	2019-02-27 15:51:31 -08:00
Ilya Kreymer	24f92054d9	versioning: update version update script to include push, commit message	2019-02-27 15:51:02 -08:00
John Berlin	a2ea925d17	pywb 2.2.x release changelist (#443 )	2019-02-27 15:34:13 -08:00
Ilya Kreymer	1fcc239ecf	Add Docker info to Docs (#448 ) * docs: add docs on running with Docker, Docker image versions, fixes #299	2019-02-27 14:38:59 -08:00
Ilya Kreymer	b90ee427cf	Docker Improvements (#446 ) * Misc improvements, including fixes from @funkyfuture: - Dockerfile: Reduces number of created layers and source contents - Support for automatic collection creation if INIT_COLLECTION is defined - Add entry point script docker-entrypoint.sh - update to latest python (3.7.2 currently) - additions to .dockerignore - setup.py and requirements cleanup (just use plain 'gevent' requirement) * docker-entrypoint.sh improvements: - before running cmd, match uid/gid to that of volume dir (specified via $VOLUME_DIR, defaulting to /webarchive) - if volume is owned by root (default if none mounted), just run as root - if volume is owned by different user, create/update user 'archivist' to match the uid/gid of $VOLUME_DIR, then run cmd as 'su archivist'	2019-02-27 09:13:38 -08:00
Ilya Kreymer	259f571cb9	Python 3.7 Support (#447 ) * py3.7 fixes: - add __repr__ to WBException for consistent output in py3.7 - don't raise StopIteration in generator, just return * ci: add py3.7 builds to travis and appveyor, (don't include in integration test suite for now)	2019-02-27 08:43:33 -08:00
Ilya Kreymer	0fb1fa68a8	Versioning: Add script to set up MAJ.MIN.DATE version (#445 ) * versioning: new MAJ.MIN.DATE versioning move version to version.py for easier updates add update-version.sh for autoupdating version in version.py, pushing new tag with current version	2019-02-25 11:46:37 -08:00
Ilya Kreymer	32c1e6c85b	Brotli: Don't accept brotli if library can't be loaded. (#444 ) * brotli: if the brotli module can not be loaded, print warning and also remove `br` from any Accept-Encoding header to avoid recording with brotli, addresses #434	2019-02-19 17:19:24 -08:00
John Berlin	000ed89dc3	Improved Query Interface and Result viewing (#421 ) * Reworked query.js to know the difference between date search and advanced searching. Exposed cdx api's through the query html page - from, to - matchType - filter Added more appealing styling to the error, index, not-found, query, and search templates Updated the included jquery and boostrap static files to jQuery v3.3.1, Bootstrap v4.1.3 Implemented optionally using a web worker for making the cdx api request and processing the results Documented the code * ensure the display count str function uses the correct "first" value * added view all captures for an result displayed in the advanced results view query worker now sends over the recordCount as an integer and as a formatted string moved the search button to the right after advanced options * tests: fixed test_intergration.py:test_static_nested_dir failing due to updates	2019-02-18 10:26:29 -08:00
Ilya Kreymer	38c1b1cc3e	Edge-case and HTML Rewrite Fixes (#441 ) * recoder fix: ensure Transfer-Encoding header is not passed through by RecorderApp, as may result in duplicate Transfer-Encoding in py2.7, fixes #432 * html rewriter fixes: - html detection: allow for UTF-8 BOM when detecting if text is html - html decl parsing: modify base parser regex to allow IE conditional declaration to also end with -->, eg. support '<![endif]-->' in addition to '<![endif]>', fixes #425 * travis: add allow failure for integration tests (for now)	2019-02-18 10:11:29 -08:00
Ilya Kreymer	100c7f5509	rules: add new fb rule for pages (#440 )	2019-02-07 13:15:30 -08:00
John Berlin	777cc30e82	Updated RewriteInfo._resolve_text_type to recognize the `fr_` rewrite modifier (indicates that the content is from a frameset's frame) (#438 ) Added a test, test_rewrite_frameset_frame_content, to test_content_rewriter.py for these changes	2019-02-05 15:11:21 -08:00
Ilya Kreymer	529a587cdc	recoder fix: ensure Transfer-Encoding header is not passed through by RecorderApp, (#437 ) as may result in duplicate Transfer-Encoding in py2.7, fixes #432	2019-01-30 18:14:09 -05:00
John Berlin	3b64b6d2c9	travis fix: added xvfb to services due to travis changes on xenial (#436 )	2019-01-30 17:39:11 -05:00
John Berlin	9be9815da4	travis integration test fixes: removed caching of pip from .travis.yml (#431 ) update pip and setuptools when running install.sh found in .travis use xenial removed trailing dash only run webrecorder-tests using chrome and firefox only run webrecorder-tests using pywbtest and chrometest marker expression	2019-01-30 16:36:45 -05:00
Ilya Kreymer	c86add9b40	setup: use 'fakeredis<1.0' until fully ported to new fakeredis version	2019-01-27 14:26:50 -05:00
John Berlin	9597a632c8	Exposed AutoFetchWorker on window in proxy-mode (#389 ) Added methods to AutoFetchWorker in proxy mode that allow external JS to initiate checks Updated the actual proxy mode worker implementation to match the functionality added	2018-12-13 18:48:16 -08:00
John Berlin	2c8d607b18	Ensured that the banner does not become stuck displaying Loading... on non-html content fixes #417 (#418 ) Changes: Reworked ContentFrame and the default banner to be ES5 classes. Introduced an optional relationship between ContentFrame and banners. If a banner is exposed then ContentFrame controls the initialization of the banner and routes any messages received from the replay iframe to the banner. When the replay iframe is navigated to a page and the replay iframe loads, the ContentFrame waits 2 seconds before checking to see if the banner still indicates it a loading state and if so updates the displayed information using the URL and timestamp the replay iframe was navigated to.	2018-12-05 18:47:10 -08:00
Ilya Kreymer	f7e8217e23	requirements and version: - bump to 2.2.0.dev0 - requirements: set redis dependency 'redis<3'	2018-12-05 16:58:06 -08:00

1 2 3 4 5 ...

2036 Commits