backup/pywb - pywb - Source code and issue tracker for Open Eggbert

mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

Author	SHA1	Message	Date
Ilya Kreymer	32962be7c4	JSONP Rewriter: Fix regex to match both /* and // comments (#460 ) * jsonp rewriter: improve regex to match starting /* and // multiline comments, update test * fix regex, add and cleanup jsonp rewriter tests * Fixes #459	2019-04-10 10:38:58 -07:00
Ilya Kreymer	9448f4fe45	release: update changelist for 2.2.20190311 docs: fix typos v-2.2.20190311	2019-03-11 16:40:53 -07:00
John Berlin	4e4f1d80c1	query ui: reworked how we construct the query to better differentiate between coming from the collection search interface vs direct querying in particular the prefix//url vs prefix/?url= case fixes #455 (#456 )	2019-03-11 16:31:34 -07:00
Ilya Kreymer	455efb17ad	Support for default timestamp/date for proxy mode (#454 ) * proxy: add option to set default timestamp for proxy mode, fixes #452 - set via flag --proxy-default-timestamp or config 'proxy_options.default_timestamp' - can be iso date or all-digit timestamp - overridable via accept-datetime header * docs: update docs for proxy timestamp - add docs on memento support in proxy mode * update-version: script can update version only, commit with 'update-version.sh commit' * indexer post append: remove 'WB_wombat_' from POST query, could have been added in previous versions of pywb!	2019-03-11 16:28:09 -07:00
Ilya Kreymer	4b5c397992	readme: update for 2.2 release version update: tweak script, ensure tag added after commit v-2.2.20190227	2019-02-27 16:07:43 -08:00
Ilya Kreymer	21b5cf36b1	version: update to 2.2.20190227	2019-02-27 15:51:31 -08:00
Ilya Kreymer	24f92054d9	versioning: update version update script to include push, commit message	2019-02-27 15:51:02 -08:00
John Berlin	a2ea925d17	pywb 2.2.x release changelist (#443 )	2019-02-27 15:34:13 -08:00
Ilya Kreymer	1fcc239ecf	Add Docker info to Docs (#448 ) * docs: add docs on running with Docker, Docker image versions, fixes #299	2019-02-27 14:38:59 -08:00
Ilya Kreymer	b90ee427cf	Docker Improvements (#446 ) * Misc improvements, including fixes from @funkyfuture: - Dockerfile: Reduces number of created layers and source contents - Support for automatic collection creation if INIT_COLLECTION is defined - Add entry point script docker-entrypoint.sh - update to latest python (3.7.2 currently) - additions to .dockerignore - setup.py and requirements cleanup (just use plain 'gevent' requirement) * docker-entrypoint.sh improvements: - before running cmd, match uid/gid to that of volume dir (specified via $VOLUME_DIR, defaulting to /webarchive) - if volume is owned by root (default if none mounted), just run as root - if volume is owned by different user, create/update user 'archivist' to match the uid/gid of $VOLUME_DIR, then run cmd as 'su archivist'	2019-02-27 09:13:38 -08:00
Ilya Kreymer	259f571cb9	Python 3.7 Support (#447 ) * py3.7 fixes: - add __repr__ to WBException for consistent output in py3.7 - don't raise StopIteration in generator, just return * ci: add py3.7 builds to travis and appveyor, (don't include in integration test suite for now)	2019-02-27 08:43:33 -08:00
Ilya Kreymer	0fb1fa68a8	Versioning: Add script to set up MAJ.MIN.DATE version (#445 ) * versioning: new MAJ.MIN.DATE versioning move version to version.py for easier updates add update-version.sh for autoupdating version in version.py, pushing new tag with current version	2019-02-25 11:46:37 -08:00
Ilya Kreymer	32c1e6c85b	Brotli: Don't accept brotli if library can't be loaded. (#444 ) * brotli: if the brotli module can not be loaded, print warning and also remove `br` from any Accept-Encoding header to avoid recording with brotli, addresses #434	2019-02-19 17:19:24 -08:00
John Berlin	000ed89dc3	Improved Query Interface and Result viewing (#421 ) * Reworked query.js to know the difference between date search and advanced searching. Exposed cdx api's through the query html page - from, to - matchType - filter Added more appealing styling to the error, index, not-found, query, and search templates Updated the included jquery and boostrap static files to jQuery v3.3.1, Bootstrap v4.1.3 Implemented optionally using a web worker for making the cdx api request and processing the results Documented the code * ensure the display count str function uses the correct "first" value * added view all captures for an result displayed in the advanced results view query worker now sends over the recordCount as an integer and as a formatted string moved the search button to the right after advanced options * tests: fixed test_intergration.py:test_static_nested_dir failing due to updates	2019-02-18 10:26:29 -08:00
Ilya Kreymer	38c1b1cc3e	Edge-case and HTML Rewrite Fixes (#441 ) * recoder fix: ensure Transfer-Encoding header is not passed through by RecorderApp, as may result in duplicate Transfer-Encoding in py2.7, fixes #432 * html rewriter fixes: - html detection: allow for UTF-8 BOM when detecting if text is html - html decl parsing: modify base parser regex to allow IE conditional declaration to also end with -->, eg. support '<![endif]-->' in addition to '<![endif]>', fixes #425 * travis: add allow failure for integration tests (for now)	2019-02-18 10:11:29 -08:00
Ilya Kreymer	100c7f5509	rules: add new fb rule for pages (#440 )	2019-02-07 13:15:30 -08:00
John Berlin	777cc30e82	Updated RewriteInfo._resolve_text_type to recognize the `fr_` rewrite modifier (indicates that the content is from a frameset's frame) (#438 ) Added a test, test_rewrite_frameset_frame_content, to test_content_rewriter.py for these changes	2019-02-05 15:11:21 -08:00
Ilya Kreymer	529a587cdc	recoder fix: ensure Transfer-Encoding header is not passed through by RecorderApp, (#437 ) as may result in duplicate Transfer-Encoding in py2.7, fixes #432	2019-01-30 18:14:09 -05:00
John Berlin	3b64b6d2c9	travis fix: added xvfb to services due to travis changes on xenial (#436 )	2019-01-30 17:39:11 -05:00
John Berlin	9be9815da4	travis integration test fixes: removed caching of pip from .travis.yml (#431 ) update pip and setuptools when running install.sh found in .travis use xenial removed trailing dash only run webrecorder-tests using chrome and firefox only run webrecorder-tests using pywbtest and chrometest marker expression	2019-01-30 16:36:45 -05:00
Ilya Kreymer	c86add9b40	setup: use 'fakeredis<1.0' until fully ported to new fakeredis version	2019-01-27 14:26:50 -05:00
John Berlin	9597a632c8	Exposed AutoFetchWorker on window in proxy-mode (#389 ) Added methods to AutoFetchWorker in proxy mode that allow external JS to initiate checks Updated the actual proxy mode worker implementation to match the functionality added	2018-12-13 18:48:16 -08:00
John Berlin	2c8d607b18	Ensured that the banner does not become stuck displaying Loading... on non-html content fixes #417 (#418 ) Changes: Reworked ContentFrame and the default banner to be ES5 classes. Introduced an optional relationship between ContentFrame and banners. If a banner is exposed then ContentFrame controls the initialization of the banner and routes any messages received from the replay iframe to the banner. When the replay iframe is navigated to a page and the replay iframe loads, the ContentFrame waits 2 seconds before checking to see if the banner still indicates it a loading state and if so updates the displayed information using the URL and timestamp the replay iframe was navigated to.	2018-12-05 18:47:10 -08:00
Ilya Kreymer	f7e8217e23	requirements and version: - bump to 2.2.0.dev0 - requirements: set redis dependency 'redis<3'	2018-12-05 16:58:06 -08:00
John Berlin	9ab248e791	Improved rewriting URLs within web workers by including the full URL the worker came from. (#420 )	2018-12-05 16:39:37 -08:00
John Berlin	323edcf47c	enabled auto-fetching of video, audio resources in wombat in non-proxy mode and proxy mode (#427 )	2018-12-05 16:03:00 -08:00
Ilya Kreymer	3235c382a5	Check text/html content to ensure actually html (#428 ) * html rewrite: when encountering 'text/html' content-type, add html-detection check before assuming content is html (similar to text/plain) supersedes #426, fixes #424 -- binary files served under mp_/ as text/html should now be served as binary - when guessing if html, add additional regex to check if text does not start with < -- perhaps html but starting with plain text. only check for text/html content-type and not js_/cs_ mod	2018-12-05 15:32:38 -08:00
John Berlin	2b8bf76c9a	ensure that the timemap path information is not in wb_url_str when serving a timemap (#423 ) updated memento tests to ensure the timemap tests include REQUEST_URI	2018-12-05 15:06:40 -08:00
John Berlin	f78bac9474	Automatic fetching of picture > source[srcset] fixes #414 (#415 ) - added to the auto-fetch worker of both wombat and wombatProxymode - added utility function isImageSrcset to wombat for determining if the srcset values being rewritten are from either a image tag or a source tag within a picture tag - added utility function isImageDataSrcset to wombat to check for img/source data-srcset attributes - reworked the backing auto-fetch worker to now queue all URLs and perform fetch batching with maximum batch size of 60. A delay of 2 seconds is applied after each batch. Ensured that the srcset values sent to the auto-fetch worker can be resolved in non-proxy mode fixes #413 Renamed the auto-fetch class named used in proxy mode from AutoFetchWorker to AutoFetchWorkerProxyMode Added checking of script tage types application/json and text/template to rewrite_script	2018-11-21 08:43:18 +13:00
Ilya Kreymer	3e0bb49ae1	Use actual page scheme instead of defaulting to http when extracting original url (#404 ) * client-side rewrite: fix extract_orig() to unrewrite relative urls using current page scheme, don't default to http * wombat tests: fix karma tests by adding 'wombat_scheme' to test setup	2018-10-31 20:50:43 -07:00
Ilya Kreymer	f805f79388	Server-Side Rewrite: 'location' rewrite fix to avoid rewriting '$location' (#403 ) * server-side rewrite: tweak 'location' rewrite to ensure $location is not rewritten! tests: add additional rewrite tests for 'location', 'this.$location' and 'this.location'	2018-10-31 20:18:18 -07:00
Ilya Kreymer	e1e8917bc3	live rewriting/utf-8 headers: fix for sites that have utf-8 in headers despite standard (#402 ) - attempt to encode headers as utf-8 first for live web, then latin-1 (similar to warcio http header parsing) - only encode headers for py3 (in py2, headers are already bytestrings) - tests: add tests for utf-8 in header bump version to 2.1.1	2018-10-26 15:06:59 -07:00
John Berlin	1b151b74bf	CHANGELIST: Update 2.1.0 changes.rst to include PRs #395 , #397 , #398 (#400 )	2018-10-23 16:02:52 -07:00
John Berlin	cb8b269539	improved the rewrite_html_full check in wombat: (#398 ) - FullHTMLRegex: performs a case insensitive check for <html, <body, <head and <!doctype html> updated rewrite_elem to: - rewrite meta tags that deliever csp policies - check for additional attributes that could contain un-rewritten URLs (form.style, iframe.style) Made check for full html into regex	2018-10-23 15:36:04 -07:00
John Berlin	82f2dace64	autoFetchWorker.js improvements: (#397 ) - ensured that autoFetchWorker uses full srcset URLs - resolves the URL against the img.src or document.baseURI if not rewritten - otherwise ensures the rewritten URL is not relative or schemeless wombat.js: - AutoFetchWorker updated extractFromLocalDoc to send URL resolution information to the worker - defer extractFromLocalDoc and preserveSrcset postMessages to ensure page viewer can see the images first	2018-10-23 12:52:58 -07:00
Ilya Kreymer	a9e4b5c469	README: update 2.0 -> 2.1 (#396 ) cli: fix typo in enable-auto-fetch, add test	2018-10-23 09:58:10 -07:00
Ilya Kreymer	0db8e5d718	Merge branch 'master' into develop for PR #395	2018-10-23 09:38:53 -07:00
anarcat	40f904af79	add sample Apache configuration (#374 ) * add sample Apache configuration This configuration can be used when launching `wayback` in the default configuration, which is useful to add stuff like access control, authentication, or encryption without going through the trouble of setting up a UWSGI proxy. * enable support for X-Forwarded-Proto headers from #395	2018-10-23 09:35:15 -07:00
Ilya Kreymer	08b0ac87f7	scheme: add support for X-Forwarded-Proto header to specify the scheme to better address #314 , #374 (#395 )	2018-10-23 09:13:23 -07:00
Ilya Kreymer	b39274cf12	CHANGELIST: Tweak changes, update to 2.1.0	2018-10-22 17:52:49 -07:00
Ilya Kreymer	3a70769c58	Cleanup CLI Switches and Docs for Auto-Fetch System (#394 ) Rename: - rename auto-fetch config to 'enable_auto_fetch' and '--enable-auto-fetch' cli param - rename 'use_head_insert' -> 'enable_content_rewrite' - rename 'use_banner' -> 'enable_banner' - rename 'use_wombat' -> 'enable_wombat' Misc Cleanup: - enable_auto_fetch applies to both proxy and non-proxy mode - remove setting 'wbinfo.use_wombat', implied if wombatProxyMode.js is included - docs: add docs for auto-fetch system, improved docs for proxy rewrite options - tests: test with enable_auto_fetch, update tests for renames - bump version to 2.1.0 due to breaking changes - changelist: updates to changelist - requirements: use bounded version for gevent	2018-10-22 17:12:22 -07:00
John Berlin	d0efd7567d	started on pywb 2.0.5 changelist (#387 ) (wip)	2018-10-22 10:31:56 -07:00
Ilya Kreymer	f76ba06c42	header rewriter: ensure the 'Status' header is prefix-rewritten, update test	2018-10-21 13:59:29 -07:00
John Berlin	c28e38718c	Updated html_rewriter.py to correctly handle self-closing <script> elements: (#392 ) - adding the 'xlink:href' attribute to script element attributes to rewrite Updated html_rewriter.py to better handle self closing tags: - added boolean set_parsing_context arg to _rewrite_tag_attrs to indicate if the parsing context is to be set - the call to _rewrite_tag_attrs from handle_startendtag now sets set_parsing_context to false Added a test to test_html_rewriter.py for rewriting SVGScriptElements	2018-10-10 15:24:34 -07:00
Ilya Kreymer	1c7badf117	wobmat init fix from #383 : - Ensure WombatInit() methods end in ';' - pass 'wbinfo' to WombatInit()	2018-10-05 23:47:23 +00:00
Ilya Kreymer	671dd2c204	Rewriting fixes for http-only cookies, bad content-length, and document with base (#386 ) * rewriting fixes: server side: cookie rewriting: if httponly cookie with mp_/if_ modifier and path ends with '/', add set-cookie for all known modifiers content length parsing: improve content-length parsing to support 'content-length: num,num', parse out the first number (occasionally seen with range requests when range is dropped for upstream) wombat: rewrite_elem: use element.ownerDocument for resolving baseUri for parent paths tests: add tests for cookie all modifier rewrite, bad content-length parsing (skip for py2.7)	2018-10-05 14:37:32 -07:00
Ilya Kreymer	e6f00ce58d	wombat: document.evaluate param de-proxy and optimization: (#385 ) - rename override_func_first_arg_proxy_to_obj -> override_func_arg_proxy_to_obj to support resolving object proxy not just from first param - add document.evaluate() 'de-proxy' to 2nd param - optimize override_func_arg_proxy_to_obj() to call original apply, avoid modifying arguments array in place	2018-10-05 01:03:33 -04:00
Ilya Kreymer	9f81933fbd	wombat reinit fix (#383 ) * wombat init fix: - fix change from #339 which removed reiniting of wombat - allow reiniting of wombat if inited via init_new_window_wombat() - don't allow if reinited directly from <head>, as happened in document import * tests: fix tests for 'new _WBWombat -> WombatInit' change * wombat: window.frames optimization: - since window.frames === window, no need for separate override! - ensure init_new_window_wombat() is called on any returned window from object proxy	2018-10-04 17:29:18 -04:00
John Berlin	e7098522b2	Added window.Text override to wombat.js to account for css in JS (#382 ) frameworks that like to append a single text node as a child to a style node modifying and then only modify that text node to add/remove css dynamically via: - initTextNodeOverrides (entry point) - overrideTextProtoFunction (overrides the appendData, insertData, and replaceData functions of inherited by Text) - overrideTextProtoGetSet (overrides property getters and setters of data and wholeText) Added window.CSSStyleSheet.insertRule override - dynamically adds a raw css rule (text) to an existing stylesheet	2018-10-04 13:41:48 -04:00
John Berlin	ec0df7b9ae	Refactor of auto-fetch worker system with support for proxy mode, fixes https://github.com/webrecorder/pywb/issues/371 : (#379 ) - Split wombat and auto-fetch worker into two files (proxy mode and non-proxy mode) - Renamed preservationWorker to autoFetchWorker in order to better convey what it does - Root config file control over including wombat and auto-fetch worker in proxy or non-proxy mode - Added additional proxy mode + auto-fetch worker only route for fetching the auto-fetch worker code nicely for CORS - templateview: add 'tobool' formatter to more cleanly format python bools to JS 'true'/'false' - proxy options: config and command line: 'use_auto_fetch_worker' and '--proxy-with-auto-fetch' 'use_wombat' and '--proxy-with-wombat' - head_insert.html: only include wombat in proxy mode when use_wombat or use_auto_fetch_worker are set. - wombatProxyMode.js: slimmed down wombat for proxy mode only including auto-fetch support. - more consistent naming: rename 'preserveWorker' and 'autoArchive' to 'auto-fetch' Updated tests: - test_wbrequestresponse.py: added tests covering constructor defaults, _init_derived, options_response, json_response, encode_stream, text_stream - test_auto_colls.py: fixed broken test test_more_custom_templates, reason using ujson now not json so spacing was off - test_proxy.py: updated existing tests to reflect splitting wombat into proxy and non-proxy mode, added tests covering auto-fetch worker specific endpoints in proxy mode removed duplicate addons key in .travis.yml - test_cli.py: updated to properly test the cli with these changes added ultrajon dep to tests_require in setup.py to reflect its usage by wbrequestresponse.py Fully documented: - cli.py - frontendapp.py - templateview.py - wbrequestresponse.py Removed duplicate addons key in .travis.yml Added ultrajson dependency to tests_require in setup.py to reflect its usage by wbrequestresponse.py Fixes #371	2018-10-03 16:27:49 -04:00

... 2 3 4 5 6 ...

2160 Commits