backup/pywb - pywb - Source code and issue tracker for Open Eggbert

mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

Author	SHA1	Message	Date
Ilya Kreymer	a9e4b5c469	README: update 2.0 -> 2.1 (#396 ) cli: fix typo in enable-auto-fetch, add test	2018-10-23 09:58:10 -07:00
Ilya Kreymer	0db8e5d718	Merge branch 'master' into develop for PR #395	2018-10-23 09:38:53 -07:00
anarcat	40f904af79	add sample Apache configuration (#374 ) * add sample Apache configuration This configuration can be used when launching `wayback` in the default configuration, which is useful to add stuff like access control, authentication, or encryption without going through the trouble of setting up a UWSGI proxy. * enable support for X-Forwarded-Proto headers from #395	2018-10-23 09:35:15 -07:00
Ilya Kreymer	08b0ac87f7	scheme: add support for X-Forwarded-Proto header to specify the scheme to better address #314 , #374 (#395 )	2018-10-23 09:13:23 -07:00
Ilya Kreymer	b39274cf12	CHANGELIST: Tweak changes, update to 2.1.0	2018-10-22 17:52:49 -07:00
Ilya Kreymer	3a70769c58	Cleanup CLI Switches and Docs for Auto-Fetch System (#394 ) Rename: - rename auto-fetch config to 'enable_auto_fetch' and '--enable-auto-fetch' cli param - rename 'use_head_insert' -> 'enable_content_rewrite' - rename 'use_banner' -> 'enable_banner' - rename 'use_wombat' -> 'enable_wombat' Misc Cleanup: - enable_auto_fetch applies to both proxy and non-proxy mode - remove setting 'wbinfo.use_wombat', implied if wombatProxyMode.js is included - docs: add docs for auto-fetch system, improved docs for proxy rewrite options - tests: test with enable_auto_fetch, update tests for renames - bump version to 2.1.0 due to breaking changes - changelist: updates to changelist - requirements: use bounded version for gevent	2018-10-22 17:12:22 -07:00
John Berlin	d0efd7567d	started on pywb 2.0.5 changelist (#387 ) (wip)	2018-10-22 10:31:56 -07:00
Ilya Kreymer	f76ba06c42	header rewriter: ensure the 'Status' header is prefix-rewritten, update test	2018-10-21 13:59:29 -07:00
John Berlin	c28e38718c	Updated html_rewriter.py to correctly handle self-closing <script> elements: (#392 ) - adding the 'xlink:href' attribute to script element attributes to rewrite Updated html_rewriter.py to better handle self closing tags: - added boolean set_parsing_context arg to _rewrite_tag_attrs to indicate if the parsing context is to be set - the call to _rewrite_tag_attrs from handle_startendtag now sets set_parsing_context to false Added a test to test_html_rewriter.py for rewriting SVGScriptElements	2018-10-10 15:24:34 -07:00
Ilya Kreymer	1c7badf117	wobmat init fix from #383 : - Ensure WombatInit() methods end in ';' - pass 'wbinfo' to WombatInit()	2018-10-05 23:47:23 +00:00
Ilya Kreymer	671dd2c204	Rewriting fixes for http-only cookies, bad content-length, and document with base (#386 ) * rewriting fixes: server side: cookie rewriting: if httponly cookie with mp_/if_ modifier and path ends with '/', add set-cookie for all known modifiers content length parsing: improve content-length parsing to support 'content-length: num,num', parse out the first number (occasionally seen with range requests when range is dropped for upstream) wombat: rewrite_elem: use element.ownerDocument for resolving baseUri for parent paths tests: add tests for cookie all modifier rewrite, bad content-length parsing (skip for py2.7)	2018-10-05 14:37:32 -07:00
Ilya Kreymer	e6f00ce58d	wombat: document.evaluate param de-proxy and optimization: (#385 ) - rename override_func_first_arg_proxy_to_obj -> override_func_arg_proxy_to_obj to support resolving object proxy not just from first param - add document.evaluate() 'de-proxy' to 2nd param - optimize override_func_arg_proxy_to_obj() to call original apply, avoid modifying arguments array in place	2018-10-05 01:03:33 -04:00
Ilya Kreymer	9f81933fbd	wombat reinit fix (#383 ) * wombat init fix: - fix change from #339 which removed reiniting of wombat - allow reiniting of wombat if inited via init_new_window_wombat() - don't allow if reinited directly from <head>, as happened in document import * tests: fix tests for 'new _WBWombat -> WombatInit' change * wombat: window.frames optimization: - since window.frames === window, no need for separate override! - ensure init_new_window_wombat() is called on any returned window from object proxy	2018-10-04 17:29:18 -04:00
John Berlin	e7098522b2	Added window.Text override to wombat.js to account for css in JS (#382 ) frameworks that like to append a single text node as a child to a style node modifying and then only modify that text node to add/remove css dynamically via: - initTextNodeOverrides (entry point) - overrideTextProtoFunction (overrides the appendData, insertData, and replaceData functions of inherited by Text) - overrideTextProtoGetSet (overrides property getters and setters of data and wholeText) Added window.CSSStyleSheet.insertRule override - dynamically adds a raw css rule (text) to an existing stylesheet	2018-10-04 13:41:48 -04:00
John Berlin	ec0df7b9ae	Refactor of auto-fetch worker system with support for proxy mode, fixes https://github.com/webrecorder/pywb/issues/371 : (#379 ) - Split wombat and auto-fetch worker into two files (proxy mode and non-proxy mode) - Renamed preservationWorker to autoFetchWorker in order to better convey what it does - Root config file control over including wombat and auto-fetch worker in proxy or non-proxy mode - Added additional proxy mode + auto-fetch worker only route for fetching the auto-fetch worker code nicely for CORS - templateview: add 'tobool' formatter to more cleanly format python bools to JS 'true'/'false' - proxy options: config and command line: 'use_auto_fetch_worker' and '--proxy-with-auto-fetch' 'use_wombat' and '--proxy-with-wombat' - head_insert.html: only include wombat in proxy mode when use_wombat or use_auto_fetch_worker are set. - wombatProxyMode.js: slimmed down wombat for proxy mode only including auto-fetch support. - more consistent naming: rename 'preserveWorker' and 'autoArchive' to 'auto-fetch' Updated tests: - test_wbrequestresponse.py: added tests covering constructor defaults, _init_derived, options_response, json_response, encode_stream, text_stream - test_auto_colls.py: fixed broken test test_more_custom_templates, reason using ujson now not json so spacing was off - test_proxy.py: updated existing tests to reflect splitting wombat into proxy and non-proxy mode, added tests covering auto-fetch worker specific endpoints in proxy mode removed duplicate addons key in .travis.yml - test_cli.py: updated to properly test the cli with these changes added ultrajon dep to tests_require in setup.py to reflect its usage by wbrequestresponse.py Fully documented: - cli.py - frontendapp.py - templateview.py - wbrequestresponse.py Removed duplicate addons key in .travis.yml Added ultrajson dependency to tests_require in setup.py to reflect its usage by wbrequestresponse.py Fixes #371	2018-10-03 16:27:49 -04:00
John Berlin	71c3eb77de	Added override for setTimeout and setInterval because [setTimeout\|setInterval]('document.location.href = "xyz.com"', time) is legal and used (#381 ) Added override for window.origin (https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/origin) available in Chrome 59+ and FF 54+	2018-09-19 17:07:17 -07:00
Ilya Kreymer	adf34cdb35	wrong encoding fallback: don't rely on content-type charset=utf-8 as being accurate! (#380 ) - only use utf-8 decoding optimization for html - when parsing as html, if utf-8 encoding fails, default to iso-8859-1/latin-1 for remainder (usually will happen right away eg. if actually binary content) - tests: add tests rewriting css and html with wrong charset	2018-09-11 11:51:09 -07:00
John Berlin	348e434bee	Pass sheet to deferredSheetExtraction rather than rules in order to ensure that the CSS rule extraction from style tags is guarded with null check on the property containing the css rules (edge case). (#378 )	2018-09-06 16:30:43 -07:00
Ilya Kreymer	d3e66b581a	encoding fix: additional fix to #376 for banner encoding: (#377 ) - if no encoding is detected, don't default to utf-8 - if no encoding known, encode banner as 'ascii' with 'xmlcharrefreplace', converting to xml entities - tests: add tests for rewriting with no known encoding	2018-09-06 17:09:30 -04:00
Ilya Kreymer	cabb488f4e	Encoding Fix (#376 ) * encoding fix: a better fix from #361: - when dealing with unicode urls, don't assume always %-encoded. if no change, (eg. anchor), then return url in original encoding - utf-8 optimization: if content is known to be in utf-8, use utf-8 directly, don't decode as iso-8859-1 and then re-encode to utf-8 for rewriting * content rewriter decoding fix: use incrementaldecoder for incrementally decoding utf-8 stream tests: add test which splits utf-8 char along 16k boundary to test incremental decoding	2018-09-06 13:32:54 -04:00
Ilya Kreymer	5c00743bdd	rules: add fuzzy matching rule for vimeo, canonicalizing out a timestamp/HMAC portion of the url (non-query) (#375 )	2018-09-06 12:17:03 -04:00
Ilya Kreymer	0bf2e08b27	non-root deployment and static prefix: (ported from uk-pywb fork) (#373 ) - store original wsgi SCRIPT_NAME (before collection path is pushed) in 'pywb.app_prefix' env var - set 'pywb.host_prefix' via rewriterapp - add 'static_prefix' jinja env global which defaults to 'pywb.host_prefix + pywb.app_prefix + static/' - set 'static_prefix' to absolute url if available (to support proxy mode) - update existing templates to use '{{ static_prefix }}' instead of '{{ host_prefix }}/{{ static_path }' - update index.html to use pywb.app_prefix for collection links - tests: add test_prefixed_deploy.py to ensure all paths are prefixed as expected	2018-08-24 17:59:02 -07:00
eszense	6a2423e754	Add recorder option to filter source collection (#368 ) * Add source_filter option to recorder. * Add test and docs for source_filter option. * Update test_record_replay.py - Split source_filter test into skip existing and new recording	2018-08-24 17:57:47 -07:00
Ilya Kreymer	9c44739bae	content rewriter: encoding check: if response has Content-Encoding but no match found in Accept-Encoding header, auto decode response (even if not otherwise rewriting) (#372 ) rewriterapp: pass environ to content rewriter to allow access to request http headers tests: test brotli served with 'br' in Accept-Encoding (no change), and without (response auto-decoded)	2018-08-23 17:50:06 -07:00
John Berlin	dfc3033117	Added skipping of metadata records with mime = text/anvl to cdxindexer.py. (#366 ) Updated test_indexing.py to include a test for no-indexing metadata records with mime == text/anvl Fixes https://github.com/webrecorder/webrecorderplayer-electron/issues/63.	2018-08-20 15:04:09 -07:00
John Berlin	d62ab14914	Add content sniffing to the html check of `_fill_text_type_and_charset` when the url ends with .json (#367 ) Detect if .json urls served with mtext/html are actually json and not html. Tests: updated test_content_rewriter.py to test for json sent as mime text/html	2018-08-20 15:03:28 -07:00
John Berlin	b4d4be8a64	Advandced preservation of media query based style rules and complete preservation of srcset values to fix https://github.com/webrecorder/webrecorder/issues/64 . (#359 ) wombat.js: - Finalized PreserveWorker that preserves srcset values and Media Query values - Defered extraction and preservation of the values to be preserved so that the UI thread is not clobered - Hooked into places where wombat rewrites the values we are interested in wombatPreservationWorker.js: - Updated handling of srcset extraction now that we are sending wombat srcset rewrites - Added check to see if we have seen a URL to be fetched - Added light polyfill of Promise and fetch if they are not defined in wombatPreservationWorker.js, for safari wombat.spec.js - Updated to include values necessary to work with PWorker changes.	2018-08-20 13:12:43 -07:00
Ilya Kreymer	841687fcc0	favicon and title pass-through: improvements from #356 , closes #342 - only add icons if in top frame, fix indent - favicon: move icon and title logic to default_banner to allow overriding default behavior (eg. Webrecorder uses its own favicon) - title: prepend original page title with 'pywb Live: ' or 'pywb Archived: ' in default banner to avoid confusion with actual site, also works for frameless mode.	2018-08-20 09:35:43 -07:00
Devhercule	dd76ed2818	Page title and favicon display (#356 ) Set favicon and title from top-most replay frame to the top frame (work from @Devhercule): Favicon display in no-proxy mode with framed_replay: true. When "iframe": "#replay_iframe", the icon of the tab in question is not visible (or a wrong icon is displayed provided from cache memor ) because of the presence of an added frame (#replay_iframe). The modification allows to get the replay_iframe favicon and pass it to the main frame to be correctly displayed in the tab. (see Issue #342)	2018-08-20 09:35:43 -07:00
Frank Sachsenheim	538ce88abc	Fixes an enumeration issue in docs/usage.rst (#364 ) Thanks! put it on develop so it can be part of next release.	2018-08-17 19:33:42 -07:00
John Berlin	c08d0d676a	Added facebook profile timeline fuzzy lookup rule to rules.yaml (#363 ) The value of __adt is incremented to indicate position in timeline as shown below and the profile_id or pagelet_token contained in the data param identify the facebook user the timeline data is for	2018-08-14 18:31:39 -07:00
John Berlin	5f938e6879	Less aggressive fuzzy matching on mime type. (#362 ) * When mime type match is made also match on extension in order to be less aggressive when matching prefix matches. * fuzzy matching: further restrict fuzzy matching on mime or ext match by ensuring the matched result differs only by query	2018-08-07 12:03:57 -07:00
Ilya Kreymer	5476d75294	htmlrewriter: if urls contain non-ascii chars, ensure the url is reencoded with expected charset, using same charset as for banner insert (#361 ) (instead of default iso-8859-1) before %-encoding and rewriting tests: add test to ensure correct %-encoding of utf-8 urls	2018-08-06 22:42:24 -07:00
John Berlin	1156032e0e	wombat.js: (#351 ) - improved worker rewriting: updated worker rewriting handles non-blob urls, added SharedWorker override ww_rw.js: - updated to be a much more complete rewriting system: overrides for importScripts, and fetch content_rewriter.py: - added wkr_ mod for handling Worker/SharedWorker, follows convention of service worker test_content_rewriter.py - added test for content rewriting of Worker/SharedWorker	2018-08-06 10:12:16 -07:00
Martin Hoppenheit	ac930c340a	Enhance CLI help messages. (#360 )	2018-08-05 17:26:38 -07:00
Ilya Kreymer	973a2dcff9	RegexRewriter Optimization (#354 ) * bump version to 2.0.5 * regexrewriter: work on splitting rules into separate class hierarchy from rewriter. rules logic and regexs can be inited once, while rewriter is per response being rewritten * regexrewriter: refactor remaining rewriters to use a shared rules factory to avoid reiniting rules * fix spacing * fixes: ensure custom rules added first, fix fb rewrite_dash content_rewriter tests: update tests to check with location-only and js obj proxy rewriter, check fb dash rewriter * simplify JSNoneRewriter	2018-08-05 16:40:19 -07:00
John Berlin	2f062cf5c7	New integration tests using webrecorder-tests: (#355 ) New integration tests using webrecorder-tests: - WR_TEST=true is set for integration test run (only run with py3.6, excluded for py2.7, 3.5) - Added .travis directory that includes two scripts: install.sh and test.sh. - install.sh handles all installation and test.sh handles running of unit or integration tests - sudo: true required to run headless chrome	2018-07-09 13:21:14 -07:00
John Berlin	3e7ec05cfe	Updated the gevent requirement: (#347 ) - Removed strict version limit (1.2.2), using latest gevent - changed the import "gevent.wsgi" to "gevent.pywsgi" (needed in latest gevent) - Installing with extra requirement gevent[dnspython] (existing dns resolver in gevent considered deprecated)	2018-07-09 11:28:11 -07:00
Ilya Kreymer	c3b6a580fd	bump version to 2.0.5	2018-07-06 15:06:52 -07:00
John Berlin	a52fdeef5b	Add issue and pull request templates (#353 ) * added issue pr templates	2018-07-06 15:06:02 -07:00
Ilya Kreymer	819e8adf48	text updates: (#352 ) - Update CHANGES.rst for 2.0.4 - Docs: Improve new proxy docs for (#316), fix URL-T->URI-T - Requirements: bump to wsgiprox>=1.5.1	2018-06-27 09:02:01 -07:00
John Berlin	0c087d383e	wombat.js: default_proxy_get improvement Facebook fix (#350 ) - if prop is requestAnimationFrame (raf) or cancelAnimationFrame and it was polyfilled by FB do not bind	2018-06-21 13:02:32 -07:00
John Berlin	0b87f32d10	Started the pywb 2.0.4 change list (#348 ) * Started the pywb 2.0.3 changelist by adding my commits * Finished documentation blurb about improving the un-rewrite regex	2018-06-21 11:35:49 -07:00
Ilya Kreymer	a192932858	slash redirects: if a capture ends with '/' (with or without a query), and requested url does not end in '/', (#346 ) redirect to '/' version, fixes #344	2018-06-14 18:01:14 -04:00
John Berlin	9404f89e31	client-side rewrite: Add rewriting of SVG Filter attribute for http://fotopaulmartens.netcam.nl/vucht.php (#341 )	2018-06-14 14:00:31 -04:00
John Berlin	bb5d46d19b	Server-side rewriting of script[src='js/...'] and link rel='import' (#334 ) * Updated html_rewriter.py to account for rewriting of script[src] values that are super relative (http://fotopaulmartens.netcam.nl/vucht.php) and added link rel='import' rewriting Updated test_html_rewriter.py for super rel script[src] rewriting and link rel='import' Updated wombat to account for the new rewriting of script[src] (http://fotopaulmartens.netcam.nl/vucht.php) Changed the postMessage override in wombat to use $wbwindow rather than window to fix google calendar replay / recording (http://qasrcc.org/events/calendar/) * Updated tests for forcing absolute and fixed merge conflicts * wombat: extracted removal and retrieval of __wb_original_src into own functions	2018-06-14 13:56:46 -04:00
Ilya Kreymer	ac5b4da9eb	Self-Redirect Fix (#345 ) * self-redirect fix for multiple continuous 3xx responses: if after one self-redirect, next match is also a redirect where url canonicalizes to same as previously rejected, also treat as self-redirect tests: add new test_self_redirect for generating example pattern where self-redirect could occur * self-redirect: ensure warc records are closed when handling self-redirect exception!	2018-06-14 10:48:32 -04:00
Ilya Kreymer	a3476d8baa	tests: also rewrite 'test.httpbin.org' to internal httpbin to allow subdomain testing	2018-06-08 16:20:43 -07:00
John Berlin	2825535ae2	Added FontFace to wombat overrides, https://drafts.csswg.org/css-font-loading/#FontFace-interface (#340 )	2018-06-01 15:13:43 -07:00
Ilya Kreymer	1e9f457ef1	setup: bump min versions for wsgiprox, warcio rewriterapp: add warc record param to _add_custom_params() to expose record to extensions	2018-05-31 17:29:37 -07:00

... 5 6 7 8 9 ...

2275 Commits