1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

1651 Commits

Author SHA1 Message Date
Ilya Kreymer
099a81b786 wb_frame: add support for optional 'wbinfo.outer_prefix' which if set, is used for making the top frame url (#191) 2016-08-20 00:03:21 -04:00
Ilya Kreymer
892ebacead cross-frame improvements: #191
- make hashchange functions use postMessage(), support setting top->replay and replay->top
- special postMessage() option for sending message from top frame -> replay frame
- fix history navigation, mimic top frame history same as replay frame as much as possible
- remove iframe_loaded() callback, using postMessage() notifications only
- include document title in 'load' message
2016-08-19 23:44:15 -04:00
Ilya Kreymer
6af1a7856e top-frame handling: don't access contents of top frame directly to support cross-domain frames
set __WB_top_Frame in wombat if is_framed property is true, don't check wbinfo (#191)
2016-08-19 13:59:42 -04:00
Ilya Kreymer
2fb1df34c9 recorder: add upload/streaming support with put_record=stream where the content being uploaded is already in WARC record form 2016-08-12 21:23:25 -04:00
Ilya Kreymer
c8b6a48005 webagg: use prepare_auth() to ensure Authorization header is set for http://user:pass@host urls 2016-08-12 21:22:17 -04:00
Ilya Kreymer
82d3b61523 recorder: catch exception in close_idle_files() if file no longer exists and ensure it's removed 2016-08-12 01:19:30 -04:00
Ilya Kreymer
594aff86d3 webagg: response self-redir: don't check if live, throw correct exception 2016-08-10 00:50:43 -04:00
Ilya Kreymer
92dfcbfcbe rewrite: don't rewrite 'www-authenticate' and 'proxy-authenicate' headers 2016-08-10 00:02:53 -04:00
Ilya Kreymer
cca0c01547 urlrewrite misc fixes:
- ensure content-length is converted to str
- templateview: support optional extensions
- fix test
2016-08-09 19:53:22 -04:00
Ilya Kreymer
b22a29df5f vidrw: also check for 'src' param as well as movie 2016-08-08 19:50:16 -04:00
Ilya Kreymer
c93d7ecafc webagg: Fix loading of url-lookup (url agnostic) revisits, ensure all params passed to cdx lookup, add tests for url-agnostic revisit lookup 2016-08-04 16:53:24 -04:00
Ilya Kreymer
e04095ffbb rewrite css: leave spaces in css url, eg url(' http://example.com/ ') rewritten with spaces intact 2016-08-01 10:29:04 -04:00
Ilya Kreymer
d5adc05cbb history rewrite check: don't check empty urls (#188) 2016-08-01 10:27:38 -04:00
Ilya Kreymer
20b161bf90 debug: print stracktrace when debugging 2016-08-01 02:12:15 -04:00
Ilya Kreymer
68b94fe671 record parser: arc-to-warc: support converting arc records to warc 'response' records on-the-fly to simplify
processing for tools that read WARC records. arc headers are converted to equivalent warc header, WARC-Record-ID
generated on the fly #190
2016-07-31 22:31:21 -04:00
Ilya Kreymer
66ca8d8b26 http block loader: raise exception for 4xx, 5xx responses
tests: add tests for limitreader posting, fix charset for frame test
2016-07-31 12:56:00 -04:00
Ilya Kreymer
db3b92e228 writing: add write_stream_to_file()function to be able to write to a WARC an existing input stream
refactor _do_write_req_resp to pass callback to actual writing (eg. _write_to_file)
2016-07-31 00:49:57 -04:00
Ilya Kreymer
1b09015954 recorder: split up _open_file() into get_new_filename() and allow_new_file() to customize skipping recording by returning false
from allow_new_file()
create_warcinfo_record() - switch to dict args over kwargs, update tests
2016-07-30 13:11:12 -04:00
Ilya Kreymer
c3389987cd frame timestamp extract: fix timestamp extracting timestamp for non-html resources for use with frame display (#189) 2016-07-28 10:06:10 -04:00
Ilya Kreymer
c8c0cecda3 rewrite improvements: if content-type is text/plain but mod is js_ or cs_, treat as js or css (#31)
header rewriter: ensure removed content-length and content-encoding are added back if no rewriting performed on response body
2016-07-27 21:34:58 -04:00
Ilya Kreymer
cd15dbfe48 head_insert: add decodeURI() to prefix to ensure unicode prefix string 2016-07-27 10:34:54 -04:00
Ilya Kreymer
498f87fb54 add Dockerfile to git! 2016-07-26 19:42:59 -04:00
Ilya Kreymer
a5696fc2d4 rewriter: range massage for patch as well as record 2016-07-26 19:42:32 -04:00
Ilya Kreymer
14cf68e4e5 custom record: don't override WARC-Date if provided in request header,
return chosen WARC-Date in json response
2016-07-26 19:41:47 -04:00
Ilya Kreymer
6928d72f68 rewrite css: handle rewriting with entities around url() css by leaving them in place, eg: url("http://example.com/") 2016-07-26 18:12:32 -04:00
Ilya Kreymer
782f95fa97 rules: rules for yt video info update 2016-07-24 19:39:43 -04:00
Ilya Kreymer
34a710e51a custom response: add utf-8 encoding, unless framed replay 2016-07-24 00:14:43 -04:00
Ilya Kreymer
9588e8622f responseloader: quote/unquote Webagg-Source-Coll header as source may contain unicode chars 2016-07-23 21:57:24 -04:00
Ilya Kreymer
42a2fa02fe wombat: history check fix: ensure check applies to absolute url #188 2016-07-16 13:32:46 -04:00
Ilya Kreymer
64a49b3e4d wombat: history change improvements (#188):
- ensure back, go, forward also propagated to top frame
- ensure pushState propagated as pushState and replaceState as replaceState to top frame
- security: prevent pushState or replaceState from changing to different domain
2016-07-16 13:18:08 -04:00
Ilya Kreymer
605ee22bec html rewrite: rewrite href on any element, not just few designated ones, as client side rewriting does the same.
avoids edge cases where href used on other tags (eg. a div) that results in incorrect rewriting, #187
2016-07-16 12:55:24 -04:00
Ilya Kreymer
b46cf8492f bump version to 0.31.5 2016-07-16 12:48:26 -04:00
Ilya Kreymer
ae290587f6 temp cookie store: add add_cookie() function for explicitly adding cookie, make expiry configurable
related to webrecorder/webrecorder#79
2016-07-01 10:15:59 -04:00
Ilya Kreymer
0b57f4a352 cookie notification: use postMessage() instead of callback to notify top frame of cookie setting with custom domain, #186 2016-07-01 09:58:25 -04:00
Ilya Kreymer
827ba9b50f cookies: add optional callback when setting cookie with domain (to experiment with server side handling of custom domain) 2016-06-30 12:26:18 -04:00
Ilya Kreymer
f4e5a7df5d Merge branch 'develop' 2016-06-16 00:41:08 -04:00
Ilya Kreymer
2fba97683a CHANGES for 0.31.0 2016-06-16 00:40:53 -04:00
Ilya Kreymer
5024234552 CHANGES for 0.31.0 2016-06-16 00:39:51 -04:00
Ilya Kreymer
d457223555 tests: add brotli compression test #184 2016-06-16 00:00:47 -04:00
Ilya Kreymer
457a1a564c bufferedreader: support brotli decompression
rewrite: handle Content-Encoding: br using brotli decompressor
setup: add brotlipy as dependency
2016-06-15 01:37:29 -04:00
Ilya Kreymer
bc36ae1302 rewriter: update for moved RewriterAMF in pywb 2016-06-14 00:14:29 -04:00
Ilya Kreymer
c1d7111841 webagg: store original 'source' value in cdx for properly mapping in WARC file resolver
error handling: ensure 'last_exc' is a string
2016-06-14 00:13:01 -04:00
Ilya Kreymer
3b68ef6540 html rewriter: cleanup rewrite_srcset, add more tests for empty rewrite 2016-06-12 01:57:21 -04:00
Ilya Kreymer
6a5842d983 Merge branch 'chdorner-fix-empty-srcset' into empty-attr 2016-06-12 01:53:53 -04:00
Ilya Kreymer
1bfec37970 html rewriter: attr rewrite ops check for empty/blank attr value, return empty string 2016-06-12 01:50:55 -04:00
Ilya Kreymer
d2c37f7d91 html parser: attr_value can now be None -- default to '' for string ops, write attr w/o assignment 2016-06-12 01:38:03 -04:00
Ilya Kreymer
0f530a3e0e dependencies: remove pyamf, update to latest surt (0.3.0) 2016-06-12 00:44:52 -04:00
Ilya Kreymer
9f299eb8e9 amf rewriting: move to separate file, mark as experimental, and don't include as default (for now) 2016-06-12 00:40:35 -04:00
Ilya Kreymer
527a3bc89c bufferedreader: be lenient of partially decompressed data: return what was decompressed, rather than just throw exception
esp. useful if record was decompressed, but an error in crc check
may add additional options for toggling 'leniency' if needed
2016-06-12 00:37:14 -04:00
Ilya Kreymer
4c7da0f6ef recorder: support overridings get_params() in subclass
multiwarcwriter: support multiple warcs in same dir, support random component in path, and a custom
key template for selecting current warc file, not related to current directory
2016-06-07 12:55:04 -04:00