1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

1692 Commits

Author SHA1 Message Date
Ilya Kreymer
b22a29df5f vidrw: also check for 'src' param as well as movie 2016-08-08 19:50:16 -04:00
Ilya Kreymer
c93d7ecafc webagg: Fix loading of url-lookup (url agnostic) revisits, ensure all params passed to cdx lookup, add tests for url-agnostic revisit lookup 2016-08-04 16:53:24 -04:00
Ilya Kreymer
e04095ffbb rewrite css: leave spaces in css url, eg url(' http://example.com/ ') rewritten with spaces intact 2016-08-01 10:29:04 -04:00
Ilya Kreymer
d5adc05cbb history rewrite check: don't check empty urls (#188) 2016-08-01 10:27:38 -04:00
Ilya Kreymer
20b161bf90 debug: print stracktrace when debugging 2016-08-01 02:12:15 -04:00
Ilya Kreymer
68b94fe671 record parser: arc-to-warc: support converting arc records to warc 'response' records on-the-fly to simplify
processing for tools that read WARC records. arc headers are converted to equivalent warc header, WARC-Record-ID
generated on the fly #190
2016-07-31 22:31:21 -04:00
Ilya Kreymer
66ca8d8b26 http block loader: raise exception for 4xx, 5xx responses
tests: add tests for limitreader posting, fix charset for frame test
2016-07-31 12:56:00 -04:00
Ilya Kreymer
db3b92e228 writing: add write_stream_to_file()function to be able to write to a WARC an existing input stream
refactor _do_write_req_resp to pass callback to actual writing (eg. _write_to_file)
2016-07-31 00:49:57 -04:00
Ilya Kreymer
1b09015954 recorder: split up _open_file() into get_new_filename() and allow_new_file() to customize skipping recording by returning false
from allow_new_file()
create_warcinfo_record() - switch to dict args over kwargs, update tests
2016-07-30 13:11:12 -04:00
Ilya Kreymer
c3389987cd frame timestamp extract: fix timestamp extracting timestamp for non-html resources for use with frame display (#189) 2016-07-28 10:06:10 -04:00
Ilya Kreymer
c8c0cecda3 rewrite improvements: if content-type is text/plain but mod is js_ or cs_, treat as js or css (#31)
header rewriter: ensure removed content-length and content-encoding are added back if no rewriting performed on response body
2016-07-27 21:34:58 -04:00
Ilya Kreymer
cd15dbfe48 head_insert: add decodeURI() to prefix to ensure unicode prefix string 2016-07-27 10:34:54 -04:00
Ilya Kreymer
498f87fb54 add Dockerfile to git! 2016-07-26 19:42:59 -04:00
Ilya Kreymer
a5696fc2d4 rewriter: range massage for patch as well as record 2016-07-26 19:42:32 -04:00
Ilya Kreymer
14cf68e4e5 custom record: don't override WARC-Date if provided in request header,
return chosen WARC-Date in json response
2016-07-26 19:41:47 -04:00
Ilya Kreymer
6928d72f68 rewrite css: handle rewriting with entities around url() css by leaving them in place, eg: url("http://example.com/") 2016-07-26 18:12:32 -04:00
Ilya Kreymer
782f95fa97 rules: rules for yt video info update 2016-07-24 19:39:43 -04:00
Ilya Kreymer
34a710e51a custom response: add utf-8 encoding, unless framed replay 2016-07-24 00:14:43 -04:00
Ilya Kreymer
9588e8622f responseloader: quote/unquote Webagg-Source-Coll header as source may contain unicode chars 2016-07-23 21:57:24 -04:00
Ilya Kreymer
42a2fa02fe wombat: history check fix: ensure check applies to absolute url #188 2016-07-16 13:32:46 -04:00
Ilya Kreymer
64a49b3e4d wombat: history change improvements (#188):
- ensure back, go, forward also propagated to top frame
- ensure pushState propagated as pushState and replaceState as replaceState to top frame
- security: prevent pushState or replaceState from changing to different domain
2016-07-16 13:18:08 -04:00
Ilya Kreymer
605ee22bec html rewrite: rewrite href on any element, not just few designated ones, as client side rewriting does the same.
avoids edge cases where href used on other tags (eg. a div) that results in incorrect rewriting, #187
2016-07-16 12:55:24 -04:00
Ilya Kreymer
b46cf8492f bump version to 0.31.5 2016-07-16 12:48:26 -04:00
Ilya Kreymer
ae290587f6 temp cookie store: add add_cookie() function for explicitly adding cookie, make expiry configurable
related to webrecorder/webrecorder#79
2016-07-01 10:15:59 -04:00
Ilya Kreymer
0b57f4a352 cookie notification: use postMessage() instead of callback to notify top frame of cookie setting with custom domain, #186 2016-07-01 09:58:25 -04:00
Ilya Kreymer
827ba9b50f cookies: add optional callback when setting cookie with domain (to experiment with server side handling of custom domain) 2016-06-30 12:26:18 -04:00
Ilya Kreymer
f4e5a7df5d Merge branch 'develop' 2016-06-16 00:41:08 -04:00
Ilya Kreymer
2fba97683a CHANGES for 0.31.0 2016-06-16 00:40:53 -04:00
Ilya Kreymer
5024234552 CHANGES for 0.31.0 2016-06-16 00:39:51 -04:00
Ilya Kreymer
d457223555 tests: add brotli compression test #184 2016-06-16 00:00:47 -04:00
Ilya Kreymer
457a1a564c bufferedreader: support brotli decompression
rewrite: handle Content-Encoding: br using brotli decompressor
setup: add brotlipy as dependency
2016-06-15 01:37:29 -04:00
Ilya Kreymer
bc36ae1302 rewriter: update for moved RewriterAMF in pywb 2016-06-14 00:14:29 -04:00
Ilya Kreymer
c1d7111841 webagg: store original 'source' value in cdx for properly mapping in WARC file resolver
error handling: ensure 'last_exc' is a string
2016-06-14 00:13:01 -04:00
Ilya Kreymer
3b68ef6540 html rewriter: cleanup rewrite_srcset, add more tests for empty rewrite 2016-06-12 01:57:21 -04:00
Ilya Kreymer
6a5842d983 Merge branch 'chdorner-fix-empty-srcset' into empty-attr 2016-06-12 01:53:53 -04:00
Ilya Kreymer
1bfec37970 html rewriter: attr rewrite ops check for empty/blank attr value, return empty string 2016-06-12 01:50:55 -04:00
Ilya Kreymer
d2c37f7d91 html parser: attr_value can now be None -- default to '' for string ops, write attr w/o assignment 2016-06-12 01:38:03 -04:00
Ilya Kreymer
0f530a3e0e dependencies: remove pyamf, update to latest surt (0.3.0) 2016-06-12 00:44:52 -04:00
Ilya Kreymer
9f299eb8e9 amf rewriting: move to separate file, mark as experimental, and don't include as default (for now) 2016-06-12 00:40:35 -04:00
Ilya Kreymer
527a3bc89c bufferedreader: be lenient of partially decompressed data: return what was decompressed, rather than just throw exception
esp. useful if record was decompressed, but an error in crc check
may add additional options for toggling 'leniency' if needed
2016-06-12 00:37:14 -04:00
Ilya Kreymer
4c7da0f6ef recorder: support overridings get_params() in subclass
multiwarcwriter: support multiple warcs in same dir, support random component in path, and a custom
key template for selecting current warc file, not related to current directory
2016-06-07 12:55:04 -04:00
Ilya Kreymer
3fec766e39 webagg: redis lookup: if url contains wildcard, scan redis keys to check multiple keys until one is found
webagg tests: fix test to include mime in live cdx
2016-06-07 12:54:28 -04:00
Ilya Kreymer
197ed5be98 loader: profile urls: ensure the profile prefix is removed from url before passing to loader, #180 2016-06-04 14:09:18 -04:00
chdorner
b54347f8d1 Allow rewriting of empty srcset attributes
Strictly speaking a `srcset` attribute must consist of one or more
strings
(http://w3c.github.io/html/semantics-embedded-content.html#element-attrdef-img-srcset)
However are websites out there that specify an empty string as the
value.

This commit makes sure that the rewriting does not break and just
returns an empty string.
2016-06-01 11:31:26 +02:00
Ilya Kreymer
d7c74b68de video loader support: add VideoLoader, which uses youtube-dl to create a metadata record
of video info. Activated with explicit content_type param 'application/vnd.youtube-dl_formats+json'
2016-05-28 15:01:33 -07:00
Ilya Kreymer
30f9d0aca7 recorder put custom record: add support for put/post of a custom record. If put_record= param is included, the request body
is written to the specified record type.
move record creation functions to the warcwriter
add tests for custom record
2016-05-26 20:49:40 -07:00
Ilya Kreymer
ea3efdf84d responseloader: use PreparedRequest() to ensure url properly formatted
tests: update tests for latest, live data
2016-05-24 18:01:44 -07:00
Ilya Kreymer
e28f294302 wombat: ensure window.open() rewrite happens even in if open not in prototype
rewrite mod: allow empty "" as set mod, check for undefined
2016-05-24 17:55:17 -07:00
Ilya Kreymer
f858be4d7d Merge branch 'frame-postMessage' into develop 2016-05-24 15:40:51 -07:00
Ilya Kreymer
84c829467b framed replay: use postMessage() instead of custom function to notify of replay frame changing url, include different type of change, eg. load, replaceState, pushState, #181 2016-05-23 12:10:10 -07:00