Ilya Kreymer
92dfcbfcbe
rewrite: don't rewrite 'www-authenticate' and 'proxy-authenicate' headers
2016-08-10 00:02:53 -04:00
Ilya Kreymer
cca0c01547
urlrewrite misc fixes:
...
- ensure content-length is converted to str
- templateview: support optional extensions
- fix test
2016-08-09 19:53:22 -04:00
Ilya Kreymer
b22a29df5f
vidrw: also check for 'src' param as well as movie
2016-08-08 19:50:16 -04:00
Ilya Kreymer
c93d7ecafc
webagg: Fix loading of url-lookup (url agnostic) revisits, ensure all params passed to cdx lookup, add tests for url-agnostic revisit lookup
2016-08-04 16:53:24 -04:00
Ilya Kreymer
e04095ffbb
rewrite css: leave spaces in css url, eg url(' http://example.com/ ') rewritten with spaces intact
2016-08-01 10:29:04 -04:00
Ilya Kreymer
d5adc05cbb
history rewrite check: don't check empty urls ( #188 )
2016-08-01 10:27:38 -04:00
Ilya Kreymer
20b161bf90
debug: print stracktrace when debugging
2016-08-01 02:12:15 -04:00
Ilya Kreymer
68b94fe671
record parser: arc-to-warc: support converting arc records to warc 'response' records on-the-fly to simplify
...
processing for tools that read WARC records. arc headers are converted to equivalent warc header, WARC-Record-ID
generated on the fly #190
2016-07-31 22:31:21 -04:00
Ilya Kreymer
66ca8d8b26
http block loader: raise exception for 4xx, 5xx responses
...
tests: add tests for limitreader posting, fix charset for frame test
2016-07-31 12:56:00 -04:00
Ilya Kreymer
db3b92e228
writing: add write_stream_to_file()function to be able to write to a WARC an existing input stream
...
refactor _do_write_req_resp to pass callback to actual writing (eg. _write_to_file)
2016-07-31 00:49:57 -04:00
Ilya Kreymer
1b09015954
recorder: split up _open_file() into get_new_filename() and allow_new_file() to customize skipping recording by returning false
...
from allow_new_file()
create_warcinfo_record() - switch to dict args over kwargs, update tests
2016-07-30 13:11:12 -04:00
Ilya Kreymer
c3389987cd
frame timestamp extract: fix timestamp extracting timestamp for non-html resources for use with frame display ( #189 )
2016-07-28 10:06:10 -04:00
Ilya Kreymer
c8c0cecda3
rewrite improvements: if content-type is text/plain but mod is js_ or cs_, treat as js or css ( #31 )
...
header rewriter: ensure removed content-length and content-encoding are added back if no rewriting performed on response body
2016-07-27 21:34:58 -04:00
Ilya Kreymer
cd15dbfe48
head_insert: add decodeURI() to prefix to ensure unicode prefix string
2016-07-27 10:34:54 -04:00
Ilya Kreymer
498f87fb54
add Dockerfile to git!
2016-07-26 19:42:59 -04:00
Ilya Kreymer
a5696fc2d4
rewriter: range massage for patch as well as record
2016-07-26 19:42:32 -04:00
Ilya Kreymer
14cf68e4e5
custom record: don't override WARC-Date if provided in request header,
...
return chosen WARC-Date in json response
2016-07-26 19:41:47 -04:00
Ilya Kreymer
6928d72f68
rewrite css: handle rewriting with entities around url() css by leaving them in place, eg: url(" http://example.com/" ;)
2016-07-26 18:12:32 -04:00
Ilya Kreymer
782f95fa97
rules: rules for yt video info update
2016-07-24 19:39:43 -04:00
Ilya Kreymer
34a710e51a
custom response: add utf-8 encoding, unless framed replay
2016-07-24 00:14:43 -04:00
Ilya Kreymer
9588e8622f
responseloader: quote/unquote Webagg-Source-Coll header as source may contain unicode chars
2016-07-23 21:57:24 -04:00
Ilya Kreymer
42a2fa02fe
wombat: history check fix: ensure check applies to absolute url #188
2016-07-16 13:32:46 -04:00
Ilya Kreymer
64a49b3e4d
wombat: history change improvements ( #188 ):
...
- ensure back, go, forward also propagated to top frame
- ensure pushState propagated as pushState and replaceState as replaceState to top frame
- security: prevent pushState or replaceState from changing to different domain
2016-07-16 13:18:08 -04:00
Ilya Kreymer
605ee22bec
html rewrite: rewrite href on any element, not just few designated ones, as client side rewriting does the same.
...
avoids edge cases where href used on other tags (eg. a div) that results in incorrect rewriting, #187
2016-07-16 12:55:24 -04:00
Ilya Kreymer
b46cf8492f
bump version to 0.31.5
2016-07-16 12:48:26 -04:00
Ilya Kreymer
ae290587f6
temp cookie store: add add_cookie() function for explicitly adding cookie, make expiry configurable
...
related to webrecorder/webrecorder#79
2016-07-01 10:15:59 -04:00
Ilya Kreymer
0b57f4a352
cookie notification: use postMessage() instead of callback to notify top frame of cookie setting with custom domain, #186
2016-07-01 09:58:25 -04:00
Ilya Kreymer
827ba9b50f
cookies: add optional callback when setting cookie with domain (to experiment with server side handling of custom domain)
2016-06-30 12:26:18 -04:00
Ilya Kreymer
f4e5a7df5d
Merge branch 'develop'
2016-06-16 00:41:08 -04:00
Ilya Kreymer
2fba97683a
CHANGES for 0.31.0
2016-06-16 00:40:53 -04:00
Ilya Kreymer
5024234552
CHANGES for 0.31.0
2016-06-16 00:39:51 -04:00
Ilya Kreymer
d457223555
tests: add brotli compression test #184
2016-06-16 00:00:47 -04:00
Ilya Kreymer
457a1a564c
bufferedreader: support brotli decompression
...
rewrite: handle Content-Encoding: br using brotli decompressor
setup: add brotlipy as dependency
2016-06-15 01:37:29 -04:00
Ilya Kreymer
bc36ae1302
rewriter: update for moved RewriterAMF in pywb
2016-06-14 00:14:29 -04:00
Ilya Kreymer
c1d7111841
webagg: store original 'source' value in cdx for properly mapping in WARC file resolver
...
error handling: ensure 'last_exc' is a string
2016-06-14 00:13:01 -04:00
Ilya Kreymer
3b68ef6540
html rewriter: cleanup rewrite_srcset, add more tests for empty rewrite
2016-06-12 01:57:21 -04:00
Ilya Kreymer
6a5842d983
Merge branch 'chdorner-fix-empty-srcset' into empty-attr
2016-06-12 01:53:53 -04:00
Ilya Kreymer
1bfec37970
html rewriter: attr rewrite ops check for empty/blank attr value, return empty string
2016-06-12 01:50:55 -04:00
Ilya Kreymer
d2c37f7d91
html parser: attr_value can now be None -- default to '' for string ops, write attr w/o assignment
2016-06-12 01:38:03 -04:00
Ilya Kreymer
0f530a3e0e
dependencies: remove pyamf, update to latest surt (0.3.0)
2016-06-12 00:44:52 -04:00
Ilya Kreymer
9f299eb8e9
amf rewriting: move to separate file, mark as experimental, and don't include as default (for now)
2016-06-12 00:40:35 -04:00
Ilya Kreymer
527a3bc89c
bufferedreader: be lenient of partially decompressed data: return what was decompressed, rather than just throw exception
...
esp. useful if record was decompressed, but an error in crc check
may add additional options for toggling 'leniency' if needed
2016-06-12 00:37:14 -04:00
Ilya Kreymer
4c7da0f6ef
recorder: support overridings get_params() in subclass
...
multiwarcwriter: support multiple warcs in same dir, support random component in path, and a custom
key template for selecting current warc file, not related to current directory
2016-06-07 12:55:04 -04:00
Ilya Kreymer
3fec766e39
webagg: redis lookup: if url contains wildcard, scan redis keys to check multiple keys until one is found
...
webagg tests: fix test to include mime in live cdx
2016-06-07 12:54:28 -04:00
Ilya Kreymer
197ed5be98
loader: profile urls: ensure the profile prefix is removed from url before passing to loader, #180
2016-06-04 14:09:18 -04:00
chdorner
b54347f8d1
Allow rewriting of empty srcset attributes
...
Strictly speaking a `srcset` attribute must consist of one or more
strings
(http://w3c.github.io/html/semantics-embedded-content.html#element-attrdef-img-srcset )
However are websites out there that specify an empty string as the
value.
This commit makes sure that the rewriting does not break and just
returns an empty string.
2016-06-01 11:31:26 +02:00
Ilya Kreymer
d7c74b68de
video loader support: add VideoLoader, which uses youtube-dl to create a metadata record
...
of video info. Activated with explicit content_type param 'application/vnd.youtube-dl_formats+json'
2016-05-28 15:01:33 -07:00
Ilya Kreymer
30f9d0aca7
recorder put custom record: add support for put/post of a custom record. If put_record=
param is included, the request body
...
is written to the specified record type.
move record creation functions to the warcwriter
add tests for custom record
2016-05-26 20:49:40 -07:00
Ilya Kreymer
ea3efdf84d
responseloader: use PreparedRequest() to ensure url properly formatted
...
tests: update tests for latest, live data
2016-05-24 18:01:44 -07:00
Ilya Kreymer
e28f294302
wombat: ensure window.open() rewrite happens even in if open not in prototype
...
rewrite mod: allow empty "" as set mod, check for undefined
2016-05-24 17:55:17 -07:00