Ilya Kreymer
3f8480c37e
typo: fix typo after rename!
2016-10-20 11:47:06 -07:00
Ilya Kreymer
40b0a291a9
rewrite: don't rewrite ajax-requested html content
...
js regex: add special regex to rewrite '?location:'
2016-10-20 11:30:14 -07:00
Ilya Kreymer
52ce45beee
tests: additional test for new modifier form
2016-10-19 21:17:40 -07:00
Ilya Kreymer
7b45df7338
wburl: support for new modifier form: $mod as well as 'mod_'
2016-10-10 17:00:36 -07:00
Ilya Kreymer
b8769c7de0
proxy mode: use js_proxy rewriter for js embedded in html when in proxy mode #198
2016-10-01 21:08:08 -07:00
Ilya Kreymer
a4efa58d1e
proxy mode: add special 'proxy_js' rewriter which defaults to none rewriter, but supports custom rules
...
from rules.yaml, to avoid inserting WB_wombat_ overrides in proxy mode #198
2016-09-30 11:33:30 -07:00
Ilya Kreymer
2079ce191c
header rewriter improvements: better define headers rewritten/prefixed due to content rewrite vs url rewriting
...
when in proxy mode, don't rewrite headers unless related to content, transfer-encoding or cacheing (separate settings) #197
2016-09-30 09:02:50 -07:00
Ilya Kreymer
1bb7aa01ce
wburl improved scheme detection: use regex to match acceptable scheme before :/, don't treat something like 'a.com/?x=http://' as having a scheme, update tests to check for this
2016-09-20 15:44:50 -07:00
Ilya Kreymer
1fb6e9b5fa
rewrite: url rewriter: don't rewrite relative urls, only those that start with scheme, / or contain ../ #195
...
update tests to reflect this new behavior
2016-09-14 13:04:46 -07:00
Ilya Kreymer
f47ae0bb7e
rewrite: for rewriting on* attr, add 'window.' before WB_wombat_ as window may not be in scope (if no '.' before WB_wombat)
2016-09-08 18:38:35 -07:00
Ilya Kreymer
1fe201c528
rewrite: html: rewrite svg <image> tag
...
client: update textContent after rewrite_style() in rewrite_elem()
2016-09-08 10:06:47 -07:00
Ilya Kreymer
92dfcbfcbe
rewrite: don't rewrite 'www-authenticate' and 'proxy-authenicate' headers
2016-08-10 00:02:53 -04:00
Ilya Kreymer
e04095ffbb
rewrite css: leave spaces in css url, eg url(' http://example.com/ ') rewritten with spaces intact
2016-08-01 10:29:04 -04:00
Ilya Kreymer
c8c0cecda3
rewrite improvements: if content-type is text/plain but mod is js_ or cs_, treat as js or css ( #31 )
...
header rewriter: ensure removed content-length and content-encoding are added back if no rewriting performed on response body
2016-07-27 21:34:58 -04:00
Ilya Kreymer
6928d72f68
rewrite css: handle rewriting with entities around url() css by leaving them in place, eg: url(" http://example.com/" ;)
2016-07-26 18:12:32 -04:00
Ilya Kreymer
605ee22bec
html rewrite: rewrite href on any element, not just few designated ones, as client side rewriting does the same.
...
avoids edge cases where href used on other tags (eg. a div) that results in incorrect rewriting, #187
2016-07-16 12:55:24 -04:00
Ilya Kreymer
457a1a564c
bufferedreader: support brotli decompression
...
rewrite: handle Content-Encoding: br using brotli decompressor
setup: add brotlipy as dependency
2016-06-15 01:37:29 -04:00
Ilya Kreymer
3b68ef6540
html rewriter: cleanup rewrite_srcset, add more tests for empty rewrite
2016-06-12 01:57:21 -04:00
Ilya Kreymer
6a5842d983
Merge branch 'chdorner-fix-empty-srcset' into empty-attr
2016-06-12 01:53:53 -04:00
Ilya Kreymer
1bfec37970
html rewriter: attr rewrite ops check for empty/blank attr value, return empty string
2016-06-12 01:50:55 -04:00
Ilya Kreymer
d2c37f7d91
html parser: attr_value can now be None -- default to '' for string ops, write attr w/o assignment
2016-06-12 01:38:03 -04:00
Ilya Kreymer
9f299eb8e9
amf rewriting: move to separate file, mark as experimental, and don't include as default (for now)
2016-06-12 00:40:35 -04:00
chdorner
b54347f8d1
Allow rewriting of empty srcset attributes
...
Strictly speaking a `srcset` attribute must consist of one or more
strings
(http://w3c.github.io/html/semantics-embedded-content.html#element-attrdef-img-srcset )
However are websites out there that specify an empty string as the
value.
This commit makes sure that the rewriting does not break and just
returns an empty string.
2016-06-01 11:31:26 +02:00
Ilya Kreymer
87da25c703
post request mapping improvements: work on #178 , including:
...
- mapping multipart/form-data same as x-www-form-urlencoded
- parsing application/x-amf with pyamf
- RewriteContentAMF for rewriting AMF response to match request
- default encoding of other POST data as base64 encoded __wb_post_data param
2016-05-06 10:19:08 -07:00
Ilya Kreymer
1bea9d73ed
rewrite: rewrite .frameElement -> WB_wombat_frameElement server-side to handle cases when default frameElement can not be overridden
2016-04-30 01:36:26 -07:00
Ilya Kreymer
37609ebdc9
rewrite: support custom cookie_rewriter passed to 'rewrite_content'
2016-04-30 01:35:55 -07:00
Ilya Kreymer
e669ecba15
rewrite: html rewrite fix such that head insert is placed before other <script> tags even if no head
2016-04-30 01:32:16 -07:00
Ilya Kreymer
658303caad
rewrite headers: undo not rewriting x- headers, needs more research and exclusions (eg. x-frame-options)
2016-04-26 13:11:08 -07:00
Ilya Kreymer
cf6cfc0c44
tests: fix cookie rewriter tests to exclude 2.6
2016-04-26 10:32:43 -07:00
Ilya Kreymer
4a60e15577
cookie rewrite improvements: #177
...
- don't remove max-age and expires if in 'live' rewrite mode (flag set on urlrewriter)
- remove secure only if replay prefix is not https
- fix expires UTC->GMT as cookie parsing chokes on UTC
- other rewriting: don't append rewrite prefix to x- headers
tests: add more cookie rewriting tests
2016-04-26 09:45:23 -07:00
Ilya Kreymer
95a212ed79
wombat rewrite: add custom X-Pywb-Requested-With header with turns off rewriting and is never sent upstream
2016-04-06 12:05:53 -07:00
Ilya Kreymer
fe0f8ed1d8
Merge branch '0.11.3' into develop
2016-03-16 14:38:49 -07:00
Ilya Kreymer
f962418c1f
html rewrite typo: ensure rw_mod is set for meta content rewrite
2016-03-16 14:27:55 -07:00
Ilya Kreymer
c26660e20f
cookies: use httplib headers pair list instead of requests headers dict to avoid 'set-cookie' headers being concatenated, as that messes up parsing in 3.5.1
2016-03-16 09:47:55 -07:00
Ilya Kreymer
e5ca9bf601
Merge branch 'master' into py3
2016-03-10 10:53:30 -08:00
Ilya Kreymer
effd618bb3
tests: add parse_comment test for html_rewriter
2016-03-10 10:10:51 -08:00
Ilya Kreymer
bb806d7f26
Merge branch 'develop' into py3
2016-03-03 14:09:00 -08:00
Ilya Kreymer
fc5d7cc7cd
rewrite: add rewriting of <meta> content="" attribute if it is a url
2016-02-25 18:49:31 -08:00
Ilya Kreymer
8fc789cc8f
rewrite: leave out charset in top-frame and don't modify it in replay frame
...
to allow browser to detect best charset, as it would on original page if it is absent)
see #170 for details
2016-02-25 18:25:53 -08:00
Ilya Kreymer
cebd6b6239
rewrite: fix rewriting encoding -- for best rewriting, keep strategy of encoding
...
insert to match page, then using latin-1 for rewriting. support for non-ascii
based encoding still needed
2016-02-23 18:07:34 -08:00
Ilya Kreymer
3a584a1ec3
py3: all tests pass, at last!
...
but not yet py2... need to resolve encoding in rewriting issues
2016-02-23 13:26:53 -08:00
Ilya Kreymer
bd841b91a9
more python 3 support work -- pywb.cdx, pywb.warc tests succeed
...
most relative imports replaced with absolute
2016-02-18 21:26:40 -08:00
Robert Knight
83a33e0541
Resolve relative canonical paths if rewriting is disabled
...
For Via, we want rel=canonical links to resolve to the same
absolute URL as it did on the original page.
For absolute URLs, no rewriting is necessary. If the original
rel=canonical URL was relative however, it needs to be resolved
relative to the original URL.
See https://github.com/hypothesis/via/issues/65 for context.
2015-12-10 08:31:50 +00:00
Ilya Kreymer
2922801b7c
test: rewrite_live: pass Accepnt-Encoding: identity to disable gzip, simplified version of fix in #151
2015-11-26 00:34:45 -08:00
Ilya Kreymer
eeb35ea3b4
proxy: add ProxyRouter wrapper to check for content-length and, if missing, perform full buffering (http1.0) or chunked encoding (http1.1) (separate from replay view buffering)
...
add tests for buffering and chunked encoding, fixes #143 , also tests no banner url-rewrite only proxy related to #142
2015-10-25 18:02:51 -07:00
Ilya Kreymer
0c96591c49
proxy: change HttpsUrlRewriter to SchemeOnlyUrlRewriter, which fixes http->https or https->http to match
...
the scheme of the current page.
url-rewrite-only mode: add uo_ mod and use that to rewrite only urls (no banner, no client side rewrite)
addresses #142
2015-10-24 15:10:30 -07:00
Ilya Kreymer
979fcaeda3
tests: fix mock YoutubeDLWrapper after refactor, #141
2015-10-23 12:19:15 -07:00
Ilya Kreymer
39e824cb3a
live rewite proxy: decouple having http/https proxy from recording,
...
move youtubedl wrapper calls, metadata add calls to live rewrite proxy class for easier extension
closes #141 also improves #136
2015-10-23 11:57:12 -07:00
Jack Cushman
633eb31f57
Use webencodings to encode head_insert_str.
2015-10-22 16:40:59 -04:00
Ilya Kreymer
9e1447c448
rewrite: strip spaces when rewriting urls in html, closes #134
2015-10-20 12:59:07 -07:00