1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-22 03:21:42 +01:00

274 Commits

Author SHA1 Message Date
Ilya Kreymer
3f8480c37e typo: fix typo after rename! 2016-10-20 11:47:06 -07:00
Ilya Kreymer
40b0a291a9 rewrite: don't rewrite ajax-requested html content
js regex: add special regex to rewrite '?location:'
2016-10-20 11:30:14 -07:00
Ilya Kreymer
52ce45beee tests: additional test for new modifier form 2016-10-19 21:17:40 -07:00
Ilya Kreymer
7b45df7338 wburl: support for new modifier form: $mod as well as 'mod_' 2016-10-10 17:00:36 -07:00
Ilya Kreymer
b8769c7de0 proxy mode: use js_proxy rewriter for js embedded in html when in proxy mode #198 2016-10-01 21:08:08 -07:00
Ilya Kreymer
a4efa58d1e proxy mode: add special 'proxy_js' rewriter which defaults to none rewriter, but supports custom rules
from rules.yaml, to avoid inserting WB_wombat_ overrides in proxy mode #198
2016-09-30 11:33:30 -07:00
Ilya Kreymer
2079ce191c header rewriter improvements: better define headers rewritten/prefixed due to content rewrite vs url rewriting
when in proxy mode, don't rewrite headers unless related to content, transfer-encoding or cacheing (separate settings) #197
2016-09-30 09:02:50 -07:00
Ilya Kreymer
1bb7aa01ce wburl improved scheme detection: use regex to match acceptable scheme before :/, don't treat something like 'a.com/?x=http://' as having a scheme, update tests to check for this 2016-09-20 15:44:50 -07:00
Ilya Kreymer
1fb6e9b5fa rewrite: url rewriter: don't rewrite relative urls, only those that start with scheme, / or contain ../ #195
update tests to reflect this new behavior
2016-09-14 13:04:46 -07:00
Ilya Kreymer
f47ae0bb7e rewrite: for rewriting on* attr, add 'window.' before WB_wombat_ as window may not be in scope (if no '.' before WB_wombat) 2016-09-08 18:38:35 -07:00
Ilya Kreymer
1fe201c528 rewrite: html: rewrite svg <image> tag
client: update textContent after rewrite_style() in rewrite_elem()
2016-09-08 10:06:47 -07:00
Ilya Kreymer
92dfcbfcbe rewrite: don't rewrite 'www-authenticate' and 'proxy-authenicate' headers 2016-08-10 00:02:53 -04:00
Ilya Kreymer
e04095ffbb rewrite css: leave spaces in css url, eg url(' http://example.com/ ') rewritten with spaces intact 2016-08-01 10:29:04 -04:00
Ilya Kreymer
c8c0cecda3 rewrite improvements: if content-type is text/plain but mod is js_ or cs_, treat as js or css (#31)
header rewriter: ensure removed content-length and content-encoding are added back if no rewriting performed on response body
2016-07-27 21:34:58 -04:00
Ilya Kreymer
6928d72f68 rewrite css: handle rewriting with entities around url() css by leaving them in place, eg: url(&quot;http://example.com/&quot;) 2016-07-26 18:12:32 -04:00
Ilya Kreymer
605ee22bec html rewrite: rewrite href on any element, not just few designated ones, as client side rewriting does the same.
avoids edge cases where href used on other tags (eg. a div) that results in incorrect rewriting, #187
2016-07-16 12:55:24 -04:00
Ilya Kreymer
457a1a564c bufferedreader: support brotli decompression
rewrite: handle Content-Encoding: br using brotli decompressor
setup: add brotlipy as dependency
2016-06-15 01:37:29 -04:00
Ilya Kreymer
3b68ef6540 html rewriter: cleanup rewrite_srcset, add more tests for empty rewrite 2016-06-12 01:57:21 -04:00
Ilya Kreymer
6a5842d983 Merge branch 'chdorner-fix-empty-srcset' into empty-attr 2016-06-12 01:53:53 -04:00
Ilya Kreymer
1bfec37970 html rewriter: attr rewrite ops check for empty/blank attr value, return empty string 2016-06-12 01:50:55 -04:00
Ilya Kreymer
d2c37f7d91 html parser: attr_value can now be None -- default to '' for string ops, write attr w/o assignment 2016-06-12 01:38:03 -04:00
Ilya Kreymer
9f299eb8e9 amf rewriting: move to separate file, mark as experimental, and don't include as default (for now) 2016-06-12 00:40:35 -04:00
chdorner
b54347f8d1 Allow rewriting of empty srcset attributes
Strictly speaking a `srcset` attribute must consist of one or more
strings
(http://w3c.github.io/html/semantics-embedded-content.html#element-attrdef-img-srcset)
However are websites out there that specify an empty string as the
value.

This commit makes sure that the rewriting does not break and just
returns an empty string.
2016-06-01 11:31:26 +02:00
Ilya Kreymer
87da25c703 post request mapping improvements: work on #178, including:
- mapping multipart/form-data same as x-www-form-urlencoded
- parsing application/x-amf with pyamf
- RewriteContentAMF for rewriting AMF response to match request
- default encoding of other POST data as base64 encoded __wb_post_data param
2016-05-06 10:19:08 -07:00
Ilya Kreymer
1bea9d73ed rewrite: rewrite .frameElement -> WB_wombat_frameElement server-side to handle cases when default frameElement can not be overridden 2016-04-30 01:36:26 -07:00
Ilya Kreymer
37609ebdc9 rewrite: support custom cookie_rewriter passed to 'rewrite_content' 2016-04-30 01:35:55 -07:00
Ilya Kreymer
e669ecba15 rewrite: html rewrite fix such that head insert is placed before other <script> tags even if no head 2016-04-30 01:32:16 -07:00
Ilya Kreymer
658303caad rewrite headers: undo not rewriting x- headers, needs more research and exclusions (eg. x-frame-options) 2016-04-26 13:11:08 -07:00
Ilya Kreymer
cf6cfc0c44 tests: fix cookie rewriter tests to exclude 2.6 2016-04-26 10:32:43 -07:00
Ilya Kreymer
4a60e15577 cookie rewrite improvements: #177
- don't remove max-age and expires if in 'live' rewrite mode (flag set on urlrewriter)
- remove secure only if replay prefix is not https
- fix expires UTC->GMT as cookie parsing chokes on UTC
- other rewriting: don't append rewrite prefix to x- headers
tests: add more cookie rewriting tests
2016-04-26 09:45:23 -07:00
Ilya Kreymer
95a212ed79 wombat rewrite: add custom X-Pywb-Requested-With header with turns off rewriting and is never sent upstream 2016-04-06 12:05:53 -07:00
Ilya Kreymer
fe0f8ed1d8 Merge branch '0.11.3' into develop 2016-03-16 14:38:49 -07:00
Ilya Kreymer
f962418c1f html rewrite typo: ensure rw_mod is set for meta content rewrite 2016-03-16 14:27:55 -07:00
Ilya Kreymer
c26660e20f cookies: use httplib headers pair list instead of requests headers dict to avoid 'set-cookie' headers being concatenated, as that messes up parsing in 3.5.1 2016-03-16 09:47:55 -07:00
Ilya Kreymer
e5ca9bf601 Merge branch 'master' into py3 2016-03-10 10:53:30 -08:00
Ilya Kreymer
effd618bb3 tests: add parse_comment test for html_rewriter 2016-03-10 10:10:51 -08:00
Ilya Kreymer
bb806d7f26 Merge branch 'develop' into py3 2016-03-03 14:09:00 -08:00
Ilya Kreymer
fc5d7cc7cd rewrite: add rewriting of <meta> content="" attribute if it is a url 2016-02-25 18:49:31 -08:00
Ilya Kreymer
8fc789cc8f rewrite: leave out charset in top-frame and don't modify it in replay frame
to allow browser to detect best charset, as it would on original page if it is absent)
see #170 for details
2016-02-25 18:25:53 -08:00
Ilya Kreymer
cebd6b6239 rewrite: fix rewriting encoding -- for best rewriting, keep strategy of encoding
insert to match page, then using latin-1 for rewriting. support for non-ascii
based encoding still needed
2016-02-23 18:07:34 -08:00
Ilya Kreymer
3a584a1ec3 py3: all tests pass, at last!
but not yet py2... need to resolve encoding in rewriting issues
2016-02-23 13:26:53 -08:00
Ilya Kreymer
bd841b91a9 more python 3 support work -- pywb.cdx, pywb.warc tests succeed
most relative imports replaced with absolute
2016-02-18 21:26:40 -08:00
Robert Knight
83a33e0541 Resolve relative canonical paths if rewriting is disabled
For Via, we want rel=canonical links to resolve to the same
absolute URL as it did on the original page.

For absolute URLs, no rewriting is necessary. If the original
rel=canonical URL was relative however, it needs to be resolved
relative to the original URL.

See https://github.com/hypothesis/via/issues/65 for context.
2015-12-10 08:31:50 +00:00
Ilya Kreymer
2922801b7c test: rewrite_live: pass Accepnt-Encoding: identity to disable gzip, simplified version of fix in #151 2015-11-26 00:34:45 -08:00
Ilya Kreymer
eeb35ea3b4 proxy: add ProxyRouter wrapper to check for content-length and, if missing, perform full buffering (http1.0) or chunked encoding (http1.1) (separate from replay view buffering)
add tests for buffering and chunked encoding, fixes #143, also tests no banner url-rewrite only proxy related to #142
2015-10-25 18:02:51 -07:00
Ilya Kreymer
0c96591c49 proxy: change HttpsUrlRewriter to SchemeOnlyUrlRewriter, which fixes http->https or https->http to match
the scheme of the current page.
url-rewrite-only mode: add uo_ mod and use that to rewrite only urls (no banner, no client side rewrite)
addresses #142
2015-10-24 15:10:30 -07:00
Ilya Kreymer
979fcaeda3 tests: fix mock YoutubeDLWrapper after refactor, #141 2015-10-23 12:19:15 -07:00
Ilya Kreymer
39e824cb3a live rewite proxy: decouple having http/https proxy from recording,
move youtubedl wrapper calls, metadata add calls to live rewrite proxy class for easier extension
closes #141 also improves #136
2015-10-23 11:57:12 -07:00
Jack Cushman
633eb31f57 Use webencodings to encode head_insert_str. 2015-10-22 16:40:59 -04:00
Ilya Kreymer
9e1447c448 rewrite: strip spaces when rewriting urls in html, closes #134 2015-10-20 12:59:07 -07:00