1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-25 07:27:47 +01:00

69 Commits

Author SHA1 Message Date
Ilya Kreymer
a4b770d34e new-pywb refactor!
frontendapp compatibility
- add support for separate not found page for 404s (not_found.html)
- support for exception handling with error template (error.html)
- support for home page (index.html)
- add memento headers for replay
- add referrer fallback check
- tests: port integration tests for front-end replay, cdx server
- not included: proxy mode, exact redirect mode, non-framed replay
- move unused tests to tests_disabled
- cli: add optional werkzeug profiler with --profile flag
2017-02-27 19:07:51 -08:00
Ilya Kreymer
3f8480c37e typo: fix typo after rename! 2016-10-20 11:47:06 -07:00
Ilya Kreymer
40b0a291a9 rewrite: don't rewrite ajax-requested html content
js regex: add special regex to rewrite '?location:'
2016-10-20 11:30:14 -07:00
Ilya Kreymer
b8769c7de0 proxy mode: use js_proxy rewriter for js embedded in html when in proxy mode #198 2016-10-01 21:08:08 -07:00
Ilya Kreymer
a4efa58d1e proxy mode: add special 'proxy_js' rewriter which defaults to none rewriter, but supports custom rules
from rules.yaml, to avoid inserting WB_wombat_ overrides in proxy mode #198
2016-09-30 11:33:30 -07:00
Ilya Kreymer
c8c0cecda3 rewrite improvements: if content-type is text/plain but mod is js_ or cs_, treat as js or css (#31)
header rewriter: ensure removed content-length and content-encoding are added back if no rewriting performed on response body
2016-07-27 21:34:58 -04:00
Ilya Kreymer
457a1a564c bufferedreader: support brotli decompression
rewrite: handle Content-Encoding: br using brotli decompressor
setup: add brotlipy as dependency
2016-06-15 01:37:29 -04:00
Ilya Kreymer
9f299eb8e9 amf rewriting: move to separate file, mark as experimental, and don't include as default (for now) 2016-06-12 00:40:35 -04:00
Ilya Kreymer
87da25c703 post request mapping improvements: work on #178, including:
- mapping multipart/form-data same as x-www-form-urlencoded
- parsing application/x-amf with pyamf
- RewriteContentAMF for rewriting AMF response to match request
- default encoding of other POST data as base64 encoded __wb_post_data param
2016-05-06 10:19:08 -07:00
Ilya Kreymer
37609ebdc9 rewrite: support custom cookie_rewriter passed to 'rewrite_content' 2016-04-30 01:35:55 -07:00
Ilya Kreymer
4a60e15577 cookie rewrite improvements: #177
- don't remove max-age and expires if in 'live' rewrite mode (flag set on urlrewriter)
- remove secure only if replay prefix is not https
- fix expires UTC->GMT as cookie parsing chokes on UTC
- other rewriting: don't append rewrite prefix to x- headers
tests: add more cookie rewriting tests
2016-04-26 09:45:23 -07:00
Ilya Kreymer
bb806d7f26 Merge branch 'develop' into py3 2016-03-03 14:09:00 -08:00
Ilya Kreymer
8fc789cc8f rewrite: leave out charset in top-frame and don't modify it in replay frame
to allow browser to detect best charset, as it would on original page if it is absent)
see #170 for details
2016-02-25 18:25:53 -08:00
Ilya Kreymer
cebd6b6239 rewrite: fix rewriting encoding -- for best rewriting, keep strategy of encoding
insert to match page, then using latin-1 for rewriting. support for non-ascii
based encoding still needed
2016-02-23 18:07:34 -08:00
Ilya Kreymer
3a584a1ec3 py3: all tests pass, at last!
but not yet py2... need to resolve encoding in rewriting issues
2016-02-23 13:26:53 -08:00
Ilya Kreymer
bd841b91a9 more python 3 support work -- pywb.cdx, pywb.warc tests succeed
most relative imports replaced with absolute
2016-02-18 21:26:40 -08:00
Ilya Kreymer
0c96591c49 proxy: change HttpsUrlRewriter to SchemeOnlyUrlRewriter, which fixes http->https or https->http to match
the scheme of the current page.
url-rewrite-only mode: add uo_ mod and use that to rewrite only urls (no banner, no client side rewrite)
addresses #142
2015-10-24 15:10:30 -07:00
Jack Cushman
633eb31f57 Use webencodings to encode head_insert_str. 2015-10-22 16:40:59 -04:00
Ilya Kreymer
925b23f8a8 rewrite: guard against invalid encoding in html charset= and default to utf-8 if specified encoding fails, related to hypothesis/via#53 2015-10-15 13:58:15 -07:00
Ilya Kreymer
cb67c172ed rewrite: html rewriter can accept optional url for initial base url of page 2015-10-02 14:01:33 -07:00
Ilya Kreymer
0b4ceb9cde rewrite: if removing content-encoding, also remove the content-length as it will need to be recomputed!
proxy: for proxy mode, must buffer fully so that content-length can be added (may add chunked encoding later)
2015-07-28 14:23:50 -07:00
Ilya Kreymer
9b08ca9005 vidrw: ensure iframe replacement does get rewritten!
regex rewrite: include '==top?' for wombat rewrite
rewrite css: if js_ modifier on text/css, treat as css
2015-07-18 12:59:20 -07:00
Ilya Kreymer
06fcc89de6 readers: support 'content-encoding: deflate' using different zlib decompression options
support default and alt settings for attempting to decompress deflate stream
tests: add tests with httpbin.org/deflate Fixes #115
2015-06-24 13:11:33 -07:00
Ilya Kreymer
bd21480db9 framed replay: add supporting for 'inverting' frame and replay modifiers,
setting default mod to be top-frame and inner frame to be 'mp_' #92
can enable this mode by setting framed_replay: inverse instead of true
modifiers passed to client side script via wbinfo as well
2015-04-01 10:13:56 -07:00
Ilya Kreymer
c378cb5188 rewrite: check for closed before any use of readline() (2.6 may throw if closed),
only use readline() if line alignment needed (non-html), related to #86 work
2015-04-01 07:54:17 -07:00
Ilya Kreymer
199f552f73 rewrite: if no charset specified, attempt to read first 1024 bytes and set charset in header,
to avoid charset warning if head insert exceeds 1024 bytes (#86)
also encode head insert with detected charset, if possible
chunkeddatareader: add read() function to ensure read will read upto specified
length across chunks
2015-03-31 22:38:20 -07:00
Ilya Kreymer
ffb702ce03 rewrite: content detection for specific case: if content type is html and mod type is css
or js, peek stream to determine actual type. Addresses #31 in part.
Fix typo in wb_frame.js
2014-12-26 13:08:35 -08:00
Ilya Kreymer
181c18a1b8 pep8 pass: fix spacing, line length, issues
also remove references to obsolete cached_replay, hostnames in pywb_init
2014-12-23 15:14:03 -08:00
Ilya Kreymer
e8d3965269 pep8 style fixes, remove unused methods 2014-10-21 19:06:16 -07:00
Ilya Kreymer
4a1cc46fa3 framed replay: invert framed replay paradigm, replay always uses
canonical, no-modifier archival url (instead of mp_).
When using frames, the page redirects to a 'tf_' page, which then uses
replaceHistory() to change url back to canonical form.
memento: support for framed replay, include memento headers in top frame
bump version to 0.6.2
2014-10-18 11:21:07 -07:00
Ilya Kreymer
aecc847ec1 rewrite: seperate stream_to_gen and text_rewriting_stream_to_gen
The regular stream_to_gen is much simpler and specifically for
binary/unrewritten content. text_rewriting_stream_to_gen() performs
rewriting. Use fixed buffer of 16384 for read size, allows for better
steaming when using live rewrite
2014-10-16 20:13:53 -07:00
Ilya Kreymer
f1b3f8c76f cookie rewriter work: ability to set a custom 'root scope' rewriter,
which sets the path of all cookies to pywb root.
Option to enable per url-prefix in rules, still more testing, other
options needed
2014-09-30 12:42:11 -07:00
Ilya Kreymer
da7e6f31ac tests: pep8 and coverage pass, getting ready for release 2014-09-06 15:19:28 -07:00
Ilya Kreymer
da6c61376c fix errors from merge 2014-08-05 11:14:22 -07:00
Ilya Kreymer
95c3f080c3 Merge branch '0.5.4-fixes' into develop 2014-08-05 10:46:18 -07:00
Ilya Kreymer
b68ef06067 banner: add back inner frame update of banner on load, if html
rewrite: banner only mode encodes to utf-8, adjusts length
2014-08-05 10:12:54 -07:00
Ilya Kreymer
4f9310fe4d rewrite: add support for js rewriting ';http:\\/' urls
add 'parse_comments' rule options for parsing comment contents via regex
banner: simplify banner insertion check, only insert for top frame, and check
for canon_url matching current href at top before redirecting to top
replace em_ -> mp_ as default embedded mod
2014-08-05 01:47:52 -07:00
Ilya Kreymer
9e4459ae50 rewrite: remove extra wb_url param from rewrite_content(), the wb_url
will come from the urlrewriter, to get the 'mod'
2014-08-04 22:51:42 -07:00
Ilya Kreymer
c3004007d7 rewrite: add test for banner-only mode, rewriting w/o a head using local
'sample_no_head' file.
query.html: use client side rewriting for calendar dates
rewrite: remove unused decode stuff
2014-08-04 22:51:42 -07:00
Ilya Kreymer
8d54153326 refactoring for better extensibility:
remove BaseContentView, move top-frame functionality to SearchPageWbUrlHandler
remove RewriteLiveView, fold functionality into the handler
move default mod setting into RewriteContent
2014-08-04 22:51:42 -07:00
Ilya Kreymer
160182ec48 rewrite: add 'bn_' banner only rewrite
cleanup rewrite_content/fetch_request api to take a full wb_url
add content-length to responses whenever possible (WbResponse) and static files
bump version to 0.5.2
2014-08-04 22:51:42 -07:00
Ilya Kreymer
a2d86fa495 Merge branch 'develop' into https-proxy 2014-08-04 22:01:16 -07:00
Ilya Kreymer
2792a92ff6 rewrite: remove extra wb_url param from rewrite_content(), the wb_url
will come from the urlrewriter, to get the 'mod'
2014-08-04 21:11:46 -07:00
Ilya Kreymer
71e8ada57d rewrite: add test for banner-only mode, rewriting w/o a head using local
'sample_no_head' file.
query.html: use client side rewriting for calendar dates
rewrite: remove unused decode stuff
2014-08-04 20:45:02 -07:00
Ilya Kreymer
492aaa4a01 Merge branch 'develop' into https-proxy 2014-08-04 13:00:25 -07:00
Ilya Kreymer
95028ab692 refactoring for better extensibility:
remove BaseContentView, move top-frame functionality to SearchPageWbUrlHandler
remove RewriteLiveView, fold functionality into the handler
move default mod setting into RewriteContent
2014-08-04 01:18:46 -07:00
Ilya Kreymer
407da7528b proxy/rewrite: don't rewrite headers banner_only 2014-07-31 17:02:26 -07:00
Ilya Kreymer
b92eda77f6 rewrite: add 'bn_' banner only rewrite
cleanup rewrite_content/fetch_request api to take a full wb_url
add content-length to responses whenever possible (WbResponse) and static files
bump version to 0.5.2
2014-07-29 12:20:22 -07:00
Ilya Kreymer
fa813bdd19 pep8 cleanup pass 2014-07-20 18:26:16 -07:00
Ilya Kreymer
70b7e29b36 pass raw bytes to htmlparser, assuming ascii-compatibility
(todo: add tests for non-ascii compatible encodings)
improved rendering of certain pages, needs more testing

lxml: remove lxml and complexity associated with having the parser,
as its too unpredictable for older html, does its own decoding.
2014-06-27 19:03:06 -07:00