1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-22 22:32:19 +01:00

321 Commits

Author SHA1 Message Date
Ilya Kreymer
d7eb40af20 rewrite: properly rewrite scheme relative JS-escaped urls:
'\/\/example.com', '\\/\\/example.com/', treat same as '//example.com'
adding http: prefix
2014-11-23 18:56:49 -08:00
Ilya Kreymer
b8b8c30573 cookie_rewriter: add tests for exact cookie rewriter 2014-11-13 09:43:50 -08:00
Ilya Kreymer
20070e95b6 cookie_rewriter: add 'exact' cookie rewriter which never changes the
path/domain
2014-11-13 09:24:34 -08:00
Ilya Kreymer
388f31e08f rewrite: don't rewrite rel=canonical links, need to make rewriting more
configurable (#50)
2014-11-11 15:34:14 -08:00
Ilya Kreymer
88f553dce7 video work: live rewrite pings proxy with full rewrite, proxies direct
range request
reorg rangecache to support is_range() check, yt-specific logic
(experimental)
wombat: add date override (experimental)
bump tentative version to 0.7.0!
yt replays work with native player! (though still issues remain)
2014-11-04 22:11:25 -08:00
Ilya Kreymer
fea48fd27a Merge branch 'develop' into video 2014-11-04 12:19:58 -08:00
Ilya Kreymer
e4bcef1c8b rewrite: default HTMLParser entityref and charref are treated as plain
data for HTMLRewriter, since they are never rewritten, and to avoid
semicolon ambiguity, since no way to determine if there is a ; or not
at end. Addresses #43
2014-11-04 12:14:00 -08:00
Ilya Kreymer
7aac3aa2dd rewrite: add support for srcset rewriting for img tag 2014-11-02 16:10:38 -08:00
Ilya Kreymer
5b9dcba15f video: add video rewriting use vidrw client side and youtube-dl on the server
add vi_ modifier:
-on record, gets video_info from youtube-dl, sends to proxy,
if any, via PUTMETA to create metadata record
-on playback, fetches special metadata record with video info and
returns to client as json
-vidrw script: fetches video info, if any, and attempts to replace
iframe and embed tags (so far) which are videos
wombat: export extract_url function, fix spaces and use object instance
semantics
2014-11-01 15:41:00 -07:00
Ilya Kreymer
a3b931b45e regex rewrite: fix js regex (dashes), add additional test case 2014-11-01 15:39:51 -07:00
Ilya Kreymer
f14f37d5b1 tests: use httpbin for redirect tests 2014-10-29 09:47:32 -07:00
Ilya Kreymer
c9273ee5ed rewrite: add 'deprefix' support to remove wburl prefix from any query
params
2014-10-26 12:12:37 -07:00
Ilya Kreymer
e8d3965269 pep8 style fixes, remove unused methods 2014-10-21 19:06:16 -07:00
Ilya Kreymer
d99f7f996c urlrewriter refactor: replace get_abs_url and get_timestamp_url with
get_new_ur() which just calls wburl.to_str and applies rewriter prefix
allows creating a new wburl with any component(s) changed
2014-10-19 00:24:00 -07:00
Ilya Kreymer
4a1cc46fa3 framed replay: invert framed replay paradigm, replay always uses
canonical, no-modifier archival url (instead of mp_).
When using frames, the page redirects to a 'tf_' page, which then uses
replaceHistory() to change url back to canonical form.
memento: support for framed replay, include memento headers in top frame
bump version to 0.6.2
2014-10-18 11:21:07 -07:00
Ilya Kreymer
aecc847ec1 rewrite: seperate stream_to_gen and text_rewriting_stream_to_gen
The regular stream_to_gen is much simpler and specifically for
binary/unrewritten content. text_rewriting_stream_to_gen() performs
rewriting. Use fixed buffer of 16384 for read size, allows for better
steaming when using live rewrite
2014-10-16 20:13:53 -07:00
Ilya Kreymer
50bf7d2634 rewrite: move extract_client_cookie to utils for access at rewrite
root cookie_rewriter: keep max-age
add csrf token copying (experimental)
update tests
2014-10-12 03:07:54 -07:00
Ilya Kreymer
498a864441 rewriting: support setting cookie_scope at collection level
js rewriting: add custom url rewrite option to per-url rewrite rules
2014-10-06 10:14:45 -07:00
Ilya Kreymer
f1b3f8c76f cookie rewriter work: ability to set a custom 'root scope' rewriter,
which sets the path of all cookies to pywb root.
Option to enable per url-prefix in rules, still more testing, other
options needed
2014-09-30 12:42:11 -07:00
Ilya Kreymer
7feb0893eb rewrite: add 'application/json' to a seperate 'json' regex rewriter type (rewrite links only, no
http), can be customized via rules
wombat: add rewrite_style for rewriting style attrs
query: don't include any filter in latest, custom filter can be used
without any other filters
tests: fix typos in tests
2014-09-30 10:57:25 -07:00
Ilya Kreymer
7ac98fbfe2 cookie rewriter: use relative path for cookie path rewriting, pass
relative path to urlrewriter
rules: add more rules
2014-09-21 13:23:19 -07:00
Ilya Kreymer
da7e6f31ac tests: pep8 and coverage pass, getting ready for release 2014-09-06 15:19:28 -07:00
Ilya Kreymer
da6c61376c fix errors from merge 2014-08-05 11:14:22 -07:00
Ilya Kreymer
95c3f080c3 Merge branch '0.5.4-fixes' into develop 2014-08-05 10:46:18 -07:00
Ilya Kreymer
b68ef06067 banner: add back inner frame update of banner on load, if html
rewrite: banner only mode encodes to utf-8, adjusts length
2014-08-05 10:12:54 -07:00
Ilya Kreymer
4f9310fe4d rewrite: add support for js rewriting ';http:\\/' urls
add 'parse_comments' rule options for parsing comment contents via regex
banner: simplify banner insertion check, only insert for top frame, and check
for canon_url matching current href at top before redirecting to top
replace em_ -> mp_ as default embedded mod
2014-08-05 01:47:52 -07:00
Ilya Kreymer
6e6688beb3 rewrite/testing: add additional test for live rewrite post, invalid post
htmlrewrite: annotate untestable sections (unimplemented, 2.6 only exceptions)
2014-08-04 22:51:43 -07:00
Ilya Kreymer
9e4459ae50 rewrite: remove extra wb_url param from rewrite_content(), the wb_url
will come from the urlrewriter, to get the 'mod'
2014-08-04 22:51:42 -07:00
Ilya Kreymer
c3004007d7 rewrite: add test for banner-only mode, rewriting w/o a head using local
'sample_no_head' file.
query.html: use client side rewriting for calendar dates
rewrite: remove unused decode stuff
2014-08-04 22:51:42 -07:00
Ilya Kreymer
103a1c6455 client js: use iframe onload event to detect when iframe changes, allows
setting banner even for non-html captures, instead of frame notifying parent
will fix issue mentioned in #41
move script from frame_insert.html -> wb_frame.js
2014-08-04 22:51:42 -07:00
Ilya Kreymer
8d54153326 refactoring for better extensibility:
remove BaseContentView, move top-frame functionality to SearchPageWbUrlHandler
remove RewriteLiveView, fold functionality into the handler
move default mod setting into RewriteContent
2014-08-04 22:51:42 -07:00
Ilya Kreymer
160182ec48 rewrite: add 'bn_' banner only rewrite
cleanup rewrite_content/fetch_request api to take a full wb_url
add content-length to responses whenever possible (WbResponse) and static files
bump version to 0.5.2
2014-08-04 22:51:42 -07:00
Ilya Kreymer
a2d86fa495 Merge branch 'develop' into https-proxy 2014-08-04 22:01:16 -07:00
Ilya Kreymer
e1e8f679b2 rewrite/testing: add additional test for live rewrite post, invalid post
htmlrewrite: annotate untestable sections (unimplemented, 2.6 only exceptions)
2014-08-04 21:59:46 -07:00
Ilya Kreymer
2792a92ff6 rewrite: remove extra wb_url param from rewrite_content(), the wb_url
will come from the urlrewriter, to get the 'mod'
2014-08-04 21:11:46 -07:00
Ilya Kreymer
71e8ada57d rewrite: add test for banner-only mode, rewriting w/o a head using local
'sample_no_head' file.
query.html: use client side rewriting for calendar dates
rewrite: remove unused decode stuff
2014-08-04 20:45:02 -07:00
Ilya Kreymer
924f71a4cc Merge branch 'develop' into https-proxy 2014-08-04 18:44:01 -07:00
Ilya Kreymer
25fe5d685c client js: use iframe onload event to detect when iframe changes, allows
setting banner even for non-html captures, instead of frame notifying parent
will fix issue mentioned in #41
move script from frame_insert.html -> wb_frame.js
2014-08-04 17:54:33 -07:00
Ilya Kreymer
492aaa4a01 Merge branch 'develop' into https-proxy 2014-08-04 13:00:25 -07:00
Ilya Kreymer
95028ab692 refactoring for better extensibility:
remove BaseContentView, move top-frame functionality to SearchPageWbUrlHandler
remove RewriteLiveView, fold functionality into the handler
move default mod setting into RewriteContent
2014-08-04 01:18:46 -07:00
Ilya Kreymer
37fd75f744 update version to 0.6.0, update CHANGELIST
add quotes around "coll" in header
2014-07-31 21:17:07 -07:00
Ilya Kreymer
f5c27d7b06 rewrite: fix header rewrite test
proxy_pac: use http host header if available for proxy host
2014-07-31 17:33:43 -07:00
Ilya Kreymer
407da7528b proxy/rewrite: don't rewrite headers banner_only 2014-07-31 17:02:26 -07:00
Ilya Kreymer
522ea87637 proxy: timestamp selection support!
certauth: wildcard support, use *.host wildcard for proxy certs whenever possible
ui: add coll info/switch and calendar links to banner
2014-07-31 11:12:50 -07:00
Ilya Kreymer
b92eda77f6 rewrite: add 'bn_' banner only rewrite
cleanup rewrite_content/fetch_request api to take a full wb_url
add content-length to responses whenever possible (WbResponse) and static files
bump version to 0.5.2
2014-07-29 12:20:22 -07:00
Ilya Kreymer
fa813bdd19 pep8 cleanup pass 2014-07-20 18:26:16 -07:00
Ilya Kreymer
6da27789eb live handler: allow live rewrite handler to be specified as one of the collections in pywb
by settings index_paths to '$liveweb'. When used, creates a RewriteHandler instead of WBHandler
Can also specify 'proxyhostport' to set the live rewrite to go through a proxy

fallback: allow fallback to a different handler (usually live rewrite) by specifying
'redir_fallback' with name of handler. Instead of 404, a not found response will
internally call the fallback handler to get a response
2014-07-20 16:42:00 -07:00
Ilya Kreymer
96fcaab521 live-rewrite-server: add ability to specify http/https proxy for live fetching
(for example, for use with a recording proxy)
2014-07-19 14:43:28 -07:00
Ilya Kreymer
f80c27ec00 cookie: add test for 'document.cookie' rewriting 2014-07-15 12:57:02 -07:00
Ilya Kreymer
fa52e0126d cookies: support client side rewriting of document.cooke -> WB_wombat_cookie to rewrite cookie path, if present 2014-07-15 12:52:42 -07:00