1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-16 00:24:48 +01:00

896 Commits

Author SHA1 Message Date
Ilya Kreymer
9f838241c7 wb.js bug fix: use only window.__orig_parent and not window.parent, as window.parent overriden. window instead of window.self 2015-02-21 12:34:35 -08:00
Ilya Kreymer
c0ff596c68 tests: add tests for recursive cdx indexing,
cross-platform: store rel filename path as '/', but convert to os.path.sep
when resolving to full path as prefix
2015-02-20 13:56:35 -08:00
Ilya Kreymer
8d52be4c44 live proxy: enable ssl validation for live proxy, was initially disabled for testing, should be on by default! 2015-02-20 13:22:21 -08:00
Ilya Kreymer
1646c90cd0 cdxindexer: add -r option to support recursive indexing when input is a directory.
filename field in cdx contains relative path including subdir, eg. subdir/file.warc.gz
related to 
2015-02-20 02:40:32 -08:00
Ilya Kreymer
adeb8bfb27 bump version to 0.8.1, (fix blank spacing in changelist) 2015-02-20 02:02:34 -08:00
Ilya Kreymer
cb6aebf06d Merge CHANGES.rst from 'develop' 0.8.0 2015-02-19 01:29:22 -08:00
Ilya Kreymer
bf203a2dc6 Merge branch 'develop' of https://github.com/ikreymer/pywb into develop 2015-02-19 01:29:03 -08:00
Ilya Kreymer
121e1df3c9 README: update branch config to master 2015-02-19 01:26:55 -08:00
Ilya Kreymer
824587bd90 A few more CHANGES.rst tweaks 2015-02-19 01:24:52 -08:00
Ilya Kreymer
26df8d7784 remove debug logging and spaces 2015-02-19 01:17:31 -08:00
Ilya Kreymer
0ddc490b8d Update CHANGELIST for 0.8.0! 2015-02-19 01:16:25 -08:00
Ilya Kreymer
80dcb6ff27 rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via 
2015-02-17 17:51:45 -08:00
Ilya Kreymer
9623f95439 memento: add rel="memento" header to timegate as well, improve memento test, clearly differntiate between
timegate redirect and intermediate resource redirect, related to 
2015-02-16 09:59:03 -08:00
Ilya Kreymer
c4d5dd4690 rewrite: optimize / sanity, only %-encode urls that are actually idna-encoded,
otherwise return as is, 
2015-02-15 10:34:56 -08:00
Ilya Kreymer
afe49a91f4 rewrite: more fixes for IDN - add _do_percent_encode field to wburl itself
defaults to true, may be disabled with 'punycode_links'
remove wbrequest and urlrewriter from get_url path, simply call wb_url.get_url() to get properly formatted url
2015-02-14 20:55:36 -08:00
Ilya Kreymer
f9452bf48e rewrite: refactor IDN support: instead of returning IRI, return utf-8 %-encoded url
remove support for  returning IRI, as that requires detecting charset, instead just use %-encoded form
and let browser decode. Should address 

Add rewrite option 'punycode_links_only' (default to false) to skip the %-encoded conversion of host, and just return punycode.

wombat: use getAttribute('href') on <a> tag to get original url, not punycode version

replay: add extra sanity check on Location header to ensure utf-8
2015-02-14 17:26:39 -08:00
Ilya Kreymer
79cfdd6a08 framework/urlrewriter: allow overriding UrlRewriter with optional urlrewriter_class param,
easier to override create_rebased_rewriter() with custom rewriter as well
2015-02-12 10:34:04 -08:00
Ilya Kreymer
dcf3688dc3 wombat: also override frameElement when changing window.parent for top-level replay frame 2015-02-11 19:26:45 -08:00
Ilya Kreymer
0b72bfe911 add 'none' js regex rewriter, which does not rewrite urls or location regexs
add test for none rewriter in test rule
2015-02-11 15:01:29 -08:00
Ilya Kreymer
f068186e37 wombat: replace window.self -> window for clarity 2015-02-11 15:01:04 -08:00
Ilya Kreymer
78bd89b4cb rewrite: simplify deprefix, url already unquoted now so remove extra unquote 2015-02-11 14:28:45 -08:00
Ilya Kreymer
4e7f95081f url_rewriter: catch exception when encoding to utf-8, may not be properly encoded, in which
case treat as bytes
2015-02-10 15:05:15 -08:00
Ilya Kreymer
90aba00ca0 not_found: catch NotFoundException from any part of handle_request, not just indexing.. allows for more flexible
usage with cdx iterators that are lazily evaluated on replay
2015-02-10 15:03:21 -08:00
Ilya Kreymer
148651680a wombat fix: use __orig_parent when referencing top-frame, since window.parent is being overriden 2015-02-10 15:02:08 -08:00
Ilya Kreymer
78ae86b6b6 Merge branch 'master' for 0.7.8 into develop 2015-02-05 08:45:55 -08:00
Ilya Kreymer
384e68c84b bump version to 0.7.8 for latest fix 2015-02-04 21:46:57 -08:00
Ilya Kreymer
cc144fdead rewrite: add basic test for X-Forwarded-Proto 2015-02-04 21:44:18 -08:00
Ilya Kreymer
78812c8085 rewrite: more conservative change, only rewrite the X-Forwarded-Proto
header for now, 
2015-02-04 15:17:23 -08:00
Ilya Kreymer
cdb3dcc3d2 rewrite_live: don't forward via or https_x headers, only standard (for
now) possible fix for 
2015-02-04 14:19:37 -08:00
Ilya Kreymer
40fba3c27b cdx-indexer: minor cleanup, add custom writer override to
write_multi_cdx_index
2015-02-04 11:17:26 -08:00
Ilya Kreymer
ef98716bd8 bump version to 0.7.7 in prep for release 2015-02-03 11:23:12 -08:00
Ilya Kreymer
c47d3ca925 wombat: add mutation observers, addressing and maybe
rules: fix regex for yt, add rx for wikimedia
2015-02-03 11:19:41 -08:00
Ilya Kreymer
734ee4471b frame ui: pass timestamp to frame banner, fix typo in html
banner: allow overriding of banner id by returning custom id
2015-02-02 09:41:49 -08:00
Ilya Kreymer
29c6a36dac cdx api query: pass query timestamp mod to index query via 'query_closest'
field, to avoid confusion with 'closest'
2015-01-31 17:45:46 -08:00
Ilya Kreymer
55426e7619 memento: fix headers to be more consistent for framed replay. when using
frames, outer frames 'mirrors' mementos of the inner frame to be
discoverable by client side memento tools, tracked via 
2015-01-29 22:27:15 -08:00
Ilya Kreymer
757345d317 replay api: make ReplayView overridable in WBHandler subclass,
allow custom content loader callable
2015-01-29 20:10:41 -08:00
Ilya Kreymer
7e017fd85e rewrite fixes: don't rewrite window.parent as it is overridable directly
html rewriter: ensure style is rewritten for all elements, add test!
wombat: cleanup and additional checks for assign(), setAttribute()
2015-01-29 20:08:00 -08:00
Ilya Kreymer
043ad5c860 wombat: improve createElementNS override to set prototype, just assign
window.parent directly
2015-01-29 10:13:32 -08:00
Ilya Kreymer
bf3d256a51 rewrite: add css-in-js rewrite rule for wikimedia, tracking via for
perhaps a more general solution
2015-01-28 09:20:42 -08:00
Ilya Kreymer
ccedb2d60e regex_rewrite: add 'parent' rewrite in addition to 'top' for frames, add
WB_wombat_parent to wombat, add test for WB_wombat_parent
2015-01-27 19:57:56 -08:00
Ilya Kreymer
976decb3f1 wombat: ensure document.write override handles elements that go into
head as well as body
2015-01-27 18:02:14 -08:00
Ilya Kreymer
59630c08f6 bump version to 0.8.0! 2015-01-26 11:08:08 -08:00
Ilya Kreymer
695245d9e8 wburl idn: more complete support for idn urls ()
add distinct to_iri() and to_uri() functions in WbUrl
internal representation is always as ascii uri
for rewriting, defaults to iri representation unless
'rewrite_ascii_only_urls' is set to true per collection
add wbrequest.get_url() to get url as either iri or uri to be passed
to templates
2015-01-26 11:07:59 -08:00
Ilya Kreymer
edff3f17fb wburl: convert %-encoded hostnames or unicode urls to punycode for
better IDN support ()
2015-01-26 11:07:58 -08:00
Ilya Kreymer
933343fa01 update README for 0.7.2 master 2015-01-26 11:07:58 -08:00
Ilya Kreymer
8b5a6be956 Merge branch 'develop' for 0.7.6 2015-01-26 10:38:35 -08:00
Ilya Kreymer
8567b3fa76 CHANGELIST tweaks 2015-01-26 10:37:51 -08:00
Ilya Kreymer
5acd1164ab update CHANGELIST for 0.7.6 2015-01-26 10:31:24 -08:00
Ilya Kreymer
38e3bbbaef templates: add new 'not_found.html' template, which will be called for any missing replay request
instead of default error.html
'not_found_html' settable in the config per collection, as per 
for not found index query, still use query.html but add condition to check for 0 results
add more query and replay not found
remove unused conditional (for search_view -- always exists)
2015-01-24 12:32:50 -08:00
Ilya Kreymer
80fd47ba3e add rules for vine () 2015-01-22 16:45:09 -05:00