1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

853 Commits

Author SHA1 Message Date
Ilya Kreymer
39824711f0 memento tweak: ensure rel=memento link for timegate uses exact in Location (cdx original) as opposed to url from request 2015-02-23 23:21:39 -08:00
Ilya Kreymer
435fa390ed config system: initial work on automated directory-convention based config!
config.yaml file now optional, add default_config.yaml which for default settings #55
2015-02-23 21:59:41 -08:00
Ilya Kreymer
5d80d2d891 replay: change strip_scheme() to strip_scheme_www() to also strip away www. prefix for self-redirect checking, #73 2015-02-22 22:51:35 -08:00
Ilya Kreymer
83f8d7d29b bump version to 0.8.2 2015-02-22 22:51:23 -08:00
Ilya Kreymer
de40e2920a update README for 0.8.1 2015-02-21 14:29:40 -08:00
Ilya Kreymer
7989c06ea4 Add webarchiveplayer link to README 2015-02-21 14:28:04 -08:00
Ilya Kreymer
80da0e91da update CHANGELIST for 0.8.1 2015-02-21 14:13:35 -08:00
Ilya Kreymer
9f838241c7 wb.js bug fix: use only window.__orig_parent and not window.parent, as window.parent overriden. window instead of window.self 2015-02-21 12:34:35 -08:00
Ilya Kreymer
c0ff596c68 tests: add tests for recursive cdx indexing, #64
cross-platform: store rel filename path as '/', but convert to os.path.sep
when resolving to full path as prefix
2015-02-20 13:56:35 -08:00
Ilya Kreymer
8d52be4c44 live proxy: enable ssl validation for live proxy, was initially disabled for testing, should be on by default! 2015-02-20 13:22:21 -08:00
Ilya Kreymer
1646c90cd0 cdxindexer: add -r option to support recursive indexing when input is a directory.
filename field in cdx contains relative path including subdir, eg. subdir/file.warc.gz
related to #64
2015-02-20 02:40:32 -08:00
Ilya Kreymer
adeb8bfb27 bump version to 0.8.1, (fix blank spacing in changelist) 2015-02-20 02:02:34 -08:00
Ilya Kreymer
cb6aebf06d Merge CHANGES.rst from 'develop' 0.8.0 2015-02-19 01:29:22 -08:00
Ilya Kreymer
bf203a2dc6 Merge branch 'develop' of https://github.com/ikreymer/pywb into develop 2015-02-19 01:29:03 -08:00
Ilya Kreymer
121e1df3c9 README: update branch config to master 2015-02-19 01:26:55 -08:00
Ilya Kreymer
824587bd90 A few more CHANGES.rst tweaks 2015-02-19 01:24:52 -08:00
Ilya Kreymer
26df8d7784 remove debug logging and spaces 2015-02-19 01:17:31 -08:00
Ilya Kreymer
0ddc490b8d Update CHANGELIST for 0.8.0! 2015-02-19 01:16:25 -08:00
Ilya Kreymer
80dcb6ff27 rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
2015-02-17 17:51:45 -08:00
Ilya Kreymer
9623f95439 memento: add rel="memento" header to timegate as well, improve memento test, clearly differntiate between
timegate redirect and intermediate resource redirect, related to #70
2015-02-16 09:59:03 -08:00
Ilya Kreymer
c4d5dd4690 rewrite: optimize / sanity, only %-encode urls that are actually idna-encoded,
otherwise return as is, #66
2015-02-15 10:34:56 -08:00
Ilya Kreymer
afe49a91f4 rewrite: more fixes for IDN #66 - add _do_percent_encode field to wburl itself
defaults to true, may be disabled with 'punycode_links'
remove wbrequest and urlrewriter from get_url path, simply call wb_url.get_url() to get properly formatted url
2015-02-14 20:55:36 -08:00
Ilya Kreymer
f9452bf48e rewrite: refactor IDN support: instead of returning IRI, return utf-8 %-encoded url
remove support for  returning IRI, as that requires detecting charset, instead just use %-encoded form
and let browser decode. Should address #66

Add rewrite option 'punycode_links_only' (default to false) to skip the %-encoded conversion of host, and just return punycode.

wombat: use getAttribute('href') on <a> tag to get original url, not punycode version

replay: add extra sanity check on Location header to ensure utf-8
2015-02-14 17:26:39 -08:00
Ilya Kreymer
79cfdd6a08 framework/urlrewriter: allow overriding UrlRewriter with optional urlrewriter_class param,
easier to override create_rebased_rewriter() with custom rewriter as well
2015-02-12 10:34:04 -08:00
Ilya Kreymer
dcf3688dc3 wombat: also override frameElement when changing window.parent for top-level replay frame 2015-02-11 19:26:45 -08:00
Ilya Kreymer
0b72bfe911 add 'none' js regex rewriter, which does not rewrite urls or location regexs
add test for none rewriter in test rule
2015-02-11 15:01:29 -08:00
Ilya Kreymer
f068186e37 wombat: replace window.self -> window for clarity 2015-02-11 15:01:04 -08:00
Ilya Kreymer
78bd89b4cb rewrite: simplify deprefix, url already unquoted now so remove extra unquote 2015-02-11 14:28:45 -08:00
Ilya Kreymer
4e7f95081f url_rewriter: catch exception when encoding to utf-8, may not be properly encoded, in which
case treat as bytes
2015-02-10 15:05:15 -08:00
Ilya Kreymer
90aba00ca0 not_found: catch NotFoundException from any part of handle_request, not just indexing.. allows for more flexible
usage with cdx iterators that are lazily evaluated on replay
2015-02-10 15:03:21 -08:00
Ilya Kreymer
148651680a wombat fix: use __orig_parent when referencing top-frame, since window.parent is being overriden 2015-02-10 15:02:08 -08:00
Ilya Kreymer
78ae86b6b6 Merge branch 'master' for 0.7.8 into develop 2015-02-05 08:45:55 -08:00
Ilya Kreymer
384e68c84b bump version to 0.7.8 for latest fix 2015-02-04 21:46:57 -08:00
Ilya Kreymer
cc144fdead rewrite: add basic test for X-Forwarded-Proto #57 2015-02-04 21:44:18 -08:00
Ilya Kreymer
78812c8085 rewrite: more conservative change, only rewrite the X-Forwarded-Proto
header for now, #57
2015-02-04 15:17:23 -08:00
Ilya Kreymer
cdb3dcc3d2 rewrite_live: don't forward via or https_x headers, only standard (for
now) possible fix for #57
2015-02-04 14:19:37 -08:00
Ilya Kreymer
40fba3c27b cdx-indexer: minor cleanup, add custom writer override to
write_multi_cdx_index
2015-02-04 11:17:26 -08:00
Ilya Kreymer
ef98716bd8 bump version to 0.7.7 in prep for release 2015-02-03 11:23:12 -08:00
Ilya Kreymer
c47d3ca925 wombat: add mutation observers, addressing #71 and maybe #67
rules: fix regex for yt, add rx for wikimedia
2015-02-03 11:19:41 -08:00
Ilya Kreymer
734ee4471b frame ui: pass timestamp to frame banner, fix typo in html
banner: allow overriding of banner id by returning custom id
2015-02-02 09:41:49 -08:00
Ilya Kreymer
29c6a36dac cdx api query: pass query timestamp mod to index query via 'query_closest'
field, to avoid confusion with 'closest'
2015-01-31 17:45:46 -08:00
Ilya Kreymer
55426e7619 memento: fix headers to be more consistent for framed replay. when using
frames, outer frames 'mirrors' mementos of the inner frame to be
discoverable by client side memento tools, tracked via #70
2015-01-29 22:27:15 -08:00
Ilya Kreymer
757345d317 replay api: make ReplayView overridable in WBHandler subclass,
allow custom content loader callable
2015-01-29 20:10:41 -08:00
Ilya Kreymer
7e017fd85e rewrite fixes: don't rewrite window.parent as it is overridable directly
html rewriter: ensure style is rewritten for all elements, add test!
wombat: cleanup and additional checks for assign(), setAttribute()
2015-01-29 20:08:00 -08:00
Ilya Kreymer
043ad5c860 wombat: improve createElementNS override to set prototype, just assign
window.parent directly
2015-01-29 10:13:32 -08:00
Ilya Kreymer
bf3d256a51 rewrite: add css-in-js rewrite rule for wikimedia, tracking via #67 for
perhaps a more general solution
2015-01-28 09:20:42 -08:00
Ilya Kreymer
ccedb2d60e regex_rewrite: add 'parent' rewrite in addition to 'top' for frames, add
WB_wombat_parent to wombat, add test for WB_wombat_parent
2015-01-27 19:57:56 -08:00
Ilya Kreymer
976decb3f1 wombat: ensure document.write override handles elements that go into
head as well as body
2015-01-27 18:02:14 -08:00
Ilya Kreymer
59630c08f6 bump version to 0.8.0! 2015-01-26 11:08:08 -08:00
Ilya Kreymer
695245d9e8 wburl idn: more complete support for idn urls (#66)
add distinct to_iri() and to_uri() functions in WbUrl
internal representation is always as ascii uri
for rewriting, defaults to iri representation unless
'rewrite_ascii_only_urls' is set to true per collection
add wbrequest.get_url() to get url as either iri or uri to be passed
to templates
2015-01-26 11:07:59 -08:00