Ilya Kreymer
60f33412ff
collections manager: add new collections manager, first pass #74
...
add cli 'wb-manager' tool
very preliminary, needs testing still
2015-02-25 13:19:20 -08:00
Ilya Kreymer
69613a0e25
tests: disable 'invalid config' test as its no longer applicable, fix default banner to just 'banner.html'
2015-02-25 13:18:32 -08:00
Ilya Kreymer
5c67782a2c
config system: some fixes for auto-init, add trailing '/' for dir paths, #55
2015-02-25 13:15:48 -08:00
Ilya Kreymer
e39d6e207c
config & collections: auto static path and templates working! #55
2015-02-24 14:32:51 -08:00
Ilya Kreymer
a932235f85
Merge branch 'develop' into config-work
2015-02-24 10:40:58 -08:00
Ilya Kreymer
cb857df125
memento: fix MementoTimemapView to have consistent signature with other query views
2015-02-24 10:35:49 -08:00
Ilya Kreymer
39824711f0
memento tweak: ensure rel=memento link for timegate uses exact in Location (cdx original) as opposed to url from request
2015-02-23 23:21:39 -08:00
Ilya Kreymer
435fa390ed
config system: initial work on automated directory-convention based config!
...
config.yaml file now optional, add default_config.yaml which for default settings #55
2015-02-23 21:59:41 -08:00
Ilya Kreymer
5d80d2d891
replay: change strip_scheme() to strip_scheme_www() to also strip away www. prefix for self-redirect checking, #73
2015-02-22 22:51:35 -08:00
Ilya Kreymer
9f838241c7
wb.js bug fix: use only window.__orig_parent and not window.parent, as window.parent overriden. window instead of window.self
2015-02-21 12:34:35 -08:00
Ilya Kreymer
c0ff596c68
tests: add tests for recursive cdx indexing, #64
...
cross-platform: store rel filename path as '/', but convert to os.path.sep
when resolving to full path as prefix
2015-02-20 13:56:35 -08:00
Ilya Kreymer
8d52be4c44
live proxy: enable ssl validation for live proxy, was initially disabled for testing, should be on by default!
2015-02-20 13:22:21 -08:00
Ilya Kreymer
1646c90cd0
cdxindexer: add -r option to support recursive indexing when input is a directory.
...
filename field in cdx contains relative path including subdir, eg. subdir/file.warc.gz
related to #64
2015-02-20 02:40:32 -08:00
Ilya Kreymer
26df8d7784
remove debug logging and spaces
2015-02-19 01:17:31 -08:00
Ilya Kreymer
80dcb6ff27
rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
...
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
2015-02-17 17:51:45 -08:00
Ilya Kreymer
9623f95439
memento: add rel="memento" header to timegate as well, improve memento test, clearly differntiate between
...
timegate redirect and intermediate resource redirect, related to #70
2015-02-16 09:59:03 -08:00
Ilya Kreymer
c4d5dd4690
rewrite: optimize / sanity, only %-encode urls that are actually idna-encoded,
...
otherwise return as is, #66
2015-02-15 10:34:56 -08:00
Ilya Kreymer
afe49a91f4
rewrite: more fixes for IDN #66 - add _do_percent_encode field to wburl itself
...
defaults to true, may be disabled with 'punycode_links'
remove wbrequest and urlrewriter from get_url path, simply call wb_url.get_url() to get properly formatted url
2015-02-14 20:55:36 -08:00
Ilya Kreymer
f9452bf48e
rewrite: refactor IDN support: instead of returning IRI, return utf-8 %-encoded url
...
remove support for returning IRI, as that requires detecting charset, instead just use %-encoded form
and let browser decode. Should address #66
Add rewrite option 'punycode_links_only' (default to false) to skip the %-encoded conversion of host, and just return punycode.
wombat: use getAttribute('href') on <a> tag to get original url, not punycode version
replay: add extra sanity check on Location header to ensure utf-8
2015-02-14 17:26:39 -08:00
Ilya Kreymer
79cfdd6a08
framework/urlrewriter: allow overriding UrlRewriter with optional urlrewriter_class param,
...
easier to override create_rebased_rewriter() with custom rewriter as well
2015-02-12 10:34:04 -08:00
Ilya Kreymer
dcf3688dc3
wombat: also override frameElement when changing window.parent for top-level replay frame
2015-02-11 19:26:45 -08:00
Ilya Kreymer
0b72bfe911
add 'none' js regex rewriter, which does not rewrite urls or location regexs
...
add test for none rewriter in test rule
2015-02-11 15:01:29 -08:00
Ilya Kreymer
f068186e37
wombat: replace window.self -> window for clarity
2015-02-11 15:01:04 -08:00
Ilya Kreymer
78bd89b4cb
rewrite: simplify deprefix, url already unquoted now so remove extra unquote
2015-02-11 14:28:45 -08:00
Ilya Kreymer
4e7f95081f
url_rewriter: catch exception when encoding to utf-8, may not be properly encoded, in which
...
case treat as bytes
2015-02-10 15:05:15 -08:00
Ilya Kreymer
90aba00ca0
not_found: catch NotFoundException from any part of handle_request, not just indexing.. allows for more flexible
...
usage with cdx iterators that are lazily evaluated on replay
2015-02-10 15:03:21 -08:00
Ilya Kreymer
148651680a
wombat fix: use __orig_parent when referencing top-frame, since window.parent is being overriden
2015-02-10 15:02:08 -08:00
Ilya Kreymer
78ae86b6b6
Merge branch 'master' for 0.7.8 into develop
2015-02-05 08:45:55 -08:00
Ilya Kreymer
cc144fdead
rewrite: add basic test for X-Forwarded-Proto #57
2015-02-04 21:44:18 -08:00
Ilya Kreymer
78812c8085
rewrite: more conservative change, only rewrite the X-Forwarded-Proto
...
header for now, #57
2015-02-04 15:17:23 -08:00
Ilya Kreymer
cdb3dcc3d2
rewrite_live: don't forward via or https_x headers, only standard (for
...
now) possible fix for #57
2015-02-04 14:19:37 -08:00
Ilya Kreymer
40fba3c27b
cdx-indexer: minor cleanup, add custom writer override to
...
write_multi_cdx_index
2015-02-04 11:17:26 -08:00
Ilya Kreymer
c47d3ca925
wombat: add mutation observers, addressing #71 and maybe #67
...
rules: fix regex for yt, add rx for wikimedia
2015-02-03 11:19:41 -08:00
Ilya Kreymer
734ee4471b
frame ui: pass timestamp to frame banner, fix typo in html
...
banner: allow overriding of banner id by returning custom id
2015-02-02 09:41:49 -08:00
Ilya Kreymer
29c6a36dac
cdx api query: pass query timestamp mod to index query via 'query_closest'
...
field, to avoid confusion with 'closest'
2015-01-31 17:45:46 -08:00
Ilya Kreymer
55426e7619
memento: fix headers to be more consistent for framed replay. when using
...
frames, outer frames 'mirrors' mementos of the inner frame to be
discoverable by client side memento tools, tracked via #70
2015-01-29 22:27:15 -08:00
Ilya Kreymer
757345d317
replay api: make ReplayView overridable in WBHandler subclass,
...
allow custom content loader callable
2015-01-29 20:10:41 -08:00
Ilya Kreymer
7e017fd85e
rewrite fixes: don't rewrite window.parent as it is overridable directly
...
html rewriter: ensure style is rewritten for all elements, add test!
wombat: cleanup and additional checks for assign(), setAttribute()
2015-01-29 20:08:00 -08:00
Ilya Kreymer
043ad5c860
wombat: improve createElementNS override to set prototype, just assign
...
window.parent directly
2015-01-29 10:13:32 -08:00
Ilya Kreymer
bf3d256a51
rewrite: add css-in-js rewrite rule for wikimedia, tracking via #67 for
...
perhaps a more general solution
2015-01-28 09:20:42 -08:00
Ilya Kreymer
ccedb2d60e
regex_rewrite: add 'parent' rewrite in addition to 'top' for frames, add
...
WB_wombat_parent to wombat, add test for WB_wombat_parent
2015-01-27 19:57:56 -08:00
Ilya Kreymer
976decb3f1
wombat: ensure document.write override handles elements that go into
...
head as well as body
2015-01-27 18:02:14 -08:00
Ilya Kreymer
695245d9e8
wburl idn: more complete support for idn urls ( #66 )
...
add distinct to_iri() and to_uri() functions in WbUrl
internal representation is always as ascii uri
for rewriting, defaults to iri representation unless
'rewrite_ascii_only_urls' is set to true per collection
add wbrequest.get_url() to get url as either iri or uri to be passed
to templates
2015-01-26 11:07:59 -08:00
Ilya Kreymer
edff3f17fb
wburl: convert %-encoded hostnames or unicode urls to punycode for
...
better IDN support (#66 )
2015-01-26 11:07:58 -08:00
Ilya Kreymer
38e3bbbaef
templates: add new 'not_found.html' template, which will be called for any missing replay request
...
instead of default error.html
'not_found_html' settable in the config per collection, as per #65
for not found index query, still use query.html but add condition to check for 0 results
add more query and replay not found
remove unused conditional (for search_view -- always exists)
2015-01-24 12:32:50 -08:00
Ilya Kreymer
80fd47ba3e
add rules for vine ( #62 )
2015-01-22 16:45:09 -05:00
Ilya Kreymer
c9b2e3e69e
wombat 2.2 improvements:
...
* for postMessage, add receive message overrides which uses original origin
to fix message passing tests that check for origin
* for createElementNS, ensure that the namespace url is not rewritten
* add equals_any() method, add "poster" attr to attr rewriting list
(solves several issues for vine replay, #62 )
2015-01-22 16:43:52 -05:00
Ilya Kreymer
48b7751f80
bump version to 0.7.6
...
jinja2: allow adding multiple packages to search path
2015-01-19 21:54:11 -05:00
Ilya Kreymer
43805c67ef
view: fix format_ts, use existing utc timestamp_to_sec conversion for %s
2015-01-12 00:28:06 -08:00
Ilya Kreymer
ac525b0937
tests: add tests for extract_post_query()
...
add test for HttpsUrlRewriter, remove unnecessary check in
bufferedreader
2015-01-11 23:54:29 -08:00