Ilya Kreymer
48eab2662d
cdx indexer: refactor indexer into mixins for differnt formats for easier customization
2015-02-25 16:45:47 -08:00
Ilya Kreymer
ee1fabf600
config fix: check for existance of root 'collections dir', #55
2015-02-25 13:51:12 -08:00
Ilya Kreymer
11c8cc92f3
add beta to README
2015-02-25 13:33:42 -08:00
Ilya Kreymer
671f45f69f
cdx indexing: wrap record iterator global functions in class DefaultRecordIter to allow for better extensibility
...
add 'minimal' option to skip digest/mime/status extraction only include minimal data (url+timestamp)
cdx-indexer: add -6 option to create 6-field index
2015-02-25 13:31:37 -08:00
Ilya Kreymer
1d4c54deaa
frames ui: update frames to use <!DOCTYPE html>, improved css and html5 compatibility
2015-02-25 13:25:05 -08:00
Ilya Kreymer
60f33412ff
collections manager: add new collections manager, first pass #74
...
add cli 'wb-manager' tool
very preliminary, needs testing still
2015-02-25 13:19:20 -08:00
Ilya Kreymer
69613a0e25
tests: disable 'invalid config' test as its no longer applicable, fix default banner to just 'banner.html'
2015-02-25 13:18:32 -08:00
Ilya Kreymer
5c67782a2c
config system: some fixes for auto-init, add trailing '/' for dir paths, #55
2015-02-25 13:15:48 -08:00
Ilya Kreymer
7c60bf17f7
bump version to 0.9.0-beta!
2015-02-24 16:54:49 -08:00
Ilya Kreymer
e39d6e207c
config & collections: auto static path and templates working! #55
2015-02-24 14:32:51 -08:00
Ilya Kreymer
a932235f85
Merge branch 'develop' into config-work
2015-02-24 10:40:58 -08:00
Ilya Kreymer
cb857df125
memento: fix MementoTimemapView to have consistent signature with other query views
2015-02-24 10:35:49 -08:00
Ilya Kreymer
39824711f0
memento tweak: ensure rel=memento link for timegate uses exact in Location (cdx original) as opposed to url from request
2015-02-23 23:21:39 -08:00
Ilya Kreymer
435fa390ed
config system: initial work on automated directory-convention based config!
...
config.yaml file now optional, add default_config.yaml which for default settings #55
2015-02-23 21:59:41 -08:00
Ilya Kreymer
5d80d2d891
replay: change strip_scheme() to strip_scheme_www() to also strip away www. prefix for self-redirect checking, #73
2015-02-22 22:51:35 -08:00
Ilya Kreymer
83f8d7d29b
bump version to 0.8.2
2015-02-22 22:51:23 -08:00
Ilya Kreymer
de40e2920a
update README for 0.8.1
2015-02-21 14:29:40 -08:00
Ilya Kreymer
7989c06ea4
Add webarchiveplayer link to README
2015-02-21 14:28:04 -08:00
Ilya Kreymer
80da0e91da
update CHANGELIST for 0.8.1
2015-02-21 14:13:35 -08:00
Ilya Kreymer
9f838241c7
wb.js bug fix: use only window.__orig_parent and not window.parent, as window.parent overriden. window instead of window.self
2015-02-21 12:34:35 -08:00
Ilya Kreymer
c0ff596c68
tests: add tests for recursive cdx indexing, #64
...
cross-platform: store rel filename path as '/', but convert to os.path.sep
when resolving to full path as prefix
2015-02-20 13:56:35 -08:00
Ilya Kreymer
8d52be4c44
live proxy: enable ssl validation for live proxy, was initially disabled for testing, should be on by default!
2015-02-20 13:22:21 -08:00
Ilya Kreymer
1646c90cd0
cdxindexer: add -r option to support recursive indexing when input is a directory.
...
filename field in cdx contains relative path including subdir, eg. subdir/file.warc.gz
related to #64
2015-02-20 02:40:32 -08:00
Ilya Kreymer
adeb8bfb27
bump version to 0.8.1, (fix blank spacing in changelist)
2015-02-20 02:02:34 -08:00
Ilya Kreymer
cb6aebf06d
Merge CHANGES.rst from 'develop'
0.8.0
2015-02-19 01:29:22 -08:00
Ilya Kreymer
bf203a2dc6
Merge branch 'develop' of https://github.com/ikreymer/pywb into develop
2015-02-19 01:29:03 -08:00
Ilya Kreymer
121e1df3c9
README: update branch config to master
2015-02-19 01:26:55 -08:00
Ilya Kreymer
824587bd90
A few more CHANGES.rst tweaks
2015-02-19 01:24:52 -08:00
Ilya Kreymer
26df8d7784
remove debug logging and spaces
2015-02-19 01:17:31 -08:00
Ilya Kreymer
0ddc490b8d
Update CHANGELIST for 0.8.0!
2015-02-19 01:16:25 -08:00
Ilya Kreymer
80dcb6ff27
rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
...
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
2015-02-17 17:51:45 -08:00
Ilya Kreymer
9623f95439
memento: add rel="memento" header to timegate as well, improve memento test, clearly differntiate between
...
timegate redirect and intermediate resource redirect, related to #70
2015-02-16 09:59:03 -08:00
Ilya Kreymer
c4d5dd4690
rewrite: optimize / sanity, only %-encode urls that are actually idna-encoded,
...
otherwise return as is, #66
2015-02-15 10:34:56 -08:00
Ilya Kreymer
afe49a91f4
rewrite: more fixes for IDN #66 - add _do_percent_encode field to wburl itself
...
defaults to true, may be disabled with 'punycode_links'
remove wbrequest and urlrewriter from get_url path, simply call wb_url.get_url() to get properly formatted url
2015-02-14 20:55:36 -08:00
Ilya Kreymer
f9452bf48e
rewrite: refactor IDN support: instead of returning IRI, return utf-8 %-encoded url
...
remove support for returning IRI, as that requires detecting charset, instead just use %-encoded form
and let browser decode. Should address #66
Add rewrite option 'punycode_links_only' (default to false) to skip the %-encoded conversion of host, and just return punycode.
wombat: use getAttribute('href') on <a> tag to get original url, not punycode version
replay: add extra sanity check on Location header to ensure utf-8
2015-02-14 17:26:39 -08:00
Ilya Kreymer
79cfdd6a08
framework/urlrewriter: allow overriding UrlRewriter with optional urlrewriter_class param,
...
easier to override create_rebased_rewriter() with custom rewriter as well
2015-02-12 10:34:04 -08:00
Ilya Kreymer
dcf3688dc3
wombat: also override frameElement when changing window.parent for top-level replay frame
2015-02-11 19:26:45 -08:00
Ilya Kreymer
0b72bfe911
add 'none' js regex rewriter, which does not rewrite urls or location regexs
...
add test for none rewriter in test rule
2015-02-11 15:01:29 -08:00
Ilya Kreymer
f068186e37
wombat: replace window.self -> window for clarity
2015-02-11 15:01:04 -08:00
Ilya Kreymer
78bd89b4cb
rewrite: simplify deprefix, url already unquoted now so remove extra unquote
2015-02-11 14:28:45 -08:00
Ilya Kreymer
4e7f95081f
url_rewriter: catch exception when encoding to utf-8, may not be properly encoded, in which
...
case treat as bytes
2015-02-10 15:05:15 -08:00
Ilya Kreymer
90aba00ca0
not_found: catch NotFoundException from any part of handle_request, not just indexing.. allows for more flexible
...
usage with cdx iterators that are lazily evaluated on replay
2015-02-10 15:03:21 -08:00
Ilya Kreymer
148651680a
wombat fix: use __orig_parent when referencing top-frame, since window.parent is being overriden
2015-02-10 15:02:08 -08:00
Ilya Kreymer
78ae86b6b6
Merge branch 'master' for 0.7.8 into develop
2015-02-05 08:45:55 -08:00
Ilya Kreymer
384e68c84b
bump version to 0.7.8 for latest fix
2015-02-04 21:46:57 -08:00
Ilya Kreymer
cc144fdead
rewrite: add basic test for X-Forwarded-Proto #57
2015-02-04 21:44:18 -08:00
Ilya Kreymer
78812c8085
rewrite: more conservative change, only rewrite the X-Forwarded-Proto
...
header for now, #57
2015-02-04 15:17:23 -08:00
Ilya Kreymer
cdb3dcc3d2
rewrite_live: don't forward via or https_x headers, only standard (for
...
now) possible fix for #57
2015-02-04 14:19:37 -08:00
Ilya Kreymer
40fba3c27b
cdx-indexer: minor cleanup, add custom writer override to
...
write_multi_cdx_index
2015-02-04 11:17:26 -08:00
Ilya Kreymer
ef98716bd8
bump version to 0.7.7 in prep for release
2015-02-03 11:23:12 -08:00