* proxy router for handling only proxy
* proxy/archival router for handling both archival and proxy mode,
togglable with 'enable_http_proxy' setting in config
* supports only most recent capture playback -- no support for
selecting replay date/calendar view yet
* not testable with WebTest -- need better way to unit test proxy mode
wrapping previous WbResponse
overhaul yaml config to be much simpler, move best resolver and
best index reader to respective classes
add config_utils for sharing config, standard non-yaml config
provides defaults for testing
fix bug in query.html
Changes WbUrl forms:
/2013/im_/example.com -> 2013/im_/example.com
/*/example.com -> */example.com
/example.com -> example.com
* also simplify scheme-agnostic url (//) handling by just eating up extra
slashes
* add additional doctests on route, with and w/o custom SCRIPT_NAME
supports /cdx?url=... and other params including
filter=<regex>
collapse_time=<0-14>
resolve_revisits=<true|false>
reverse=<true|false>
closest=<timestamp>
* Refactor views class to support more Jinja2 views (J2Template)
* Add a home page, collection search page, and error pages, all optional
* all exceptions appear on error page
* wbrequest supports a request with an empty or / wb_url
- pywb_init module inits from ./test directory
misc:
- router has lookahead for '/'
- dechunk even for transparent/binary
- 'text' query mode displays cdx
fix html_rewriter missing ; on entities
js rewriter: only rewrite full document.domain
PathIndexPrefixResolver using binsearch on path index, for #9
resolvers moved to replay_resolvers.py
improve path-resolver logic: each resolver returns an array of possible
files (could be from primary or secondary storage).
then, iterate over all possible files from all resolvers until
a successful load, or raise exception if all failed
All rewriters can support either buffered or streaming mode.
In buffered mode, the full text content is written into a buffer
and served with a Content-Length
in streaming mode, text is streamed as it is rewritten and
no Content-Length is written
Default is to stream the response
Rename archivalrouter.MatchRegex -> archivalrouter.Route, supporting regex/prefix matching
add redir_to_exact to turn off redirect to exact timestamp in RewritingReplayHandler
update README
* Instead of relying on REQUEST_URI, pywb constructs a
REL_REQUEST_URI, from PATH_INFO + QUERY_STRING.
SCRIPT_NAME auto-added to prefix
* MatchPrefix is now superceded by MatchRegex, which
can match a plain string -- collId defaults to the full match
* Added optional archivalurl_class to router to allow for customized
ArchivalUrl implementations to be specified
* run.sh can test on a non-root mountpoint, eg. ./run.sh "/approot"
- Manually set env[‘REQUEST_URI’] (which is nonstandard) the same way
it’s set by uwsgi.
- Include HTTP error code reasons in error response. (wsgiref checks
that error code is at least 4 characters, i.e. includes reason)
- Add a ChunkedLineReader to deal with replays with the
transfer-encoding: chunked header.
- Catch UnicodeDecodeErrors caused by multibyte characters getting
split during buffering.
- A couple of tiny bugs in replay.py
currently MatchPrefix and MatchRegex. handler returns a single response
(no chaining for now)
* rewriting: don't rewrite anchor only urls
* perf: add a very basic profiler in WBHandler for testing