and 'fuzzy' matching when not found
handled via
BaseCDXServer contains a canonicalizer object and a fuzzy query
canonicalizer abstracted to seperate class (in
clean up cdx related exceptions
default rules read from cdx/rules.yaml
filename configurable via 'domain_specific_rules' setting in config.yaml
fix typo in pywb/rewrite
(as opposed to regex matches)
eg: filter:urlkey=com,example)/?example=1 matches exact
string 'com,example)/?example=1' in the urlkey field
(as opposed to applying it as a regex)
a cdx server need implement a single interface:
load_cdx(self, **params)
CDXServer and RemoteCDXServer distinct classes in
utility function cdxserver.create_cdx_server() to create
appropriate server based on input
move to distinct packages: pywb.utils, pywb.cdx, pywb.warc, pywb.util, pywb.rewrite!
each package will have its own README and tests
shared sample_data and install
- add defaults dictionary, chain dictionaries rather than copying
- allow custom classes to be loaded explicitly via yaml
- for LineReader, assume ungzipped if first decompress fails
- properly ignore bad local paths
- add optional reporter object
- don't store explicit static path, but allow it to be set in the insert
- store host_prefix, which is either server name or empty
- for archival mode, absolute_paths settings controls if using absolute paths,
- for proxy always use absolute_paths
- default static path is: /static/default/
- allow extension apps to provide custom /static/X/ path
Route overriding:
- ability to set Route class
- custom init method
Archival Relative Redirect:
- if starting with timestamp, drop timestamp and assume host-relative path
Integration Tests:
- test proxy mode by using REQUEST_URI
- test archival relative redirect!
adding StaticHandler and loading templates and static resources from current package
add default template and static data to be included in the pywb package
add test for custom static route
* proxy router for handling only proxy
* proxy/archival router for handling both archival and proxy mode,
togglable with 'enable_http_proxy' setting in config
* supports only most recent capture playback -- no support for
selecting replay date/calendar view yet
* not testable with WebTest -- need better way to unit test proxy mode
wrapping previous WbResponse
overhaul yaml config to be much simpler, move best resolver and
best index reader to respective classes
add config_utils for sharing config, standard non-yaml config
provides defaults for testing
fix bug in query.html
Changes WbUrl forms:
/2013/im_/ -> 2013/im_/
/*/ -> */
/ ->
* also simplify scheme-agnostic url (//) handling by just eating up extra
* add additional doctests on route, with and w/o custom SCRIPT_NAME
supports /cdx?url=... and other params including
* Refactor views class to support more Jinja2 views (J2Template)
* Add a home page, collection search page, and error pages, all optional
* all exceptions appear on error page
* wbrequest supports a request with an empty or / wb_url
- pywb_init module inits from ./test directory
- router has lookahead for '/'
- dechunk even for transparent/binary
- 'text' query mode displays cdx