split binsearch further into binsearch and linearsearch components
reading blocks one at a time currently, due to zlib decompress limitations
fix bufferedreader.readline() and fileloader bugs
remove max_len from DecompressingBufferedReader as it applied to
the compressed size, not original size.
Add integration test for verifying content length of larger file
exclusions: add AllAllowPerms and refactor exclusions interface
add TestExclusionPerms and a sample exclusion integration test
refactor cdx server init params into **kwargs
convert all cdx params to use camelCase
and 'fuzzy' matching when not found
handled via cdxdomainspecific.py
BaseCDXServer contains a canonicalizer object and a fuzzy query
canonicalizer abstracted to seperate class (in canonicalizer.py)
clean up cdx related exceptions
default rules read from cdx/rules.yaml
filename configurable via 'domain_specific_rules' setting in config.yaml
fix typo in pywb/rewrite
(as opposed to regex matches)
eg: filter:urlkey=com,example)/?example=1 matches exact
string 'com,example)/?example=1' in the urlkey field
(as opposed to applying it as a regex)
a cdx server need implement a single interface:
load_cdx(self, **params)
CDXServer and RemoteCDXServer distinct classes in cdxserver.py
utility function cdxserver.create_cdx_server() to create
appropriate server based on input
move to distinct packages: pywb.utils, pywb.cdx, pywb.warc, pywb.util, pywb.rewrite!
each package will have its own README and tests
shared sample_data and install
- add defaults dictionary, chain dictionaries rather than copying
- allow custom classes to be loaded explicitly via yaml
- for LineReader, assume ungzipped if first decompress fails
- properly ignore bad local paths
- add optional reporter object
- don't store explicit static path, but allow it to be set in the insert
- store host_prefix, which is either server name or empty
- for archival mode, absolute_paths settings controls if using absolute paths,
- for proxy always use absolute_paths
- default static path is: /static/default/
- allow extension apps to provide custom /static/X/ path
Route overriding:
- ability to set Route class
- custom init method
Archival Relative Redirect:
- if starting with timestamp, drop timestamp and assume host-relative path
Integration Tests:
- test proxy mode by using REQUEST_URI
- test archival relative redirect!
adding StaticHandler and loading templates and static resources from current package
add default template and static data to be included in the pywb package
add test for custom static route
* proxy router for handling only proxy
* proxy/archival router for handling both archival and proxy mode,
togglable with 'enable_http_proxy' setting in config
* supports only most recent capture playback -- no support for
selecting replay date/calendar view yet
* not testable with WebTest -- need better way to unit test proxy mode
wrapping previous WbResponse
overhaul yaml config to be much simpler, move best resolver and
best index reader to respective classes
add config_utils for sharing config, standard non-yaml config
provides defaults for testing
fix bug in query.html
Changes WbUrl forms:
/2013/im_/example.com -> 2013/im_/example.com
/*/example.com -> */example.com
/example.com -> example.com
* also simplify scheme-agnostic url (//) handling by just eating up extra
slashes
* add additional doctests on route, with and w/o custom SCRIPT_NAME
supports /cdx?url=... and other params including
filter=<regex>
collapse_time=<0-14>
resolve_revisits=<true|false>
reverse=<true|false>
closest=<timestamp>