use StatusAndHeaders instead of requests CaseInsensitiveDict for consistency
refactor writer api: create_warc_record() for creating new record
copy_warc_record() for copying a full record from a stream
add writer tests, separate from recorder
frontendapp compatibility
- add support for separate not found page for 404s (not_found.html)
- support for exception handling with error template (error.html)
- support for home page (index.html)
- add memento headers for replay
- add referrer fallback check
- tests: port integration tests for front-end replay, cdx server
- not included: proxy mode, exact redirect mode, non-framed replay
- move unused tests to tests_disabled
- cli: add optional werkzeug profiler with --profile flag
support for dynamic collections: check all .cdxj files in /<coll>/indexes/*.cdxj when accessing /<coll>
support for fixed routes: specified in config.yaml as per https://github.com/ikreymer/pywb/wiki/Distributed-Archive-Config
werkzeug routing in FrontEndApp: default query, replay, search pages working
route listing: /_coll_info.json for listing fixed + dynamic routes
autoindexing enabled, indexing WARCs added to archives directory to .cdxj index
Addresses #196
- xhr responseURL override, extract original url
- Worker override: if using 'blob:', extract blob and remove any postMessage() rewriting (workers won't have the __WB_pmw function)
- eval() override: conv to string before rewriting
if no content-length and http 1.1, chunk encode the response
if no content-length and http 1.0, buffer response and add content-length
utils: port buffer_iter() for buffering iter, returning another iter
utils load_config: expand any env vars
decompressor: allow plaintext after gzipped record fully finished, as next member
warc loader: ignore blank line records -- if empty statusheaders, set length to 0 and ignore, don't read indenfinitely
- make recorder tempfile used by request/response wrappers overridable, better checks to ensure temp file is closed after recording is done/failed
- ensure ParamsFormatter inited for all requests
- writer: ensure writing from temp buffer done in BUFF_SIZE increments
responseloader: direct loader: unrewrite location, content-location headers for non-live responses
autoapp: support custom indexsource list
indexsource: ensure closest query is added for RemoteIndexSource
utils res)template: urlencode '{url}' param if after '?'
autoapp: add init_index_agg() for initializing indexes from a config dict
autoapp config: use RedisMultiKeyIndexSource for redis url and ZipNumIndexSource as zipnum+
all index sources can be inited from string or dictionary (loaded from yaml)
support for dynamic directory-based collections based on file system, as well as static routes
specified explicitly
add `-cdx` path for compatibility with existing pywb -cdx interface
tests: add tests for AutoConfigApp yaml loading
add WSGI app shortcut for AutoConfigApp