Ilya Kreymer
0399cc1046
webagg app: support bottle debug properly as opt param
2016-03-26 22:30:47 -04:00
Ilya Kreymer
7884d4394b
recorder: close_file() by params rather than exact path, update tests
2016-03-26 13:07:53 -04:00
Ilya Kreymer
7deba42851
add urlrewrite pywb-adapter PlatformHandler for using traditional pywb
...
setup with webrecorder components recorder and webagg
2016-03-24 16:33:03 -04:00
Ilya Kreymer
2bfe5d4f9e
inputreq: only use REQUEST_URI if no SCRIPT_NAME is set (otherwise reconstruct the path)
2016-03-24 16:17:46 -04:00
Ilya Kreymer
b6e988d9a1
self-redirect: if 'status' is a 3xx, call raise_on_self_redirect() to check Location for exact url redirect.
...
supports both WARC and live loaders, addresses #1
2016-03-24 16:08:29 -04:00
Ilya Kreymer
61921d6c4a
tests: add FakeRedisTests class mixin for patching in FakeRedis for tests
2016-03-24 10:45:48 -04:00
Ilya Kreymer
7cc772329c
redis: add tests for RedisMultiKeyIndexSource
2016-03-24 10:44:14 -04:00
Ilya Kreymer
64b32dc57a
redis support: add RedisMultiKeyIndexSource for using redis SCAN wildcard query and aggregate results from several
...
redis keys
2016-03-24 01:17:18 -04:00
Ilya Kreymer
e5ddf9d4f4
utils: res_template() supports extra params for interpolation
2016-03-23 23:58:49 -04:00
Ilya Kreymer
ba66d0bb5e
recorder: use res_template() to resolve params, rename indexing method to add_urls_to_index
2016-03-23 23:55:21 -04:00
Ilya Kreymer
aa80cd6881
recorder: add simple recorder config indexing to redis
2016-03-21 11:50:01 -07:00
Ilya Kreymer
d38bb5a1fd
filters: add extensible 'skip filters', with default filters to accept certain collections, filter out
...
recording of range requests. Opportunity to skip recording at request or response time
RespWrapper handles reading stream fully on close() (no need for old ReadFullyStream),
skips recording if read was interrupted/incomplete
writer: avoiding writing duplicate content-length/content-type headers
2016-03-21 11:47:12 -07:00
Ilya Kreymer
cbe7d1c981
webagg: add tests for RedisPathResolver and errors on missing warc, missing warc keys
2016-03-21 11:44:32 -07:00
Ilya Kreymer
22ead52604
webagg: convert StreamIter to generate, remove unused ReadFullyStream
...
loaders: add support for RedisResolver as well as PathPrefixResolver
inputreq: reconstruct_request() skips host header if already present
improve test app to include replay
2016-03-21 11:04:52 -07:00
Ilya Kreymer
4cf935abd1
directory agg: add CacheDirectoryAggregator to cache file listing, rescan dir only if changed
2016-03-19 20:34:09 -07:00
Ilya Kreymer
f5ee3c7bca
inputreq: add reconstruct_request() to return a bytestring of the request, add test for inputreq
2016-03-19 20:32:37 -07:00
Ilya Kreymer
c96e419341
recorder: ensure filename is also tracked by the indexer, add tests
...
for redis file mapping
2016-03-19 10:24:28 -07:00
Ilya Kreymer
3452cf39e0
recorder: use more general MultiFileWARCWriter, supporting both keeping file open
...
and one-warc-per record use cases
2016-03-18 21:40:41 -07:00
Ilya Kreymer
e81457df5f
rename WARCRecorder -> WARCWriter, add optional max_size to single warc recorder
...
per-record recorder combines http response/req into single file
2016-03-18 19:49:14 -07:00
Ilya Kreymer
b64be0dff1
recorder: add tests for single file writer, including file locking
...
dedup policy: support customizable dedup/skip/write policy plugins and add tests
2016-03-18 15:28:24 -07:00
Ilya Kreymer
cba8e4ee3a
filters: more functional filter impl for header exclusion
2016-03-17 18:22:26 -07:00
Ilya Kreymer
58e8c709aa
docker: add initial docker-compose, webagg Dockerfile
2016-03-16 18:42:15 -07:00
Ilya Kreymer
8dc59ef6bd
webagg: add test for live server config
2016-03-13 16:53:39 -07:00
Ilya Kreymer
06978bd8d2
recorder: check for empty input stream (support for direct proxy?)
2016-03-13 11:17:52 -07:00
Ilya Kreymer
709d2b1ea2
reorg: move StreamIter to utils
2016-03-12 23:29:23 -08:00
Ilya Kreymer
7a828017d1
recorder: clean up logging, ReadFullyStream moves to utils, get_request_uri to inputreq
2016-03-12 22:18:01 -08:00
Ilya Kreymer
49b6ae78a8
live loader: remove liverec (doesn't work well with gevent), use regular requests
...
instead of overriden version.
reconstruct header block from httplib header pairs list
move ReadFullyStream to utils
2016-03-12 22:15:24 -08:00
Ilya Kreymer
9adb8da3b7
recorder: add support for filtering collections to record by regex (default: .*)
...
add support for excluding certain headers when writing WARCs
tests: add first batch of tests for recorder, using live upstream server
2016-03-11 11:12:25 -08:00
Ilya Kreymer
2003925b75
setup: fix pywb py3 version to 0.30.0, add coverage for recorder
2016-03-11 11:11:43 -08:00
Ilya Kreymer
3b3e190cf4
testing: use test mixins for class-scope temp directory, live server creation
...
use processes instead of threads for live server
2016-03-11 11:10:22 -08:00
Ilya Kreymer
46d013ab19
test redis: minor tweak to use @patch for fakeredis mock
2016-03-10 21:35:01 -08:00
Ilya Kreymer
c309637a3a
tests: webagg test tweaks, create TempDirTests for sharing tests that require a temp dir
2016-03-10 16:04:27 -08:00
Ilya Kreymer
7b847311d5
dir agg: include filename in dir source name
2016-03-10 15:51:01 -08:00
Ilya Kreymer
31fb2f926f
add recorder app, initial pass!
2016-03-09 14:33:36 -08:00
Ilya Kreymer
1499f0e611
add shared README.rst and coverage
2016-03-09 14:33:11 -08:00
Ilya Kreymer
34386578a5
shared setup: move webagg test to webagg/test
2016-03-09 14:29:14 -08:00
Ilya Kreymer
3477cb0bb5
drop process/thread mixin support (doesn't work as well on py2) could readd processes only if need arises, but for now focusing on gevent
...
rename header Source-Coll -> WebAgg-Source-Coll
2016-03-08 10:56:03 -08:00
Ilya Kreymer
348fb133e0
add upstream/proxy tests
2016-03-08 10:29:59 -08:00
Ilya Kreymer
107ba9aabc
add ProxyLiveIndexSource for proxying upstream conn directly w/o a second index query
...
liveloader: if 'memento_url' key is set, then memento-datetime header must be present or its an error response
liveindexsource: add option to specify custom live path (eg. prefix for cacheing)
fix test cases changed due to ia (todo: mock up all external data!)
2016-03-08 10:27:13 -08:00
Ilya Kreymer
c1895ae70f
loaders: return full WARC record in response, no need for upstream response handler
...
add UpstreamAggIndexSource to simplify upstream aggregator config, add test for upstream config
bottle app: wrap in a ResAppAgg, allow multiple bottle apps
py2: non-gevent concurrency not supported
2016-03-06 23:12:14 -08:00
Ilya Kreymer
0823ff4bd0
added 'upstream' handler for connecting to another webagg when 'upstream_url' is set
...
output 'is_live' as string in live index
2016-03-06 09:10:17 -08:00
Ilya Kreymer
20ebccc13e
handlers: return out_headers directly instead of setting bottle response, contains bottle dependency to app.py (to allow alternate impl not using bottle)
...
param parsing: instead of setting custom _src_params and _all_params, use a custom ParamFormatter which will check param dict for params with prefix and custom name
2016-03-05 16:49:26 -08:00
Ilya Kreymer
bdda1b8c03
minor fixes for py2 support
2016-03-03 13:58:09 -08:00
Ilya Kreymer
896f81fd1c
Add README.rst
2016-03-03 12:09:17 -08:00
Ilya Kreymer
ed1d3555c3
rename rezag -> webagg
...
rename aggindexsource -> aggregator
2016-03-03 11:55:43 -08:00
Ilya Kreymer
98830147b5
add memento headers to all response loaders, use BaseLoader base class, update tests
...
for memento headers
2016-03-03 11:04:28 -08:00
Ilya Kreymer
65e969a492
errors and timeouts reported back to the user via ResErrors header
...
add new /index, /resource access point system
2016-03-02 18:13:13 -08:00
Ilya Kreymer
1f3763d02c
misc fixes: add route listing, more not found tests, timemap use file:// with ranges
2016-03-01 14:46:05 -08:00
Ilya Kreymer
008e5284b1
seperate iter_sources from list_sources api
...
all errors returned as json block with error msg
tests for not found, invalid errors
2016-02-29 12:34:06 -08:00
Ilya Kreymer
68090d00c1
add routing setup via app.py
...
add full test suite for handlers and responseloaders, as well as timeouts
2016-02-28 14:33:08 -08:00