Ilya Kreymer
cbe7d1c981
webagg: add tests for RedisPathResolver and errors on missing warc, missing warc keys
2016-03-21 11:44:32 -07:00
Ilya Kreymer
22ead52604
webagg: convert StreamIter to generate, remove unused ReadFullyStream
...
loaders: add support for RedisResolver as well as PathPrefixResolver
inputreq: reconstruct_request() skips host header if already present
improve test app to include replay
2016-03-21 11:04:52 -07:00
Ilya Kreymer
4cf935abd1
directory agg: add CacheDirectoryAggregator to cache file listing, rescan dir only if changed
2016-03-19 20:34:09 -07:00
Ilya Kreymer
f5ee3c7bca
inputreq: add reconstruct_request() to return a bytestring of the request, add test for inputreq
2016-03-19 20:32:37 -07:00
Ilya Kreymer
c96e419341
recorder: ensure filename is also tracked by the indexer, add tests
...
for redis file mapping
2016-03-19 10:24:28 -07:00
Ilya Kreymer
3452cf39e0
recorder: use more general MultiFileWARCWriter, supporting both keeping file open
...
and one-warc-per record use cases
2016-03-18 21:40:41 -07:00
Ilya Kreymer
e81457df5f
rename WARCRecorder -> WARCWriter, add optional max_size to single warc recorder
...
per-record recorder combines http response/req into single file
2016-03-18 19:49:14 -07:00
Ilya Kreymer
b64be0dff1
recorder: add tests for single file writer, including file locking
...
dedup policy: support customizable dedup/skip/write policy plugins and add tests
2016-03-18 15:28:24 -07:00
Ilya Kreymer
cba8e4ee3a
filters: more functional filter impl for header exclusion
2016-03-17 18:22:26 -07:00
Ilya Kreymer
58e8c709aa
docker: add initial docker-compose, webagg Dockerfile
2016-03-16 18:42:15 -07:00
Ilya Kreymer
8dc59ef6bd
webagg: add test for live server config
2016-03-13 16:53:39 -07:00
Ilya Kreymer
06978bd8d2
recorder: check for empty input stream (support for direct proxy?)
2016-03-13 11:17:52 -07:00
Ilya Kreymer
709d2b1ea2
reorg: move StreamIter to utils
2016-03-12 23:29:23 -08:00
Ilya Kreymer
7a828017d1
recorder: clean up logging, ReadFullyStream moves to utils, get_request_uri to inputreq
2016-03-12 22:18:01 -08:00
Ilya Kreymer
49b6ae78a8
live loader: remove liverec (doesn't work well with gevent), use regular requests
...
instead of overriden version.
reconstruct header block from httplib header pairs list
move ReadFullyStream to utils
2016-03-12 22:15:24 -08:00
Ilya Kreymer
9adb8da3b7
recorder: add support for filtering collections to record by regex (default: .*)
...
add support for excluding certain headers when writing WARCs
tests: add first batch of tests for recorder, using live upstream server
2016-03-11 11:12:25 -08:00
Ilya Kreymer
2003925b75
setup: fix pywb py3 version to 0.30.0, add coverage for recorder
2016-03-11 11:11:43 -08:00
Ilya Kreymer
3b3e190cf4
testing: use test mixins for class-scope temp directory, live server creation
...
use processes instead of threads for live server
2016-03-11 11:10:22 -08:00
Ilya Kreymer
46d013ab19
test redis: minor tweak to use @patch for fakeredis mock
2016-03-10 21:35:01 -08:00
Ilya Kreymer
c309637a3a
tests: webagg test tweaks, create TempDirTests for sharing tests that require a temp dir
2016-03-10 16:04:27 -08:00
Ilya Kreymer
7b847311d5
dir agg: include filename in dir source name
2016-03-10 15:51:01 -08:00
Ilya Kreymer
31fb2f926f
add recorder app, initial pass!
2016-03-09 14:33:36 -08:00
Ilya Kreymer
1499f0e611
add shared README.rst and coverage
2016-03-09 14:33:11 -08:00
Ilya Kreymer
34386578a5
shared setup: move webagg test to webagg/test
2016-03-09 14:29:14 -08:00
Ilya Kreymer
3477cb0bb5
drop process/thread mixin support (doesn't work as well on py2) could readd processes only if need arises, but for now focusing on gevent
...
rename header Source-Coll -> WebAgg-Source-Coll
2016-03-08 10:56:03 -08:00
Ilya Kreymer
348fb133e0
add upstream/proxy tests
2016-03-08 10:29:59 -08:00
Ilya Kreymer
107ba9aabc
add ProxyLiveIndexSource for proxying upstream conn directly w/o a second index query
...
liveloader: if 'memento_url' key is set, then memento-datetime header must be present or its an error response
liveindexsource: add option to specify custom live path (eg. prefix for cacheing)
fix test cases changed due to ia (todo: mock up all external data!)
2016-03-08 10:27:13 -08:00
Ilya Kreymer
c1895ae70f
loaders: return full WARC record in response, no need for upstream response handler
...
add UpstreamAggIndexSource to simplify upstream aggregator config, add test for upstream config
bottle app: wrap in a ResAppAgg, allow multiple bottle apps
py2: non-gevent concurrency not supported
2016-03-06 23:12:14 -08:00
Ilya Kreymer
0823ff4bd0
added 'upstream' handler for connecting to another webagg when 'upstream_url' is set
...
output 'is_live' as string in live index
2016-03-06 09:10:17 -08:00
Ilya Kreymer
20ebccc13e
handlers: return out_headers directly instead of setting bottle response, contains bottle dependency to app.py (to allow alternate impl not using bottle)
...
param parsing: instead of setting custom _src_params and _all_params, use a custom ParamFormatter which will check param dict for params with prefix and custom name
2016-03-05 16:49:26 -08:00
Ilya Kreymer
bdda1b8c03
minor fixes for py2 support
2016-03-03 13:58:09 -08:00
Ilya Kreymer
896f81fd1c
Add README.rst
2016-03-03 12:09:17 -08:00
Ilya Kreymer
ed1d3555c3
rename rezag -> webagg
...
rename aggindexsource -> aggregator
2016-03-03 11:55:43 -08:00
Ilya Kreymer
98830147b5
add memento headers to all response loaders, use BaseLoader base class, update tests
...
for memento headers
2016-03-03 11:04:28 -08:00
Ilya Kreymer
65e969a492
errors and timeouts reported back to the user via ResErrors header
...
add new /index, /resource access point system
2016-03-02 18:13:13 -08:00
Ilya Kreymer
1f3763d02c
misc fixes: add route listing, more not found tests, timemap use file:// with ranges
2016-03-01 14:46:05 -08:00
Ilya Kreymer
008e5284b1
seperate iter_sources from list_sources api
...
all errors returned as json block with error msg
tests for not found, invalid errors
2016-02-29 12:34:06 -08:00
Ilya Kreymer
68090d00c1
add routing setup via app.py
...
add full test suite for handlers and responseloaders, as well as timeouts
2016-02-28 14:33:08 -08:00
Ilya Kreymer
c88c5f4cca
add new package setup!
...
add tests and testdata, splitting mem and dir agg tests
2016-02-26 18:25:10 -08:00
Ilya Kreymer
398e8f1a77
inputrequest: add input request handling (direct wsgi headers) or as a prepared post request
...
add timemap link output
rename source_name -> source
2016-02-24 14:22:29 -08:00
Ilya Kreymer
1a0b2fba17
add aggregate index source and tests!
2016-02-22 13:30:12 -08:00
Ilya Kreymer
37198767ed
add utils, responseloader and liverec
2016-02-19 17:27:19 -08:00
Ilya Kreymer
baa02add69
add indexloader and tests, including file, redis, remote cdx, memento, and live sources
2016-02-19 17:25:54 -08:00