1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-20 10:49:11 +01:00

21 Commits

Author SHA1 Message Date
Ilya Kreymer
9588e8622f responseloader: quote/unquote Webagg-Source-Coll header as source may contain unicode chars 2016-07-23 21:57:24 -04:00
Ilya Kreymer
c1d7111841 webagg: store original 'source' value in cdx for properly mapping in WARC file resolver
error handling: ensure 'last_exc' is a string
2016-06-14 00:13:01 -04:00
Ilya Kreymer
3fec766e39 webagg: redis lookup: if url contains wildcard, scan redis keys to check multiple keys until one is found
webagg tests: fix test to include mime in live cdx
2016-06-07 12:54:28 -04:00
Ilya Kreymer
d7c74b68de video loader support: add VideoLoader, which uses youtube-dl to create a metadata record
of video info. Activated with explicit content_type param 'application/vnd.youtube-dl_formats+json'
2016-05-28 15:01:33 -07:00
Ilya Kreymer
ea3efdf84d responseloader: use PreparedRequest() to ensure url properly formatted
tests: update tests for latest, live data
2016-05-24 18:01:44 -07:00
Ilya Kreymer
7a0dd463cd webagg: responseloader: use urllib3 directly instead of requests to
take advantage of connection pooling w/o storing/sharing cookies
2016-04-27 10:16:54 -07:00
Ilya Kreymer
a93f75dca2 webagg: add preliminary 'fuzzy matching' fallback support, currently enabled for all sources
(todo: need to only include sources that support it)
2016-04-15 02:18:20 +00:00
Ilya Kreymer
017e9802f8 tests: fix fakeredis patch not running on test_handlers,
use exc str instead of repr for error message for consistency
all tests pass on py2 and py3 again!
2016-03-26 22:32:21 -04:00
Ilya Kreymer
b6e988d9a1 self-redirect: if 'status' is a 3xx, call raise_on_self_redirect() to check Location for exact url redirect.
supports both WARC and live loaders, addresses #1
2016-03-24 16:08:29 -04:00
Ilya Kreymer
cbe7d1c981 webagg: add tests for RedisPathResolver and errors on missing warc, missing warc keys 2016-03-21 11:44:32 -07:00
Ilya Kreymer
22ead52604 webagg: convert StreamIter to generate, remove unused ReadFullyStream
loaders: add support for RedisResolver as well as PathPrefixResolver
inputreq: reconstruct_request() skips host header if already present
improve test app to include replay
2016-03-21 11:04:52 -07:00
Ilya Kreymer
4cf935abd1 directory agg: add CacheDirectoryAggregator to cache file listing, rescan dir only if changed 2016-03-19 20:34:09 -07:00
Ilya Kreymer
709d2b1ea2 reorg: move StreamIter to utils 2016-03-12 23:29:23 -08:00
Ilya Kreymer
49b6ae78a8 live loader: remove liverec (doesn't work well with gevent), use regular requests
instead of overriden version.
reconstruct header block from httplib header pairs list
move ReadFullyStream to utils
2016-03-12 22:15:24 -08:00
Ilya Kreymer
3477cb0bb5 drop process/thread mixin support (doesn't work as well on py2) could readd processes only if need arises, but for now focusing on gevent
rename header Source-Coll -> WebAgg-Source-Coll
2016-03-08 10:56:03 -08:00
Ilya Kreymer
107ba9aabc add ProxyLiveIndexSource for proxying upstream conn directly w/o a second index query
liveloader: if 'memento_url' key is set, then memento-datetime header must be present or its an error response
liveindexsource: add option to specify custom live path (eg. prefix for cacheing)
fix test cases changed due to ia (todo: mock up all external data!)
2016-03-08 10:27:13 -08:00
Ilya Kreymer
c1895ae70f loaders: return full WARC record in response, no need for upstream response handler
add UpstreamAggIndexSource to simplify upstream aggregator config, add test for upstream config
bottle app: wrap in a ResAppAgg, allow multiple bottle apps
py2: non-gevent concurrency not supported
2016-03-06 23:12:14 -08:00
Ilya Kreymer
0823ff4bd0 added 'upstream' handler for connecting to another webagg when 'upstream_url' is set
output 'is_live' as string in live index
2016-03-06 09:10:17 -08:00
Ilya Kreymer
20ebccc13e handlers: return out_headers directly instead of setting bottle response, contains bottle dependency to app.py (to allow alternate impl not using bottle)
param parsing: instead of setting custom _src_params and _all_params, use a custom ParamFormatter which will check param dict for params with prefix and custom name
2016-03-05 16:49:26 -08:00
Ilya Kreymer
bdda1b8c03 minor fixes for py2 support 2016-03-03 13:58:09 -08:00
Ilya Kreymer
ed1d3555c3 rename rezag -> webagg
rename aggindexsource -> aggregator
2016-03-03 11:55:43 -08:00