1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

68 Commits

Author SHA1 Message Date
Ilya Kreymer
0b255819ff recorder warcwriter: allow skipping writing of only request or only response by overriding _is_write_req and _is_write_resp in subclass
(todo: rethink the interface)
2016-04-15 02:19:34 +00:00
Ilya Kreymer
a93f75dca2 webagg: add preliminary 'fuzzy matching' fallback support, currently enabled for all sources
(todo: need to only include sources that support it)
2016-04-15 02:18:20 +00:00
Ilya Kreymer
00bdddd1e9 recorder: SkipDupePolicy only skips if url is an exact match (not just by urlkey) 2016-04-07 10:44:05 -07:00
Ilya Kreymer
f4cc143dc7 urlrewrite: generalize support for overridable handle_custom_response() callback for handling modifiers (default support top-frame)
pass headers to add_custom_params, include error message on error if available
headers: use add_header() to support multiple headers with same name
is_ajax(): check for X-Pywb-Requested-With header to make as ajax and not pass to upstream
2016-04-07 10:39:12 -07:00
Ilya Kreymer
fa5d5e6bcc urlrewrite templates: add get_top_frame_params() callback for adding custom params for top frame,
also inject env['webrec.template_params'] if set
2016-04-05 02:45:00 -07:00
Ilya Kreymer
d40edfc22d warcwriter: add create_warcinfo_record() for creating a warcinfo and a SimpleTempWARCWriter for writing records to temp buff/file 2016-04-03 12:19:54 -07:00
Ilya Kreymer
fd76030cb3 urlrewriter: allow passing in existing jinja_env wrapper 2016-04-02 21:36:54 -07:00
Ilya Kreymer
01c21d3a43 recorder: redis indexer accepts arg list, supports separate redis and key_template args
add length param to add_urls_to_index() in redis indexer, return cdx list
2016-04-02 21:36:36 -07:00
Ilya Kreymer
6157cebcc9 testutils: when mock patching FakeStrictRedis, use a subclass with a shared pubsub (to match real redis) 2016-04-02 21:33:39 -07:00
Ilya Kreymer
ddee9236c6 webagg: rename key_prefix -> key_template 2016-04-02 21:33:23 -07:00
Ilya Kreymer
70fbb5f7a6 ulrewrite: fix typos, add full package paths 2016-03-28 22:59:22 -07:00
Ilya Kreymer
f12be3bc91 urlrewrite app: add bottle-based app, templateview separate from pywb webapp framework 2016-03-27 17:34:45 -04:00
Ilya Kreymer
017e9802f8 tests: fix fakeredis patch not running on test_handlers,
use exc str instead of repr for error message for consistency
all tests pass on py2 and py3 again!
2016-03-26 22:32:21 -04:00
Ilya Kreymer
0399cc1046 webagg app: support bottle debug properly as opt param 2016-03-26 22:30:47 -04:00
Ilya Kreymer
7884d4394b recorder: close_file() by params rather than exact path, update tests 2016-03-26 13:07:53 -04:00
Ilya Kreymer
7deba42851 add urlrewrite pywb-adapter PlatformHandler for using traditional pywb
setup with webrecorder components recorder and webagg
2016-03-24 16:33:03 -04:00
Ilya Kreymer
2bfe5d4f9e inputreq: only use REQUEST_URI if no SCRIPT_NAME is set (otherwise reconstruct the path) 2016-03-24 16:17:46 -04:00
Ilya Kreymer
b6e988d9a1 self-redirect: if 'status' is a 3xx, call raise_on_self_redirect() to check Location for exact url redirect.
supports both WARC and live loaders, addresses #1
2016-03-24 16:08:29 -04:00
Ilya Kreymer
61921d6c4a tests: add FakeRedisTests class mixin for patching in FakeRedis for tests 2016-03-24 10:45:48 -04:00
Ilya Kreymer
7cc772329c redis: add tests for RedisMultiKeyIndexSource 2016-03-24 10:44:14 -04:00
Ilya Kreymer
64b32dc57a redis support: add RedisMultiKeyIndexSource for using redis SCAN wildcard query and aggregate results from several
redis keys
2016-03-24 01:17:18 -04:00
Ilya Kreymer
e5ddf9d4f4 utils: res_template() supports extra params for interpolation 2016-03-23 23:58:49 -04:00
Ilya Kreymer
ba66d0bb5e recorder: use res_template() to resolve params, rename indexing method to add_urls_to_index 2016-03-23 23:55:21 -04:00
Ilya Kreymer
aa80cd6881 recorder: add simple recorder config indexing to redis 2016-03-21 11:50:01 -07:00
Ilya Kreymer
d38bb5a1fd filters: add extensible 'skip filters', with default filters to accept certain collections, filter out
recording of range requests. Opportunity to skip recording at request or response time
RespWrapper handles reading stream fully on close() (no need for old ReadFullyStream),
skips recording if read was interrupted/incomplete
writer: avoiding writing duplicate content-length/content-type headers
2016-03-21 11:47:12 -07:00
Ilya Kreymer
cbe7d1c981 webagg: add tests for RedisPathResolver and errors on missing warc, missing warc keys 2016-03-21 11:44:32 -07:00
Ilya Kreymer
22ead52604 webagg: convert StreamIter to generate, remove unused ReadFullyStream
loaders: add support for RedisResolver as well as PathPrefixResolver
inputreq: reconstruct_request() skips host header if already present
improve test app to include replay
2016-03-21 11:04:52 -07:00
Ilya Kreymer
4cf935abd1 directory agg: add CacheDirectoryAggregator to cache file listing, rescan dir only if changed 2016-03-19 20:34:09 -07:00
Ilya Kreymer
f5ee3c7bca inputreq: add reconstruct_request() to return a bytestring of the request, add test for inputreq 2016-03-19 20:32:37 -07:00
Ilya Kreymer
c96e419341 recorder: ensure filename is also tracked by the indexer, add tests
for redis file mapping
2016-03-19 10:24:28 -07:00
Ilya Kreymer
3452cf39e0 recorder: use more general MultiFileWARCWriter, supporting both keeping file open
and one-warc-per record use cases
2016-03-18 21:40:41 -07:00
Ilya Kreymer
e81457df5f rename WARCRecorder -> WARCWriter, add optional max_size to single warc recorder
per-record recorder combines http response/req into single file
2016-03-18 19:49:14 -07:00
Ilya Kreymer
b64be0dff1 recorder: add tests for single file writer, including file locking
dedup policy: support customizable dedup/skip/write policy plugins and add tests
2016-03-18 15:28:24 -07:00
Ilya Kreymer
cba8e4ee3a filters: more functional filter impl for header exclusion 2016-03-17 18:22:26 -07:00
Ilya Kreymer
58e8c709aa docker: add initial docker-compose, webagg Dockerfile 2016-03-16 18:42:15 -07:00
Ilya Kreymer
8dc59ef6bd webagg: add test for live server config 2016-03-13 16:53:39 -07:00
Ilya Kreymer
06978bd8d2 recorder: check for empty input stream (support for direct proxy?) 2016-03-13 11:17:52 -07:00
Ilya Kreymer
709d2b1ea2 reorg: move StreamIter to utils 2016-03-12 23:29:23 -08:00
Ilya Kreymer
7a828017d1 recorder: clean up logging, ReadFullyStream moves to utils, get_request_uri to inputreq 2016-03-12 22:18:01 -08:00
Ilya Kreymer
49b6ae78a8 live loader: remove liverec (doesn't work well with gevent), use regular requests
instead of overriden version.
reconstruct header block from httplib header pairs list
move ReadFullyStream to utils
2016-03-12 22:15:24 -08:00
Ilya Kreymer
9adb8da3b7 recorder: add support for filtering collections to record by regex (default: .*)
add support for excluding certain headers when writing WARCs
tests: add first batch of tests for recorder, using live upstream server
2016-03-11 11:12:25 -08:00
Ilya Kreymer
2003925b75 setup: fix pywb py3 version to 0.30.0, add coverage for recorder 2016-03-11 11:11:43 -08:00
Ilya Kreymer
3b3e190cf4 testing: use test mixins for class-scope temp directory, live server creation
use processes instead of threads for live server
2016-03-11 11:10:22 -08:00
Ilya Kreymer
46d013ab19 test redis: minor tweak to use @patch for fakeredis mock 2016-03-10 21:35:01 -08:00
Ilya Kreymer
c309637a3a tests: webagg test tweaks, create TempDirTests for sharing tests that require a temp dir 2016-03-10 16:04:27 -08:00
Ilya Kreymer
7b847311d5 dir agg: include filename in dir source name 2016-03-10 15:51:01 -08:00
Ilya Kreymer
31fb2f926f add recorder app, initial pass! 2016-03-09 14:33:36 -08:00
Ilya Kreymer
1499f0e611 add shared README.rst and coverage 2016-03-09 14:33:11 -08:00
Ilya Kreymer
34386578a5 shared setup: move webagg test to webagg/test 2016-03-09 14:29:14 -08:00
Ilya Kreymer
3477cb0bb5 drop process/thread mixin support (doesn't work as well on py2) could readd processes only if need arises, but for now focusing on gevent
rename header Source-Coll -> WebAgg-Source-Coll
2016-03-08 10:56:03 -08:00