1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

77 Commits

Author SHA1 Message Date
Ilya Kreymer
228ca58c5b recorer: actually fix content-type on warcinfo, add to test! 2016-04-30 13:07:53 -07:00
Ilya Kreymer
0fbae1c7f8 recorder: ensure warcinfo record has a content-type 2016-04-30 10:19:20 -07:00
Ilya Kreymer
7a0dd463cd webagg: responseloader: use urllib3 directly instead of requests to
take advantage of connection pooling w/o storing/sharing cookies
2016-04-27 10:16:54 -07:00
Ilya Kreymer
9010e52663 urlrewrite: refactor simpleapp to support live/record/replay 2016-04-27 10:15:48 -07:00
Ilya Kreymer
f119d05724 recorder: fix simplerec init
tests: improve tests for skipping request and response headers
2016-04-27 09:52:56 -07:00
Ilya Kreymer
a82e2785c7 tests: add basic test for rewriterapp 2016-04-25 14:29:28 -07:00
Ilya Kreymer
3b6cab1730 urlrewrite: remove dependency on bottle from rewriterapp,
add overridable error and query views, with extensible get_query_params() and process_cdx_query()
to extend cdx for query view
add get_top_url() for adding custom top_url for frame insert
add call_with_params() for adding custom params to environ
2016-04-25 12:05:43 -07:00
Ilya Kreymer
b056acd88e urlrewrite: add support for index query 2016-04-15 04:01:36 +00:00
Ilya Kreymer
0370470e68 urlrewrite: http range: support skipping record for range requests not starting at 0-
and performing async request,
support converting unbounded 0- to non-ranged and back
2016-04-15 02:21:39 +00:00
Ilya Kreymer
0b255819ff recorder warcwriter: allow skipping writing of only request or only response by overriding _is_write_req and _is_write_resp in subclass
(todo: rethink the interface)
2016-04-15 02:19:34 +00:00
Ilya Kreymer
a93f75dca2 webagg: add preliminary 'fuzzy matching' fallback support, currently enabled for all sources
(todo: need to only include sources that support it)
2016-04-15 02:18:20 +00:00
Ilya Kreymer
00bdddd1e9 recorder: SkipDupePolicy only skips if url is an exact match (not just by urlkey) 2016-04-07 10:44:05 -07:00
Ilya Kreymer
f4cc143dc7 urlrewrite: generalize support for overridable handle_custom_response() callback for handling modifiers (default support top-frame)
pass headers to add_custom_params, include error message on error if available
headers: use add_header() to support multiple headers with same name
is_ajax(): check for X-Pywb-Requested-With header to make as ajax and not pass to upstream
2016-04-07 10:39:12 -07:00
Ilya Kreymer
fa5d5e6bcc urlrewrite templates: add get_top_frame_params() callback for adding custom params for top frame,
also inject env['webrec.template_params'] if set
2016-04-05 02:45:00 -07:00
Ilya Kreymer
d40edfc22d warcwriter: add create_warcinfo_record() for creating a warcinfo and a SimpleTempWARCWriter for writing records to temp buff/file 2016-04-03 12:19:54 -07:00
Ilya Kreymer
fd76030cb3 urlrewriter: allow passing in existing jinja_env wrapper 2016-04-02 21:36:54 -07:00
Ilya Kreymer
01c21d3a43 recorder: redis indexer accepts arg list, supports separate redis and key_template args
add length param to add_urls_to_index() in redis indexer, return cdx list
2016-04-02 21:36:36 -07:00
Ilya Kreymer
6157cebcc9 testutils: when mock patching FakeStrictRedis, use a subclass with a shared pubsub (to match real redis) 2016-04-02 21:33:39 -07:00
Ilya Kreymer
ddee9236c6 webagg: rename key_prefix -> key_template 2016-04-02 21:33:23 -07:00
Ilya Kreymer
70fbb5f7a6 ulrewrite: fix typos, add full package paths 2016-03-28 22:59:22 -07:00
Ilya Kreymer
f12be3bc91 urlrewrite app: add bottle-based app, templateview separate from pywb webapp framework 2016-03-27 17:34:45 -04:00
Ilya Kreymer
017e9802f8 tests: fix fakeredis patch not running on test_handlers,
use exc str instead of repr for error message for consistency
all tests pass on py2 and py3 again!
2016-03-26 22:32:21 -04:00
Ilya Kreymer
0399cc1046 webagg app: support bottle debug properly as opt param 2016-03-26 22:30:47 -04:00
Ilya Kreymer
7884d4394b recorder: close_file() by params rather than exact path, update tests 2016-03-26 13:07:53 -04:00
Ilya Kreymer
7deba42851 add urlrewrite pywb-adapter PlatformHandler for using traditional pywb
setup with webrecorder components recorder and webagg
2016-03-24 16:33:03 -04:00
Ilya Kreymer
2bfe5d4f9e inputreq: only use REQUEST_URI if no SCRIPT_NAME is set (otherwise reconstruct the path) 2016-03-24 16:17:46 -04:00
Ilya Kreymer
b6e988d9a1 self-redirect: if 'status' is a 3xx, call raise_on_self_redirect() to check Location for exact url redirect.
supports both WARC and live loaders, addresses #1
2016-03-24 16:08:29 -04:00
Ilya Kreymer
61921d6c4a tests: add FakeRedisTests class mixin for patching in FakeRedis for tests 2016-03-24 10:45:48 -04:00
Ilya Kreymer
7cc772329c redis: add tests for RedisMultiKeyIndexSource 2016-03-24 10:44:14 -04:00
Ilya Kreymer
64b32dc57a redis support: add RedisMultiKeyIndexSource for using redis SCAN wildcard query and aggregate results from several
redis keys
2016-03-24 01:17:18 -04:00
Ilya Kreymer
e5ddf9d4f4 utils: res_template() supports extra params for interpolation 2016-03-23 23:58:49 -04:00
Ilya Kreymer
ba66d0bb5e recorder: use res_template() to resolve params, rename indexing method to add_urls_to_index 2016-03-23 23:55:21 -04:00
Ilya Kreymer
aa80cd6881 recorder: add simple recorder config indexing to redis 2016-03-21 11:50:01 -07:00
Ilya Kreymer
d38bb5a1fd filters: add extensible 'skip filters', with default filters to accept certain collections, filter out
recording of range requests. Opportunity to skip recording at request or response time
RespWrapper handles reading stream fully on close() (no need for old ReadFullyStream),
skips recording if read was interrupted/incomplete
writer: avoiding writing duplicate content-length/content-type headers
2016-03-21 11:47:12 -07:00
Ilya Kreymer
cbe7d1c981 webagg: add tests for RedisPathResolver and errors on missing warc, missing warc keys 2016-03-21 11:44:32 -07:00
Ilya Kreymer
22ead52604 webagg: convert StreamIter to generate, remove unused ReadFullyStream
loaders: add support for RedisResolver as well as PathPrefixResolver
inputreq: reconstruct_request() skips host header if already present
improve test app to include replay
2016-03-21 11:04:52 -07:00
Ilya Kreymer
4cf935abd1 directory agg: add CacheDirectoryAggregator to cache file listing, rescan dir only if changed 2016-03-19 20:34:09 -07:00
Ilya Kreymer
f5ee3c7bca inputreq: add reconstruct_request() to return a bytestring of the request, add test for inputreq 2016-03-19 20:32:37 -07:00
Ilya Kreymer
c96e419341 recorder: ensure filename is also tracked by the indexer, add tests
for redis file mapping
2016-03-19 10:24:28 -07:00
Ilya Kreymer
3452cf39e0 recorder: use more general MultiFileWARCWriter, supporting both keeping file open
and one-warc-per record use cases
2016-03-18 21:40:41 -07:00
Ilya Kreymer
e81457df5f rename WARCRecorder -> WARCWriter, add optional max_size to single warc recorder
per-record recorder combines http response/req into single file
2016-03-18 19:49:14 -07:00
Ilya Kreymer
b64be0dff1 recorder: add tests for single file writer, including file locking
dedup policy: support customizable dedup/skip/write policy plugins and add tests
2016-03-18 15:28:24 -07:00
Ilya Kreymer
cba8e4ee3a filters: more functional filter impl for header exclusion 2016-03-17 18:22:26 -07:00
Ilya Kreymer
58e8c709aa docker: add initial docker-compose, webagg Dockerfile 2016-03-16 18:42:15 -07:00
Ilya Kreymer
8dc59ef6bd webagg: add test for live server config 2016-03-13 16:53:39 -07:00
Ilya Kreymer
06978bd8d2 recorder: check for empty input stream (support for direct proxy?) 2016-03-13 11:17:52 -07:00
Ilya Kreymer
709d2b1ea2 reorg: move StreamIter to utils 2016-03-12 23:29:23 -08:00
Ilya Kreymer
7a828017d1 recorder: clean up logging, ReadFullyStream moves to utils, get_request_uri to inputreq 2016-03-12 22:18:01 -08:00
Ilya Kreymer
49b6ae78a8 live loader: remove liverec (doesn't work well with gevent), use regular requests
instead of overriden version.
reconstruct header block from httplib header pairs list
move ReadFullyStream to utils
2016-03-12 22:15:24 -08:00
Ilya Kreymer
9adb8da3b7 recorder: add support for filtering collections to record by regex (default: .*)
add support for excluding certain headers when writing WARCs
tests: add first batch of tests for recorder, using live upstream server
2016-03-11 11:12:25 -08:00