1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

84 Commits

Author SHA1 Message Date
Ilya Kreymer
80d9805a58 webagg: tests: flush fakeredis for reentrancy
utils: add load_config() with option for main and override configs
2016-05-19 17:01:09 -07:00
Ilya Kreymer
45c8fcddbd recorder: add max_idle_secs / close_idle_files() to close any open files that have not been modified longer than set threshold, in prep for webrecorder/webrecorder#92
indexer: add 'full_warc_prefix' for setting full path prefix in add_warc_file() (eg. for http load) for webrecorder/webrecorder#95
2016-05-11 21:40:02 -07:00
Ilya Kreymer
94d6098238 app: separate json_encode() func
compat: py2 fixes
2016-05-11 11:38:59 -07:00
Ilya Kreymer
c45f5cb749 webagg: use werkzeug routing instead of wrapping Bottle app 2016-05-10 16:31:44 -07:00
Ilya Kreymer
464eca2fa0 test apps: enable debugging for test apps
test recorder: write to a temp dir for each run
2016-05-06 16:33:18 -07:00
Ilya Kreymer
e64ae780c6 urlrewrite: improve POST request support for ikreymer/pywb#178 2016-05-06 16:32:13 -07:00
Ilya Kreymer
ab3af90df2 cookie_tracker: add support for redis-based subdomain cookie tracker, which temp caches cookies with Domain= set in redis and passes them upstream
when rewriting. addresses webrecorder/webrecorder#79
2016-05-04 16:39:47 -07:00
Ilya Kreymer
228ca58c5b recorer: actually fix content-type on warcinfo, add to test! 2016-04-30 13:07:53 -07:00
Ilya Kreymer
0fbae1c7f8 recorder: ensure warcinfo record has a content-type 2016-04-30 10:19:20 -07:00
Ilya Kreymer
7a0dd463cd webagg: responseloader: use urllib3 directly instead of requests to
take advantage of connection pooling w/o storing/sharing cookies
2016-04-27 10:16:54 -07:00
Ilya Kreymer
9010e52663 urlrewrite: refactor simpleapp to support live/record/replay 2016-04-27 10:15:48 -07:00
Ilya Kreymer
f119d05724 recorder: fix simplerec init
tests: improve tests for skipping request and response headers
2016-04-27 09:52:56 -07:00
Ilya Kreymer
a82e2785c7 tests: add basic test for rewriterapp 2016-04-25 14:29:28 -07:00
Ilya Kreymer
3b6cab1730 urlrewrite: remove dependency on bottle from rewriterapp,
add overridable error and query views, with extensible get_query_params() and process_cdx_query()
to extend cdx for query view
add get_top_url() for adding custom top_url for frame insert
add call_with_params() for adding custom params to environ
2016-04-25 12:05:43 -07:00
Ilya Kreymer
b056acd88e urlrewrite: add support for index query 2016-04-15 04:01:36 +00:00
Ilya Kreymer
0370470e68 urlrewrite: http range: support skipping record for range requests not starting at 0-
and performing async request,
support converting unbounded 0- to non-ranged and back
2016-04-15 02:21:39 +00:00
Ilya Kreymer
0b255819ff recorder warcwriter: allow skipping writing of only request or only response by overriding _is_write_req and _is_write_resp in subclass
(todo: rethink the interface)
2016-04-15 02:19:34 +00:00
Ilya Kreymer
a93f75dca2 webagg: add preliminary 'fuzzy matching' fallback support, currently enabled for all sources
(todo: need to only include sources that support it)
2016-04-15 02:18:20 +00:00
Ilya Kreymer
00bdddd1e9 recorder: SkipDupePolicy only skips if url is an exact match (not just by urlkey) 2016-04-07 10:44:05 -07:00
Ilya Kreymer
f4cc143dc7 urlrewrite: generalize support for overridable handle_custom_response() callback for handling modifiers (default support top-frame)
pass headers to add_custom_params, include error message on error if available
headers: use add_header() to support multiple headers with same name
is_ajax(): check for X-Pywb-Requested-With header to make as ajax and not pass to upstream
2016-04-07 10:39:12 -07:00
Ilya Kreymer
fa5d5e6bcc urlrewrite templates: add get_top_frame_params() callback for adding custom params for top frame,
also inject env['webrec.template_params'] if set
2016-04-05 02:45:00 -07:00
Ilya Kreymer
d40edfc22d warcwriter: add create_warcinfo_record() for creating a warcinfo and a SimpleTempWARCWriter for writing records to temp buff/file 2016-04-03 12:19:54 -07:00
Ilya Kreymer
fd76030cb3 urlrewriter: allow passing in existing jinja_env wrapper 2016-04-02 21:36:54 -07:00
Ilya Kreymer
01c21d3a43 recorder: redis indexer accepts arg list, supports separate redis and key_template args
add length param to add_urls_to_index() in redis indexer, return cdx list
2016-04-02 21:36:36 -07:00
Ilya Kreymer
6157cebcc9 testutils: when mock patching FakeStrictRedis, use a subclass with a shared pubsub (to match real redis) 2016-04-02 21:33:39 -07:00
Ilya Kreymer
ddee9236c6 webagg: rename key_prefix -> key_template 2016-04-02 21:33:23 -07:00
Ilya Kreymer
70fbb5f7a6 ulrewrite: fix typos, add full package paths 2016-03-28 22:59:22 -07:00
Ilya Kreymer
f12be3bc91 urlrewrite app: add bottle-based app, templateview separate from pywb webapp framework 2016-03-27 17:34:45 -04:00
Ilya Kreymer
017e9802f8 tests: fix fakeredis patch not running on test_handlers,
use exc str instead of repr for error message for consistency
all tests pass on py2 and py3 again!
2016-03-26 22:32:21 -04:00
Ilya Kreymer
0399cc1046 webagg app: support bottle debug properly as opt param 2016-03-26 22:30:47 -04:00
Ilya Kreymer
7884d4394b recorder: close_file() by params rather than exact path, update tests 2016-03-26 13:07:53 -04:00
Ilya Kreymer
7deba42851 add urlrewrite pywb-adapter PlatformHandler for using traditional pywb
setup with webrecorder components recorder and webagg
2016-03-24 16:33:03 -04:00
Ilya Kreymer
2bfe5d4f9e inputreq: only use REQUEST_URI if no SCRIPT_NAME is set (otherwise reconstruct the path) 2016-03-24 16:17:46 -04:00
Ilya Kreymer
b6e988d9a1 self-redirect: if 'status' is a 3xx, call raise_on_self_redirect() to check Location for exact url redirect.
supports both WARC and live loaders, addresses #1
2016-03-24 16:08:29 -04:00
Ilya Kreymer
61921d6c4a tests: add FakeRedisTests class mixin for patching in FakeRedis for tests 2016-03-24 10:45:48 -04:00
Ilya Kreymer
7cc772329c redis: add tests for RedisMultiKeyIndexSource 2016-03-24 10:44:14 -04:00
Ilya Kreymer
64b32dc57a redis support: add RedisMultiKeyIndexSource for using redis SCAN wildcard query and aggregate results from several
redis keys
2016-03-24 01:17:18 -04:00
Ilya Kreymer
e5ddf9d4f4 utils: res_template() supports extra params for interpolation 2016-03-23 23:58:49 -04:00
Ilya Kreymer
ba66d0bb5e recorder: use res_template() to resolve params, rename indexing method to add_urls_to_index 2016-03-23 23:55:21 -04:00
Ilya Kreymer
aa80cd6881 recorder: add simple recorder config indexing to redis 2016-03-21 11:50:01 -07:00
Ilya Kreymer
d38bb5a1fd filters: add extensible 'skip filters', with default filters to accept certain collections, filter out
recording of range requests. Opportunity to skip recording at request or response time
RespWrapper handles reading stream fully on close() (no need for old ReadFullyStream),
skips recording if read was interrupted/incomplete
writer: avoiding writing duplicate content-length/content-type headers
2016-03-21 11:47:12 -07:00
Ilya Kreymer
cbe7d1c981 webagg: add tests for RedisPathResolver and errors on missing warc, missing warc keys 2016-03-21 11:44:32 -07:00
Ilya Kreymer
22ead52604 webagg: convert StreamIter to generate, remove unused ReadFullyStream
loaders: add support for RedisResolver as well as PathPrefixResolver
inputreq: reconstruct_request() skips host header if already present
improve test app to include replay
2016-03-21 11:04:52 -07:00
Ilya Kreymer
4cf935abd1 directory agg: add CacheDirectoryAggregator to cache file listing, rescan dir only if changed 2016-03-19 20:34:09 -07:00
Ilya Kreymer
f5ee3c7bca inputreq: add reconstruct_request() to return a bytestring of the request, add test for inputreq 2016-03-19 20:32:37 -07:00
Ilya Kreymer
c96e419341 recorder: ensure filename is also tracked by the indexer, add tests
for redis file mapping
2016-03-19 10:24:28 -07:00
Ilya Kreymer
3452cf39e0 recorder: use more general MultiFileWARCWriter, supporting both keeping file open
and one-warc-per record use cases
2016-03-18 21:40:41 -07:00
Ilya Kreymer
e81457df5f rename WARCRecorder -> WARCWriter, add optional max_size to single warc recorder
per-record recorder combines http response/req into single file
2016-03-18 19:49:14 -07:00
Ilya Kreymer
b64be0dff1 recorder: add tests for single file writer, including file locking
dedup policy: support customizable dedup/skip/write policy plugins and add tests
2016-03-18 15:28:24 -07:00
Ilya Kreymer
cba8e4ee3a filters: more functional filter impl for header exclusion 2016-03-17 18:22:26 -07:00