1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00

12 Commits

Author SHA1 Message Date
Ilya Kreymer
32c1e6c85b
Brotli: Don't accept brotli if library can't be loaded. (#444)
* brotli: if the brotli module can not be loaded, print warning
and also remove `br` from any Accept-Encoding header to avoid recording with brotli, addresses #434
2019-02-19 17:19:24 -08:00
eszense
6a2423e754 Add recorder option to filter source collection (#368)
* Add source_filter option to recorder.

* Add test and docs for source_filter option.

* Update test_record_replay.py - Split source_filter test into skip existing and new recording
2018-08-24 17:57:47 -07:00
Ilya Kreymer
bef63b4c6c
Local httpbin tests + LiveIndexSource improvement (#318)
tests and LiveIndexSource improvements:
- run local instance of httpbin in separate gevent server for any httpbin.org requests
- LiveIndexSource: has overridable get_load_url(), also use 'load_url' for HEAD check, remove unused proxy_url
- test update: add HttpBinLiveTests which patches LiveIndexSource.get_load_url() to redirect httpbin requests to local instance
- test update: just use httpbin.org/get instead of httpbin.org/anything, unsupported in older version (0.5.0) require for windows support
- setup: add 'httpbin==0.5.0' to test requires, remove jinja2 pin to old version
2018-04-28 18:20:37 -07:00
Ilya Kreymer
8d9951bc7b misc test fixes: make record_replay tests for consistent, use different url to ensure consistent ordering
fakeredistests: fix for fakenewredis, clear fakeredis databases and pubsub list
2018-03-29 21:43:37 -07:00
Ilya Kreymer
a954a5470f HEAD requests: fix pywb recording & replay of HEAD requests (force payload of 0 instead of content-length if HEAD request from live web)
tests: fix socks-proxy test to fast-fail to a random unused port to detect proxy hook is enabled
2018-01-29 16:34:25 -08:00
Ilya Kreymer
459cd706d3 include the collection in Memento Link outputs: (#259)
* include the collection in Memento Link outputs:
- add new cdx 'source-coll' field, storing only the collection
- ensure rel="collection" property included in the TimeMap and Link header
- tests: update all tests to include the 'source-coll' property
- docs: add 'collection provenance' to auto-all collection configuration docs
2017-10-23 15:33:23 -07:00
Ilya Kreymer
1bfba09c94 config: proxy and recorder improvements
- proxy and recorder config loaded from 'proxy' and 'recorder' string or dicts in config
- proxy settings loaded from config, wsgiproxmiddleware applied within main init path
- cli --proxy-record add to indicate recording, optional dict to set options
- optional recorder dict to configure other recorder options, file max_size, filename_template, etc..
- proxy tests: add proxy cli tests
- recorder tests: add recorder custom config test
2017-10-02 15:54:08 -07:00
Ilya Kreymer
924b983a8f dyn collection and all coll improvements: (#69)
support dynamic collections, all collection with remote archives (eg. s3:// paths)
- warcserver: allow custom dynamic collections index and archive path templates via 'dyn_index_path' and 'dyn_archive_path'
- pathresolver: allow resolving wildcard path prefixes with collection, to support remote paths and avoid globbing
- warcserver: don't add fixed collections dir to source to support resolving wildcard
- pathresolver: add wildcard resolving s3 path test
- referrer unrewrite: ensure referrer not empty
2017-09-29 04:20:51 +00:00
Ilya Kreymer
02f8fa9ff3 windows: fix file path to/from file:// url conversion, add
from_file_url() and use to_file_url() more consistently
resolvers: make_best_resolver() handles file:// urls, but not
PrefixResolver itself
2017-09-28 08:37:04 -07:00
Ilya Kreymer
a870f7e91a memento timemap and test improvements:
- windows: fix paths for pathresolver test on windows
- timemap: add tests for all collection timemap, add cdxj timemap test
- timemap: only add original, timegate links for 'link' timemap
2017-09-28 07:15:58 -07:00
Ilya Kreymer
a32c6f089c auto-all aggregate collection support: (#69)
- enabled with 'all_coll' in config or --all-coll cli option, eg. --all-coll all to enable
- supported for replay, timemap and cdx endpoints, uses wildcard '*' for coll name with directory aggregator
- tests: record/replay tests updated to replay via all collection, check all collection cdxj
2017-09-28 02:08:31 -07:00
Ilya Kreymer
93921aadb7 Recorder App Support (#241)
recording support: now available for dynamic collections via config
- config.yaml 'recorder: live' entry enables /record/ subpath which records to any dynamic collections (can record from any collection, though usually live)
- autoindex refactor: simplified, standalone AutoIndexer() -- indexes any changed warc files to autoindex.cdxj
- windows autoindex support: also check for changed file size, as last modified time may not be changing
- manager: remove autoindex, now part of main cli
- tests: updated test_auto_colls with autoindex changes
- tests: add record/replay tests for recording and replay
2017-09-21 22:12:57 -07:00