1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-04-03 20:45:44 +02:00

10 Commits

Author SHA1 Message Date
Ilya Kreymer
93921aadb7 Recorder App Support (#241)
recording support: now available for dynamic collections via config
- config.yaml 'recorder: live' entry enables /record/ subpath which records to any dynamic collections (can record from any collection, though usually live)
- autoindex refactor: simplified, standalone AutoIndexer() -- indexes any changed warc files to autoindex.cdxj
- windows autoindex support: also check for changed file size, as last modified time may not be changing
- manager: remove autoindex, now part of main cli
- tests: updated test_auto_colls with autoindex changes
- tests: add record/replay tests for recording and replay
2017-09-21 22:12:57 -07:00
Ilya Kreymer
772993ba53 Adaptive Streaming Improvements (#236)
* adaptive rewrite improvements:
- Add 'application/vnd.apple.mpegurl' as HLS type in rules.yaml and default_rewriter.py
- Support setting max resolution and max bandwidth to choose, defaults to 480x854 and 200000 respectively
- LiveWebLoader provides a get_custom_metadata for specifying WARC-JSON-Metadata header, per mime type (TODO: support customization via rules)
- When filtering, first limiting by resolution (if set), then by bandwidth (if set), otherwise default to max bandwidth
- Max resoluton/max bandwidth stored in WARC record under WARC-JSON-Metadata as 'adaptive_max_resolution' and 'adaptive_max_bandwidth' to ensure replayability. If absent, choose absolute max in manifest to be backwards compatible
- Add sample HLS and DASH manifests for testing, with and without max resolution/bandwidth settings.
2017-09-06 23:23:39 -07:00
Ilya Kreymer
36abd032ce warcserver: logging: use 'warcserver' logger for index and response load errors
wbmementoindexsource: use timegate_url for initial head query to allow for different urls (proxy, etc..)
2017-07-03 23:25:25 -07:00
Ilya Kreymer
84eb070938 warcserver: support different default adapters, for live web and remote sources
warcserver.http.DefaultAdapters.live_adapter used if is_live, else DefaultAdapters.remote_adapter
tests: fix test to ignore order in dir listing
2017-07-02 03:58:55 +00:00
Ilya Kreymer
dd7c1bd752 warcserver: define default HTTPAdapter in warcserver.http.default_adapter, for use with both index sources and responseloader
responseloader uses existing pool from shared HTTPAdapter
fix tests: call_release_conn() checks if release_conn() exists before calling, else default to close()
2017-06-29 22:33:16 -07:00
Ilya Kreymer
1bd8a85a4d mementoindexsource: add 'connection: close' to ensure connection closed after memento timegate query!
io utils: StreamIter() supports custom closer
responseloader: use release_conn() instead of close() to recycle urllib3 connections!
2017-06-29 20:03:42 -07:00
Ilya Kreymer
d12f715d81 refactor: split warcserver.utils into utils package:
- utils.io for stream/compression related utils
- utils.format for string formatting
- utils.memento for memento
- load_config -> utils.loaders.load_overlay_config
- also: use warcio.utils.to_native_str instead of utils.loaders.to_native_str
2017-06-05 17:43:46 -07:00
Ilya Kreymer
dbc56b864b Merge branch 'aggregator-improvements' into refactor2 2017-06-02 21:33:23 -07:00
Ilya Kreymer
5930b2acb3 provenance improvement: don't store source id as provenance,
instead write full url to WARC-Recorded-From-URI, current datetime to WARC-Recorded-On-Date
warcwriter: ensure WARC-Recorded-* headers copied to request record as well
2017-05-25 13:26:17 -07:00
Ilya Kreymer
ad33dc6728 refactor: webagg -> warcserver rename
- ResAggApp -> BaseWarcServer
- AutoApp -> WarcServer
- move index related files to warcserver.index package, tests to warcserver.index.test
- move resource loading related files to warcserver.resource package, tests to warcserver.resource.test
- pywb.cdx -> pywb.warcserver.index
- split pywb.warc -> pywb.warcserver.resource or pywb.indexer (for cdx generation)
- bump to 0.51.0 for now!
- tests for pywb.warcserver should be working
2017-05-23 09:21:43 -07:00