1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-04-01 19:51:28 +02:00

13 Commits

Author SHA1 Message Date
Ilya Kreymer
bcb5bef39d Windows Build Fixes/Appveyor CI (#225)
windows build fixes: all tests should pass, ci with appveyor
- add appveyor.yml
- path fixes for windows, use os.path.join
- templates_dir: use '/' always for jinja2 paths
- auto colls: ensure chdir before deleting dir
- recorder: ensure warc writer is always closed
- recorder: disable locking in warcwriter on windows for now (read access not avail, shared
lock seems to not be working)
- zipnum: ensure block is closed after read!
- cached dir test: wait before adding file
- tests: adjust timeout tests to allow more leeway in timing
2017-08-05 17:12:16 -07:00
Ilya Kreymer
d12f715d81 refactor: split warcserver.utils into utils package:
- utils.io for stream/compression related utils
- utils.format for string formatting
- utils.memento for memento
- load_config -> utils.loaders.load_overlay_config
- also: use warcio.utils.to_native_str instead of utils.loaders.to_native_str
2017-06-05 17:43:46 -07:00
Ilya Kreymer
dbc56b864b Merge branch 'aggregator-improvements' into refactor2 2017-06-02 21:33:23 -07:00
Ilya Kreymer
06b1134be5 aggregator: support 'invert_sources' option to exclude source list, rather than include
can be set explicitly or via '!' on the sources list
tests: test invert sources
filters: include params to skip_response() filter
warc headers: change headers for recording from other source to: WARC-Source-URI and WARC-Created-Date
2017-06-01 07:45:02 -07:00
Ilya Kreymer
5930b2acb3 provenance improvement: don't store source id as provenance,
instead write full url to WARC-Recorded-From-URI, current datetime to WARC-Recorded-On-Date
warcwriter: ensure WARC-Recorded-* headers copied to request record as well
2017-05-25 13:26:17 -07:00
Ilya Kreymer
630911ef23 provenance improvement: don't store source id as provenance,
instead write full url to WARC-Recorded-From-URI, current datetime to WARC-Recorded-On-Date
warcwriter: ensure WARC-Recorded-* headers copied to request record as well
2017-05-25 13:14:10 -07:00
Ilya Kreymer
2907ed01c8 refactor:
- fix pywb.indexer, pywb.manager, pywb.recorder packages, tests pass
rename geventeventserver -> pywb.utils
move extract_post_query/append_post_query to inputrequest.PostQueryExtractor
remove to_native_str() in pywb.utils, redundant with warcio.utils version
remove obsolete readme, dockerfile
2017-05-23 16:43:41 -07:00
Ilya Kreymer
685804919a aggregator improvements:
- support for 'WARC-Provenance' header added to response
- aggregator supports source collection: if 'name:coll', coll parsed out and stored in 'param.<name>.src_coll' field,
available for use in remote index, included in provenance
- remoteindexsource: support interpolating '{src_coll}' in api_url and replay_url to allow handling src_coll
- recorder: CollectionFilter supports dict of prefixes to filter regexs, and catch-all '*' prefix
- recorder: provenance written to paired request record
- rename: ProxyIndexSource -> UpstreamIndexSource to avoid confusion with actual proxy
- autoapp: register_source() supports adding source classes at beginning of list
2017-05-21 05:37:58 +00:00
Ilya Kreymer
331320b17a aggregator improvements:
- support for 'WARC-Provenance' header added to response
- aggregator supports source collection: if 'name:coll', coll parsed out and stored in 'param.<name>.src_coll' field,
available for use in remote index, included in provenance
- remoteindexsource: support interpolating '{src_coll}' in api_url and replay_url to allow handling src_coll
- recorder: CollectionFilter supports dict of prefixes to filter regexs, and catch-all '*' prefix
- recorder: provenance written to paired request record
- rename: ProxyIndexSource -> UpstreamIndexSource to avoid confusion with actual proxy
- autoapp: register_source() supports adding source classes at beginning of list
2017-05-20 09:50:26 -07:00
Ilya Kreymer
147c3217dd update to warcio==1.3
recorder: use ArcWarcRecordLoader() for parsing response record
multifilewarcwriter: ensure digest is computed before trying to lookup revisits
2017-05-01 21:50:39 -07:00
Ilya Kreymer
d04f8fc2e3 recorder: cookie filter:
- update ExcludeSpecificHeaders() to be passed directly as a filter to warcio
- add ExcludeHttpOnlyCookiesHeader() to exclude only Set-Cookie if HttpOnly is present
remove unused code
2017-03-10 10:07:13 -08:00
Ilya Kreymer
0784e4e5aa spin-off warcio!
update imports to point to warcio
warcio rename fixes:
- ArcWarcRecord.stream -> raw_stream
- ArcWarcRecord.status_headers -> http_headers
- ArchiveLoadFailed single param init
2017-03-07 10:58:00 -08:00
Ilya Kreymer
1213466afb warc & recorder refactor: split BaseWARCWriter from MultiWARCWriter, move to warc/warcwriter.py, recorder/multifilewarcwriter.py
split indexing functionality from base warc iterator, move to archiveindexer.py
2017-03-01 14:18:44 -08:00