Ilya Kreymer
custom record: don't override WARC-Date if provided in request header,
return chosen WARC-Date in json response
2016-07-26 19:41:47 -04:00
Ilya Kreymer
recorder: support overridings get_params() in subclass
multiwarcwriter: support multiple warcs in same dir, support random component in path, and a custom
key template for selecting current warc file, not related to current directory
2016-06-07 12:55:04 -04:00
Ilya Kreymer
video loader support: add VideoLoader, which uses youtube-dl to create a metadata record
of video info. Activated with explicit content_type param 'application/'
2016-05-28 15:01:33 -07:00
Ilya Kreymer
recorder put custom record: add support for put/post of a custom record. If put_record=
param is included, the request body
is written to the specified record type.
move record creation functions to the warcwriter
add tests for custom record
2016-05-26 20:49:40 -07:00
Ilya Kreymer
recorder warcwriter: allow skipping writing of only request or only response by overriding _is_write_req and _is_write_resp in subclass
(todo: rethink the interface)
2016-04-15 02:19:34 +00:00
Ilya Kreymer
recorder: close_file() by params rather than exact path, update tests
2016-03-26 13:07:53 -04:00
Ilya Kreymer
filters: add extensible 'skip filters', with default filters to accept certain collections, filter out
recording of range requests. Opportunity to skip recording at request or response time
RespWrapper handles reading stream fully on close() (no need for old ReadFullyStream),
skips recording if read was interrupted/incomplete
writer: avoiding writing duplicate content-length/content-type headers
2016-03-21 11:47:12 -07:00
Ilya Kreymer
rename WARCRecorder -> WARCWriter, add optional max_size to single warc recorder
per-record recorder combines http response/req into single file
2016-03-18 19:49:14 -07:00
Ilya Kreymer
recorder: add tests for single file writer, including file locking
dedup policy: support customizable dedup/skip/write policy plugins and add tests
2016-03-18 15:28:24 -07:00
Ilya Kreymer
recorder: check for empty input stream (support for direct proxy?)
2016-03-13 11:17:52 -07:00
Ilya Kreymer
reorg: move StreamIter to utils
2016-03-12 23:29:23 -08:00
Ilya Kreymer
recorder: clean up logging, ReadFullyStream moves to utils, get_request_uri to inputreq
2016-03-12 22:18:01 -08:00
Ilya Kreymer
recorder: add support for filtering collections to record by regex (default: .*)
add support for excluding certain headers when writing WARCs
tests: add first batch of tests for recorder, using live upstream server
2016-03-11 11:12:25 -08:00
Ilya Kreymer
add recorder app, initial pass!
2016-03-09 14:33:36 -08:00