Ilya Kreymer
d7c74b68de
video loader support: add VideoLoader, which uses youtube-dl to create a metadata record
...
of video info. Activated with explicit content_type param 'application/vnd.youtube-dl_formats+json'
2016-05-28 15:01:33 -07:00
Ilya Kreymer
30f9d0aca7
recorder put custom record: add support for put/post of a custom record. If put_record=
param is included, the request body
...
is written to the specified record type.
move record creation functions to the warcwriter
add tests for custom record
2016-05-26 20:49:40 -07:00
Ilya Kreymer
45c8fcddbd
recorder: add max_idle_secs / close_idle_files() to close any open files that have not been modified longer than set threshold, in prep for webrecorder/webrecorder#92
...
indexer: add 'full_warc_prefix' for setting full path prefix in add_warc_file() (eg. for http load) for webrecorder/webrecorder#95
2016-05-11 21:40:02 -07:00
Ilya Kreymer
94d6098238
app: separate json_encode() func
...
compat: py2 fixes
2016-05-11 11:38:59 -07:00
Ilya Kreymer
228ca58c5b
recorer: actually fix content-type on warcinfo, add to test!
2016-04-30 13:07:53 -07:00
Ilya Kreymer
0fbae1c7f8
recorder: ensure warcinfo record has a content-type
2016-04-30 10:19:20 -07:00
Ilya Kreymer
0b255819ff
recorder warcwriter: allow skipping writing of only request or only response by overriding _is_write_req and _is_write_resp in subclass
...
(todo: rethink the interface)
2016-04-15 02:19:34 +00:00
Ilya Kreymer
d40edfc22d
warcwriter: add create_warcinfo_record() for creating a warcinfo and a SimpleTempWARCWriter for writing records to temp buff/file
2016-04-03 12:19:54 -07:00
Ilya Kreymer
01c21d3a43
recorder: redis indexer accepts arg list, supports separate redis and key_template args
...
add length param to add_urls_to_index() in redis indexer, return cdx list
2016-04-02 21:36:36 -07:00
Ilya Kreymer
7884d4394b
recorder: close_file() by params rather than exact path, update tests
2016-03-26 13:07:53 -04:00
Ilya Kreymer
ba66d0bb5e
recorder: use res_template() to resolve params, rename indexing method to add_urls_to_index
2016-03-23 23:55:21 -04:00
Ilya Kreymer
d38bb5a1fd
filters: add extensible 'skip filters', with default filters to accept certain collections, filter out
...
recording of range requests. Opportunity to skip recording at request or response time
RespWrapper handles reading stream fully on close() (no need for old ReadFullyStream),
skips recording if read was interrupted/incomplete
writer: avoiding writing duplicate content-length/content-type headers
2016-03-21 11:47:12 -07:00
Ilya Kreymer
c96e419341
recorder: ensure filename is also tracked by the indexer, add tests
...
for redis file mapping
2016-03-19 10:24:28 -07:00
Ilya Kreymer
3452cf39e0
recorder: use more general MultiFileWARCWriter, supporting both keeping file open
...
and one-warc-per record use cases
2016-03-18 21:40:41 -07:00
Ilya Kreymer
e81457df5f
rename WARCRecorder -> WARCWriter, add optional max_size to single warc recorder
...
per-record recorder combines http response/req into single file
2016-03-18 19:49:14 -07:00