mirror of
https://github.com/webrecorder/pywb.git
synced 2025-03-15 08:04:49 +01:00
* docs work: - write warcserver and beginnings of recorder docs! - add cdx api docs! - add indexing docs - refactor architecture section, remove readme - update readme with better new features list, work-in-progress list - add placeholder docs for apps, indexing - remove unused readme - update README with better docs link, features
52 lines
1.6 KiB
ReStructuredText
52 lines
1.6 KiB
ReStructuredText
Recorder
|
|
========
|
|
|
|
The recorder component acts a proxy component, intercepting requests to and response from the :ref:`warcserver` and recording them
|
|
to a WARC file on disk.
|
|
|
|
The recorder uses the :class:`pywb.recorder.multifilewarcwriter.MultiFileWARCWriter` which extends the base :class:`warcio.warcwriter.WARCWriter` from :mod:`warcio` and provides support for:
|
|
|
|
* appending to multiple WARC files at once
|
|
|
|
* WARC 'rollover' based on maximum size idle time
|
|
|
|
* indexing (CDXJ) on write
|
|
|
|
|
|
Many of the features of the Recorder are created for use with Webrecorder project, although the core recorder is used to provide
|
|
a basic recording via ``/record/`` endpoint. (See: :ref:`recording-mode`)
|
|
|
|
|
|
Deduplication Filters
|
|
---------------------
|
|
|
|
The core recorder class provides for optional deduplication using the :class:`pywb.recorder.redisindexer.WritableRedisIndexer` class which requires Redis to store the index, and can be used to either:
|
|
|
|
* write duplicates responses.
|
|
|
|
* write ``revisit`` records.
|
|
|
|
* ignore duplicates and don't write to WARC.
|
|
|
|
|
|
Custom Filtering
|
|
----------------
|
|
|
|
The recorder filter system also includes a filtering system to allow for not writing certain requests and responses.
|
|
Filters include:
|
|
|
|
* Skipping by regex applied to source (``Warcserver-Source-Coll`` header from Warcserver)
|
|
|
|
* Skipping if ``Recorder-Skip: 1`` header is provided
|
|
|
|
* Skipping if ``Range`` request header is provided
|
|
|
|
* Filtering out certain HTTP headers, for example, http-only cookies
|
|
|
|
The additional recorder functionality will be enchanced in a future version.
|
|
|
|
For a more detailed examples, please consult the tests in :mod:`pywb.recorder.test.test_recorder`
|
|
|
|
|
|
|