1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00
pywb/docs/manual/recorder.rst

54 lines
1.6 KiB
ReStructuredText
Raw Permalink Normal View History

.. _recorder:
Recorder
========
The recorder component acts a proxy component, intercepting requests to and response from the :ref:`warcserver` and recording them
to a WARC file on disk.
The recorder uses the :class:`pywb.recorder.multifilewarcwriter.MultiFileWARCWriter` which extends the base :class:`warcio.warcwriter.WARCWriter` from :mod:`warcio` and provides support for:
* appending to multiple WARC files at once
* WARC 'rollover' based on maximum size idle time
* indexing (CDXJ) on write
Many of the features of the Recorder are created for use with Webrecorder project, although the core recorder is used to provide
a basic recording via ``/record/`` endpoint. (See: :ref:`recording-mode`)
Deduplication Filters
---------------------
The core recorder class provides for optional deduplication using the :class:`pywb.recorder.redisindexer.WritableRedisIndexer` class which requires Redis to store the index, and can be used to either:
* write duplicates responses.
* write ``revisit`` records.
* ignore duplicates and don't write to WARC.
Custom Filtering
----------------
The recorder filter system also includes a filtering system to allow for not writing certain requests and responses.
Filters include:
* Skipping by regex applied to source (``Warcserver-Source-Coll`` header from Warcserver)
* Skipping if ``Recorder-Skip: 1`` header is provided
* Skipping if ``Range`` request header is provided
* Filtering out certain HTTP headers, for example, http-only cookies
The additional recorder functionality will be enhanced in a future version.
For a more detailed examples, please consult the tests in :mod:`pywb.recorder.test.test_recorder`