docs updates

This commit is contained in:
Barbara Miller 2019-06-13 17:18:51 -07:00
parent d133565061
commit 8c52bd8442
2 changed files with 10 additions and 7 deletions

View File

@ -89,12 +89,13 @@ for deduplication works similarly to deduplication by `Heritrix
4. If not found,
a. Write ``response`` record with full payload
b. Store new entry in deduplication database
b. Store new entry in deduplication database (can be disabled, see
`Warcprox-Meta HTTP request header <api.rst#warcprox-meta-http-request-header>`
The deduplication database is partitioned into different "buckets". URLs are
deduplicated only against other captures in the same bucket. If specified, the
``dedup-bucket`` field of the `Warcprox-Meta HTTP request header
<api.rst#warcprox-meta-http-request-header>`_ determines the bucket. Otherwise,
``dedup-buckets`` field of the `Warcprox-Meta HTTP request header
<api.rst#warcprox-meta-http-request-header>`_ determines the bucket(s). Otherwise,
the default bucket is used.
Deduplication can be disabled entirely by starting warcprox with the argument

10
api.rst
View File

@ -137,14 +137,16 @@ Example::
Warcprox-Meta: {"warc-prefix": "special-warc"}
``dedup-bucket`` (string)
``dedup-buckets`` (string)
~~~~~~~~~~~~~~~~~~~~~~~~~
Specifies the deduplication bucket. For more information about deduplication
Specifies the deduplication bucket(s). For more information about deduplication
see `<README.rst#deduplication>`_.
Example::
Examples::
Warcprox-Meta: {"dedup-bucket":"my-dedup-bucket"}
Warcprox-Meta: {"dedup-buckets":{"my-dedup-bucket":"rw"}}
Warcprox-Meta: {"dedup-buckets":{"my-dedup-bucket":"rw", "my-read-only-dedup-bucket": "ro"}}
``blocks`` (list)
~~~~~~~~~~~~~~~~~