mirror of
https://github.com/internetarchive/warcprox.git
synced 2025-01-18 13:22:09 +01:00
docs updates
This commit is contained in:
parent
d133565061
commit
8c52bd8442
@ -89,12 +89,13 @@ for deduplication works similarly to deduplication by `Heritrix
|
||||
4. If not found,
|
||||
|
||||
a. Write ``response`` record with full payload
|
||||
b. Store new entry in deduplication database
|
||||
b. Store new entry in deduplication database (can be disabled, see
|
||||
`Warcprox-Meta HTTP request header <api.rst#warcprox-meta-http-request-header>`
|
||||
|
||||
The deduplication database is partitioned into different "buckets". URLs are
|
||||
deduplicated only against other captures in the same bucket. If specified, the
|
||||
``dedup-bucket`` field of the `Warcprox-Meta HTTP request header
|
||||
<api.rst#warcprox-meta-http-request-header>`_ determines the bucket. Otherwise,
|
||||
``dedup-buckets`` field of the `Warcprox-Meta HTTP request header
|
||||
<api.rst#warcprox-meta-http-request-header>`_ determines the bucket(s). Otherwise,
|
||||
the default bucket is used.
|
||||
|
||||
Deduplication can be disabled entirely by starting warcprox with the argument
|
||||
|
10
api.rst
10
api.rst
@ -137,14 +137,16 @@ Example::
|
||||
|
||||
Warcprox-Meta: {"warc-prefix": "special-warc"}
|
||||
|
||||
``dedup-bucket`` (string)
|
||||
``dedup-buckets`` (string)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Specifies the deduplication bucket. For more information about deduplication
|
||||
Specifies the deduplication bucket(s). For more information about deduplication
|
||||
see `<README.rst#deduplication>`_.
|
||||
|
||||
Example::
|
||||
Examples::
|
||||
|
||||
Warcprox-Meta: {"dedup-bucket":"my-dedup-bucket"}
|
||||
Warcprox-Meta: {"dedup-buckets":{"my-dedup-bucket":"rw"}}
|
||||
|
||||
Warcprox-Meta: {"dedup-buckets":{"my-dedup-bucket":"rw", "my-read-only-dedup-bucket": "ro"}}
|
||||
|
||||
``blocks`` (list)
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
Loading…
x
Reference in New Issue
Block a user