1
0
mirror of https://github.com/internetarchive/warcprox.git synced 2025-01-18 13:22:09 +01:00
This commit is contained in:
Barbara Miller 2019-06-20 14:52:28 -07:00
parent c0fcf59c86
commit 48d96fbc79

@ -90,7 +90,7 @@ for deduplication works similarly to deduplication by `Heritrix
a. Write ``response`` record with full payload a. Write ``response`` record with full payload
b. Store new entry in deduplication database (can be disabled, see b. Store new entry in deduplication database (can be disabled, see
`Warcprox-Meta HTTP request header <api.rst#warcprox-meta-http-request-header>` `Warcprox-Meta HTTP request header <api.rst#warcprox-meta-http-request-header>`_)
The deduplication database is partitioned into different "buckets". URLs are The deduplication database is partitioned into different "buckets". URLs are
deduplicated only against other captures in the same bucket. If specified, the deduplicated only against other captures in the same bucket. If specified, the