dedup-bucket is required in Warcprox-Meta to do dedup

Modify `DedupableMixin.should_dedup` to check Warcprox-Meta for
`dedup-bucket` in order to perform dedup.
This commit is contained in:
Vangelis Banos 2018-05-04 14:27:42 +00:00
parent 9baa2e22d5
commit 432e42803c

View File

@ -44,8 +44,12 @@ class DedupableMixin(object):
def should_dedup(self, recorded_url): def should_dedup(self, recorded_url):
"""Check if we should try to run dedup on resource based on payload """Check if we should try to run dedup on resource based on payload
size compared with min text/binary dedup size options. Return Boolean. size compared with min text/binary dedup size options.
`dedup-bucket` is required in Warcprox-Meta to perform dedup.
Return Boolean.
""" """
if "dedup-bucket" not in recorded_url.warcprox_meta:
return False
if recorded_url.is_text(): if recorded_url.is_text():
return recorded_url.response_recorder.payload_size() > self.min_text_size return recorded_url.response_recorder.payload_size() > self.min_text_size
else: else: