mirror of
https://github.com/internetarchive/warcprox.git
synced 2025-01-18 13:22:09 +01:00
little edits
This commit is contained in:
parent
cd6e30fe36
commit
68ede68e5f
27
api.rst
27
api.rst
@ -195,7 +195,7 @@ Example::
|
||||
Warcprox-Meta: {"blocks": [{"ssurt": "com,example,//http:/"}, {"domain": "malware.us", "substring": "wp-login.php?action=logout"}]}
|
||||
|
||||
If any of the rules match the url being requested, warcprox aborts normal
|
||||
processing and responds with a http 403. The http response includes
|
||||
processing and responds with a http ``403``. The http response includes
|
||||
a ``Warcprox-Meta`` **response** header with one field, ``blocked-by-rule``,
|
||||
which reproduces the value of the match rule that resulted in the block. The
|
||||
presence of the ``warcprox-meta`` response header can be used by the client to
|
||||
@ -229,6 +229,11 @@ dictionary is ``{stats_key: numerical_limit, ...}`` where stats key has the
|
||||
format ``"bucket/sub-bucket/statistic"``. See `readme.rst#statistics`_ for
|
||||
further explanation of what "bucket", "sub-bucket", and "statistic" mean here.
|
||||
|
||||
If processing a request would result in exceeding a limit, warcprox aborts
|
||||
normal processing and responds with a http ``420 Reached Limit``. The http
|
||||
response includes a ``Warcprox-Meta`` **response** header with the complete set
|
||||
of statistics for the bucket whose limit has been reached.
|
||||
|
||||
Example::
|
||||
|
||||
{"stats": {"buckets": ["test_limits_bucket"]}, "limits": {"test_limits_bucket/total/urls": 10}}
|
||||
@ -250,16 +255,16 @@ Example::
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
From warcprox's perspective ``soft-limits`` work almost exactly the same way
|
||||
as ``limits``. The only difference is that when a soft limit is hit, warcprox
|
||||
response with an http 430 "Reached soft limit" instead of http 420.
|
||||
response with an http ``430 Reached soft limit`` instead of http ``420``.
|
||||
|
||||
Warcprox clients might treat a 430 very differently from a 420. From brozzler's
|
||||
perspective, for instance, ``soft-limits`` are very different from ``limits``.
|
||||
When brozzler receives a 420 from warcprox because a ``limit`` has been
|
||||
reached, this means that crawling for that seed is finished, and brozzler sets
|
||||
about finalizing the crawl of that seed. On the other hand, brozzler blissfully
|
||||
ignores 430 responses, because soft limits only apply to a particular bucket
|
||||
(like a domain), and don't have any effect on crawling of urls that don't fall
|
||||
in that bucket.
|
||||
Warcprox clients might treat a 430 very differently from a ``420``. From
|
||||
brozzler's perspective, for instance, ``soft-limits`` are very different from
|
||||
``limits``. When brozzler receives a ``420`` from warcprox because a ``limit``
|
||||
has been reached, this means that crawling for that seed is finished, and
|
||||
brozzler sets about finalizing the crawl of that seed. On the other hand,
|
||||
brozzler blissfully ignores ``430`` responses, because soft limits only apply
|
||||
to a particular bucket (like a domain), and don't have any effect on crawling
|
||||
of urls that don't fall in that bucket.
|
||||
|
||||
Example::
|
||||
|
||||
@ -300,7 +305,7 @@ Example::
|
||||
|
||||
Warcprox-Meta: {"accept": ["capture-metadata"]}
|
||||
|
||||
The response will include a ``Warcpro-Meta`` response header with one field
|
||||
The response will include a ``Warcprox-Meta`` response header with one field
|
||||
also called ``captured-metadata``. Currently warcprox reports one piece of
|
||||
capture medata, ``timestamp``, which represents the time fetch began for the
|
||||
resource and matches the ``WARC-Date`` written to the warc record. For
|
||||
|
Loading…
x
Reference in New Issue
Block a user