docs still in progress

This commit is contained in:
Noah Levitt 2018-05-25 17:36:26 -07:00
parent 195faa5cff
commit 07dc978f09

22
api.rst
View File

@ -136,7 +136,7 @@ remote server, and also does not write it in the warc request record.
Brozzler knows about ``warcprox-meta``. For information on configuring Brozzler knows about ``warcprox-meta``. For information on configuring
it in brozzler, see it in brozzler, see
`https://github.com/internetarchive/brozzler/blob/master/job-conf.rst#warcprox-meta`_. https://github.com/internetarchive/brozzler/blob/master/job-conf.rst#warcprox-meta.
``Warcprox-Meta`` is often a very important part of brozzler job configuration. ``Warcprox-Meta`` is often a very important part of brozzler job configuration.
It is the way url and data limits on jobs, seeds, and hosts are implemented, It is the way url and data limits on jobs, seeds, and hosts are implemented,
among other things. among other things.
@ -156,14 +156,14 @@ Example::
``stats`` (dictionary) ``stats`` (dictionary)
~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
``stats`` is a dictionary with only one field understood by warcprox, ``stats`` is a dictionary with only one field understood by warcprox,
``"buckets"``. The value of ``"buckets"`` is a list of strings and/or ``buckets``. The value of ``buckets`` is a list of strings and/or
dictionaries. A string signifies the name of the bucket; a dictionary is dictionaries. A string signifies the name of the bucket; a dictionary is
expected to have at least an item with key ``"bucket"`` whose value is the name expected to have at least an item with key ``bucket`` whose value is the name
of the bucket. The other currently recognized key is ``"tally-domains"``, which of the bucket. The other currently recognized key is ``tally-domains``, which
if supplied should be a list of domains. This instructs warcprox to if supplied should be a list of domains. This instructs warcprox to
additionally tally substats of the given bucket by domain. Host stats are additionally tally substats of the given bucket by domain. Host stats are
stored in the stats table under the key stored in the stats table under the key
``{parent-bucket}:{domain(normalized)}``, e.g. `"bucket2:foo.bar.com"` for the ``{parent-bucket}:{domain(normalized)}``, e.g. ``"bucket2:foo.bar.com"`` for the
example below. example below.
Examples:: Examples::
@ -196,13 +196,13 @@ Example::
If any of the rules match the url being requested, warcprox aborts normal If any of the rules match the url being requested, warcprox aborts normal
processing and responds with a http 403. The http response includes processing and responds with a http 403. The http response includes
a ``Warcprox-Meta`` **response** header with one field, `"blocked-by-rule"`, a ``Warcprox-Meta`` **response** header with one field, ``blocked-by-rule``,
which reproduces the value of the match rule that resulted in the block. The which reproduces the value of the match rule that resulted in the block. The
presence of the ``warcprox-meta`` response header can be used by the client to presence of the ``warcprox-meta`` response header can be used by the client to
distinguish this type of a response from a 403 from the remote url being distinguish this type of a response from a 403 from the remote url being
requested. requested.
For example:: An example::
$ curl -iksS --proxy localhost:8000 --header 'Warcprox-Meta: {"blocks": [{"ssurt": "com,example,//http:/"}, {"domain": "malware.us", "substring": "wp-login.php?action=logout"}]}' http://example.com/foo $ curl -iksS --proxy localhost:8000 --header 'Warcprox-Meta: {"blocks": [{"ssurt": "com,example,//http:/"}, {"domain": "malware.us", "substring": "wp-login.php?action=logout"}]}' http://example.com/foo
HTTP/1.0 403 Forbidden HTTP/1.0 403 Forbidden
@ -217,10 +217,10 @@ For example::
You might be wondering why ``blocks`` is necessary. Why would the warcprox You might be wondering why ``blocks`` is necessary. Why would the warcprox
client make a request that it should already know will be blocked by the proxy? client make a request that it should already know will be blocked by the proxy?
The answer is that the request may be initiated somewhere where it's not The answer is that the request may be initiated somewhere where it's difficult
possible, or at least not convenient, to evaluate the block rules. In to evaluate the block rules. In particular, this circumstance prevails when the
particular, this circumstance prevails when the browser controlled by brozzler browser controlled by brozzler is requesting images, javascript, css, and so
is requesting images, javascript, css, and so on, embedded in a page. on, embedded in a page.
``limits`` (dictionary) ``limits`` (dictionary)
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~