From cdb17f4790af7ec2367167a92a7e535a53577012 Mon Sep 17 00:00:00 2001 From: Gretchen Miller Date: Mon, 23 Sep 2024 15:21:04 -0700 Subject: [PATCH] WT-2955 documentation for MIME type filtering --- api.rst | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/api.rst b/api.rst index eee3219..86876a4 100644 --- a/api.rst +++ b/api.rst @@ -186,6 +186,21 @@ to evaluate the block rules. In particular, this circumstance prevails when the browser controlled by brozzler is requesting images, javascript, css, and so on, embedded in a page. +``mime-type-filters`` (list) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +``mime-type-filters`` is a list of dictionaries, each of which has two required +fields, ``regex`` and ``type``. Each entry in the ``mime-type-filters`` list +defines behavior to filter WARC-writing by the MIME type specified in the HTTP +response's Content-Type header. + +There are two expected keys in a MIME type filter block: + +* ``regex``: A regex expression to be applied to the Content-Type header value. +* ``type``: The type of filtering logic to apply. Two values are supported. + * ``REJECT``: Any Content-Type header value matching the regex will be + rejected. + * ``LIMIT``: Only Content-Type values matching the regex will be allowed. + ``stats`` (dictionary) ~~~~~~~~~~~~~~~~~~~~~~ ``stats`` is a dictionary with only one field understood by warcprox, @@ -307,4 +322,3 @@ that it sends to the client. As with the request header, the value is a json blob. It is only included if something in the ``warcprox-meta`` request header calls for it. Those cases are described above in the `Warcprox-Meta http request header`_ section. -