* add localization utilities:
- add locmanager to support extract, update, remove, list using pybabel
- add po2csv/csv2po conversion with translate-utils
- docs: add localization.rst to manual!
* add language switch header (via header.html) to all pages if more than one locale is present.
* localization: wrap more text strings in templates in existing templates
* docs:
- document `wb-manager i18n` commands
- mention `<html lang>` setting
- include csv example
- add info about adding localizable text in templates
* add localization to CHANGES
* embargo: add support for per-collection date range embargo with embargo options of 'before', 'after', 'newer' and 'older'
'before' and 'after' accept a timestamp
'newer' and 'older' options configured with a dictionary consisting of any combo of 'years', 'months', 'days'
add basic test for each embargo option
* acl/embargo work:
- support acl access value 'allow_ignore_embargo' for overriding embargo
- support 'user' in acl setting, matched with value of 'X-Pywb-ACL-User' header
- support passing through 'X-Pywb-ACL-User' setting to warcserver
- aclmanager: support -u/--user param for adding, removing and matching rules
- tests: add test for 'allow_ignore_embargo', user-specific acl rule matching
* docs: add docs for new embargo system!
* docs: add info on how to configure ACL header with short examples to usage page.
sample-deploy: add examples of configuring X-pywb-ACL-user header based on IP for nginx and apache sample deployments
* docs: fix access control page header, text tweaks
* bump version to 2.6.0b0
* post append improvements:
- parse json primitives for post query
- for text/plain, attempt to parse as json, then as binary
- standardize post append indexing
- include '__wb_method' in urlkey
- add 'requestBody' and 'method' to cdxj
- support unique dupe params for json-to-query conversion
* test fixes:
- update tests for test_inputreq,
- update post-test.cdxj and post-test.cdx
* ci: fixes
- tox: run full test suite!
- disable appveyor
* inputrequest buffering fix:
- never truncate reading POST request, must read entire POST data to avoid hung request in live mode
- truncate final query string to 4096
The field is unfortunately misnamed compressedendoffset in XML but OWB
actually uses this for the compressed length 'S' CDX field.
Without this field when WARC files are accessed over HTTP pywb will make
open byte range requests which results in a lot more data being read
from disk than necessary.
* FrontendApp: forward HTTP status of CDX backend to allow clients
to handle errors more easily
* Handle CDXExceptions properly, returning the exception status code
- make that CDXException is raised early so that it can be handled
in the IndexHandler
* FrontendApp: forward HTTP status of CDX backend to allow clients
to handle errors more easily
* WarcServer: keep the HTTP status lines short
- append the exception message only if the status isn't a string
(WbException and inherited classes already have nice status string)
- avoid overlong status lines, eg.
HTTP/1.1 404 Not Found No Captures found for: https://very-long.url/...
* Add unit test to verify whether ACL exact-match rules in a single-line
*.aclj file are found
* Fix AccessChecker to match exact rules in a single-line rule file
* dedup improvements on top of #597, work towards patching support (#601)
- single key 'dedup_policy' of 'skip', 'revisit', 'keep'
- optional 'dedup_index_url', defaults to redis urls
- support for 'cache: always' to further add cacheing on all requests that have a referrer
- updated docs to mention latest config, explain 'instant replay' that is possible when dedup_policy is set
- add check to ensure only redis:// URLs can be set for dedup_index_url for now
- config: convert shorthand 'recorder: <source_coll>' setting string to dict, don't override custom config
* rules: updated rule to fix replay of latest youtube watch and embed pages
include youtube-nocookie variant
fixes#607
part of fix for webrecorder/browsertrix-crawler#4
* rules: additional rules fix for vimeo
At UKWA we're hitting cases where crawl variation means we have e.g. a lot of redirect records and in these cases the 10 record limit is too low. I can't see any way of configuring this value, so I'm proposing the default is raised.
* Add support for verifying ssl certificates
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
* Add documentation for new certificate configuration options
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
* Add test to check the verification of ssl certificates
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
* docs work on OpenWayback -> pywb transition, part 1
* docs: add config change examples, exclusions and deploy recommendations
* update with path index example
* update terms with collection info
* docs update:
- add zipnum examples to owb-to-pywb config transition
- add working docker compose examples for nginx subdirectory, apache subdirectory and outback cdx deployment in ./sample-deploy
- update usage and owb-to-pywb deployment docs with updated subdiretory deployment info + sample-deploy links
* tweak exclusion info, deploy title
* add missing filee uwsgi_subdir.ini
* Docs: fix typos and clarifications from review (thanks @ldko!)
Co-authored-by: Lauren Ko <lauren.ko@unt.edu>
* docs: explain that existing cdx can be added to outbackcdx, explain reindexing is optional
* docs: elaborate on docker-compose examples
* minor tweaks
* update to latest wombat 3.0.2
* update CHANGES.rst
* bump version to 2.5.0 for release
Co-authored-by: Lauren Ko <lauren.ko@unt.edu>
* ensure that the RemoteCDXIndexSource also adds a 'matchType=' param, fix for ukwa-pywb/ukwa#57
* 2.4.2 fixes:
- cdxindexer: don't treat first param as output, require '-o <output>' instead, update tests
- cleanup: move url-polyfill.min.js to correct static dir, addresses #571
- update to latest wombat
- move logo to ./pywb/static, fix README path
- tests: update indexing tests for cdx-indexer fix
- bump version to 2.4.2
- Fix link in access-control docs to use RST instead of MD syntax (#568) (by @machawk1)