1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

49 Commits

Author SHA1 Message Date
Sebastian Nagel
f9f5d2dc33
Improve docs about CDXJ Server API endpoint (#651)
- replace erroneous/outdated `/coll-cdx` API endpoint
  by default API endpoint `/<coll>/cdx`
- if clear from preceding context: reduce examples
  to params only `?url=...&param1=...`
2021-06-15 18:12:48 -07:00
Ilya Kreymer
f7bd84cdac
Localization / doc fixes (#650)
* localization / doc fixes:
- add missing header.html
- docs: support 'i18n' extra, mention in docs
- use 'default_locale' for html lang tag
- access control docs: fix documentation for adding user with acl command

* localization: add compile_catalog after extract as well to simplify updates for identity (en) locale

* ui: 
- include locale in home page collection listing
- keep locale on error page home link

* autoescape:
- ensure jinja2 templates are autoescaped to prevent xss issues (thanks @sebastian-nagel for suggested fix)
- ensure banner inserts are not double-escaped
- update tests for template autoescaping

* update CHANGES.rst

* bump version to 2.6.0b1
2021-06-14 17:09:00 -07:00
Lauren Ko
9587954856
Fix typos in localization and access-control docs (#649)
* Fix typos in localization doc

* Fix typos in access-control doc
2021-06-11 22:50:35 -07:00
Ilya Kreymer
12fcc87962
Localization Support (#647)
* add localization utilities:
- add locmanager to support extract, update, remove, list using pybabel
- add po2csv/csv2po conversion with translate-utils
- docs: add localization.rst to manual!

* add language switch header (via header.html) to all pages if more than one locale is present.

* localization: wrap more text strings in templates in existing templates

* docs:
- document `wb-manager i18n` commands
- mention `<html lang>` setting
- include csv example
- add info about adding localizable text in templates

* add localization to CHANGES
2021-06-09 13:12:53 -07:00
Ilya Kreymer
f07d35709a
Access Control Improvements: Embargo + ACL User Support (#642)
* embargo: add support for per-collection date range embargo with embargo options of 'before', 'after', 'newer' and 'older'
'before' and 'after' accept a timestamp
'newer' and 'older' options configured with a dictionary consisting of any combo of 'years', 'months', 'days'
add basic test for each embargo option

* acl/embargo work:
- support acl access value 'allow_ignore_embargo' for overriding embargo
- support 'user' in acl setting, matched with value of 'X-Pywb-ACL-User' header
- support passing through 'X-Pywb-ACL-User' setting to warcserver
- aclmanager: support -u/--user param for adding, removing and matching rules
- tests: add test for 'allow_ignore_embargo', user-specific acl rule matching

* docs: add docs for new embargo system!

* docs: add info on how to configure ACL header with short examples to usage page.
sample-deploy: add examples of configuring X-pywb-ACL-user header based on IP for nginx and apache sample deployments

* docs: fix access control page header, text tweaks

* bump version to 2.6.0b0
2021-05-18 20:09:18 -07:00
Sebastian Nagel
662fc747bf
Fix ACL loading for auto collections (#620)
* Pass collection name to ACL checker to load ACL lists
for automatic collections

* Typo: file suffix must be `.aclj`
2021-04-26 19:58:56 -07:00
Ilya Kreymer
e1cad621b9
Dedup Improvments (#611)
* dedup improvements on top of #597, work towards patching support (#601)
- single key 'dedup_policy' of 'skip', 'revisit', 'keep'
- optional 'dedup_index_url', defaults to redis urls
- support for 'cache: always' to further add cacheing on all requests that have a referrer
- updated docs to mention latest config, explain 'instant replay' that is possible when dedup_policy is set
- add check to ensure only redis:// URLs can be set for dedup_index_url for now
- config: convert shorthand 'recorder: <source_coll>' setting string to dict, don't override custom config
2021-01-26 18:53:54 -08:00
Lukey3332
ddf3207e40
Add configuration options for dedup (#597)
* Add configuration options for dedup

Signed-off-by: Lukas Straub <lukasstraub2@web.de>

* Add documentation for new dedup_index configuration options

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
2021-01-26 17:06:18 -08:00
Lukey3332
f628b40e02
Add support for verifying ssl certificates (#596)
* Add support for verifying ssl certificates

Signed-off-by: Lukas Straub <lukasstraub2@web.de>

* Add documentation for new certificate configuration options

Signed-off-by: Lukas Straub <lukasstraub2@web.de>

* Add test to check the verification of ssl certificates

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
2021-01-26 12:41:26 -08:00
Ilya Kreymer
9e09bcd2a7
Docs Update: OpenWayback -> pywb Transition Guide (#588)
* docs work on OpenWayback -> pywb transition, part 1

* docs: add config change examples, exclusions and deploy recommendations

* update with path index example

* update terms with collection info

* docs update:
- add zipnum examples to owb-to-pywb config transition
- add working docker compose examples for nginx subdirectory, apache subdirectory and outback cdx deployment in ./sample-deploy
- update usage and owb-to-pywb deployment docs with updated subdiretory deployment info + sample-deploy links

* tweak exclusion info, deploy title

* add missing filee uwsgi_subdir.ini

* Docs: fix typos and clarifications from review (thanks @ldko!)

Co-authored-by: Lauren Ko <lauren.ko@unt.edu>

* docs: explain that existing cdx can be added to outbackcdx, explain reindexing is optional

* docs: elaborate on docker-compose examples

* minor tweaks

* update to latest wombat 3.0.2
* update CHANGES.rst

* bump version to 2.5.0 for release

Co-authored-by: Lauren Ko <lauren.ko@unt.edu>
2020-12-04 18:40:58 -08:00
Ilya Kreymer
7b51101b04 license: add NOTICE, update license statement for docs (gplv3) 2020-10-27 16:19:19 -07:00
Max Maass
3f3f8caef1
docs: Fix incorrect example (#574)
minor fix to docs example http://localhost:8080/my-web-archive/record/<url> -> http://localhost:8080/my-web-archive/record/http://example.com/
2020-07-10 20:40:24 -07:00
Ilya Kreymer
9b8c187b3a
2.4.2 Develop->Master (#572)
* ensure that the RemoteCDXIndexSource also adds a 'matchType=' param, fix for ukwa-pywb/ukwa#57

* 2.4.2 fixes:
- cdxindexer: don't treat first param as output, require '-o <output>' instead, update tests
- cleanup: move url-polyfill.min.js to correct static dir, addresses #571
- update to latest wombat
- move logo to ./pywb/static, fix README path
- tests: update indexing tests for cdx-indexer fix
- bump version to 2.4.2
- Fix link in access-control docs to use RST instead of MD syntax (#568) (by @machawk1)
2020-07-10 20:22:58 -07:00
thomas536
8f0ce45b27
docs: fix proxy default timestamp yaml example (#544)
Per the code, the key should use an underscore, not a hyphen. It also seems like the value is parsed as a number instead of a string, which then fails with a type error later, so quote it to force it to be a string.

```
$ pywb
2020-03-10 21:06:33,084: [INFO]: Proxy enabled for collection "web"
Traceback (most recent call last):
  File "/tmp/pywb_venv/bin/pywb", line 8, in <module>
    sys.exit(wayback())
  File "/tmp/pywb_venv/local/lib/python2.7/site-packages/pywb/apps/cli.py", line 20, in wayback
    desc='pywb Wayback Machine Server').run()
  File "/tmp/pywb_venv/local/lib/python2.7/site-packages/pywb/apps/cli.py", line 89, in __init__
    self.application = self.load()
  File "/tmp/pywb_venv/local/lib/python2.7/site-packages/pywb/apps/cli.py", line 181, in load
    return FrontEndApp(custom_config=self.extra_config)
  File "/tmp/pywb_venv/local/lib/python2.7/site-packages/pywb/apps/frontendapp.py", line 79, in __init__
    self.init_proxy(config)
  File "/tmp/pywb_venv/local/lib/python2.7/site-packages/pywb/apps/frontendapp.py", line 569, in init_proxy
    if not self.ALL_DIGITS.match(self.proxy_default_timestamp):
TypeError: expected string or buffer
```
2020-04-30 16:18:44 -07:00
Ivo Branco
8d8cf7eb58
Fix documentation: replace fl to fields on doc webrecorder/pywb#542 (#543) 2020-04-30 16:16:07 -07:00
Yvan
8baa8cbdb7 docs: fix doc typo in BaseWarcServer example (#507) 2019-10-31 17:09:25 -07:00
Ilya Kreymer
fed3263ac6
Docs: Fix access controls and ui customizations docs links (#513)
* docs: ensure docs added to access controls, fix typos

* begin changelist for 2.4.0
2019-10-31 16:56:36 -07:00
Ilya Kreymer
6f79840b79
Docs, custom metadata improvements (#509)
* metadata/coll_config: don't confuse user metadata with collection config, don't display collection config settings as metadata (ukwa/ukwa-pywb#47)
- for collection template, add separate 'coll_config' dict, keep user metadata only in 'metadata' dict (default to empty)
- for static collections, assume metadata is in the 'metadata' dict of collection config
- for dynamic collections, load metadata.yaml into 'metadata' dict
- ensure 'metadata' key is passed to frame_insert
- ensure 'metadata' added consistently in framed and non-framed mode
- tests: update tests to ensure metadata is added consistently

- fuzzymatch: don't match 204 OPTIONS responses, update fuzzymatcher test

* documentation
- add documentation for metadata in ui-customization, rebuild docs, 
- add link to ui customization from configuring
- work on access control docs
* fixed small typo's in ui-customization.rst
* frontendapp: fix doc string

- misc: remove warning on urllib3 Retry init

- set version to pywb 2.4.0rc0

Co-Authored-By: John Berlin <n0tan3rd@gmail.com>
2019-10-27 01:39:52 +01:00
John Berlin
379f7de1ba
manual
- split out the ui customization documentation into its own file ui-customization.rst
 - added initial documentation covering the new template setup to the ui-customization.rst
2019-09-05 18:13:12 -04:00
Ilya Kreymer
11610f6e04
2.3 Changelist + Docs Update (#487)
* docs: update changelist and add docs about new wombat

* update to latest wombat

* update wombat, fix pytest cmdline in setup
2019-07-09 17:50:57 -07:00
Eoin Kilfeather
96a7a4bbb0 Update configuring.rst to reflect default config.yaml. (#483)
The Docs specify the default value for the warc files path as 'archives' but the default config.yaml file specifies 'archive'
https://github.com/webrecorder/pywb/blob/master/pywb/default_config.yaml#L4
2019-07-08 14:16:57 -07:00
Ilya Kreymer
9448f4fe45 release: update changelist for 2.2.20190311
docs: fix typos
2019-03-11 16:40:53 -07:00
Ilya Kreymer
455efb17ad
Support for default timestamp/date for proxy mode (#454)
* proxy: add option to set default timestamp for proxy mode, fixes #452
- set via flag --proxy-default-timestamp or config 'proxy_options.default_timestamp'
- can be iso date or all-digit timestamp
- overridable via accept-datetime header

* docs: update docs for proxy timestamp
- add docs on memento support in proxy mode

* update-version: script can update version only, commit with 'update-version.sh commit'

* indexer post append: remove 'WB_wombat_' from POST query, could have been added in previous versions of pywb!
2019-03-11 16:28:09 -07:00
Ilya Kreymer
1fcc239ecf
Add Docker info to Docs (#448)
* docs: add docs on running with Docker, Docker image versions, fixes #299
2019-02-27 14:38:59 -08:00
Ilya Kreymer
0db8e5d718 Merge branch 'master' into develop for PR #395 2018-10-23 09:38:53 -07:00
anarcat
40f904af79 add sample Apache configuration (#374)
* add sample Apache configuration

This configuration can be used when launching `wayback` in the default
configuration, which is useful to add stuff like access control,
authentication, or encryption without going through the trouble of
setting up a UWSGI proxy.

* enable support for X-Forwarded-Proto headers from #395
2018-10-23 09:35:15 -07:00
Ilya Kreymer
3a70769c58
Cleanup CLI Switches and Docs for Auto-Fetch System (#394)
Rename:
- rename auto-fetch config to 'enable_auto_fetch' and '--enable-auto-fetch' cli param
- rename 'use_head_insert' -> 'enable_content_rewrite'
- rename 'use_banner' -> 'enable_banner'
- rename 'use_wombat' -> 'enable_wombat'

Misc Cleanup:
- enable_auto_fetch applies to both proxy and non-proxy mode
- remove setting 'wbinfo.use_wombat', implied if wombatProxyMode.js is included
- docs: add docs for auto-fetch system, improved docs for proxy rewrite options
- tests: test with enable_auto_fetch, update tests for renames
- bump version to 2.1.0 due to breaking changes
- changelist: updates to changelist
- requirements: use bounded version for gevent
2018-10-22 17:12:22 -07:00
eszense
6a2423e754 Add recorder option to filter source collection (#368)
* Add source_filter option to recorder.

* Add test and docs for source_filter option.

* Update test_record_replay.py - Split source_filter test into skip existing and new recording
2018-08-24 17:57:47 -07:00
Frank Sachsenheim
538ce88abc Fixes an enumeration issue in docs/usage.rst (#364)
Thanks! put it on develop so it can be part of next release.
2018-08-17 19:33:42 -07:00
Ilya Kreymer
819e8adf48
text updates: (#352)
- Update CHANGES.rst for 2.0.4
- Docs: Improve new proxy docs for (#316), fix URL-T->URI-T
- Requirements: bump to wsgiprox>=1.5.1
2018-06-27 09:02:01 -07:00
Ilya Kreymer
de3ec0e1bc proxy: use FrontEndApp.proxy_route_request() to determine proxy route
Extensions can override this function to provide custom proxy routing
Update docs
2018-04-20 15:20:56 -07:00
Ilya Kreymer
5349d0518c
Proxy Options (#317)
* proxy mode options: #316
- add 'use_banner' option, if false, will disable standard banner.html from being added
- add 'use_head_insert' option, if false, will disable injecting head_insert.html in proxy mode
both options default to true

* docs: add docs for new proxy options

* also add 'override_route' option and docs for extending proxy routing
2018-04-20 10:04:34 -07:00
Ilya Kreymer
d732cdd01f aggregator timeout fixes (#310):
- fix memento aggregation if timeout is 0.0
- use default timeout (5.0), instead of default to 0.0 and failing
- add 'timeout' property to warcserver aggregation tests
- docs: mention property in warcserver docs also
2018-04-02 17:52:13 -07:00
Ilya Kreymer
8f981743ae docs: add sample nginx config to deployment section, mention how https is handled, fixes #314 2018-04-02 17:23:04 -07:00
Ilya Kreymer
6d879cb8b8 docs: fix typos in memento docs (#307)
- URI-M instead of URL-M
- remove mention of vary: accept-datetime for URI-M
2018-03-05 13:12:12 -08:00
Ilya Kreymer
61bf5e09ca
proxy-mode tweaks: (fixes #302): (#304)
- don't include wombat.js in banner only mode, including in proxy mode
  (instead, do set devicePixelRatio to fix certain fidelity issues)
- default_banner: set title to document.title on load when frameless, including in proxy mode
- improve docs for configuring proxy mode cert
- tests: update tests to ensure no wombat.js injected in proxy or banner-only mode
2018-02-27 15:52:19 -08:00
Ilya Kreymer
fc48e23dae docs/README: fix typos, add changes for 2.0.1 2018-02-10 11:48:50 -08:00
Ilya Kreymer
008504d055
Text tweaks/Dockerfile update (#288)
README: update features list, contributing section, fix typos
docs: update features list, fix wording, add more links to other sections, fix typos
renaming: change 'ikreymer/pywb' -> 'webrecorder/pywb', add Rhizome to copyright statement
Dockerfile: remove deprecated MAINTAINER, add 'ARG PYTHON' to support custom base python image
2018-01-30 07:49:54 -08:00
Ilya Kreymer
34902df80c docs work and misc:
- set depth in main toc to 3
- add info on cli apps in apps.rst
- fix typos, update links
setup: add 'pywb' cli script to be same as 'wayback'
appveyor: remove coveralls
2018-01-29 18:36:14 -08:00
Ilya Kreymer
131c5ff5da
SOCKS proxy (#281)
warcserver: SOCKS proxy:
- add support for running warcserver through a socks proxy specified via SOCKS_HOST and SOCKS_PORT
- move socks patch setup, http max_header adjustment to http module
- logging: print stack trace only if debugging
- add pysocks to extra_requirements, enable in ci
- add simple test (not actual proxy) to check that connection through proxy is attempted
- docs: add SOCKS proxy section to docs
2018-01-17 10:51:49 -08:00
Ilya Kreymer
0c24f8a1c1
Docs and README Update for 2.0.0 (#277)
* docs and version update:
- add docs for compatibility features
- add docs for memento
- updat rewriter docs
- bump version to 2.0.0, update README, and changelist
2018-01-11 21:34:04 -08:00
Ilya Kreymer
df14c67a56 docs: docs update, start rewriter section 2017-12-09 22:51:19 -08:00
raffaele messuti
aea9be5291 Update configuring.rst (#265)
small fix on customizing frame
2017-11-07 18:13:57 -08:00
Ilya Kreymer
459cd706d3 include the collection in Memento Link outputs: (#259)
* include the collection in Memento Link outputs:
- add new cdx 'source-coll' field, storing only the collection
- ensure rel="collection" property included in the TimeMap and Link header
- tests: update all tests to include the 'source-coll' property
- docs: add 'collection provenance' to auto-all collection configuration docs
2017-10-23 15:33:23 -07:00
Ilya Kreymer
30be6f2e4c docs: add uwsgi info, rearrange ui customizations 2017-10-20 17:21:02 -07:00
Ilya Kreymer
61f825330c Docs Update (#256)
* docs work:
- write warcserver and beginnings of recorder docs!
- add cdx api docs!
- add indexing docs
- refactor architecture section, remove readme
- update readme with better new features list, work-in-progress list
- add placeholder docs for apps, indexing
- remove unused readme
- update README with better docs link, features
2017-10-18 10:12:44 -07:00
Ilya Kreymer
02bc7776ca config and docs work: (#255)
config and docs work:
- autoindexing now set in config via 'autoindex: <secs>' option
- autoindexing only runs in first uwsgi worker if in uwsgi
- recorder config: rename props to 'rollover_' to match docs
- docs: write configuring.rst section for recording mode, autoindexing and proxy mode!
- update README for new pywb release, point to new docs!
2017-10-15 22:47:23 -07:00
Ilya Kreymer
9b73200d90 docs: fix customizing banner example 2017-10-13 10:06:29 -07:00
Ilya Kreymer
31209db311 New Documentation (#252)
* docs work:
- remove old doc folder
- generate new sphinx docs
rewrite: fix existing docstrings for rst
add 'make apidoc' to rerun apidoc on pywb root
apidocs in docs/code
first pass on usage manual in docs/manual

* use default theme

* docs config work:
- remove modules.rst, use pywb toc directly
- make apidoc force rebuild
- comment out alabaster theme config

* Update usage.rst with working dir info

* docs: add configuring web archive page, ui customizations, custom collections explanations

* work on 'custom collections' section

* docs: update dir tree, switch recording/proxy order

* docs: improve framed vs frameless intro
add 'custom outer replay frame' section
2017-10-04 22:02:03 -07:00