1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

298 Commits

Author SHA1 Message Date
Tessa Walsh
7c5fa48667 Add test 2024-04-02 17:18:45 -04:00
Ed Summers
b4955cca66
Upgrade dependencies (#839)
- Update and pin dependencies to specific versions that support Python 3.7-3.11
- Replace deprecated werkzeug.pop_path_info with wsgiref.shift_path_info
- Use the latest httpbin from psf/httpbin
- Remove unused flask test dependency
- Drop Python 2 and Python <3.7 support
- Ensure greenlet 2 is used for now, as psf/httpbin doesn't yet work with greenlet 3

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-04-02 17:16:50 -04:00
kuechensofa
f40e7ef18c
Sort index when adding wacz archives (#820) 2023-11-23 12:10:52 -05:00
Jonas Linde
5c427b9ff2
[#715] Forward custom headers for cdx queries (#813)
In particular the X-Pywb-ACL-User header must be forwarded in order
for it to be able to control CDX-queries
2023-02-15 17:05:21 -05:00
kuechensofa
454486bf75
[#799] wb-manager: Add wacz archives to collection with --uncompress-wacz (#800)
Add WACZ support for `wb-manager add` by unpacking WACZ files with --uncompress-wacz.

A future commit will add pywb support for WACZ files without requiring them to be unpacked.
2023-02-15 17:00:38 -05:00
Sara Tavares
c441d83435
chore(typos): fix typos across codebase (#811)
Co-authored-by: stavares843 <stavares843@users.noreply.github.com>
2023-02-15 13:04:20 -05:00
Tessa Walsh
2d19b6b18d
Merge 2.7.1 development branch (#785)
* Add locale-dependent handling of first day of week

The Intl.Locale is a proposed standard not yet supported by Firefox so
in Firefox the first day of week will default to Monday (as specified
in ISO-8601).

* Set top frame document title when Vue updates

* Update template guide for 2.7

* Drop Python 3.6 and add 3.10 in test CI

* Allow either JS mimetype in test_add_static

* Add convenience build script for Vue UI

* Add build flag to docker compose example

* Fix Vue app issue with redirect_to_exact: false

Fixes #779

Undated URLs were resulting in a broken calendar and timeline in the
Vue app when redirect_to_exact was set to false. This was due to
TopFrameView using the current datetime if no timestamp was included,
which caused a failed snapshot lookup in the Vue app.

This commit changes the default timestamp in TopFrameView to None and
adds additional logic in the Vue app to use the last snapshot's
timestamp as the default if one is not present to match the snapshot
that pywb loads by default under the same conditions.

* Add filter instead of submitting form when pressing enter in the filtering expression field

* Make filter expressions translatable

* Add missing tooltip strings to vue_loc

* Add changelog

* Bump version to 2.7.1

* Use empty string as default template timestamp

* Bump wombat to 3.3.13

Co-authored-by: Jonas Linde <jonasjlinde@gmail.com>
2022-12-07 18:16:18 -08:00
Ilya Kreymer
e20fac2c75 head_insert: don't include banner_html, only used for framed replay now
wombat: bump to latest wombat 3.3.7
add new custom_banner to head_insert template for frameless replay
2022-11-21 12:46:28 -05:00
Tessa Walsh
c28941a0b6 Rework Vue banner UI
- Make Vue banner responsive with Bootstrap 4
- Add previous/next year arrows to calendar
- Make navbar background, text color, and button outlines configurable
via config.yaml
- Toggle calendar and timeline separately
- Fix bug preventing title from displaying
- Make app keyboard-navigable
- Fix banner background color configuration
- Comment out vue_navbar_background_hash
- Display linear timeline tooltip centrally on enter
- Improve header styling on small screens
- Add titles to font awesome icons
- Remove old default banner (calendar retained for advanced search
  results)
- Fix TimelineLinear TypeError that broke calendar
- Bump version to 2.7.0b2
- Set Cache-Control header on CDXJ API response to mark returned CDX as
stale after 1 day
- Add commented out UI values to config.yaml to aid users
- Remove timeline and calendar card borders
- Fix issues with snapshot navigation
- Center search bar and align with buttons
- Make Vue app bfcache-ineligible: By adding an empty unload event
listener, we make pages serving the Vue app ineligible for bfcache,
which prevents unexpected behavior when navigating via the browser's
back/forward buttons.
2022-11-21 12:46:09 -05:00
Ilya Kreymer
1fddec216d
Add ir_ modifier (#759)
* rewrite: add 'ir_' mod to support header only url-rewriting with no content rewriting
* tests: add tests for ir_ to test that content is identical to id_, but Location headers are rewritten with ir_ modifier.
2022-08-31 18:49:45 -07:00
Ilya Kreymer
2ccd8eb2c3
tests run improvements: update from python setup.py test -> tox (#754)
* tests cleanup:
- move test requirements to test_requirements.txt to share between setup.py and tox.ini
- README: update to recommend using 'tox --current-env' for running tests locally
- replaces #741

* test tweaks:
- don't require i18n to import locmanager, instead set flag on load (to avoid breaking tox / pytest)
- don't add werkzeug to test requirements
2022-08-31 16:04:55 -07:00
Ilya Kreymer
c121198183
revisit of redirect optimization: (#753)
- if a revisit is of a redirect (3xx response) and revisit has http headers, return
the http headers with empty payload -- don't bother loading the original record
builds on changes in #751
- cleanup redirect revisit tests from #751
2022-08-20 13:53:16 -07:00
Ilya Kreymer
f190190128
Revisit headers load fix (#751)
* revisit loading fix for revisit records with http headers:
- if revisit record has http headers, always use those headers
- otherwise, continue to use http headers from payload record
- parse headers of http and payload records on initial lookup, to simplify loading
- tests: add test for loading revisit records with different urls, different headers but same payload
- fix for sul-dlss/was-pywb#64
* also bump version to 2.6.8
2022-08-18 23:25:38 -07:00
Ilya Kreymer
403167fbe0
User-Agent Detection Fix + New-Style rewriting on by default + Dependency Update (2.6.6) (#708)
* js rewriting: default to moden js-proxy based rewriting by default, use legacy rewriting only if browsers are older than minimum, as suggested in #707 
* user-agent detection: use ua_parser for user-agent detection instead of obsolete werkzeug.useragent, which also did not support browsers >=100
* tests: additional tests for rewriting with various user-agents, defaulting to new-style rewriting for unknown browsers
* dockerfile: Update Dockerfile to use py3.8
* tests: skip s3 tests dependent on commoncrawl data (for now, need better s3 tests).
* bump to 2.6.6, update CHANGES
2022-04-11 14:51:11 -07:00
Ilya Kreymer
38b1952d34
live route fix: (#692)
- when 'redirect_to_exact' is enabled, the top-frame expects a redirect for top-frame, however, live mode does not result in redirect to top-frame, so render live top-frame same as before
- tests: ensure top-frame loads correctly for live mode with redirect_to_exact enabled
- tests: fix webenact index tests
2022-01-25 19:10:28 -08:00
Ilya Kreymer
c97a66703b
More consistent env var setting / static path fix (#688)
* template/custom env var fix:
- ensure pywb.host_prefix, pywb.app_prefix and pywb.static_prefix set for all requests via prepare_env()
- ensure X-Forwarded-Proto is accounted for in pywb.host_prefix
- call prepare_env() in handle_request(), and also in rewriterapp (in case using a different front-end app).

* update wombat to 3.3.6 (includes partial fix for #684)
* bump version to 2.6.3
2021-12-22 16:15:27 -08:00
Ilya Kreymer
e64e58f040
2.6.2 fix (#682)
2.6.2 release:
* fix for regression caused by 2.6.1, invalid static path #681
* add missing base.css
2021-11-12 17:51:34 -08:00
Ilya Kreymer
98c6fba44d
Support for custom data being added via 'PUT /<coll>/record' when… (#661)
* add support for custom data being added via 'PUT /<coll>/record' when in recording mode and 'enable_put_custom_record: true' set in 'recorder' config
- url specified via 'url' query arg and content type via request Content-Type
- update docs for put custom record options

* bump version to 2.6.0b4
2021-07-18 17:04:34 -07:00
Ilya Kreymer
f7bd84cdac
Localization / doc fixes (#650)
* localization / doc fixes:
- add missing header.html
- docs: support 'i18n' extra, mention in docs
- use 'default_locale' for html lang tag
- access control docs: fix documentation for adding user with acl command

* localization: add compile_catalog after extract as well to simplify updates for identity (en) locale

* ui: 
- include locale in home page collection listing
- keep locale on error page home link

* autoescape:
- ensure jinja2 templates are autoescaped to prevent xss issues (thanks @sebastian-nagel for suggested fix)
- ensure banner inserts are not double-escaped
- update tests for template autoescaping

* update CHANGES.rst

* bump version to 2.6.0b1
2021-06-14 17:09:00 -07:00
Ilya Kreymer
f07d35709a
Access Control Improvements: Embargo + ACL User Support (#642)
* embargo: add support for per-collection date range embargo with embargo options of 'before', 'after', 'newer' and 'older'
'before' and 'after' accept a timestamp
'newer' and 'older' options configured with a dictionary consisting of any combo of 'years', 'months', 'days'
add basic test for each embargo option

* acl/embargo work:
- support acl access value 'allow_ignore_embargo' for overriding embargo
- support 'user' in acl setting, matched with value of 'X-Pywb-ACL-User' header
- support passing through 'X-Pywb-ACL-User' setting to warcserver
- aclmanager: support -u/--user param for adding, removing and matching rules
- tests: add test for 'allow_ignore_embargo', user-specific acl rule matching

* docs: add docs for new embargo system!

* docs: add info on how to configure ACL header with short examples to usage page.
sample-deploy: add examples of configuring X-pywb-ACL-user header based on IP for nginx and apache sample deployments

* docs: fix access control page header, text tweaks

* bump version to 2.6.0b0
2021-05-18 20:09:18 -07:00
Ilya Kreymer
abb76911f5
Recorder Pending count (#637)
* recorder: add pending counter (in redis) to when using redis based dedup system, supports webrecorder/browsertrix#44
2021-04-28 16:10:39 -07:00
Ilya Kreymer
626da99899
POST request handling and indexing improvements (#636)
* post append improvements:
- parse json primitives for post query
- for text/plain, attempt to parse as json, then as binary
- standardize post append indexing
- include '__wb_method' in urlkey
- add 'requestBody' and 'method' to cdxj
- support unique dupe params for json-to-query conversion

* test fixes:
- update tests for test_inputreq,
- update post-test.cdxj and post-test.cdx

* ci: fixes
- tox: run full test suite!
- disable appveyor

* inputrequest buffering fix:
- never truncate reading POST request, must read entire POST data to avoid hung request in live mode
- truncate final query string to 4096
2021-04-27 20:52:24 -07:00
Sebastian Nagel
106a9e9200
IndexHandler: report BadRequestException as error while loading index (#625) 2021-04-27 12:47:13 -07:00
Sebastian Nagel
212691bd38
Handle CDXException and respond with HTTP 400 Bad Request (#626)
* FrontendApp: forward HTTP status of CDX backend to allow clients
to handle errors more easily

* Handle CDXExceptions properly, returning the exception status code
- make that CDXException is raised early so that it can be handled
  in the IndexHandler
2021-04-26 20:51:33 -07:00
Sebastian Nagel
c62b1bc987
Warcserver / CDXJ API: properly handle unsupported output formats (#623)
- add unit test to verify unknown output formats are handled
  if output fields param is in request
2021-04-26 20:33:37 -07:00
Ilya Kreymer
b475d85c4f tests: fix failing test?
update to latest wombat (3.1.4)
2021-04-26 18:22:43 -07:00
Ilya Kreymer
78a9888b46
Dedup Policy Tests (#613)
* dedup tests: add basic tests for dedup system, continuing from #611
- ensure config merge works correctly
2021-01-26 22:39:52 -08:00
Lukey3332
f628b40e02
Add support for verifying ssl certificates (#596)
* Add support for verifying ssl certificates

Signed-off-by: Lukas Straub <lukasstraub2@web.de>

* Add documentation for new certificate configuration options

Signed-off-by: Lukas Straub <lukasstraub2@web.de>

* Add test to check the verification of ssl certificates

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
2021-01-26 12:41:26 -08:00
Ilya Kreymer
de81efac78
Use Github Actions for CI (#600)
* ci: use gh actions for ci!

* use tox-gh-actions

* add missing tox.ini

* skip proxy tests for now
2020-12-05 20:20:38 -08:00
Ilya Kreymer
9b8c187b3a
2.4.2 Develop->Master (#572)
* ensure that the RemoteCDXIndexSource also adds a 'matchType=' param, fix for ukwa-pywb/ukwa#57

* 2.4.2 fixes:
- cdxindexer: don't treat first param as output, require '-o <output>' instead, update tests
- cleanup: move url-polyfill.min.js to correct static dir, addresses #571
- update to latest wombat
- move logo to ./pywb/static, fix README path
- tests: update indexing tests for cdx-indexer fix
- bump version to 2.4.2
- Fix link in access-control docs to use RST instead of MD syntax (#568) (by @machawk1)
2020-07-10 20:22:58 -07:00
Ilya Kreymer
3c53c2731b
memento timegate: make timegate headers for /<coll>/<url> behave correctly per-memento spec, (#564)
return 404 if not found, return latest memento header. do this by performing actual response lookup,
but then returning the top frame response if succeeded. addresses ukwa/ukwa-pywb#58
2020-06-08 13:26:20 -07:00
Ilya Kreymer
7e56ca8ca2
RC7 Fixes (#561)
* misc fixes for 2.4.0rc7:
- warcserver: when parsing headers to check for redirect, reserialized headers
may be of different length then original, causing warcserver->app response to hang
now adjusting the content-length on the warc record and also not including a fixed
length when serving warcserver->app, possible fix for ukwa/ukwa-pywb#53
- undo change in path resolvers to use os.path.join, just concatenate full_path + filename
- rewrite 'date' -> 'x-orig-archive-date' header to avoid confusion (eg. #548)
- bump version to rc7

* ci: attempt to fix travis build for 27, 35
2020-04-30 22:39:47 -07:00
Ilya Kreymer
92e459bda5
R6 - Various Fixes (#540)
* fixes for RC6:
- blockrecordloader: ensure record stream is closed after parsing one record 
- wrap HttpLoader streams in StreamClosingReader() which should close the connection even if stream not fully consumed
- simplify no_except_close
may help with ukwa/ukwa-pywb#53
- iframe: add allow fullscreen, autoplay
- wombat: update to latest, filter out custom wombat props from getOwnPropertyNames
- rules: add rule for vimeo

* cdx formatting: fix output=text to return plain text / non-cdxj output

* auto fetch fix:
- update to latest wombat to fix auto-fetch in rewriting mode
- fix /proxy-fetch/ endpoint for proxy mode recording, switch proxy-fetch to run in recording mode
- don't use global to allow repeated checks

* rewriter html check: peek 1024 bytes to determine if page is html instead of 128

* fix jinja2 dependency for py2
2020-02-20 21:53:00 -08:00
Ilya Kreymer
fa021eebab
Misc Fixes for RC5 (#534)
* misc fixes (rc 5):
- banner: only auto init banner if not in top-frame (check for no-frame mode and replay url is set)
- index: 'cdx+' fix for use as internal index: if cdx has a warc filename and offset, don't attempt default live web load
- improved self-redirect: avoid www2 -> www redirect altogether, not just for second redirect
- tests: update tests for improved self-redirect checking
- bump version to pywb-2.4.0-rc5
2020-01-17 17:38:08 -08:00
Ilya Kreymer
fb8aa7cbc1
revisit lookup fix (possible fix for ukwa/ukwa-pywb#53) (#530)
- if a revisit record has empty hash, don't attempt to lookup an original, simply use with empty payload
2020-01-11 11:12:31 -08:00
Ilya Kreymer
30680803e8
proxy mode: replay improvements for content not captured via proxy mode (#520)
- if preflight OPTIONS request, respond directly (don't attempt OPTIONS capture lookup)
- if preflight CORS request, ensure response has appropriate CORS headers, even if not captured
- wombat: update to latest wombat with updated Date() fixed timezone in proxy mode
- bump version to 2.4.0rc3
2019-11-12 12:41:04 -08:00
Ilya Kreymer
0d819aadeb
Localization and Banner Update (#517)
* banner: add banner and localization improvements from ukwa branch:
- show 'view all captures' link if not live
- optional logo
- loc options, if available
- banner options set via window.banner_info in banner.html

localization support: 
- add init_loc() to templateview
- loc available if config options set
- tests: add tests for loading localized messages, override .gitignore to allow test messages.mo
2019-11-11 09:51:26 -08:00
Ilya Kreymer
66ac3ca114
config limit: add query_limit config options to specify optional limit for both exact and prefix queries, addresses ukwa/ukwa-pywb#49 (#518) 2019-11-07 10:25:49 -08:00
Ilya Kreymer
6f79840b79
Docs, custom metadata improvements (#509)
* metadata/coll_config: don't confuse user metadata with collection config, don't display collection config settings as metadata (ukwa/ukwa-pywb#47)
- for collection template, add separate 'coll_config' dict, keep user metadata only in 'metadata' dict (default to empty)
- for static collections, assume metadata is in the 'metadata' dict of collection config
- for dynamic collections, load metadata.yaml into 'metadata' dict
- ensure 'metadata' key is passed to frame_insert
- ensure 'metadata' added consistently in framed and non-framed mode
- tests: update tests to ensure metadata is added consistently

- fuzzymatch: don't match 204 OPTIONS responses, update fuzzymatcher test

* documentation
- add documentation for metadata in ui-customization, rebuild docs, 
- add link to ui customization from configuring
- work on access control docs
* fixed small typo's in ui-customization.rst
* frontendapp: fix doc string

- misc: remove warning on urllib3 Retry init

- set version to pywb 2.4.0rc0

Co-Authored-By: John Berlin <n0tan3rd@gmail.com>
2019-10-27 01:39:52 +01:00
Ilya Kreymer
dc30c890a6 enable new transclusion system for tests (not enabled by default) 2019-09-11 09:34:57 -07:00
Ilya Kreymer
a3294c8b25 fix exception handling:
- don't rethrow HTTPException from WbException
- catch RequestRedirect to issue 307 redirect, check referrer
- tests: add referrer redirect tests with missing slash
defaults: don't enable new transclusions by default
2019-09-11 09:03:55 -07:00
Ilya Kreymer
e04adea7a8
transclusions/augmentations: add new video/audio translcusions script
- enabled with 'transclusions: 2' (default) config option
- legacy flash-supporting transclusions script (still working) available via 'transclusions: 1' or enable_flash_video_rewrite option
- add transclusions.js with support for poster image
- legacy vidrw: don't add undefined url as source
- locatization: wrap text in not_found.html to be translatable
2019-09-03 18:37:15 -04:00
Ilya Kreymer
7ac9a37bb4
acl: support for exact acl rules via '###' suffix
- ex: rule 'com,example)/###' matches http://example.com/ only
- wb-manager acl add/remove --exact-match adds/remove exact match rules
- tests: add tests for exact match queries, acl
2019-09-03 18:37:14 -04:00
Ilya Kreymer
3589240431
ui template overhaul to simplify customization:
- add base.html template with head, header, footer optional customizations
- refactor all top-level templates to extend base.html, except frame_insert.html
- localization: add placeholder support for jinja2 localization extension, '{% trans %}' and _('') tags, placeholder null localization
- refactor new query UI to support localization
- update some text to match localized versions used in ukwa-pywb, update test
2019-09-03 18:37:14 -04:00
Ilya Kreymer
42b8c3a22b
merge: additional fixes after merge of ukwa/pywb and 2.2
rewrite: remove custom modifiers for now, use oe_ for non-import css embeds
bump version to 2.3.dev0
2019-09-03 18:26:09 -04:00
Ilya Kreymer
54a4e38531
memento 404 fix: ensure timemap only includes memento headers on success 200 response
fuzzy match limit: add 'fuzzy_search_limit' option to default_filters in rules.yaml
default fuzzy matching search limit to 100 results to avoid timeouts for large result sets that don't have any matches
2019-09-03 18:24:01 -04:00
Ilya Kreymer
0a9ad5c8dc
timemap format fix: fixes ukwa-pywb/pywb#37
- ensure timemap returns full url-m warcserver supports 'memento_format' param which, if present, specifies
full format to use for memento links in timemap
- memento tests: timemap tests include full url-m, test both framed and frameless timemap responses
2019-09-03 18:24:01 -04:00
Ilya Kreymer
5da6122d83
memento timemap fix: further fix for ukwa/ukwa-pywb#37
- fix timemap in 'redirect-to-exact' mode, (ensure timegate redirect condition applies only to top-frame)
- tests: add additional timemap tests, with and without exact redirect
2019-09-03 18:24:00 -04:00
Ilya Kreymer
9b2ae35b93
acl optimization: fixes ukwa/ukwa-pywb#39
- don't parse json on every aclj line until key prefix matches, resulting in speed boost!
- convert aclj to dict (via cdxobject) only when match is found (disable aggregator source tracking)
2019-09-03 18:23:59 -04:00
Ilya Kreymer
ce0ed610bd
memento-fix: fix for ukwa/ukwa-pywb#37.
- support memento timegate on top-frame (when no timestamp is provided)
- treat top-frame no-timestamp url as canonical timegate
- tests: update tests, add memento redirect mode tests for timegate, timegate with accept-dt header
2019-09-03 18:19:59 -04:00