1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

13 Commits

Author SHA1 Message Date
Sebastian Nagel
106a9e9200
IndexHandler: report BadRequestException as error while loading index (#625) 2021-04-27 12:47:13 -07:00
Sebastian Nagel
c62b1bc987
Warcserver / CDXJ API: properly handle unsupported output formats (#623)
- add unit test to verify unknown output formats are handled
  if output fields param is in request
2021-04-26 20:33:37 -07:00
Ilya Kreymer
92e459bda5
R6 - Various Fixes (#540)
* fixes for RC6:
- blockrecordloader: ensure record stream is closed after parsing one record 
- wrap HttpLoader streams in StreamClosingReader() which should close the connection even if stream not fully consumed
- simplify no_except_close
may help with ukwa/ukwa-pywb#53
- iframe: add allow fullscreen, autoplay
- wombat: update to latest, filter out custom wombat props from getOwnPropertyNames
- rules: add rule for vimeo

* cdx formatting: fix output=text to return plain text / non-cdxj output

* auto fetch fix:
- update to latest wombat to fix auto-fetch in rewriting mode
- fix /proxy-fetch/ endpoint for proxy mode recording, switch proxy-fetch to run in recording mode
- don't use global to allow repeated checks

* rewriter html check: peek 1024 bytes to determine if page is html instead of 128

* fix jinja2 dependency for py2
2020-02-20 21:53:00 -08:00
Ilya Kreymer
273b3eec30
warcserver/cdx query: filter improvements (#285)
- pywb.utils.format: add query_to_dict() to convert query string with support for list for certain params
- support multiple values for 'filter' cdx server param (fixes #284)
- pywb.utils.format: add to_bool() to convert string/int to bool (eg. for query args)
- fuzzymatch: add 'allowFuzzy' (default to true) to allow disabling fuzzy matcher
- tests: fuzzymather: test disabling fuzzy matcher with allowFuzzy=0
- tests: cdx-server api: add multiple filter tests, with and without fuzzy matching
2018-01-29 15:08:50 -08:00
Ilya Kreymer
459cd706d3 include the collection in Memento Link outputs: (#259)
* include the collection in Memento Link outputs:
- add new cdx 'source-coll' field, storing only the collection
- ensure rel="collection" property included in the TimeMap and Link header
- tests: update all tests to include the 'source-coll' property
- docs: add 'collection provenance' to auto-all collection configuration docs
2017-10-23 15:33:23 -07:00
Ilya Kreymer
33eb4a4ae1 cdx-server/frontendapp refactor: (#237)
frontendapp/warcserver improvements:
- support '/cdx' endpoint for every collection, exposing standard cdx-server api
- remove '-cdx' endpoint in warcserver, redundant with index and frontend /cdx endpoint
- warcserver: simplify paths! support static paths (/A, /B) + dynamic paths (/<path>) on same endpoint
2017-09-06 23:25:30 -07:00
Ilya Kreymer
97182b71b7 refactor:
- merge pywb.urlrewrite -> pywb.rewrite, remove obsolete stuff (rewrite_content.py, rewrite_live.py, dsrules.py)
- move wbrequestresponse -> pywb.apps
- move pywb.webapp.handlers -> pywb.apps.static_handler
- remove pywb.webapp, pywb.framework packages
- disable old header_rewriter, content_rewriter tests
- finish renaming from previous warcserver refactor
- all other tests passing!
2017-05-23 19:08:29 -07:00
Ilya Kreymer
58f39f0558 setup: update to warcio==1.2
add ensure_http_headers=True when reading WARC records
tests: fix pytest warnings, use webtest.TestApp instead of TestApp
2017-04-29 13:47:54 -07:00
Ilya Kreymer
a4b770d34e new-pywb refactor!
frontendapp compatibility
- add support for separate not found page for 404s (not_found.html)
- support for exception handling with error template (error.html)
- support for home page (index.html)
- add memento headers for replay
- add referrer fallback check
- tests: port integration tests for front-end replay, cdx server
- not included: proxy mode, exact redirect mode, non-framed replay
- move unused tests to tests_disabled
- cli: add optional werkzeug profiler with --profile flag
2017-02-27 19:07:51 -08:00
Ilya Kreymer
3a584a1ec3 py3: all tests pass, at last!
but not yet py2... need to resolve encoding in rewriting issues
2016-02-23 13:26:53 -08:00
Ilya Kreymer
ea460bb0f0 cdxj: support cdx json output from cdx server with output='json' (not yet default)
cdx field renaming: canonical cdx field name changes
statuscode -> status
mimetype -> mime
original -> url
old names still accept for query/filtering, however, cdx json will use new names
ensures consistency between .cdxj field names and names used by cdx server json output
collections manager now creates .cdxj by default
bump version to 0.9.0b2!
2015-03-19 13:33:49 -07:00
Ilya Kreymer
80f2da9548 refactor: move configs/config.yaml to root again
remove cdx-server specific config, instead make cdx server api-only
path configurable from regular config
2014-04-02 21:26:53 -07:00
Ilya Kreymer
0bf651c2e3 add cdx_server app!
port wsgi cdx server tests to test new app!
move base handlers to basehandlers in framework pkg
(remove werkzeug dependency)
2014-03-02 23:41:44 -08:00