1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

2326 Commits

Author SHA1 Message Date
Ilya Kreymer
5e9b13e267
proxy mode: don't rewrite xml for ajax requests. Support python 3.8 (#563)
* rewrite:
- don't rewrite xml in proxy mode / html-insert only mode
- ajax: if sec-fetch-mode is set to non-navigate, also treat as 'ajax'

* ci: build python 3.8, ignore 2.7 failures

* reqs: use released ujson for extra_reqs

* hmac: add digestmod, fix for py3.8
2020-06-08 09:40:59 -07:00
Ilya Kreymer
ed89fcc6f8 rules: update yt rules 2020-06-01 19:06:32 -07:00
Ilya Kreymer
7e56ca8ca2
RC7 Fixes (#561)
* misc fixes for 2.4.0rc7:
- warcserver: when parsing headers to check for redirect, reserialized headers
may be of different length then original, causing warcserver->app response to hang
now adjusting the content-length on the warc record and also not including a fixed
length when serving warcserver->app, possible fix for ukwa/ukwa-pywb#53
- undo change in path resolvers to use os.path.join, just concatenate full_path + filename
- rewrite 'date' -> 'x-orig-archive-date' header to avoid confusion (eg. #548)
- bump version to rc7

* ci: attempt to fix travis build for 27, 35
v-2.4.0-rc7
2020-04-30 22:39:47 -07:00
micronn
871a05a76a
proxy mode: respect settings when started from cli (#557) 2020-04-30 22:38:13 -07:00
John Vandenberg
be90e06742
MANIFEST.in: Create (#559)
Fixes https://github.com/webrecorder/pywb/issues/558
2020-04-30 16:21:20 -07:00
thomas536
8f0ce45b27
docs: fix proxy default timestamp yaml example (#544)
Per the code, the key should use an underscore, not a hyphen. It also seems like the value is parsed as a number instead of a string, which then fails with a type error later, so quote it to force it to be a string.

```
$ pywb
2020-03-10 21:06:33,084: [INFO]: Proxy enabled for collection "web"
Traceback (most recent call last):
  File "/tmp/pywb_venv/bin/pywb", line 8, in <module>
    sys.exit(wayback())
  File "/tmp/pywb_venv/local/lib/python2.7/site-packages/pywb/apps/cli.py", line 20, in wayback
    desc='pywb Wayback Machine Server').run()
  File "/tmp/pywb_venv/local/lib/python2.7/site-packages/pywb/apps/cli.py", line 89, in __init__
    self.application = self.load()
  File "/tmp/pywb_venv/local/lib/python2.7/site-packages/pywb/apps/cli.py", line 181, in load
    return FrontEndApp(custom_config=self.extra_config)
  File "/tmp/pywb_venv/local/lib/python2.7/site-packages/pywb/apps/frontendapp.py", line 79, in __init__
    self.init_proxy(config)
  File "/tmp/pywb_venv/local/lib/python2.7/site-packages/pywb/apps/frontendapp.py", line 569, in init_proxy
    if not self.ALL_DIGITS.match(self.proxy_default_timestamp):
TypeError: expected string or buffer
```
2020-04-30 16:18:44 -07:00
Ivo Branco
8d8cf7eb58
Fix documentation: replace fl to fields on doc webrecorder/pywb#542 (#543) 2020-04-30 16:16:07 -07:00
Daniel Bicho
6b014d05bf
try to remove headers with illegal characters. arquivo/pwa-technologies#774 (#536) 2020-04-30 16:14:04 -07:00
Ilya Kreymer
92e459bda5
R6 - Various Fixes (#540)
* fixes for RC6:
- blockrecordloader: ensure record stream is closed after parsing one record 
- wrap HttpLoader streams in StreamClosingReader() which should close the connection even if stream not fully consumed
- simplify no_except_close
may help with ukwa/ukwa-pywb#53
- iframe: add allow fullscreen, autoplay
- wombat: update to latest, filter out custom wombat props from getOwnPropertyNames
- rules: add rule for vimeo

* cdx formatting: fix output=text to return plain text / non-cdxj output

* auto fetch fix:
- update to latest wombat to fix auto-fetch in rewriting mode
- fix /proxy-fetch/ endpoint for proxy mode recording, switch proxy-fetch to run in recording mode
- don't use global to allow repeated checks

* rewriter html check: peek 1024 bytes to determine if page is html instead of 128

* fix jinja2 dependency for py2
2020-02-20 21:53:00 -08:00
Ilya Kreymer
fa021eebab
Misc Fixes for RC5 (#534)
* misc fixes (rc 5):
- banner: only auto init banner if not in top-frame (check for no-frame mode and replay url is set)
- index: 'cdx+' fix for use as internal index: if cdx has a warc filename and offset, don't attempt default live web load
- improved self-redirect: avoid www2 -> www redirect altogether, not just for second redirect
- tests: update tests for improved self-redirect checking
- bump version to pywb-2.4.0-rc5
v-2.4.0-rc5
2020-01-17 17:38:08 -08:00
Ilya Kreymer
93ce4f6f7a
Banner fix (#531)
* banner: fix banner display for non-framed and proxy mode replay, ensure new 'View All Captures' ancillary section is also shown

* bump version to 2.4.0rc4
v-2.4.0rc4
2020-01-11 13:05:28 -08:00
Ilya Kreymer
fb8aa7cbc1
revisit lookup fix (possible fix for ukwa/ukwa-pywb#53) (#530)
- if a revisit record has empty hash, don't attempt to lookup an original, simply use with empty payload
2020-01-11 11:12:31 -08:00
Ilya Kreymer
f0b9d5b8e8
Rewriting fix for DASH FB and document.write (#529)
* rewrite fixes:
- dash rewrite fix for fb: when rewriting, match quoted '"dash_prefetched_representation_ids"' as well as w/o quotes,
update tests to ensure rewriting both old and new formats
- wombat update to fix #527: ensure document.write() doesn't accidentally remove end-tag if end-tag was not lowercase (see webrecorder/wombat#21)

* tests: fix recorder cookie filtering test, use https://www.google.com/ for testing

* appveyor: fix appveyor builds
2020-01-11 10:44:49 -08:00
Noah Levitt
523e35d973 fuzzy matching: apply fuzzy match if url prefix and regex match, even if no groups are captured by the regex (#524) 2019-12-20 17:20:45 -08:00
Ilya Kreymer
0be84520ed
index query limit: ensure 'limit' is correctly applied to XmlQueryIndexSource, fixes ukwa/ukwa-pywb#49 (#523) 2019-11-22 12:25:18 -08:00
Ilya Kreymer
30680803e8
proxy mode: replay improvements for content not captured via proxy mode (#520)
- if preflight OPTIONS request, respond directly (don't attempt OPTIONS capture lookup)
- if preflight CORS request, ensure response has appropriate CORS headers, even if not captured
- wombat: update to latest wombat with updated Date() fixed timezone in proxy mode
- bump version to 2.4.0rc3
2019-11-12 12:41:04 -08:00
Ilya Kreymer
c7fdfe72a7
Restrict POST query size (#519)
* indexing: restrict POST body appended to query to 16384, avoid reading very large POST requests on indexing
2019-11-12 12:38:01 -08:00
Ilya Kreymer
0d819aadeb
Localization and Banner Update (#517)
* banner: add banner and localization improvements from ukwa branch:
- show 'view all captures' link if not live
- optional logo
- loc options, if available
- banner options set via window.banner_info in banner.html

localization support: 
- add init_loc() to templateview
- loc available if config options set
- tests: add tests for loading localized messages, override .gitignore to allow test messages.mo
v-2.4.0-rc2
2019-11-11 09:51:26 -08:00
Ilya Kreymer
66ac3ca114
config limit: add query_limit config options to specify optional limit for both exact and prefix queries, addresses ukwa/ukwa-pywb#49 (#518) v-2.4.0-rc1 2019-11-07 10:25:49 -08:00
Ilya Kreymer
fe09d9991e
rewrite fix: don't inject checkThis function into every script, now handled by wombat via prototype (#516)
update to latest wombat (includes webrecorder/wombat#19, webrecorder/wombat#18, webrecorder/wombat#17)
2019-11-06 16:55:34 -08:00
mark f beasley
44dcd39c02 UI: tweak query page to be responsive (#515) v-2.4.0-rc0 2019-11-01 15:30:22 -07:00
Ilya Kreymer
02cc7035e8
query: fix query for IE11, don't use ES6 syntax, add URL polyfill (#514) 2019-10-31 17:09:42 -07:00
Yvan
8baa8cbdb7 docs: fix doc typo in BaseWarcServer example (#507) 2019-10-31 17:09:25 -07:00
Ilya Kreymer
fed3263ac6
Docs: Fix access controls and ui customizations docs links (#513)
* docs: ensure docs added to access controls, fix typos

* begin changelist for 2.4.0
2019-10-31 16:56:36 -07:00
Ilya Kreymer
6f79840b79
Docs, custom metadata improvements (#509)
* metadata/coll_config: don't confuse user metadata with collection config, don't display collection config settings as metadata (ukwa/ukwa-pywb#47)
- for collection template, add separate 'coll_config' dict, keep user metadata only in 'metadata' dict (default to empty)
- for static collections, assume metadata is in the 'metadata' dict of collection config
- for dynamic collections, load metadata.yaml into 'metadata' dict
- ensure 'metadata' key is passed to frame_insert
- ensure 'metadata' added consistently in framed and non-framed mode
- tests: update tests to ensure metadata is added consistently

- fuzzymatch: don't match 204 OPTIONS responses, update fuzzymatcher test

* documentation
- add documentation for metadata in ui-customization, rebuild docs, 
- add link to ui customization from configuring
- work on access control docs
* fixed small typo's in ui-customization.rst
* frontendapp: fix doc string

- misc: remove warning on urllib3 Retry init

- set version to pywb 2.4.0rc0

Co-Authored-By: John Berlin <n0tan3rd@gmail.com>
2019-10-27 01:39:52 +01:00
John Berlin
35004c1675 Fixed calendar view dropping query parameters by using encodeURIComponent fixes #510 (#512) 2019-10-26 09:25:13 +01:00
Ilya Kreymer
59b735ee99
tests: fix all tests for updated to webenact, use https when possible for webenact and example page tests (#511) 2019-10-26 09:03:25 +01:00
Ilya Kreymer
9ce324212a
Merge pull request #453 from webrecorder/ukwa-merge
Merge ukwa/pywb changes into mainline!
2019-10-08 14:13:44 -07:00
Ilya Kreymer
dc30c890a6 enable new transclusion system for tests (not enabled by default) v-2.4.0-beta 2019-09-11 09:34:57 -07:00
Ilya Kreymer
2f6fb74ea1 bump version to 2.4.0 2.4.0-beta 2019-09-11 09:17:41 -07:00
Ilya Kreymer
a3294c8b25 fix exception handling:
- don't rethrow HTTPException from WbException
- catch RequestRedirect to issue 307 redirect, check referrer
- tests: add referrer redirect tests with missing slash
defaults: don't enable new transclusions by default
2019-09-11 09:03:55 -07:00
John Berlin
802b9fa4f5
apps:
- frontendapp.py: restored the pulling out of collection route creation into its own function
 - rewriterapp.py: reformated file and added documentation

 utils:
  - geventserver.py: added documentation
  - wbexception.py: updated documentation
2019-09-10 14:45:05 -04:00
John Berlin
379f7de1ba
manual
- split out the ui customization documentation into its own file ui-customization.rst
 - added initial documentation covering the new template setup to the ui-customization.rst
2019-09-05 18:13:12 -04:00
John Berlin
d6ab31d529
templates:
- migrated proxy templates to use new template setup
2019-09-05 16:41:14 -04:00
John Berlin
5ab97a41c2
templates:
- not_found.html: removed un-needed closing div
2019-09-04 15:39:47 -04:00
John Berlin
69f7f02006
static files:
- re-formatted: default_banner.js, queryWorker.js, search.js, wb_frame.js
2019-09-04 14:59:50 -04:00
John Berlin
ae78a955de
templates
- base.html: removed including the query pages query.css in every page
 - query.html: include query.css in head block
2019-09-04 14:57:09 -04:00
John Berlin
e34606cecb
static files:
- formatted them according to project
 - query.js: ensured correct timestamp to date function is used
templates:
 - head_insert.html: is_framed check is no longer a string it is a boolean, corrected redirect check
tests:
 - test_html_rewriter.py: added missing rewrite modifier test checking i.style containing a background image html encoded
 warcserver:
  - added missing quote_plus import and cleaned up imports
2019-09-04 14:28:54 -04:00
John Berlin
61b6ff21e1
added missing comma to setup.py's tests_require list
removed package.json from project as it is no longer required
removed npm install command from .travis/install.sh
2019-09-04 13:41:56 -04:00
John Berlin
8d98b9111e
added additional code documentation in order to meet the documentation requirements of pywb 2019-09-03 18:40:35 -04:00
John Berlin
9a40d29ac3
added lxml requirments entry to extra_requirments.txt and documented pywb.warcserver.index.indexsource.XmlQueryIndexSource 2019-09-03 18:39:31 -04:00
John Berlin
41c37129c0
documented and cleaned up the aclmanager.py2 2019-09-03 18:37:46 -04:00
John Berlin
1a7fdd0d70
documented and cleaned up the aclmanager.py 2019-09-03 18:37:45 -04:00
Ilya Kreymer
ce10d9af7c
docstrings: add docstrings, remove duplicate call, cleanup ACLManager init 2019-09-03 18:37:45 -04:00
Ilya Kreymer
e04adea7a8
transclusions/augmentations: add new video/audio translcusions script
- enabled with 'transclusions: 2' (default) config option
- legacy flash-supporting transclusions script (still working) available via 'transclusions: 1' or enable_flash_video_rewrite option
- add transclusions.js with support for poster image
- legacy vidrw: don't add undefined url as source
- locatization: wrap text in not_found.html to be translatable
2019-09-03 18:37:15 -04:00
Ilya Kreymer
7ac9a37bb4
acl: support for exact acl rules via '###' suffix
- ex: rule 'com,example)/###' matches http://example.com/ only
- wb-manager acl add/remove --exact-match adds/remove exact match rules
- tests: add tests for exact match queries, acl
2019-09-03 18:37:14 -04:00
Ilya Kreymer
3589240431
ui template overhaul to simplify customization:
- add base.html template with head, header, footer optional customizations
- refactor all top-level templates to extend base.html, except frame_insert.html
- localization: add placeholder support for jinja2 localization extension, '{% trans %}' and _('') tags, placeholder null localization
- refactor new query UI to support localization
- update some text to match localized versions used in ukwa-pywb, update test
2019-09-03 18:37:14 -04:00
Ilya Kreymer
1b0c9c6895
misc fixes from merge:
- xmlqueryindexsource: fix typo, improve tests to be more clear with url encoding
- exceptions: move UpstreamException and AppNotFound to wbexceptions
- docker: ensure sample_archive is added to Dockerfile still
- yaml: use python Loader to support custom intrepolation of env vars
- content rewrite: ensure custom exceptions passed up to frontendapp
2019-09-03 18:30:42 -04:00
Ilya Kreymer
42b8c3a22b
merge: additional fixes after merge of ukwa/pywb and 2.2
rewrite: remove custom modifiers for now, use oe_ for non-import css embeds
bump version to 2.3.dev0
2019-09-03 18:26:09 -04:00
Ilya Kreymer
e92b1969e8
xmlindexsource: fix tests for double escaping of query (for ukwa/ukwa-pywb#29) 2019-09-03 18:24:03 -04:00