1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

144 Commits

Author SHA1 Message Date
Ilya Kreymer
39e824cb3a live rewite proxy: decouple having http/https proxy from recording,
move youtubedl wrapper calls, metadata add calls to live rewrite proxy class for easier extension
closes #141 also improves #136
2015-10-23 11:57:12 -07:00
Ilya Kreymer
c7224ecceb tests: use proxy str directly (imrpove test cov) 2015-10-23 11:54:16 -07:00
Ilya Kreymer
4ba4521b56 tests: use random port instead of 8080 for cli test to avoid conflicts with running services 2015-10-23 11:53:28 -07:00
Ilya Kreymer
e37636de84 cdxindexer: if latest ujson (with forward slash not-escaping) is available, use that when indexing, closes #140
tests: update indexer CDXJ tests to be order-independent
travis: install ujson for testing
2015-10-22 17:46:05 -07:00
Ilya Kreymer
e249f300e3 tests refactor! init pywb once per module, instead of once per test
refactor common init pattern to server_mock for now (can add fixtures also)
2015-10-14 20:34:46 -07:00
Ilya Kreymer
b612c584de tests: test fixes for windows 2015-10-13 21:36:27 -07:00
Ilya Kreymer
6f7bd8c291 proxy resolvers: add tests for ip-based resolver
cache: default cache returns empty instead of raise KeyError on invalid key, to be consistent with uwsgi
2015-10-11 17:46:12 -07:00
Ilya Kreymer
31912b3bf7 proxy: update tests for new use_banner, use_client_rewrite options, #107 2015-09-09 13:22:32 -07:00
Ilya Kreymer
e1a9334a54 tests: update test to match cdx-convert 2015-08-25 23:06:00 +03:00
Ilya Kreymer
63c6efc851 autocolls test: patch wsgiref not waitress as it is default 2015-07-31 09:26:48 -07:00
Ilya Kreymer
f2a2c86552 tests: proxy check to ensure content-length header is always present in proxy mode 2015-07-30 11:06:44 -07:00
Ilya Kreymer
c2f99d6cfd replay/memento: always include 'Content-Location' for in no-redir mode replay (not just for memento timegate), #122 2015-07-19 00:11:25 -07:00
Ilya Kreymer
66f5ad62b3 memento: when redir_to_exact is false, don't redirect latest replay/timegate to current timestamp, but return directly latest capture.
when memento enabled, the timegate now follows memento pattern 2.2  (http://tools.ietf.org/html/rfc7089#section-4.2.2)
also return content-location instead of location, update memento no-redirect tests to match new behavior. closes #122
2015-07-18 23:30:31 -07:00
Ilya Kreymer
080587516b youtube-dl tests: use mock youtube-dl info for tests 2015-06-27 20:46:55 -07:00
Ilya Kreymer
f0359877f0 youtube-dl: remove from dependency, installation is optional. Return 404 if attempting live
proxy of videos and youtube-dl is not available (the only use case).
HTTPParser wrapping logic no longer needed in latest versions
Modify tests to only run if youtube-dl is installed in cases where it is not available #118
2015-06-27 16:11:59 -07:00
Ilya Kreymer
06fcc89de6 readers: support 'content-encoding: deflate' using different zlib decompression options
support default and alt settings for attempting to decompress deflate stream
tests: add tests with httpbin.org/deflate Fixes #115
2015-06-24 13:11:33 -07:00
Ilya Kreymer
7bf8b97cb0 tests: add tests for root collection access, and also a custom handler passed to pywb_init
(a simple redirect handler)
2015-04-17 11:48:50 -07:00
Ilya Kreymer
307809bbe9 live-rewrite-server: switch to 'inverse' frame mode by default,
switch from /rewrite/ to /live/ path, update tests
2015-04-13 13:00:06 -07:00
Ilya Kreymer
f9bd2ba55a jinja template: use shared template in J2Template, init on first use 2015-04-03 10:43:39 -07:00
Ilya Kreymer
4a85869427 cli refactor: use classes in cli to allow custom options
get rid of custom init for live_rewrite_handler, just use create_wb_router()
with custom config for consistent init
2015-04-03 10:43:39 -07:00
Ilya Kreymer
a34607764e manager: validate name on collection init: must start with wordchar and can contain wordchar or - 2015-04-03 01:18:35 -07:00
Ilya Kreymer
8bd6787595 'inverse' framed replay: ensure memento headers point to actual memento in inverse framed replay
add additional test for inverse framed replay, #92
fix framed replay url replace slash
2015-04-01 16:21:44 -07:00
Ilya Kreymer
dd30e3f2a7 refactor: fixes for compat with latest certauth>=1.1.0 2015-03-30 09:38:42 -07:00
Ilya Kreymer
90eee03cdb fixes for windows:
indexing: ensure '/' always written to cdx
autoindex: improved test case, ensure threads exit with join
style: fix long lines
2015-03-25 10:56:53 -07:00
Ilya Kreymer
ec7a29a3ba static paths: ensure consistent renaming of static/default -> static/__pywb for bundled static path 2015-03-23 16:15:37 -07:00
Ilya Kreymer
da7532a1f8 wb-manager: rename 'migrate' to 'cdx-convert' for clarity 2015-03-23 11:05:02 -07:00
Ilya Kreymer
ae363ad368 autoindex and cli: add autoindex to cli with 'wayback -a' option, #81 2015-03-22 23:03:39 -07:00
Ilya Kreymer
e8db31d066 cli: improve wayback cli to take optional port, threads and working dir arguments
switch to waitress as default WSGI server instead of wsgiref
2015-03-22 21:50:56 -07:00
Ilya Kreymer
733642551d manager: support autoindexing! (#91) wb-manager autoindex will use watchdog library to detect creation/updates
to any warc/arc in specified collection or across all and update autoindex cdx
cdx indexing: add --dir-root option to specify custom relative root dir for filenames used in cdx
2015-03-22 17:55:38 -07:00
Ilya Kreymer
b43a7f94f3 manager: add cdx -> cdxj migration tool #80, which will convert all cdxs in a directory to cdxj, removing original files
migration will also recanonicalize the urlkey to surt form
add migration test using non-surt, 9-field cdx (created from samples)
cdxindexer: fix multi warc->multi cdx indexing options
2015-03-19 20:57:33 -07:00
Ilya Kreymer
c5b5c8ee4b manager: fix index path to index.cdxj 2015-03-19 13:41:48 -07:00
Ilya Kreymer
ea460bb0f0 cdxj: support cdx json output from cdx server with output='json' (not yet default)
cdx field renaming: canonical cdx field name changes
statuscode -> status
mimetype -> mime
original -> url
old names still accept for query/filtering, however, cdx json will use new names
ensures consistency between .cdxj field names and names used by cdx server json output
collections manager now creates .cdxj by default
bump version to 0.9.0b2!
2015-03-19 13:33:49 -07:00
Ilya Kreymer
fe1c32c8f7 cdxj: support loading cdxj (#76)
cdx obj: allow alt field names to be used (eg. mime, mimetype, m)
(status/statuscode/s) in querying and reading cdx
cdx minimal: (#75) now implies cdxj to avoid more formats
minimal includes digest always and mime when warc/revisit
tests for cdxj loading
indexing optimization: reuse same entry obj for records of same type
2015-03-19 12:36:49 -07:00
Ilya Kreymer
73f24f5a2b manager: fixes for windows: use shutil.move instead of os.rename to allow move to
existing file
tests: reset workdir before deleting temp dir
2015-03-18 13:14:05 -07:00
Ilya Kreymer
bfe590996b auto-config: add support for loading from root ./static/ directory,
available under /static/__shared/ path
default path changed from /static/default -> /static/__pywb/
rename wayback-manager to wb-manager
2015-03-17 19:05:39 -07:00
Ilya Kreymer
4b45e789df templates: ensure shared templates are loaded from root templates/ subdir
manager: add shared templates to templates subdir, not root dir #55 and #74
2015-03-16 19:57:28 -07:00
Ilya Kreymer
2f6780a576 rename for 0.9.0:
rename default templates package from ui/* templates to templates/*
rename default subdirs: warcs -> archive, cdx -> indexes
2015-03-16 18:48:09 -07:00
Ilya Kreymer
19b8650891 manager: templates: add collections manager (#74) commands for adding, removing and listing
available ui templates. Support for both collection and shared templates.
confirmation for overwrite/remove
updated full template list in default_config and added tests
2015-03-16 16:55:06 -07:00
Ilya Kreymer
be5139b635 fix tests for coll listing, #78
config override: when loading from coll-specific config.yaml, resolve
relative paths to that collection, not to root #55
2015-03-15 22:23:08 -07:00
Ilya Kreymer
30454abb6b metadata: add support for user-defined per-collection metadata! #78
metadata stored in wbrequest.user_metadata and available to all templates

collections manager: refactor to use subparsers, add list collections and set metadata commands
update tests for new commands
index template: use user metadata title for collections listing
search template: display all metadata and title, if available
2015-03-15 21:24:15 -07:00
Ilya Kreymer
b417b47835 collections manager: support for merge when adding warc, explicit --index-warcs
option to index and merge instead of reindexing whole dir, #74
additional testing for recursive indexing, index merge
timeutils: add timestamp20_now() function
2015-03-14 14:56:15 -07:00
Ilya Kreymer
759d151551 tests: add test for directory auto collection loader,
collection manager and new 6-field minimal cdx format
2015-03-13 19:53:50 -07:00
Ilya Kreymer
69613a0e25 tests: disable 'invalid config' test as its no longer applicable, fix default banner to just 'banner.html' 2015-02-25 13:18:32 -08:00
Ilya Kreymer
80dcb6ff27 rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
2015-02-17 17:51:45 -08:00
Ilya Kreymer
9623f95439 memento: add rel="memento" header to timegate as well, improve memento test, clearly differntiate between
timegate redirect and intermediate resource redirect, related to #70
2015-02-16 09:59:03 -08:00
Ilya Kreymer
55426e7619 memento: fix headers to be more consistent for framed replay. when using
frames, outer frames 'mirrors' mementos of the inner frame to be
discoverable by client side memento tools, tracked via #70
2015-01-29 22:27:15 -08:00
Ilya Kreymer
695245d9e8 wburl idn: more complete support for idn urls (#66)
add distinct to_iri() and to_uri() functions in WbUrl
internal representation is always as ascii uri
for rewriting, defaults to iri representation unless
'rewrite_ascii_only_urls' is set to true per collection
add wbrequest.get_url() to get url as either iri or uri to be passed
to templates
2015-01-26 11:07:59 -08:00
Ilya Kreymer
38e3bbbaef templates: add new 'not_found.html' template, which will be called for any missing replay request
instead of default error.html
'not_found_html' settable in the config per collection, as per #65
for not found index query, still use query.html but add condition to check for 0 results
add more query and replay not found
remove unused conditional (for search_view -- always exists)
2015-01-24 12:32:50 -08:00
Ilya Kreymer
4c08a6a064 video work: improved yt handling:
- disable yt using yt api, for forced html/flash, diable on load
- use yt error event to detect error
- better fallback on recorded video
use seperate cache for range and video info tracking
fix yt rules query to account for & and ?
2014-12-26 13:02:47 -08:00
Ilya Kreymer
ad5a43db76 replay redirect: ensure no timestamp redirect when range request is
present, alter test to include inexact timestamp
2014-12-23 21:19:39 -08:00