1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00

72 Commits

Author SHA1 Message Date
Ilya Kreymer
a4b770d34e new-pywb refactor!
frontendapp compatibility
- add support for separate not found page for 404s (not_found.html)
- support for exception handling with error template (error.html)
- support for home page (index.html)
- add memento headers for replay
- add referrer fallback check
- tests: port integration tests for front-end replay, cdx server
- not included: proxy mode, exact redirect mode, non-framed replay
- move unused tests to tests_disabled
- cli: add optional werkzeug profiler with --profile flag
2017-02-27 19:07:51 -08:00
Ilya Kreymer
3a584a1ec3 py3: all tests pass, at last!
but not yet py2... need to resolve encoding in rewriting issues
2016-02-23 13:26:53 -08:00
Ilya Kreymer
d98c1f6cf7 memento/api: add a new /collinfo.json end-point, enabled with 'enable_coll_info' config setting, which returns
the value fo collinfo.json template. Default template returns an entry for each handler route,
including the route path (id), title (name) and memento timegate and timemap paths, to be used with
an aggregator. Using a custom 'info_json' template can specify a different collinfo template, alternative to #69 (local aggregation)
Closes #146
2015-11-04 15:36:44 -08:00
Ilya Kreymer
e249f300e3 tests refactor! init pywb once per module, instead of once per test
refactor common init pattern to server_mock for now (can add fixtures also)
2015-10-14 20:34:46 -07:00
Ilya Kreymer
c2f99d6cfd replay/memento: always include 'Content-Location' for in no-redir mode replay (not just for memento timegate), #122 2015-07-19 00:11:25 -07:00
Ilya Kreymer
ea460bb0f0 cdxj: support cdx json output from cdx server with output='json' (not yet default)
cdx field renaming: canonical cdx field name changes
statuscode -> status
mimetype -> mime
original -> url
old names still accept for query/filtering, however, cdx json will use new names
ensures consistency between .cdxj field names and names used by cdx server json output
collections manager now creates .cdxj by default
bump version to 0.9.0b2!
2015-03-19 13:33:49 -07:00
Ilya Kreymer
fe1c32c8f7 cdxj: support loading cdxj (#76)
cdx obj: allow alt field names to be used (eg. mime, mimetype, m)
(status/statuscode/s) in querying and reading cdx
cdx minimal: (#75) now implies cdxj to avoid more formats
minimal includes digest always and mime when warc/revisit
tests for cdxj loading
indexing optimization: reuse same entry obj for records of same type
2015-03-19 12:36:49 -07:00
Ilya Kreymer
69613a0e25 tests: disable 'invalid config' test as its no longer applicable, fix default banner to just 'banner.html' 2015-02-25 13:18:32 -08:00
Ilya Kreymer
80dcb6ff27 rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
2015-02-17 17:51:45 -08:00
Ilya Kreymer
695245d9e8 wburl idn: more complete support for idn urls (#66)
add distinct to_iri() and to_uri() functions in WbUrl
internal representation is always as ascii uri
for rewriting, defaults to iri representation unless
'rewrite_ascii_only_urls' is set to true per collection
add wbrequest.get_url() to get url as either iri or uri to be passed
to templates
2015-01-26 11:07:59 -08:00
Ilya Kreymer
38e3bbbaef templates: add new 'not_found.html' template, which will be called for any missing replay request
instead of default error.html
'not_found_html' settable in the config per collection, as per #65
for not found index query, still use query.html but add condition to check for 0 results
add more query and replay not found
remove unused conditional (for search_view -- always exists)
2015-01-24 12:32:50 -08:00
Ilya Kreymer
ad5a43db76 replay redirect: ensure no timestamp redirect when range request is
present, alter test to include inexact timestamp
2014-12-23 21:19:39 -08:00
Ilya Kreymer
51919ed1e7 replay: make range cache available by default in replay_views since its
inited on first use. remove
separate subclass. 'enable_ranges' can be set to false to disable range
cache altogether
improve tests
2014-12-23 14:34:59 -08:00
Ilya Kreymer
c28304fd90 tests rangecache: added integration tests for range support via range
cache, using enable_cache option
2014-12-23 11:09:19 -08:00
Ilya Kreymer
5b9dcba15f video: add video rewriting use vidrw client side and youtube-dl on the server
add vi_ modifier:
-on record, gets video_info from youtube-dl, sends to proxy,
if any, via PUTMETA to create metadata record
-on playback, fetches special metadata record with video info and
returns to client as json
-vidrw script: fetches video info, if any, and attempts to replace
iframe and embed tags (so far) which are videos
wombat: export extract_url function, fix spaces and use object instance
semantics
2014-11-01 15:41:00 -07:00
Ilya Kreymer
4a1cc46fa3 framed replay: invert framed replay paradigm, replay always uses
canonical, no-modifier archival url (instead of mp_).
When using frames, the page redirects to a 'tf_' page, which then uses
replaceHistory() to change url back to canonical form.
memento: support for framed replay, include memento headers in top frame
bump version to 0.6.2
2014-10-18 11:21:07 -07:00
Ilya Kreymer
cede54f0c1 self-redir: remove referrer-based self-redirect check, as it may be
triggered incorrectly during refresh.. (will need to investigate more if
there's an edge-case to test against)
2014-10-17 08:54:03 -07:00
Ilya Kreymer
eaaefbfd24 * config cleanup: remove 'hostpaths' setting entirely, avoiding the need to specify host on which pywb
will run (this was cumbersome to maintain and not really useful)
ReferRedirect just checks that the current request host header, if present, matches that of the referrer
and checks that the coll and script name match.
* removed proxy_pac as it was also unneeded/unused and required use of the hostpaths
* added test for invalid CONNECT usage (405 response)
2014-08-20 02:02:47 -04:00
Ilya Kreymer
c3c7935546 Merge branch '0.5.4-work' into develop 2014-08-06 13:22:08 -07:00
Ilya Kreymer
501c942a6f tests: add test for rel self-redirect 2014-08-06 13:19:52 -07:00
Ilya Kreymer
1cd82c1bc4 proxy: move test to seperate file
cert: create seperate get_wildcard_cert for clarity
2014-08-06 12:39:06 -07:00
Ilya Kreymer
6e6688beb3 rewrite/testing: add additional test for live rewrite post, invalid post
htmlrewrite: annotate untestable sections (unimplemented, 2.6 only exceptions)
2014-08-04 22:51:43 -07:00
Ilya Kreymer
ef8d910d01 banner: remove client side 'capture_str' formatting, just output wbinfo.timestamp,
allow js to format as needed, also helps with #41
update tests to only look at timestamp
2014-08-04 22:51:42 -07:00
Ilya Kreymer
8d54153326 refactoring for better extensibility:
remove BaseContentView, move top-frame functionality to SearchPageWbUrlHandler
remove RewriteLiveView, fold functionality into the handler
move default mod setting into RewriteContent
2014-08-04 22:51:42 -07:00
Ilya Kreymer
160182ec48 rewrite: add 'bn_' banner only rewrite
cleanup rewrite_content/fetch_request api to take a full wb_url
add content-length to responses whenever possible (WbResponse) and static files
bump version to 0.5.2
2014-08-04 22:51:42 -07:00
Ilya Kreymer
a2d86fa495 Merge branch 'develop' into https-proxy 2014-08-04 22:01:16 -07:00
Ilya Kreymer
e1e8f679b2 rewrite/testing: add additional test for live rewrite post, invalid post
htmlrewrite: annotate untestable sections (unimplemented, 2.6 only exceptions)
2014-08-04 21:59:46 -07:00
Ilya Kreymer
924f71a4cc Merge branch 'develop' into https-proxy 2014-08-04 18:44:01 -07:00
Ilya Kreymer
86bc2f17ba banner: remove client side 'capture_str' formatting, just output wbinfo.timestamp,
allow js to format as needed, also helps with #41
update tests to only look at timestamp
2014-08-04 18:19:28 -07:00
Ilya Kreymer
492aaa4a01 Merge branch 'develop' into https-proxy 2014-08-04 13:00:25 -07:00
Ilya Kreymer
95028ab692 refactoring for better extensibility:
remove BaseContentView, move top-frame functionality to SearchPageWbUrlHandler
remove RewriteLiveView, fold functionality into the handler
move default mod setting into RewriteContent
2014-08-04 01:18:46 -07:00
Ilya Kreymer
2ca4757599 fix integration test for proxy_pac 2014-07-31 18:03:18 -07:00
Ilya Kreymer
b92eda77f6 rewrite: add 'bn_' banner only rewrite
cleanup rewrite_content/fetch_request api to take a full wb_url
add content-length to responses whenever possible (WbResponse) and static files
bump version to 0.5.2
2014-07-29 12:20:22 -07:00
Ilya Kreymer
7c57345363 proxy: add 'unaltered_replay' option to proxy_options to replay
all content unaltered (no rewriting html, no banner, no wombat)
use 'proxy_options' instead of 'routing_options', add additional
tests for proxy mode
2014-07-21 16:42:14 -07:00
Ilya Kreymer
e4297ddabe tests: add integration tests for $liveweb rewrite handler and replay
with fallback
2014-07-20 18:25:47 -07:00
Ilya Kreymer
1317b2b10f route selection via proxy auth!
refactor poute request parsing to happen in the actual router class instead of in the route
in proxy mode, add support for picking a route via proxy-auth
improve test for 'top' rewriting
2014-07-10 21:54:23 -07:00
Ilya Kreymer
fb07775d38 tests: add 'bad.cdx' for testing cdx lines with missing original for revisit,
missing/non-existant warc
2014-06-25 12:32:57 -07:00
Ilya Kreymer
913a1e9f31 warc: simplify recordloader a bit more, only response and request records
get parsed as http (excluding dns: and whois: uris)
All others have an '-' status and no headers parsing
tests: add test for zero-length revisits
2014-06-25 12:11:26 -07:00
Ilya Kreymer
80e80e97d3 replay: support 'framed_replay' option in config for both replay and live rewrite
split replay view into BaseContentView and ReplayView
refactor RewriteLiveHandler into RewriteLiveView
add additional tests for framed and non-framed mode
default to framed replay!
2014-06-14 18:26:19 -07:00
Ilya Kreymer
0d3f663ef1 rewrite: disable refer-redirect in case of POST, handle request w/o redirect
(can't use 307 because of FF)
2014-06-13 16:23:11 -07:00
Ilya Kreymer
41e1809039 update wombat.js (support for write override, fill in WB_wombat_location on new iframe)
disable 307 redirects as FF always displays modal confirmation for these, even for same host
2014-06-11 20:12:05 -07:00
Ilya Kreymer
0c9d88f032 POST replay: treat POST form data same as get query, no '&&&' marker
additional testing POST
2014-06-11 11:17:06 -07:00
Ilya Kreymer
e2349a74e2 replay: better POST support via post query append!
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
f9710d033c fix integration test for 307
update head_insert for new wombat
remove redundant host jinja func, use 'urlsplit' instead
2014-05-30 11:17:12 -07:00
Ilya Kreymer
923421d637 rewrite_content: add a few tests for cs_, js_, remove redundant except 2014-05-16 22:43:53 -07:00
Ilya Kreymer
2600d870d7 improved test: dsrules remove redundant check
static: check invalid static paths and file_wrapper
memento: check non-memento paths
test debug handlers and custom '-cdx' suffix
2014-05-16 22:17:51 -07:00
Ilya Kreymer
ca33287051 test: move non-surt-cdx sample to non-surt-cdx/ dir for clarity / avoid confusion
when bulk loading cdx/ dir (surt and non-surt cdx should NOT be mixed)
2014-05-16 21:21:14 -07:00
Ilya Kreymer
7d236af7d7 cdx: fix creation and add test for non-surt cdx (pywb-nonsurt/ test)
archiveindexer: -u option to generate non-surt cdx
tests: full test coverage for cdxdomainspecific (fuzzy and custom canon)
2014-05-16 21:16:50 -07:00
Ilya Kreymer
b0b0adb043 refactor: rename pywb.core -> pywb.webapp
move perms/test/test_perms_policy -> tests/perms_fixture
for rules file, use single DEFAULT_RULES_FILE import
2014-04-04 10:09:26 -07:00
Ilya Kreymer
91184426b7 test coverage pass:
refactor and cleanup to improve coverage for corner cases
2014-04-02 13:16:54 -07:00