1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-30 10:45:31 +02:00

37 Commits

Author SHA1 Message Date
Ilya Kreymer
5d80d2d891 replay: change strip_scheme() to strip_scheme_www() to also strip away www. prefix for self-redirect checking, #73 2015-02-22 22:51:35 -08:00
Ilya Kreymer
80dcb6ff27 rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
2015-02-17 17:51:45 -08:00
Ilya Kreymer
9623f95439 memento: add rel="memento" header to timegate as well, improve memento test, clearly differntiate between
timegate redirect and intermediate resource redirect, related to #70
2015-02-16 09:59:03 -08:00
Ilya Kreymer
afe49a91f4 rewrite: more fixes for IDN #66 - add _do_percent_encode field to wburl itself
defaults to true, may be disabled with 'punycode_links'
remove wbrequest and urlrewriter from get_url path, simply call wb_url.get_url() to get properly formatted url
2015-02-14 20:55:36 -08:00
Ilya Kreymer
f9452bf48e rewrite: refactor IDN support: instead of returning IRI, return utf-8 %-encoded url
remove support for  returning IRI, as that requires detecting charset, instead just use %-encoded form
and let browser decode. Should address #66

Add rewrite option 'punycode_links_only' (default to false) to skip the %-encoded conversion of host, and just return punycode.

wombat: use getAttribute('href') on <a> tag to get original url, not punycode version

replay: add extra sanity check on Location header to ensure utf-8
2015-02-14 17:26:39 -08:00
Ilya Kreymer
757345d317 replay api: make ReplayView overridable in WBHandler subclass,
allow custom content loader callable
2015-01-29 20:10:41 -08:00
Ilya Kreymer
ad5a43db76 replay redirect: ensure no timestamp redirect when range request is
present, alter test to include inexact timestamp
2014-12-23 21:19:39 -08:00
Ilya Kreymer
181c18a1b8 pep8 pass: fix spacing, line length, issues
also remove references to obsolete cached_replay, hostnames in pywb_init
2014-12-23 15:14:03 -08:00
Ilya Kreymer
51919ed1e7 replay: make range cache available by default in replay_views since its
inited on first use. remove
separate subclass. 'enable_ranges' can be set to false to disable range
cache altogether
improve tests
2014-12-23 14:34:59 -08:00
Ilya Kreymer
0f2c96879c refactor: split out optional cached replay components into cached_replay,
toggleable via 'enable_cache' in config -- regular replayview does not
need any cache info
move add_range() components to statusandheaders from wbrequestresponse
add x-pywb-noredirect' header which disables date related redirect
video replay works w/o cache if supported by frontend (nginx)
2014-12-19 18:40:45 -08:00
Ilya Kreymer
9929737a8e rangecache: don't redirect when using range header, don't cache non-200
responses
2014-11-06 22:14:41 -08:00
Ilya Kreymer
88f553dce7 video work: live rewrite pings proxy with full rewrite, proxies direct
range request
reorg rangecache to support is_range() check, yt-specific logic
(experimental)
wombat: add date override (experimental)
bump tentative version to 0.7.0!
yt replays work with native player! (though still issues remain)
2014-11-04 22:11:25 -08:00
Ilya Kreymer
72aa921ce5 video: work on domain-specific range cache rewrites 2014-11-04 08:44:45 -08:00
Ilya Kreymer
1aac5a9f15 cache: move cache wrappers to seperate cache.py in framework from
proxy_resolvers
range cache: and buffering cache for serving range requests, intended
for videos but not only. full response cached in temp file and range
requests served from cache, still experimental
need to add deletion.
youtube_dl: wrap youtube-dl import due to youtube-dl HTMLParser regex
bug
tests: add test for vi_ handler
2014-11-01 15:41:01 -07:00
Ilya Kreymer
d99f7f996c urlrewriter refactor: replace get_abs_url and get_timestamp_url with
get_new_ur() which just calls wburl.to_str and applies rewriter prefix
allows creating a new wburl with any component(s) changed
2014-10-19 00:24:00 -07:00
Ilya Kreymer
cede54f0c1 self-redir: remove referrer-based self-redirect check, as it may be
triggered incorrectly during refresh.. (will need to investigate more if
there's an edge-case to test against)
2014-10-17 08:54:03 -07:00
Ilya Kreymer
ba1e276e2f misc fixes: ensure buffered response is an iterator (no need to explicitly check, check doesn't work in jython)
query_handler: include check for '-' status code for revisits
2014-08-15 14:23:25 -07:00
Ilya Kreymer
75cda15ea4 fix self-redirect check with relative urls in Location 2014-08-06 12:39:48 -07:00
Ilya Kreymer
9e4459ae50 rewrite: remove extra wb_url param from rewrite_content(), the wb_url
will come from the urlrewriter, to get the 'mod'
2014-08-04 22:51:42 -07:00
Ilya Kreymer
8d54153326 refactoring for better extensibility:
remove BaseContentView, move top-frame functionality to SearchPageWbUrlHandler
remove RewriteLiveView, fold functionality into the handler
move default mod setting into RewriteContent
2014-08-04 22:51:42 -07:00
Ilya Kreymer
160182ec48 rewrite: add 'bn_' banner only rewrite
cleanup rewrite_content/fetch_request api to take a full wb_url
add content-length to responses whenever possible (WbResponse) and static files
bump version to 0.5.2
2014-08-04 22:51:42 -07:00
Ilya Kreymer
8ea7f5d3a0 framed replay: don't use is_timegate to determine frame usage due to potential
ambiguity, memento will need to use the mp_ modifier
2014-07-23 15:31:38 -07:00
Ilya Kreymer
b8a17b7cab refactor webapp: RewriteLiveHandler and WBHandler share a common base class,
SearchPageWbUrlHandler which renders the search page when there is no wburl
move some inits from pywb_init to WBHandler itself
2014-07-21 21:25:10 -07:00
Ilya Kreymer
a2973b04e7 wbrequest: add options dictionary to store misc request options 2014-07-21 14:02:31 -07:00
Ilya Kreymer
fa813bdd19 pep8 cleanup pass 2014-07-20 18:26:16 -07:00
Ilya Kreymer
96fcaab521 live-rewrite-server: add ability to specify http/https proxy for live fetching
(for example, for use with a recording proxy)
2014-07-19 14:43:28 -07:00
Ilya Kreymer
70b7e29b36 pass raw bytes to htmlparser, assuming ascii-compatibility
(todo: add tests for non-ascii compatible encodings)
improved rendering of certain pages, needs more testing

lxml: remove lxml and complexity associated with having the parser,
as its too unpredictable for older html, does its own decoding.
2014-06-27 19:03:06 -07:00
Ilya Kreymer
88d3e94b36 fixes for pep8, name fixes 2014-06-15 11:57:48 -07:00
Ilya Kreymer
80e80e97d3 replay: support 'framed_replay' option in config for both replay and live rewrite
split replay view into BaseContentView and ReplayView
refactor RewriteLiveHandler into RewriteLiveView
add additional tests for framed and non-framed mode
default to framed replay!
2014-06-14 18:26:19 -07:00
Ilya Kreymer
41e1809039 update wombat.js (support for write override, fill in WB_wombat_location on new iframe)
disable 307 redirects as FF always displays modal confirmation for these, even for same host
2014-06-11 20:12:05 -07:00
Ilya Kreymer
e2349a74e2 replay: better POST support via post query append!
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
cf119174ea rewrite: for rewriting purposes, use original cdx url, not the request url
(significance if trailing '/' is present)
2014-06-05 14:09:30 -07:00
Ilya Kreymer
46449ac188 rewrite: pass wburl mod to rewritier, so that css/js rewriting
rules may override default content-type (in cases where it is incorrect)
allows for rule based cusomization (to be added later)
2014-05-05 22:12:45 -07:00
Ilya Kreymer
bfc2e63793 live rewriter: integrate handler with rewrite_live.py module,
clean up css, add unit and integration tests
clean up cli server now known as 'live-rewrite-server', which performs live rewrite using
iframe paradigm
2014-04-09 15:49:55 -07:00
Ilya Kreymer
19f2df4717 refactor:
- move is_identity(), is_embed() to wburl from wbrequest
- add is_mainpage() predicate
- add create_template() to each J2TemplateView to create itself
- add HeadInsertView to create a reusable head insert for
RewriteContent
- add 'mp_' as modifier for frames mode to be used as possible
  modifier with HTMLRewriter
2014-04-09 15:49:55 -07:00
Ilya Kreymer
64eef7063d record reading: better handling of empty arc (or warc) records
for indexing, index empty/invalid length as '-' status code
for reading, serve as 204 no content.
ensure that StatusAndHeaders has a valid statusline when serving
if http content-length is valid,, limit stream to that content-length
as well as record content-length (whichever is smaller)
replace content-length when buffering
2014-04-07 17:08:39 -07:00
Ilya Kreymer
b0b0adb043 refactor: rename pywb.core -> pywb.webapp
move perms/test/test_perms_policy -> tests/perms_fixture
for rules file, use single DEFAULT_RULES_FILE import
2014-04-04 10:09:26 -07:00