Ilya Kreymer
990af5ee79
rewrite: add extra test for rewriting html with <script> tag that's never closed
2015-03-31 23:30:56 -07:00
Ilya Kreymer
199f552f73
rewrite: if no charset specified, attempt to read first 1024 bytes and set charset in header,
...
to avoid charset warning if head insert exceeds 1024 bytes (#86 )
also encode head insert with detected charset, if possible
chunkeddatareader: add read() function to ensure read will read upto specified
length across chunks
2015-03-31 22:38:20 -07:00
Ilya Kreymer
30ab27bb1c
indexing: support indexing (and even replay of) records where target-uri is a 'urn:' identifier ( #91 )
...
for canonicalzation, treat urns as is, already canonical
for wburl, don't add http:// prefix if urn: prefix is present
add example-wpull warc for testing
2015-03-30 17:23:50 -07:00
Ilya Kreymer
ec7a29a3ba
static paths: ensure consistent renaming of static/default -> static/__pywb for bundled static path
2015-03-23 16:15:37 -07:00
Ilya Kreymer
4aa6512b05
rewrite: fix WbUrl parsing for urls that start with a digit, eg. 1234.example.com
...
split latest replay url from timestamped replay regex
add additional rewrite tests
2015-03-23 15:38:10 -07:00
Ilya Kreymer
6acac67d3c
rewrite: fix js rewrite again to ensure '// comments' are not rewritten as scheme-rel urls
...
add tests
2015-03-23 11:49:24 -07:00
Ilya Kreymer
aa427bd6d0
rewrite: js regex: fix js rewrite regex to only match beginning of url for rewriting, since
...
rewrite just adding prefix for abs urls in js use case. (avoid dealing with any invalid chars that
may occur later in url)
2015-03-21 13:58:36 -07:00
Ilya Kreymer
ea460bb0f0
cdxj: support cdx json output from cdx server with output='json' (not yet default)
...
cdx field renaming: canonical cdx field name changes
statuscode -> status
mimetype -> mime
original -> url
old names still accept for query/filtering, however, cdx json will use new names
ensures consistency between .cdxj field names and names used by cdx server json output
collections manager now creates .cdxj by default
bump version to 0.9.0b2!
2015-03-19 13:33:49 -07:00
Ilya Kreymer
24021fcd57
html rewrite: add trailing slash for <base> tag rewrite if url is a scheme://host
...
with no path component #77
cleanup: remove unused code path for tags with no rewriting -- all tags
now checked for dynamic attrs which may need rewriting
update tests, including live rewrite test dependent on live site (FB)
2015-03-13 10:53:57 -07:00
Ilya Kreymer
f2d7bd074a
bump version to 0.8.3
...
cookie rewrite: remove 'secure' flag if present
2015-03-05 16:18:56 -08:00
Ilya Kreymer
8d52be4c44
live proxy: enable ssl validation for live proxy, was initially disabled for testing, should be on by default!
2015-02-20 13:22:21 -08:00
Ilya Kreymer
80dcb6ff27
rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
...
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
2015-02-17 17:51:45 -08:00
Ilya Kreymer
c4d5dd4690
rewrite: optimize / sanity, only %-encode urls that are actually idna-encoded,
...
otherwise return as is, #66
2015-02-15 10:34:56 -08:00
Ilya Kreymer
afe49a91f4
rewrite: more fixes for IDN #66 - add _do_percent_encode field to wburl itself
...
defaults to true, may be disabled with 'punycode_links'
remove wbrequest and urlrewriter from get_url path, simply call wb_url.get_url() to get properly formatted url
2015-02-14 20:55:36 -08:00
Ilya Kreymer
f9452bf48e
rewrite: refactor IDN support: instead of returning IRI, return utf-8 %-encoded url
...
remove support for returning IRI, as that requires detecting charset, instead just use %-encoded form
and let browser decode. Should address #66
Add rewrite option 'punycode_links_only' (default to false) to skip the %-encoded conversion of host, and just return punycode.
wombat: use getAttribute('href') on <a> tag to get original url, not punycode version
replay: add extra sanity check on Location header to ensure utf-8
2015-02-14 17:26:39 -08:00
Ilya Kreymer
79cfdd6a08
framework/urlrewriter: allow overriding UrlRewriter with optional urlrewriter_class param,
...
easier to override create_rebased_rewriter() with custom rewriter as well
2015-02-12 10:34:04 -08:00
Ilya Kreymer
0b72bfe911
add 'none' js regex rewriter, which does not rewrite urls or location regexs
...
add test for none rewriter in test rule
2015-02-11 15:01:29 -08:00
Ilya Kreymer
78bd89b4cb
rewrite: simplify deprefix, url already unquoted now so remove extra unquote
2015-02-11 14:28:45 -08:00
Ilya Kreymer
4e7f95081f
url_rewriter: catch exception when encoding to utf-8, may not be properly encoded, in which
...
case treat as bytes
2015-02-10 15:05:15 -08:00
Ilya Kreymer
78ae86b6b6
Merge branch 'master' for 0.7.8 into develop
2015-02-05 08:45:55 -08:00
Ilya Kreymer
cc144fdead
rewrite: add basic test for X-Forwarded-Proto #57
2015-02-04 21:44:18 -08:00
Ilya Kreymer
78812c8085
rewrite: more conservative change, only rewrite the X-Forwarded-Proto
...
header for now, #57
2015-02-04 15:17:23 -08:00
Ilya Kreymer
cdb3dcc3d2
rewrite_live: don't forward via or https_x headers, only standard (for
...
now) possible fix for #57
2015-02-04 14:19:37 -08:00
Ilya Kreymer
55426e7619
memento: fix headers to be more consistent for framed replay. when using
...
frames, outer frames 'mirrors' mementos of the inner frame to be
discoverable by client side memento tools, tracked via #70
2015-01-29 22:27:15 -08:00
Ilya Kreymer
7e017fd85e
rewrite fixes: don't rewrite window.parent as it is overridable directly
...
html rewriter: ensure style is rewritten for all elements, add test!
wombat: cleanup and additional checks for assign(), setAttribute()
2015-01-29 20:08:00 -08:00
Ilya Kreymer
ccedb2d60e
regex_rewrite: add 'parent' rewrite in addition to 'top' for frames, add
...
WB_wombat_parent to wombat, add test for WB_wombat_parent
2015-01-27 19:57:56 -08:00
Ilya Kreymer
695245d9e8
wburl idn: more complete support for idn urls ( #66 )
...
add distinct to_iri() and to_uri() functions in WbUrl
internal representation is always as ascii uri
for rewriting, defaults to iri representation unless
'rewrite_ascii_only_urls' is set to true per collection
add wbrequest.get_url() to get url as either iri or uri to be passed
to templates
2015-01-26 11:07:59 -08:00
Ilya Kreymer
edff3f17fb
wburl: convert %-encoded hostnames or unicode urls to punycode for
...
better IDN support (#66 )
2015-01-26 11:07:58 -08:00
Ilya Kreymer
ac525b0937
tests: add tests for extract_post_query()
...
add test for HttpsUrlRewriter, remove unnecessary check in
bufferedreader
2015-01-11 23:54:29 -08:00
Ilya Kreymer
cf0a21509b
loaders: add to_file_url() for converting between filename and file://,
...
used in live rewrite and tests
2015-01-11 13:05:48 -08:00
Ilya Kreymer
7f52ecdca9
tests: fix indexing test, remove extra space/print
2015-01-10 15:36:53 -08:00
Ilya Kreymer
1eb0f96f92
windows support work: fix loaders to use pathname2url to convert to
...
file:/// url, use urlopen to open file paths
fix some tests to use universal line breaks
2015-01-10 14:06:15 -08:00
Ilya Kreymer
205aeca4a1
bump version to 0.7.3
...
rewrite: add additional tags for client side src rewrite, add missing
tags to server-side html rewrite
2015-01-04 17:32:58 -08:00
Ilya Kreymer
d9c5345d3c
rewrite: add support for Cookie request header rewrite to support sites
...
which require a cookie to be set. req_cookie_rewrite directive can be
set in rules.yaml per url prefix with a list of match/replace regexs
2015-01-03 12:51:09 -08:00
Ilya Kreymer
a76bf79b83
html_rewriter: add explicit <video>, <audio> tags to html_rewriter tag
...
list
2014-12-26 18:15:49 -08:00
Ilya Kreymer
ffb702ce03
rewrite: content detection for specific case: if content type is html and mod type is css
...
or js, peek stream to determine actual type. Addresses #31 in part.
Fix typo in wb_frame.js
2014-12-26 13:08:35 -08:00
Ilya Kreymer
8f57ce622d
Improved top rewriting, addressing #54
2014-12-26 13:06:33 -08:00
Ilya Kreymer
181c18a1b8
pep8 pass: fix spacing, line length, issues
...
also remove references to obsolete cached_replay, hostnames in pywb_init
2014-12-23 15:14:03 -08:00
Ilya Kreymer
a8b4041716
live rewrite: proxy setup refactor: ignore_proxy flag, pass proxy during constructor only
2014-12-22 21:58:07 -08:00
Ilya Kreymer
b54e4c1c06
tests: add more tests for cookie, html and rewrite_live crsf
2014-12-22 20:34:18 -08:00
Ilya Kreymer
ab087afa4e
Merge branch 'develop' into video, JS rewriter refactoring
2014-12-07 21:11:20 -08:00
Ilya Kreymer
5a11714b41
rewrite: refactor JS rewriters into seperate mixins, allowing for
...
link only, location only, and link + location JS rewriters.
location-only rewriter is new
js_rewrite_location options: all, location, urls (for now)
2014-12-07 21:09:37 -08:00
Ilya Kreymer
7e36ad29e7
Merge branch 'develop' 0.6.6 into video
2014-12-06 19:19:12 -08:00
Ilya Kreymer
0495423e86
rewrite: add per-collection rewrite options, settable in 'rewrite_opts'
...
block in each collection. Added rewrite_base to disable rewriting <base>
tag and rewrite_rel_canon to disable rewriting link rel=canon.
Disabling <base> tag rewrite fixex #51 and new system addresses #50 as
well.
2014-12-06 17:16:35 -08:00
Ilya Kreymer
8a87966ebd
video fixes: disable adding a fixed buffer on unbounded range requests,
...
as that messes up FF html5 player.. (it assumes a full stream)
video response: ensure Accept-Ranges: bytes is being added on 206
responses
2014-12-03 21:59:03 -08:00
Ilya Kreymer
f21f4fb1ba
Merge branch 'develop' into video
2014-12-01 09:10:08 -08:00
Ilya Kreymer
c996e70a6e
wburl: detect and decode partially encoded schemes in url, such as http%3A//,
...
https%A2F2F// before handling further
add additional tests for wburl
2014-11-29 11:13:57 -08:00
Ilya Kreymer
87d791eba8
html rewrite: rewrite param value only if start with http
2014-11-29 11:03:09 -08:00
Ilya Kreymer
3e3a74619f
various fixes: wombat: add Date.UTC and Date.parse
...
rewrite: support vi_ https -> metadata
video: fallback to vi_ call on current page
remove debug logging
2014-11-25 00:21:28 -08:00
Ilya Kreymer
d3ef47342c
Merge branch 'develop' into video
2014-11-23 18:58:31 -08:00