1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-31 19:24:11 +02:00

30 Commits

Author SHA1 Message Date
Ilya Kreymer
93d49ae24b rewrite deprefix: improve query deprefix to also test url-encoded params, closes #119 2015-07-06 18:19:01 -07:00
Ilya Kreymer
756b80cac4 rewrite: remove urlrewriter optimization which uses wburl.original_url, as its
not realiable (%-encoding, no trailing slash issues) and not really needed.. rename original_url -> _original_url, should not be used
directly.
add test for rewriting when base url has no trailing slash
2015-04-13 12:51:38 -07:00
Ilya Kreymer
b21e288063 wburl to_uri: when parsing scheme, also for first '?' in addition to '/' to catch any irregular urls (to be ignored) 2015-04-11 10:21:52 -07:00
Ilya Kreymer
1844597889 wburl to_uri: catch error on idna encode with very invalid urls 2015-04-04 12:51:49 -07:00
Ilya Kreymer
30ab27bb1c indexing: support indexing (and even replay of) records where target-uri is a 'urn:' identifier (#91)
for canonicalzation, treat urns as is, already canonical
for wburl, don't add http:// prefix if urn: prefix is present
add example-wpull warc for testing
2015-03-30 17:23:50 -07:00
Ilya Kreymer
4aa6512b05 rewrite: fix WbUrl parsing for urls that start with a digit, eg. 1234.example.com
split latest replay url from timestamped replay regex
add additional rewrite tests
2015-03-23 15:38:10 -07:00
Ilya Kreymer
80dcb6ff27 rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
2015-02-17 17:51:45 -08:00
Ilya Kreymer
c4d5dd4690 rewrite: optimize / sanity, only %-encode urls that are actually idna-encoded,
otherwise return as is, #66
2015-02-15 10:34:56 -08:00
Ilya Kreymer
afe49a91f4 rewrite: more fixes for IDN #66 - add _do_percent_encode field to wburl itself
defaults to true, may be disabled with 'punycode_links'
remove wbrequest and urlrewriter from get_url path, simply call wb_url.get_url() to get properly formatted url
2015-02-14 20:55:36 -08:00
Ilya Kreymer
f9452bf48e rewrite: refactor IDN support: instead of returning IRI, return utf-8 %-encoded url
remove support for  returning IRI, as that requires detecting charset, instead just use %-encoded form
and let browser decode. Should address #66

Add rewrite option 'punycode_links_only' (default to false) to skip the %-encoded conversion of host, and just return punycode.

wombat: use getAttribute('href') on <a> tag to get original url, not punycode version

replay: add extra sanity check on Location header to ensure utf-8
2015-02-14 17:26:39 -08:00
Ilya Kreymer
78bd89b4cb rewrite: simplify deprefix, url already unquoted now so remove extra unquote 2015-02-11 14:28:45 -08:00
Ilya Kreymer
695245d9e8 wburl idn: more complete support for idn urls (#66)
add distinct to_iri() and to_uri() functions in WbUrl
internal representation is always as ascii uri
for rewriting, defaults to iri representation unless
'rewrite_ascii_only_urls' is set to true per collection
add wbrequest.get_url() to get url as either iri or uri to be passed
to templates
2015-01-26 11:07:59 -08:00
Ilya Kreymer
edff3f17fb wburl: convert %-encoded hostnames or unicode urls to punycode for
better IDN support (#66)
2015-01-26 11:07:58 -08:00
Ilya Kreymer
181c18a1b8 pep8 pass: fix spacing, line length, issues
also remove references to obsolete cached_replay, hostnames in pywb_init
2014-12-23 15:14:03 -08:00
Ilya Kreymer
c996e70a6e wburl: detect and decode partially encoded schemes in url, such as http%3A//,
https%A2F2F// before handling further
add additional tests for wburl
2014-11-29 11:13:57 -08:00
Ilya Kreymer
c9273ee5ed rewrite: add 'deprefix' support to remove wburl prefix from any query
params
2014-10-26 12:12:37 -07:00
Ilya Kreymer
e8d3965269 pep8 style fixes, remove unused methods 2014-10-21 19:06:16 -07:00
Ilya Kreymer
4a1cc46fa3 framed replay: invert framed replay paradigm, replay always uses
canonical, no-modifier archival url (instead of mp_).
When using frames, the page redirects to a 'tf_' page, which then uses
replaceHistory() to change url back to canonical form.
memento: support for framed replay, include memento headers in top frame
bump version to 0.6.2
2014-10-18 11:21:07 -07:00
Ilya Kreymer
b92eda77f6 rewrite: add 'bn_' banner only rewrite
cleanup rewrite_content/fetch_request api to take a full wb_url
add content-length to responses whenever possible (WbResponse) and static files
bump version to 0.5.2
2014-07-29 12:20:22 -07:00
Ilya Kreymer
d2795dfdaa minor cleanup:
wburl: add is_url_query() check
views: add kwargs to J2HtmlCapturesView for better extensibility
query_handler: simplify make_cdx_response() arguments
2014-05-01 11:58:34 -07:00
Ilya Kreymer
85593696fa remove rfc3987 validation, was rejecting valid urls
add extract_referer_wburl_str() to extract WbUrl str, if any,
from the referrer. Use that for live_rewrite_handler to override
default referrer
2014-04-15 16:38:53 -07:00
Ilya Kreymer
19f2df4717 refactor:
- move is_identity(), is_embed() to wburl from wbrequest
- add is_mainpage() predicate
- add create_template() to each J2TemplateView to create itself
- add HeadInsertView to create a reusable head insert for
RewriteContent
- add 'mp_' as modifier for frames mode to be used as possible
  modifier with HTMLRewriter
2014-04-09 15:49:55 -07:00
Ilya Kreymer
91184426b7 test coverage pass:
refactor and cleanup to improve coverage for corner cases
2014-04-02 13:16:54 -07:00
Ilya Kreymer
2a605652c6 add memento timemap support (for archival mode only)
add timemap Link headers to timegate and memento responses
timemap accessible via /timemap/*/ path
2014-03-24 14:00:06 -07:00
Ilya Kreymer
ac0bf5a415 refactor: IndexReader -> QueryHandler, move query output support
to QueryHandler. allow for multiple query views in QueryHandler
2014-03-23 12:44:28 -07:00
Ilya Kreymer
a69d565af5 make pywb.rewrite package pep8-compatible
move doctests to test subdir
2014-03-14 16:44:23 -07:00
Ilya Kreymer
a1ab54c340 first pass at memento support #10!
memento support enabled by default, togglable via 'enable_memento' config property
supporting timegate and memento apis, no timemap yet
supporting pattern 2.3 for archival and pattern 1.3 for proxy modes
also:
simplify exception hierarchy a bit more, move down to utils
make WbRequest and WbResponse extensible with mixins (eg for memento)
2014-03-14 10:46:20 -07:00
Ilya Kreymer
584d826f05 rewrite: fix html rewriting, if forcing end </script>, </style>,
don't actually output to preserve original
wombat: copy over all Location settings
wburl: convert :/ -> :// if 2nd slash missing, only check for <scheme>:/
and ignore subsequent slashes
2014-03-08 15:10:35 -08:00
Ilya Kreymer
d702b299ae wburl: split into BaseWbUrl and WbUrl for better extensibility 2014-02-24 21:30:38 -08:00
Ilya Kreymer
5345459298 pywb 0.2!
move to distinct packages: pywb.utils, pywb.cdx, pywb.warc, pywb.util, pywb.rewrite!
each package will have its own README and tests
shared sample_data and install
2014-02-17 10:01:09 -08:00