Jack Cushman
903583c3d7
Handle ArchivalUrl subclasses.
2014-01-20 14:13:16 -05:00
Ilya Kreymer
9ff3fc300b
Fix #5 , bringing back customParams optional params sent to cdx server
...
Rename archivalrouter.MatchRegex -> archivalrouter.Route, supporting regex/prefix matching
add redir_to_exact to turn off redirect to exact timestamp in RewritingReplayHandler
update README
2014-01-20 10:50:06 -08:00
Ilya Kreymer
6cb1743163
Merge branch 'master' of github.com:ikreymer/pywb into work
2014-01-19 12:31:53 -08:00
Ilya Kreymer
354040a7e0
support for url-agnostic dedup, eg loading payload from a different url
...
than the revisit
2014-01-19 12:31:19 -08:00
Jack Cushman
c9d0b0ba7b
Handle transfer-encoding:chunked; misc. replay bugs.
...
- Add a ChunkedLineReader to deal with replays with the
transfer-encoding: chunked header.
- Catch UnicodeDecodeErrors caused by multibyte characters getting
split during buffering.
- A couple of tiny bugs in replay.py
2014-01-18 21:32:49 -05:00
Ilya Kreymer
7ce6d0d22b
first pass on html rendering via jinja, support for query (cdx) rendering
2014-01-17 16:24:36 -08:00
Ilya Kreymer
bcc9588c00
* archivalrouter: to take a list of handlers,
...
currently MatchPrefix and MatchRegex. handler returns a single response
(no chaining for now)
* rewriting: don't rewrite anchor only urls
* perf: add a very basic profiler in WBHandler for testing
2014-01-16 20:33:51 -08:00
Ilya Kreymer
c4457abc4c
Update README
...
Rename FullHandler -> WBHandler
Add additional comments!
2014-01-03 21:44:20 -08:00
Ilya Kreymer
d820a8c06a
add some comments, make charset parsing lower()
2014-01-03 17:40:20 -08:00
Ilya Kreymer
c3767cd31b
fix css url parsing typo
...
always default to utf-8 if chardet thinks ascii
tweak banner
2014-01-03 21:38:18 +00:00
Ilya Kreymer
2357f108a3
rename rewriters
...
header_rewriter added!
support for encoding detection
various fixes
xmlrewriter
2014-01-03 13:03:03 -08:00
Ilya Kreymer
cca9071c53
minor tweaks, increase num closest searched, upper case url check
...
css remove fixed pos
2013-12-31 21:01:18 +00:00
Ilya Kreymer
d9930322f1
support utf-8 (so far)
...
support protocol-agnostic prefix //
failedFile list for warc loading
2013-12-31 00:18:12 +00:00
Ilya Kreymer
997dc5df0f
fixes! Fix typos, in html parsing, fix base, support attrs w/o values
2013-12-30 03:03:33 +00:00
Ilya Kreymer
a84ec2abc7
first iteration of archival mode working w/ banner insertion!!
2013-12-28 17:39:43 -08:00
Ilya Kreymer
16f458d5ec
archiveloader: Support for loading warc/arc records using hanzo parser (for record header parsing only)
...
ReplayHandler: load replay from query response, find best option
basic support for matching url, checking self-redirects!
2013-12-28 05:00:06 -08:00