Ilya Kreymer
c255f4e47f
fix typos
2014-01-03 17:04:15 -08:00
Ilya Kreymer
246b3fba43
cleanup, setup runnable testwb, or pluggable 'globalwb'
2014-01-04 00:21:52 +00:00
Ilya Kreymer
c3767cd31b
fix css url parsing typo
...
always default to utf-8 if chardet thinks ascii
tweak banner
2014-01-03 21:38:18 +00:00
Ilya Kreymer
1e03cad25c
update setup.py, static files
2014-01-03 13:06:27 -08:00
Ilya Kreymer
2357f108a3
rename rewriters
...
header_rewriter added!
support for encoding detection
various fixes
xmlrewriter
2014-01-03 13:03:03 -08:00
Ilya Kreymer
edbcaaf108
big update: refactor archiveloader,
...
StatusAndHeaders obj and StatusAndHeaders parser
remove dependency on hanzo
Add sample example.warc.gz for very basic unit testing
2014-01-02 20:21:18 -08:00
Ilya Kreymer
cca9071c53
minor tweaks, increase num closest searched, upper case url check
...
css remove fixed pos
2013-12-31 21:01:18 +00:00
Ilya Kreymer
d9930322f1
support utf-8 (so far)
...
support protocol-agnostic prefix //
failedFile list for warc loading
2013-12-31 00:18:12 +00:00
Ilya Kreymer
b8c4a453c9
wbhtml: add utf-8 tests
2013-12-29 22:42:29 -08:00
Ilya Kreymer
997dc5df0f
fixes! Fix typos, in html parsing, fix base, support attrs w/o values
2013-12-30 03:03:33 +00:00
Ilya Kreymer
a84ec2abc7
first iteration of archival mode working w/ banner insertion!!
2013-12-28 17:39:43 -08:00
Ilya Kreymer
16f458d5ec
archiveloader: Support for loading warc/arc records using hanzo parser (for record header parsing only)
...
ReplayHandler: load replay from query response, find best option
basic support for matching url, checking self-redirects!
2013-12-28 05:00:06 -08:00
Ilya Kreymer
787dfc136e
wbhtml: add script and style doctests
...
override close() to handle open <script> and <style> tags by forcing an end tag,
otherwise parser does not process the remainder
2013-12-24 22:51:33 -08:00
Ilya Kreymer
6050ea1ffa
standard JS and CSS rewriting working, with generic regex rewriter
...
which supports extensions!
2013-12-23 23:57:13 -08:00
Ilya Kreymer
3a896f7cd3
move norewrite prefixs down to ArchivalUrlRewriter (was in html parser)
...
Add new general regex match work, (several attempts, though last one is simplest/best!)
2013-12-23 15:52:33 -08:00
Ilya Kreymer
37e57f7013
html parser fleshed out!
2013-12-22 18:12:05 -08:00
Ilya Kreymer
fbf29e80d6
add html parser!
...
urlrewriter support for changing modifier
2013-12-20 19:11:52 -08:00
Ilya Kreymer
072befe3c8
archivalrouter: support handler chaining, using call convention and pass prev response
2013-12-20 15:10:12 -08:00
Ilya Kreymer
4cf4bf3bbb
add wburlrewriter, ReferRedirect uses the rewriter
...
more refactoring, ReferRedirect moved into archivalrouter module
wbrequest: parses from uri directly, keeps track of wburl and prefix
2013-12-20 14:54:41 -08:00
Ilya Kreymer
0a2b16407d
better exception handling, specific status codes for exceptions,
...
detect access control and not found exceptions more consistently
2013-12-19 12:06:47 -08:00
Ilya Kreymer
ebc76c0791
update readme
2013-12-18 18:57:55 -08:00
Ilya Kreymer
c8d2271e8a
archiveurl: add support for url_query, format modifier for more unit tests
...
archivalrouter: flesh out router seperately
indexreader: RemoteCDXServer reader
unit tests for req/resp
wbapp -- cdx output for query, urlquery, replay and latest_replay!
2013-12-18 18:52:52 -08:00
Ilya Kreymer
5d42cc0cac
rename aurl -> archiveurl, add default scheme, test for empty url
2013-12-13 15:43:07 -08:00
Ilya Kreymer
6b78f59e49
Merge branch 'master' of github.com:ikreymer/pywb
2013-12-13 15:22:06 -08:00
Ilya Kreymer
27b35f31e8
add basic wsgi app for parsing archivalurls, fallback on a referrer based redirect
2013-12-13 15:20:13 -08:00
ikreymer
d546fcc82c
Merge pull request #1 from nlevitt/setup.py
...
setuptools config
2013-12-09 20:26:14 -08:00
Noah Levitt
89481f162e
setuptools config
2013-12-09 11:58:50 -08:00
Ilya Kreymer
b10f0cd041
switch to IRI
2013-12-08 19:44:14 -08:00
Ilya Kreymer
10bf465367
add aurl.py with a few tests
2013-12-08 19:31:58 -08:00
ikreymer
0dc56ee074
Initial commit
2013-12-08 19:30:31 -08:00