Ilya Kreymer
a6b4ae4c47
chardet optimization: using chardet feed() approach to avoid passing in entire buffer
2014-03-17 20:53:42 -07:00
Ilya Kreymer
d1ad9b5e69
refactor: cleanup HTMLRewrtier/LXMLHTMLRewriter close path,
...
single close in base class delegeating to _internal_close()
Also, HTMLRewriter auto-terminates <script> and <style> tags
for consistency with lxml
2014-03-17 20:50:35 -07:00
Ilya Kreymer
10c84d8354
embed rewriting: add 'em_' flag for all regex-based rewrites
...
(js, css, xml) to be able to distinguish between embeds and non-embeds
more conclusively
wbrequest: add is_embed(), is_identity() properties
update tests
don't insert html banner if detected as an embed
2014-03-17 19:36:25 -07:00
Ilya Kreymer
52d99aef57
misc fixes: RemoteCDXServer throws NotFoundException on 404
...
fix typo in handlers
make WBHandler overridable in pywb_init
make perms_policy optional in IndexReader
2014-03-17 17:35:10 -07:00
Ilya Kreymer
2e7b17ed56
cleanup: move lxml tests to seperate test dir, seperate html, lxml html and regex
...
tests into seperate files
fix lxml toggle in rewriterrules
2014-03-17 15:30:45 -07:00
Ilya Kreymer
f35e82a4d5
ensure final output from close() is encoded!
...
add config option to 'use_lxml_parser' if available, if not,
will default to regular parser
testing on travis with lxml (not adding to dep yet)
2014-03-17 13:19:51 -07:00
Ilya Kreymer
1404177c6f
fixes for unicode (doctests)
...
remove explicit </html> since lxml does not parse past the </html>
tag and adds one anyway (not ideal but only workaround for html after closing tag)
2014-03-17 11:55:45 -07:00
Ilya Kreymer
23d60b0bb8
more work on lxml parser.. always write
...
start/end tags..
rewriterules: experiment defaulting to lxml if possible!
2014-03-17 09:48:31 -07:00
Ilya Kreymer
bd10c6c2d2
first pass -- lxml parser!
2014-03-16 23:12:04 -07:00
Ilya Kreymer
b0a7cafe6d
update default static path to: pywb/static/
2014-03-14 18:37:03 -07:00
Ilya Kreymer
6461af030b
refactoring: clean up handlers and replay_views for pep8
...
use BlockLoader().load for StaticHandler static file resolving
update static paths to point to pywb/static instead of static
2014-03-14 18:17:22 -07:00
Ilya Kreymer
a69d565af5
make pywb.rewrite package pep8-compatible
...
move doctests to test subdir
2014-03-14 16:44:23 -07:00
Ilya Kreymer
bfffac45b0
remove reference to deleted file wbexceptions.py
2014-03-14 11:22:50 -07:00
Ilya Kreymer
14a12f95b2
pep8 fixes, improve docs for proxy
...
move CaptureException into replay_views
2014-03-14 11:02:03 -07:00
Ilya Kreymer
a1ab54c340
first pass at memento support #10 !
...
memento support enabled by default, togglable via 'enable_memento' config property
supporting timegate and memento apis, no timemap yet
supporting pattern 2.3 for archival and pattern 1.3 for proxy modes
also:
simplify exception hierarchy a bit more, move down to utils
make WbRequest and WbResponse extensible with mixins (eg for memento)
2014-03-14 10:46:20 -07:00
Ilya Kreymer
45972df6c4
minor fixes, copyright update
2014-03-10 18:45:45 -07:00
Ilya Kreymer
3322fb233f
fixup wb and wombat.js:
...
fix formatting to 4-tab snake_case, remove obsolete code
2014-03-10 00:55:41 -07:00
Ilya Kreymer
e3d700a50f
wombat improvements: override history, ajax and use
...
seeded random number gen (with seed from capture timestamp)
2014-03-10 00:10:20 -07:00
Ilya Kreymer
e346dfb024
remove accidental logging
2014-03-09 23:03:55 -07:00
Ilya Kreymer
e384425d48
proxy cleanup: move HttpsUrlRewriter to url_rewriter module,
...
move strip_scheme to replay_views where it is used
regex rewriters: use url rewriter for rewriting http:// in JS,
instead of just prefix, to support custom rewriters (such as
https->http rewriter in proxy mode)
2014-03-09 14:21:32 -07:00
Ilya Kreymer
68878fa72a
update domain-specific rules to make flickr replay work better!
2014-03-08 15:53:52 -08:00
archiveit
4fdcdc98ae
replay: ignore 304 captures
2014-03-08 23:46:59 +00:00
Ilya Kreymer
584d826f05
rewrite: fix html rewriting, if forcing end </script>, </style>,
...
don't actually output to preserve original
wombat: copy over all Location settings
wburl: convert :/ -> :// if 2nd slash missing, only check for <scheme>:/
and ignore subsequent slashes
2014-03-08 15:10:35 -08:00
Ilya Kreymer
541c076b77
setup: add cli scripts for wayback, cdx-server
...
fix logging of app name, make most logging debug
2014-03-08 15:09:53 -08:00
Ilya Kreymer
3b1afc3e3d
replace StringIO with BytesIO
2014-03-08 09:30:19 -08:00
Ilya Kreymer
e3618871c8
proxy: support setting hostname via env variable
2014-03-07 11:42:09 -08:00
Ilya Kreymer
a60ab1f118
routing/proxy: pass in hostpaths to proxy routing
...
add PYWB_HOST_NAME env var to allow overriding default hostname
add request_hostname jinja filter
2014-03-07 10:29:11 -08:00
Ilya Kreymer
702e5e0143
perms test: moved test perms policy to perms/test/test_perms_policy.py
...
all perms related configs exist within perms package
2014-03-06 18:24:53 -08:00
Ilya Kreymer
681c2fd8d5
perms: refactor perms config to make interface much clearer
...
'perms_policy' is a callback which returns a Perms object, which may
filter cdx lines from the response
2014-03-06 18:06:05 -08:00
Ilya Kreymer
7b5cbaa878
cdx: clean up closest, reverse ops
...
closest takes precedence over reverse
'reverse closest' not supported, add test to reflect that
2014-03-06 16:11:46 -08:00
Ilya Kreymer
c42a96386f
cdx: fix the 'yield nothing' case when limit==1
...
add additional test case for limit==1 and reverse=True,
as limit is optimized out
2014-03-06 16:01:49 -08:00
Ilya Kreymer
4e71a0b772
better rules.yaml fix
2014-03-06 02:51:54 -08:00
Ilya Kreymer
3718e1d21b
rewrite fixes: html_rewriter do not unescape attrs!
...
rules: don't rewrite past end of block or line
2014-03-06 02:29:52 -08:00
Ilya Kreymer
673ff35d15
minor fixes: wombat add document.WB_wombat_location
...
loaders: file 'urls' starting with . and / are always file paths
pep8 fixes for cdx, utils packages
2014-03-05 17:13:14 -08:00
Kenji Nagahashi
64f4699203
clean up docstrings: fix reST formatting issues.
...
cherry-picked f03e0a7092 + some more.
2014-03-05 22:07:27 +00:00
Ilya Kreymer
fe1fa43fef
zipnum: remove time-based reloading for now, just look at mtime
...
and reload if changed
2014-03-04 21:29:05 -08:00
Ilya Kreymer
df2f7ba496
warc: add digest filter only if digest is present for url-agnostic load
...
ensure cdxobject format set on cdx load callback
limit reader: add length wrappign utility func to limitreader
2014-03-05 05:12:25 +00:00
Ilya Kreymer
cc22448cc5
fixes for 2.6 and pypy
2014-03-04 19:11:17 -08:00
Ilya Kreymer
202f6101e0
coverage work! add additional test for wsgi_wrappers
...
additional test for zipnum bad location
for now, not testing cli interfaces which depend on opt params
2014-03-04 16:13:49 -08:00
Ilya Kreymer
d702a98bbc
url-agnostic revisit testing!
...
add sample warc and cdx for url-agnostic revisits
add unit test and integration test
resolvingloader: pass callback instead of full cdx server
for use for loading cdx in case of url-agnostic revisit
2014-03-04 20:12:09 +00:00
Ilya Kreymer
cf5aaf5de4
add new perms_handler for supporting direct permissions api
...
currently just returning ["allow"] or ["block"] for a single url
2014-03-03 19:37:37 -08:00
Ilya Kreymer
577c74be49
cdx: move perms related handling to pywb.perms package, support
...
custom processing ops, of which perms is a specific type
add lazy_ops test to ensure all cdx processing ops are lazy
perms: set up a 'perms policy' factory and perms policy implementation
perms policy setting results in a custom processing op
update tests to work with new config
IndexReader handles both cdx server + perms policy
2014-03-03 18:27:04 -08:00
Ilya Kreymer
e0d5846484
seperate 'perms_checker' config loading as a seperate param
...
simplify IndexReader wrapper init, just init with a cdx server
2014-03-03 13:40:48 -08:00
Ilya Kreymer
331976748e
cdxops: make sure sort reverse and closest are lazy (create generators)
...
perms: allow_url_lookup() only takes key param for simplicity
2014-03-03 12:16:07 -08:00
Ilya Kreymer
2d4ae62fbe
- cdx handler refactoring: factor out CDXHandler and init to
...
seperate cdx_handler module
- Make wsgi app a class, add port as an optional field in wsgi app
and router. (not required to be specified)
2014-03-03 10:35:57 -08:00
Ilya Kreymer
0bf651c2e3
add cdx_server app!
...
port wsgi cdx server tests to test new app!
move base handlers to basehandlers in framework pkg
(remove werkzeug dependency)
2014-03-02 23:41:44 -08:00
Ilya Kreymer
f0a0976038
more refactoring!
...
create 'framework' subpackage for general purpose components!
contains routing, request/response, exceptions and wsgi wrappers
update framework package for pep8
dsrules: using load_config_yaml() (pushed to utils)
to init default config
2014-03-02 21:42:05 -08:00
Ilya Kreymer
f1acad53fc
wsgi wrapper reorg!
...
support pluggable wsgi apps
utils: BlockLoader() supports loading from package
exceptions: base WbException moved to utils
2014-03-02 19:26:06 -08:00
Ilya Kreymer
47271bbfab
remove extra .gz file, change test to use zipnum file instead
2014-03-02 08:55:26 -08:00
Ilya Kreymer
19f86305bf
update pkg-reorg with changes from master, including
...
CDXQuery configuration
2014-03-02 00:26:29 -08:00