Ilya Kreymer
99eadb3d4f
update package paths
2014-03-28 11:57:13 -07:00
Ilya Kreymer
9700004dc8
move configs to pywb package as package data
2014-03-28 11:53:59 -07:00
Ilya Kreymer
49d2d5b035
customizations: support custom cdx api suffix, custom
...
cdx server class
2014-03-28 10:58:14 -07:00
Ilya Kreymer
e2f7777c7d
jinja2: add decorator for adding custom filters
2014-03-28 10:57:55 -07:00
Ilya Kreymer
2c74ea9f23
fuzzy match: make filter string optionally overridable
...
setup.py: unset PYWB_CONFIG_ENV
2014-03-27 21:43:30 -07:00
Ilya Kreymer
41d51a6427
ensure 'cdx_' modifier is working
2014-03-27 14:46:59 -07:00
Ilya Kreymer
093d8310e5
config: move config files to ./configs/
...
PYWB_CONFIG_FILE setting overrides passed in config
2014-03-27 14:31:27 -07:00
Ilya Kreymer
b5e70f5dc6
timeutils: add sec_to_timestamp() func
2014-03-27 14:24:49 -07:00
Ilya Kreymer
da0623fbbb
lxml: ensure lxml support is optional: if not available,
...
use_lxml_parser() will return false and doctests/pytest collection
won't test the lxml parser
2014-03-26 14:05:02 -07:00
Ilya Kreymer
4e53c2e9d8
remote cdx refactoring: refactor remote cdx source and server to support
...
fuzzy matching
test local cdx server, remote cdx source, local and remote filtering
with self-contained unit tests
map remote cdx httperrors to pywb exceptions
2014-03-26 11:33:46 -07:00
Ilya Kreymer
5847087aae
add fakeredis mock, test for RedisCDXSource
2014-03-25 11:02:32 -07:00
Ilya Kreymer
87df7c22f1
standardize test scripts to test_*.py instead of *_test.py
2014-03-25 11:01:51 -07:00
Ilya Kreymer
c6c9fe680a
memento: add original link to timemap #10
2014-03-24 14:57:41 -07:00
Ilya Kreymer
2a605652c6
add memento timemap support (for archival mode only)
...
add timemap Link headers to timegate and memento responses
timemap accessible via /timemap/*/ path
2014-03-24 14:00:06 -07:00
Ilya Kreymer
9654c22bed
rewrite: add doctype rewriting, more tests on various markup edge cases
2014-03-23 23:46:49 -07:00
Ilya Kreymer
742df6238e
fix typo in renaming file
2014-03-23 13:12:06 -07:00
Ilya Kreymer
bcaacaf642
rename handlers
...
pep8 cleanup for all packages
remove obsolte statictextview
2014-03-23 12:59:21 -07:00
Ilya Kreymer
ac0bf5a415
refactor: IndexReader -> QueryHandler, move query output support
...
to QueryHandler. allow for multiple query views in QueryHandler
2014-03-23 12:44:28 -07:00
Ilya Kreymer
79da12348f
limit stream by warc/arc record length instead of
...
http content length.
track length of StatusAndHeaders also.
add tests to verify content length correct for identity
arc and arcgz replays as well
2014-03-22 11:30:51 -07:00
Ilya Kreymer
53590537e0
Merge develop and lxml
2014-03-18 17:14:27 -07:00
Ilya Kreymer
a6b4ae4c47
chardet optimization: using chardet feed() approach to avoid passing in entire buffer
2014-03-17 20:53:42 -07:00
Ilya Kreymer
d1ad9b5e69
refactor: cleanup HTMLRewrtier/LXMLHTMLRewriter close path,
...
single close in base class delegeating to _internal_close()
Also, HTMLRewriter auto-terminates <script> and <style> tags
for consistency with lxml
2014-03-17 20:50:35 -07:00
Ilya Kreymer
10c84d8354
embed rewriting: add 'em_' flag for all regex-based rewrites
...
(js, css, xml) to be able to distinguish between embeds and non-embeds
more conclusively
wbrequest: add is_embed(), is_identity() properties
update tests
don't insert html banner if detected as an embed
2014-03-17 19:36:25 -07:00
Ilya Kreymer
52d99aef57
misc fixes: RemoteCDXServer throws NotFoundException on 404
...
fix typo in handlers
make WBHandler overridable in pywb_init
make perms_policy optional in IndexReader
2014-03-17 17:35:10 -07:00
Ilya Kreymer
2e7b17ed56
cleanup: move lxml tests to seperate test dir, seperate html, lxml html and regex
...
tests into seperate files
fix lxml toggle in rewriterrules
2014-03-17 15:30:45 -07:00
Ilya Kreymer
f35e82a4d5
ensure final output from close() is encoded!
...
add config option to 'use_lxml_parser' if available, if not,
will default to regular parser
testing on travis with lxml (not adding to dep yet)
2014-03-17 13:19:51 -07:00
Ilya Kreymer
1404177c6f
fixes for unicode (doctests)
...
remove explicit </html> since lxml does not parse past the </html>
tag and adds one anyway (not ideal but only workaround for html after closing tag)
2014-03-17 11:55:45 -07:00
Ilya Kreymer
23d60b0bb8
more work on lxml parser.. always write
...
start/end tags..
rewriterules: experiment defaulting to lxml if possible!
2014-03-17 09:48:31 -07:00
Ilya Kreymer
bd10c6c2d2
first pass -- lxml parser!
2014-03-16 23:12:04 -07:00
Ilya Kreymer
b0a7cafe6d
update default static path to: pywb/static/
2014-03-14 18:37:03 -07:00
Ilya Kreymer
6461af030b
refactoring: clean up handlers and replay_views for pep8
...
use BlockLoader().load for StaticHandler static file resolving
update static paths to point to pywb/static instead of static
2014-03-14 18:17:22 -07:00
Ilya Kreymer
a69d565af5
make pywb.rewrite package pep8-compatible
...
move doctests to test subdir
2014-03-14 16:44:23 -07:00
Ilya Kreymer
bfffac45b0
remove reference to deleted file wbexceptions.py
2014-03-14 11:22:50 -07:00
Ilya Kreymer
14a12f95b2
pep8 fixes, improve docs for proxy
...
move CaptureException into replay_views
2014-03-14 11:02:03 -07:00
Ilya Kreymer
a1ab54c340
first pass at memento support #10 !
...
memento support enabled by default, togglable via 'enable_memento' config property
supporting timegate and memento apis, no timemap yet
supporting pattern 2.3 for archival and pattern 1.3 for proxy modes
also:
simplify exception hierarchy a bit more, move down to utils
make WbRequest and WbResponse extensible with mixins (eg for memento)
2014-03-14 10:46:20 -07:00
Ilya Kreymer
45972df6c4
minor fixes, copyright update
2014-03-10 18:45:45 -07:00
Ilya Kreymer
3322fb233f
fixup wb and wombat.js:
...
fix formatting to 4-tab snake_case, remove obsolete code
2014-03-10 00:55:41 -07:00
Ilya Kreymer
e3d700a50f
wombat improvements: override history, ajax and use
...
seeded random number gen (with seed from capture timestamp)
2014-03-10 00:10:20 -07:00
Ilya Kreymer
e346dfb024
remove accidental logging
2014-03-09 23:03:55 -07:00
Ilya Kreymer
e384425d48
proxy cleanup: move HttpsUrlRewriter to url_rewriter module,
...
move strip_scheme to replay_views where it is used
regex rewriters: use url rewriter for rewriting http:// in JS,
instead of just prefix, to support custom rewriters (such as
https->http rewriter in proxy mode)
2014-03-09 14:21:32 -07:00
Ilya Kreymer
68878fa72a
update domain-specific rules to make flickr replay work better!
2014-03-08 15:53:52 -08:00
archiveit
4fdcdc98ae
replay: ignore 304 captures
2014-03-08 23:46:59 +00:00
Ilya Kreymer
584d826f05
rewrite: fix html rewriting, if forcing end </script>, </style>,
...
don't actually output to preserve original
wombat: copy over all Location settings
wburl: convert :/ -> :// if 2nd slash missing, only check for <scheme>:/
and ignore subsequent slashes
2014-03-08 15:10:35 -08:00
Ilya Kreymer
541c076b77
setup: add cli scripts for wayback, cdx-server
...
fix logging of app name, make most logging debug
2014-03-08 15:09:53 -08:00
Ilya Kreymer
3b1afc3e3d
replace StringIO with BytesIO
2014-03-08 09:30:19 -08:00
Ilya Kreymer
e3618871c8
proxy: support setting hostname via env variable
2014-03-07 11:42:09 -08:00
Ilya Kreymer
a60ab1f118
routing/proxy: pass in hostpaths to proxy routing
...
add PYWB_HOST_NAME env var to allow overriding default hostname
add request_hostname jinja filter
2014-03-07 10:29:11 -08:00
Ilya Kreymer
702e5e0143
perms test: moved test perms policy to perms/test/test_perms_policy.py
...
all perms related configs exist within perms package
2014-03-06 18:24:53 -08:00
Ilya Kreymer
681c2fd8d5
perms: refactor perms config to make interface much clearer
...
'perms_policy' is a callback which returns a Perms object, which may
filter cdx lines from the response
2014-03-06 18:06:05 -08:00
Ilya Kreymer
7b5cbaa878
cdx: clean up closest, reverse ops
...
closest takes precedence over reverse
'reverse closest' not supported, add test to reflect that
2014-03-06 16:11:46 -08:00