1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-21 11:02:10 +01:00

479 Commits

Author SHA1 Message Date
Ilya Kreymer
e4f409b2a4 simplify pywb_init config:
- add defaults dictionary, chain dictionaries rather than copying
 - allow custom classes to be loaded explicitly via yaml
 - for LineReader, assume ungzipped if first decompress fails
 - properly ignore bad local paths
 - add optional reporter object
2014-02-11 14:10:40 -08:00
Ilya Kreymer
8b2bfa570c referer redirect fixes:
- allow redirect if current Host: matches
- redirect request uri to host root, not current host path
2014-02-09 20:19:43 +00:00
Ilya Kreymer
232ac733ab referer redirect: check against registered routes
js rewriter: only rewrite quoted strings, support relative redirect
Jinja view: add 'host' filter for extracting hostname
css tweak
2014-02-09 01:42:42 -08:00
Ilya Kreymer
a757f53bd5 cleanup Route config, move filters init into custom_init
remove extra print
2014-02-08 22:01:31 -08:00
Ilya Kreymer
44f38f44d5 paths cleanup:
- don't store explicit static path, but allow it to be set in the insert
- store host_prefix, which is either server name or empty
- for archival mode, absolute_paths settings controls if using absolute paths,
- for proxy always use absolute_paths
- default static path is: /static/default/
- allow extension apps to provide custom /static/X/ path

Route overriding:
- ability to set Route class
- custom init method

Archival Relative Redirect:
- if starting with timestamp, drop timestamp and assume host-relative path

Integration Tests:
- test proxy mode by using REQUEST_URI
- test archival relative redirect!
2014-02-08 20:07:16 -08:00
Ilya Kreymer
b11f4fad93 add support for pywb static content routes (seperate from uwsgi)
adding StaticHandler and loading templates and static resources from current package
add default template and static data to be included in the pywb package
add test for custom static route
2014-02-07 19:32:58 -08:00
Ilya Kreymer
00a7691f69 add optional filters to default Route
add examples to config.yaml and test_config.yaml and integration test
per route config is inherited globally if only name is set
2014-02-06 17:28:08 -08:00
Ilya Kreymer
d347b4952b don't mask raised exceptions, to address #23 2014-02-05 13:21:57 -08:00
Ilya Kreymer
1a1aa814d0 first pass at simple http proxy! #8
* proxy router for handling only proxy
* proxy/archival router for handling both archival and proxy mode,
  togglable with 'enable_http_proxy' setting in config
* supports only most recent capture playback -- no support for
selecting replay date/calendar view yet
* not testable with WebTest -- need better way to unit test proxy mode
2014-02-05 13:08:10 -08:00
Ilya Kreymer
3168b80cfa improve docs for config.yaml, group all ui settings together
create seperate test_config.yaml for testing
rename ArchivalRequestRouter -> ArchivalRouter for consistency
2014-02-05 10:10:33 -08:00
Ilya Kreymer
6388a78162 refactor: replay_views to support cleaner inheritance, no longer
wrapping previous WbResponse

overhaul yaml config to be much simpler, move best resolver and
best index reader to respective classes

add config_utils for sharing config, standard non-yaml config
provides defaults for testing

fix bug in query.html
2014-02-03 09:24:40 -08:00
Ilya Kreymer
bdef00cb8d refactor WbUrl and UrlRewriter to drop requirement for having a WbUrl start with /
Changes WbUrl forms:
/2013/im_/example.com -> 2013/im_/example.com
/*/example.com -> */example.com
/example.com -> example.com

* also simplify scheme-agnostic url (//) handling by just eating up extra
slashes
* add additional doctests on route, with and w/o custom SCRIPT_NAME
2014-02-01 18:20:23 -08:00
Ilya Kreymer
9f258fa64c fix up cdx server query interface
supports /cdx?url=... and other params including
filter=<regex>
collapse_time=<0-14>
resolve_revisits=<true|false>
reverse=<true|false>
closest=<timestamp>
2014-02-01 14:47:07 -08:00
Ilya Kreymer
b685772b96 fixup loading from archive, add LimitReader to ensure record length is respected
rename FileReader -> FileLoader, HttpReader -> HttpLoader
loaders create 'readers', which support read()/readline()
2014-02-01 14:02:53 -08:00
Ilya Kreymer
d9c4e5cba4 make RemoteCDXServer api conform to LocalCDXServer api, addressing #19 2014-02-01 13:19:30 -08:00
Ilya Kreymer
86a093d164 support cdx server query at (/cdx in default config)
also enable /echo_env and /echo_req debug handlers
2014-02-01 00:43:24 -08:00
Ilya Kreymer
2f5ffb3a88 switch test framework to use py.test instead of nose 2014-02-01 00:12:11 -08:00
Ilya Kreymer
6d2c8286ca render_response has option to pass in statuscode 2014-01-31 19:45:01 -08:00
Ilya Kreymer
bd94e3c656 fix replay_resolvers tests, don't use abs paths! 2014-01-31 10:33:47 -08:00
Ilya Kreymer
304ddbec84 Support for new UI, as per #16
* Refactor views class to support more Jinja2 views (J2Template)
* Add a home page, collection search page, and error pages, all optional
* all exceptions appear on error page
* wbrequest supports a request with an empty or / wb_url
2014-01-31 10:04:21 -08:00
Ilya Kreymer
937fc7229e update README, fix typo 2014-01-29 02:12:58 -08:00
Ilya Kreymer
7a20d26d5f support non-surt ordered cdx
add unsurt() util func and surt_ordered init param to LocalCDXServer
test make_best_resolver()
2014-01-29 00:58:37 -08:00
Ilya Kreymer
411e7fe8a3 cleanup pywb_init, work on documenting config.yaml! 2014-01-29 00:03:24 -08:00
Ilya Kreymer
43a46b373d move sample/test data to ./sample_archive/warcs and ./sample_archive/cdx
pywb_init now driven by config.yaml! (#14)

Not yet supporting customized handlers, views, etc...
2014-01-28 22:03:01 -08:00
Ilya Kreymer
35f7cb0477 new-feature: support jinja2 template generated banner
template receives cdx and wbrequest
default template inserts capture time into banner
2014-01-28 20:18:47 -08:00
Ilya Kreymer
6de794a4e1 style fixes: convert camelCase func and var names to 'not_camel_case'
WbHtml -> HTMLRewriter
ArchivalUrl -> WbUrl
2014-01-28 19:37:37 -08:00
Ilya Kreymer
c0f8edf517 more refactoring: seperate top-level handlers (WBHandler) from
views (html, text)
Add CDXHandler for interfacing with cdx server directly, #12
2014-01-28 17:23:44 -08:00
Ilya Kreymer
1a234f2953 refactor: remove intermediate query object.
rename query -> views
wbhandler queries index, replayer and renders via view

new feature: 'cdx_' modifier can be used to render cdx from any request
2014-01-28 16:41:19 -08:00
Ilya Kreymer
a6458b056f some tweaks on transfer-encoding: always remove and serve unchunked
(should allow front-end serve can rechunk as needed)
2014-01-27 22:05:49 -08:00
Ilya Kreymer
8732499dd5 - cdx server bootstrap configured, #12
- pywb_init module inits from ./test directory

misc:
- router has lookahead for '/'
- dechunk even for transparent/binary
- 'text' query mode displays cdx
2014-01-27 21:46:38 -08:00
Ilya Kreymer
c55bdf0e1f -binsearch: add tests, support both prefix and exact loading, for #11
-cdx server first pass for #12: implement cdx parsing and transforming
-operations supported: merge sort, regex filter, resolve revisits, closest sort, reverse sort,
timestamp collapse
timestamp parsing utils
2014-01-27 17:02:48 -08:00
Ilya Kreymer
e1b669fdea improved customization: can setup pywb_init.pywb_config() config,
or specify custom init module <initmodule>.py_config() by
setting PYWB_INIT=<initmodule>
fix run.sh to support testing with custom mount point
2014-01-24 12:25:27 -08:00
Ilya Kreymer
44f68158a9 update README and comments 2014-01-24 01:17:18 -08:00
Ilya Kreymer
1033feb2f8 use sample settings if driver file not found 2014-01-24 00:59:15 -08:00
Ilya Kreymer
391f3bf81d remove pycdx_server pkg for now, move binsearch into pywb package,
update setup.py
2014-01-24 00:54:48 -08:00
Ilya Kreymer
03b6938b9c referer fallback: check for non empty SCRIPT_NAME when parsing referrer 2014-01-24 00:53:55 -08:00
Ilya Kreymer
94326dafc1 html_rewriter: default attrs without value to empty str value, instead of no value 2014-01-24 00:52:17 -08:00
Ilya Kreymer
e95e17b9e6 pycdx_server initial binsearch module, with support exact match iterator!
fix html_rewriter missing ; on entities
js rewriter: only rewrite full document.domain
PathIndexPrefixResolver using binsearch on path index, for #9
resolvers moved to replay_resolvers.py

improve path-resolver logic: each resolver returns an array of possible
files (could be from primary or secondary storage).
then, iterate over all possible files from all resolvers until
a successful load, or raise exception if all failed
2014-01-23 01:38:09 -08:00
Ilya Kreymer
b237b144ff further refactor steaming of responses related to #13: always create a generator from
response stream, and if buffering, read entire generator into temp buffer
remove duplicate reading logic
2014-01-22 17:55:55 -08:00
Ilya Kreymer
2d0cb5745d enable bulk doctest testing via nosetests --with-doctest
as well as individual doctests
andd utils.enable_doctests() func which checks if executing
app is nosetests (is there a better a way?)
2014-01-22 15:28:01 -08:00
Ilya Kreymer
7722014a96 Cleanup rewrite interfaces to address #13
All rewriters can support either buffered or streaming mode.
In buffered mode, the full text content is written into a buffer
and served with a Content-Length
in streaming mode, text is streamed as it is rewritten and
no Content-Length is written
Default is to stream the response
2014-01-22 14:03:41 -08:00
Jack Cushman
6581f54fad Robust chunked data exception handling. 2014-01-21 20:00:52 -05:00
Ilya Kreymer
a1cd40fba1 support replay of records that have Transfer-Encoding: chunked, but
were not actually rewritten to the warc as chunked.
Attempt to parse chunk length, and if failed, fallback to treating
record as not chunked
2014-01-20 23:06:45 -08:00
Ilya Kreymer
8fd10673e8 refactor: cleanup the revisit resolving logic in replay
also, update documented logic on wiki at:
https://github.com/ikreymer/pywb/wiki/PyWb-Record-Lookup-and-Revisits
2014-01-20 17:52:14 -08:00
Jack Cushman
903583c3d7 Handle ArchivalUrl subclasses. 2014-01-20 14:13:16 -05:00
Ilya Kreymer
9ff3fc300b Fix #5, bringing back customParams optional params sent to cdx server
Rename archivalrouter.MatchRegex -> archivalrouter.Route, supporting regex/prefix matching
add redir_to_exact to turn off redirect to exact timestamp in RewritingReplayHandler
update README
2014-01-20 10:50:06 -08:00
Ilya Kreymer
80b2585d22 Should resolve #4 -- supports pywb running as a non-root app
* Instead of relying on REQUEST_URI, pywb constructs a
REL_REQUEST_URI, from PATH_INFO + QUERY_STRING.
SCRIPT_NAME auto-added to prefix
* MatchPrefix is now superceded by MatchRegex, which
can match a plain string -- collId defaults to the full match
* Added optional archivalurl_class to router to allow for customized
ArchivalUrl implementations to be specified
* run.sh can test on a non-root mountpoint, eg. ./run.sh "/approot"
2014-01-19 21:13:48 -08:00
Ilya Kreymer
2e4d78d079 request_uri: only generate REQUEST_URI manually if not provided by wsgi framework
only encode chars that are not allowed in path segment, per
http://tools.ietf.org/html/rfc3986#section-3.3
2014-01-19 16:51:17 -08:00
Jack Cushman
595c9b0c3c wsgiref compatibility fixes.
- Manually set env[‘REQUEST_URI’] (which is nonstandard) the same way
it’s set by uwsgi.
- Include HTTP error code reasons in error response. (wsgiref checks
that error code is at least 4 characters, i.e. includes reason)
2014-01-19 16:22:06 -05:00
Ilya Kreymer
6cb1743163 Merge branch 'master' of github.com:ikreymer/pywb into work 2014-01-19 12:31:53 -08:00