backup/pywb - pywb - Source code and issue tracker for Open Eggbert

mirror of https://github.com/webrecorder/pywb.git synced 2025-03-22 14:24:27 +01:00

Author	SHA1	Message	Date
Ilya Kreymer	d347b4952b	don't mask raised exceptions, to address #23	2014-02-05 13:21:57 -08:00
Ilya Kreymer	1a1aa814d0	first pass at simple http proxy! #8 * proxy router for handling only proxy * proxy/archival router for handling both archival and proxy mode, togglable with 'enable_http_proxy' setting in config * supports only most recent capture playback -- no support for selecting replay date/calendar view yet * not testable with WebTest -- need better way to unit test proxy mode	2014-02-05 13:08:10 -08:00
Ilya Kreymer	3168b80cfa	improve docs for config.yaml, group all ui settings together create seperate test_config.yaml for testing rename ArchivalRequestRouter -> ArchivalRouter for consistency	2014-02-05 10:10:33 -08:00
Ilya Kreymer	6388a78162	refactor: replay_views to support cleaner inheritance, no longer wrapping previous WbResponse overhaul yaml config to be much simpler, move best resolver and best index reader to respective classes add config_utils for sharing config, standard non-yaml config provides defaults for testing fix bug in query.html	2014-02-03 09:24:40 -08:00
Ilya Kreymer	bdef00cb8d	refactor WbUrl and UrlRewriter to drop requirement for having a WbUrl start with / Changes WbUrl forms: /2013/im_/example.com -> 2013/im_/example.com //example.com -> /example.com /example.com -> example.com * also simplify scheme-agnostic url (//) handling by just eating up extra slashes * add additional doctests on route, with and w/o custom SCRIPT_NAME	2014-02-01 18:20:23 -08:00
Ilya Kreymer	9f258fa64c	fix up cdx server query interface supports /cdx?url=... and other params including filter=<regex> collapse_time=<0-14> resolve_revisits=<true\|false> reverse=<true\|false> closest=<timestamp>	2014-02-01 14:47:07 -08:00
Ilya Kreymer	b685772b96	fixup loading from archive, add LimitReader to ensure record length is respected rename FileReader -> FileLoader, HttpReader -> HttpLoader loaders create 'readers', which support read()/readline()	2014-02-01 14:02:53 -08:00
Ilya Kreymer	d9c4e5cba4	make RemoteCDXServer api conform to LocalCDXServer api, addressing #19	2014-02-01 13:19:30 -08:00
Ilya Kreymer	86a093d164	support cdx server query at (/cdx in default config) also enable /echo_env and /echo_req debug handlers	2014-02-01 00:43:24 -08:00
Ilya Kreymer	2f5ffb3a88	switch test framework to use py.test instead of nose	2014-02-01 00:12:11 -08:00
Ilya Kreymer	6d2c8286ca	render_response has option to pass in statuscode	2014-01-31 19:45:01 -08:00
Ilya Kreymer	bd94e3c656	fix replay_resolvers tests, don't use abs paths!	2014-01-31 10:33:47 -08:00
Ilya Kreymer	304ddbec84	Support for new UI, as per #16 * Refactor views class to support more Jinja2 views (J2Template) * Add a home page, collection search page, and error pages, all optional * all exceptions appear on error page * wbrequest supports a request with an empty or / wb_url	2014-01-31 10:04:21 -08:00
Ilya Kreymer	937fc7229e	update README, fix typo	2014-01-29 02:12:58 -08:00
Ilya Kreymer	7a20d26d5f	support non-surt ordered cdx add unsurt() util func and surt_ordered init param to LocalCDXServer test make_best_resolver()	2014-01-29 00:58:37 -08:00
Ilya Kreymer	411e7fe8a3	cleanup pywb_init, work on documenting config.yaml!	2014-01-29 00:03:24 -08:00
Ilya Kreymer	43a46b373d	move sample/test data to ./sample_archive/warcs and ./sample_archive/cdx pywb_init now driven by config.yaml! (#14) Not yet supporting customized handlers, views, etc...	2014-01-28 22:03:01 -08:00
Ilya Kreymer	35f7cb0477	new-feature: support jinja2 template generated banner template receives cdx and wbrequest default template inserts capture time into banner	2014-01-28 20:18:47 -08:00
Ilya Kreymer	6de794a4e1	style fixes: convert camelCase func and var names to 'not_camel_case' WbHtml -> HTMLRewriter ArchivalUrl -> WbUrl	2014-01-28 19:37:37 -08:00
Ilya Kreymer	c0f8edf517	more refactoring: seperate top-level handlers (WBHandler) from views (html, text) Add CDXHandler for interfacing with cdx server directly, #12	2014-01-28 17:23:44 -08:00
Ilya Kreymer	1a234f2953	refactor: remove intermediate query object. rename query -> views wbhandler queries index, replayer and renders via view new feature: 'cdx_' modifier can be used to render cdx from any request	2014-01-28 16:41:19 -08:00
Ilya Kreymer	a6458b056f	some tweaks on transfer-encoding: always remove and serve unchunked (should allow front-end serve can rechunk as needed)	2014-01-27 22:05:49 -08:00
Ilya Kreymer	8732499dd5	- cdx server bootstrap configured, #12 - pywb_init module inits from ./test directory misc: - router has lookahead for '/' - dechunk even for transparent/binary - 'text' query mode displays cdx	2014-01-27 21:46:38 -08:00
Ilya Kreymer	c55bdf0e1f	-binsearch: add tests, support both prefix and exact loading, for #11 -cdx server first pass for #12: implement cdx parsing and transforming -operations supported: merge sort, regex filter, resolve revisits, closest sort, reverse sort, timestamp collapse timestamp parsing utils	2014-01-27 17:02:48 -08:00
Ilya Kreymer	e1b669fdea	improved customization: can setup pywb_init.pywb_config() config, or specify custom init module <initmodule>.py_config() by setting PYWB_INIT=<initmodule> fix run.sh to support testing with custom mount point	2014-01-24 12:25:27 -08:00
Ilya Kreymer	44f68158a9	update README and comments	2014-01-24 01:17:18 -08:00
Ilya Kreymer	1033feb2f8	use sample settings if driver file not found	2014-01-24 00:59:15 -08:00
Ilya Kreymer	391f3bf81d	remove pycdx_server pkg for now, move binsearch into pywb package, update setup.py	2014-01-24 00:54:48 -08:00
Ilya Kreymer	03b6938b9c	referer fallback: check for non empty SCRIPT_NAME when parsing referrer	2014-01-24 00:53:55 -08:00
Ilya Kreymer	94326dafc1	html_rewriter: default attrs without value to empty str value, instead of no value	2014-01-24 00:52:17 -08:00
Ilya Kreymer	e95e17b9e6	pycdx_server initial binsearch module, with support exact match iterator! fix html_rewriter missing ; on entities js rewriter: only rewrite full document.domain PathIndexPrefixResolver using binsearch on path index, for #9 resolvers moved to replay_resolvers.py improve path-resolver logic: each resolver returns an array of possible files (could be from primary or secondary storage). then, iterate over all possible files from all resolvers until a successful load, or raise exception if all failed	2014-01-23 01:38:09 -08:00
Ilya Kreymer	b237b144ff	further refactor steaming of responses related to #13 : always create a generator from response stream, and if buffering, read entire generator into temp buffer remove duplicate reading logic	2014-01-22 17:55:55 -08:00
Ilya Kreymer	2d0cb5745d	enable bulk doctest testing via `nosetests --with-doctest` as well as individual doctests andd utils.enable_doctests() func which checks if executing app is nosetests (is there a better a way?)	2014-01-22 15:28:01 -08:00
Ilya Kreymer	7722014a96	Cleanup rewrite interfaces to address #13 All rewriters can support either buffered or streaming mode. In buffered mode, the full text content is written into a buffer and served with a Content-Length in streaming mode, text is streamed as it is rewritten and no Content-Length is written Default is to stream the response	2014-01-22 14:03:41 -08:00
Jack Cushman	6581f54fad	Robust chunked data exception handling.	2014-01-21 20:00:52 -05:00
Ilya Kreymer	a1cd40fba1	support replay of records that have Transfer-Encoding: chunked, but were not actually rewritten to the warc as chunked. Attempt to parse chunk length, and if failed, fallback to treating record as not chunked	2014-01-20 23:06:45 -08:00
Ilya Kreymer	8fd10673e8	refactor: cleanup the revisit resolving logic in replay also, update documented logic on wiki at: https://github.com/ikreymer/pywb/wiki/PyWb-Record-Lookup-and-Revisits	2014-01-20 17:52:14 -08:00
Jack Cushman	903583c3d7	Handle ArchivalUrl subclasses.	2014-01-20 14:13:16 -05:00
Ilya Kreymer	9ff3fc300b	Fix #5 , bringing back customParams optional params sent to cdx server Rename archivalrouter.MatchRegex -> archivalrouter.Route, supporting regex/prefix matching add redir_to_exact to turn off redirect to exact timestamp in RewritingReplayHandler update README	2014-01-20 10:50:06 -08:00
Ilya Kreymer	80b2585d22	Should resolve #4 -- supports pywb running as a non-root app * Instead of relying on REQUEST_URI, pywb constructs a REL_REQUEST_URI, from PATH_INFO + QUERY_STRING. SCRIPT_NAME auto-added to prefix * MatchPrefix is now superceded by MatchRegex, which can match a plain string -- collId defaults to the full match * Added optional archivalurl_class to router to allow for customized ArchivalUrl implementations to be specified * run.sh can test on a non-root mountpoint, eg. ./run.sh "/approot"	2014-01-19 21:13:48 -08:00
Ilya Kreymer	2e4d78d079	request_uri: only generate REQUEST_URI manually if not provided by wsgi framework only encode chars that are not allowed in path segment, per http://tools.ietf.org/html/rfc3986#section-3.3	2014-01-19 16:51:17 -08:00
Jack Cushman	595c9b0c3c	wsgiref compatibility fixes. - Manually set env[‘REQUEST_URI’] (which is nonstandard) the same way it’s set by uwsgi. - Include HTTP error code reasons in error response. (wsgiref checks that error code is at least 4 characters, i.e. includes reason)	2014-01-19 16:22:06 -05:00
Ilya Kreymer	6cb1743163	Merge branch 'master' of github.com:ikreymer/pywb into work	2014-01-19 12:31:53 -08:00
Ilya Kreymer	354040a7e0	support for url-agnostic dedup, eg loading payload from a different url than the revisit	2014-01-19 12:31:19 -08:00
Jack Cushman	c9d0b0ba7b	Handle transfer-encoding:chunked; misc. replay bugs. - Add a ChunkedLineReader to deal with replays with the transfer-encoding: chunked header. - Catch UnicodeDecodeErrors caused by multibyte characters getting split during buffering. - A couple of tiny bugs in replay.py	2014-01-18 21:32:49 -05:00
Ilya Kreymer	7ce6d0d22b	first pass on html rendering via jinja, support for query (cdx) rendering	2014-01-17 16:24:36 -08:00
Ilya Kreymer	bcc9588c00	* archivalrouter: to take a list of handlers, currently MatchPrefix and MatchRegex. handler returns a single response (no chaining for now) * rewriting: don't rewrite anchor only urls * perf: add a very basic profiler in WBHandler for testing	2014-01-16 20:33:51 -08:00
Ilya Kreymer	c4457abc4c	Update README Rename FullHandler -> WBHandler Add additional comments!	2014-01-03 21:44:20 -08:00
Ilya Kreymer	d820a8c06a	add some comments, make charset parsing lower()	2014-01-03 17:40:20 -08:00
Ilya Kreymer	c255f4e47f	fix typos	2014-01-03 17:04:15 -08:00

... 8 9 10 11 12

572 Commits