ikreymer
b6846c54e0
Merge pull request #21 from ikreymer/wburl-drop-slash
...
refactor WbUrl and UrlRewriter to drop requirement for having a WbUrl start with /
2014-02-01 19:47:18 -08:00
Ilya Kreymer
bdef00cb8d
refactor WbUrl and UrlRewriter to drop requirement for having a WbUrl start with /
...
Changes WbUrl forms:
/2013/im_/example.com -> 2013/im_/example.com
/*/example.com -> */example.com
/example.com -> example.com
* also simplify scheme-agnostic url (//) handling by just eating up extra
slashes
* add additional doctests on route, with and w/o custom SCRIPT_NAME
2014-02-01 18:20:23 -08:00
Ilya Kreymer
9f258fa64c
fix up cdx server query interface
...
supports /cdx?url=... and other params including
filter=<regex>
collapse_time=<0-14>
resolve_revisits=<true|false>
reverse=<true|false>
closest=<timestamp>
2014-02-01 14:47:07 -08:00
Ilya Kreymer
b685772b96
fixup loading from archive, add LimitReader to ensure record length is respected
...
rename FileReader -> FileLoader, HttpReader -> HttpLoader
loaders create 'readers', which support read()/readline()
2014-02-01 14:02:53 -08:00
Ilya Kreymer
d9c4e5cba4
make RemoteCDXServer api conform to LocalCDXServer api, addressing #19
2014-02-01 13:19:30 -08:00
Ilya Kreymer
86a093d164
support cdx server query at (/cdx in default config)
...
also enable /echo_env and /echo_req debug handlers
2014-02-01 00:43:24 -08:00
Ilya Kreymer
f00ac826cf
fix typo in setup.py
2014-02-01 00:15:31 -08:00
Ilya Kreymer
2f5ffb3a88
switch test framework to use py.test instead of nose
2014-02-01 00:12:11 -08:00
Ilya Kreymer
6d2c8286ca
render_response has option to pass in statuscode
2014-01-31 19:45:01 -08:00
Ilya Kreymer
44ef14b022
add first integration tests with WebTest!
...
covers home page, search page, replay, calendar, redirect + replay, cdx
2014-01-31 19:41:44 -08:00
Ilya Kreymer
bd94e3c656
fix replay_resolvers tests, don't use abs paths!
2014-01-31 10:33:47 -08:00
Ilya Kreymer
304ddbec84
Support for new UI, as per #16
...
* Refactor views class to support more Jinja2 views (J2Template)
* Add a home page, collection search page, and error pages, all optional
* all exceptions appear on error page
* wbrequest supports a request with an empty or / wb_url
2014-01-31 10:04:21 -08:00
Ilya Kreymer
57fe9515db
- support for running uwsgi with virtualenv
...
- text changes in banner
- some info about testing in README
2014-01-29 17:23:19 -08:00
Ilya Kreymer
467d880681
update README
2014-01-29 15:15:39 -08:00
Ilya Kreymer
53eb5072ec
more README tweaks
2014-01-29 15:12:57 -08:00
Ilya Kreymer
28618c69c6
update query.html, listing unique timestamps
...
update README
2014-01-29 15:07:45 -08:00
Ilya Kreymer
e7b70ae496
fix links in README
2014-01-29 12:08:51 -08:00
Ilya Kreymer
f45234f39b
README tweaks
2014-01-29 12:07:33 -08:00
Ilya Kreymer
a6cfe9a87b
update README.md
2014-01-29 12:01:03 -08:00
Ilya Kreymer
937fc7229e
update README, fix typo
2014-01-29 02:12:58 -08:00
Ilya Kreymer
9cde058ccf
check for osx uwsgi path and use that, otherwise run 'uwsgi'
2014-01-29 02:12:54 -08:00
Ilya Kreymer
84ffec9b8d
update README.md
2014-01-29 01:52:30 -08:00
Ilya Kreymer
eb9cef9e28
update README.md for beta!!!
2014-01-29 01:36:31 -08:00
Ilya Kreymer
7a20d26d5f
support non-surt ordered cdx
...
add unsurt() util func and surt_ordered init param to LocalCDXServer
test make_best_resolver()
2014-01-29 00:58:37 -08:00
Ilya Kreymer
9a3449dfd5
add pyyaml to dependency
2014-01-29 00:04:54 -08:00
Ilya Kreymer
411e7fe8a3
cleanup pywb_init, work on documenting config.yaml!
2014-01-29 00:03:24 -08:00
Ilya Kreymer
43a46b373d
move sample/test data to ./sample_archive/warcs and ./sample_archive/cdx
...
pywb_init now driven by config.yaml! (#14 )
Not yet supporting customized handlers, views, etc...
2014-01-28 22:03:01 -08:00
Ilya Kreymer
35f7cb0477
new-feature: support jinja2 template generated banner
...
template receives cdx and wbrequest
default template inserts capture time into banner
2014-01-28 20:18:47 -08:00
Ilya Kreymer
6de794a4e1
style fixes: convert camelCase func and var names to 'not_camel_case'
...
WbHtml -> HTMLRewriter
ArchivalUrl -> WbUrl
2014-01-28 19:37:37 -08:00
Ilya Kreymer
c0f8edf517
more refactoring: seperate top-level handlers (WBHandler) from
...
views (html, text)
Add CDXHandler for interfacing with cdx server directly, #12
2014-01-28 17:23:44 -08:00
Ilya Kreymer
1a234f2953
refactor: remove intermediate query object.
...
rename query -> views
wbhandler queries index, replayer and renders via view
new feature: 'cdx_' modifier can be used to render cdx from any request
2014-01-28 16:41:19 -08:00
Ilya Kreymer
a83d527702
add surt to dependency list
2014-01-27 22:07:27 -08:00
Ilya Kreymer
a6458b056f
some tweaks on transfer-encoding: always remove and serve unchunked
...
(should allow front-end serve can rechunk as needed)
2014-01-27 22:05:49 -08:00
Ilya Kreymer
8732499dd5
- cdx server bootstrap configured, #12
...
- pywb_init module inits from ./test directory
misc:
- router has lookahead for '/'
- dechunk even for transparent/binary
- 'text' query mode displays cdx
2014-01-27 21:46:38 -08:00
Ilya Kreymer
c55bdf0e1f
-binsearch: add tests, support both prefix and exact loading, for #11
...
-cdx server first pass for #12 : implement cdx parsing and transforming
-operations supported: merge sort, regex filter, resolve revisits, closest sort, reverse sort,
timestamp collapse
timestamp parsing utils
2014-01-27 17:02:48 -08:00
Ilya Kreymer
e1b669fdea
improved customization: can setup pywb_init.pywb_config() config,
...
or specify custom init module <initmodule>.py_config() by
setting PYWB_INIT=<initmodule>
fix run.sh to support testing with custom mount point
2014-01-24 12:25:27 -08:00
Ilya Kreymer
44f68158a9
update README and comments
2014-01-24 01:17:18 -08:00
Ilya Kreymer
1033feb2f8
use sample settings if driver file not found
2014-01-24 00:59:15 -08:00
Ilya Kreymer
391f3bf81d
remove pycdx_server pkg for now, move binsearch into pywb package,
...
update setup.py
2014-01-24 00:54:48 -08:00
Ilya Kreymer
03b6938b9c
referer fallback: check for non empty SCRIPT_NAME when parsing referrer
2014-01-24 00:53:55 -08:00
Ilya Kreymer
94326dafc1
html_rewriter: default attrs without value to empty str value, instead of no value
2014-01-24 00:52:17 -08:00
Ilya Kreymer
5987a0c047
update README.md!
2014-01-23 16:30:37 -08:00
Ilya Kreymer
cbf0e23ad9
add .travis.yml for Travis CI!
2014-01-23 16:20:51 -08:00
Ilya Kreymer
e95e17b9e6
pycdx_server initial binsearch module, with support exact match iterator!
...
fix html_rewriter missing ; on entities
js rewriter: only rewrite full document.domain
PathIndexPrefixResolver using binsearch on path index, for #9
resolvers moved to replay_resolvers.py
improve path-resolver logic: each resolver returns an array of possible
files (could be from primary or secondary storage).
then, iterate over all possible files from all resolvers until
a successful load, or raise exception if all failed
2014-01-23 01:38:09 -08:00
Ilya Kreymer
b237b144ff
further refactor steaming of responses related to #13 : always create a generator from
...
response stream, and if buffering, read entire generator into temp buffer
remove duplicate reading logic
2014-01-22 17:55:55 -08:00
Ilya Kreymer
2d0cb5745d
enable bulk doctest testing via nosetests --with-doctest
...
as well as individual doctests
andd utils.enable_doctests() func which checks if executing
app is nosetests (is there a better a way?)
2014-01-22 15:28:01 -08:00
Ilya Kreymer
7722014a96
Cleanup rewrite interfaces to address #13
...
All rewriters can support either buffered or streaming mode.
In buffered mode, the full text content is written into a buffer
and served with a Content-Length
in streaming mode, text is streamed as it is rewritten and
no Content-Length is written
Default is to stream the response
2014-01-22 14:03:41 -08:00
ikreymer
33c135b337
Merge pull request #7 from jcushman/master
...
Robust chunked data exception handling.
2014-01-21 19:23:03 -08:00
Jack Cushman
6581f54fad
Robust chunked data exception handling.
2014-01-21 20:00:52 -05:00
Ilya Kreymer
a1cd40fba1
support replay of records that have Transfer-Encoding: chunked, but
...
were not actually rewritten to the warc as chunked.
Attempt to parse chunk length, and if failed, fallback to treating
record as not chunked
2014-01-20 23:06:45 -08:00