1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

554 Commits

Author SHA1 Message Date
Ilya Kreymer
d702b299ae wburl: split into BaseWbUrl and WbUrl for better extensibility 2014-02-24 21:30:38 -08:00
Ilya Kreymer
21e885b78a statusandheaders: add support for header line continuations with space/tab
add basic unit test for statusandheaders
2014-02-24 21:14:10 -08:00
Ilya Kreymer
7968f360ce timeutils: timestamp_to_datetime() uses custom timestamp parsing
instead of strptime to automatically clamp timestamp to allowed
range (instead of erroring) on invalid timestamps.
returns datetime.datetime as advertised instead of struct_time as well
2014-02-24 16:30:11 -08:00
Ilya Kreymer
a474335501 fix missing param, typo 2014-02-24 19:42:37 +00:00
Ilya Kreymer
ef062fee7b cdx: add prototype support for redis cdx source (need testing) 2014-02-24 11:05:48 -08:00
Ilya Kreymer
51d61a8738 package reorg!
split up remaining parts of pywb root pkg
into core, dispatch and bootstrap
2014-02-24 03:00:01 -08:00
Ilya Kreymer
9194e867ea - add referrer self-redirect check and test case
- dispatching: cleanup wbrequestresponse, move tests to a seperate file
- wbrequest: store both rel_prefix and host_prefix, with wb_prefix either full
or rel path as needed, so that full and relative paths are
both available in wbrequest
- create WbUrlHandler to differentiate handlers which
support WbUrl (timestamp[mod]/url) semantic vs other request handlers.
2014-02-23 23:31:54 -08:00
Ilya Kreymer
a4f1224d16 zipnum: add file mtime check to location loading #17 2014-02-22 18:28:54 -08:00
Ilya Kreymer
d8d7435d77 add zipnum location reloading support
default to 10 min interval #17
2014-02-22 16:49:37 -08:00
Ilya Kreymer
1754f15831 Combine FileLoader/HttpLoader into a single BlockLoader which
delegates based on scheme
2014-02-22 16:49:26 -08:00
Ilya Kreymer
434fd23a95 optimize zipnum to support loading multiple continuous blocks,
decompressing each one individually. #17
2014-02-22 10:50:30 -08:00
Ilya Kreymer
8e840ccaaf zipnum first version! #17
split binsearch further into binsearch and linearsearch components
reading blocks one at a time currently, due to zlib decompress limitations
fix bufferedreader.readline() and fileloader bugs
2014-02-22 10:50:03 -08:00
Ilya Kreymer
a56cbcf62e binsearch: add range based matching via iter_range()
support for: exact, prefix, host, domain match types
2014-02-20 21:21:12 -08:00
Ilya Kreymer
922917a631 rename BufferedReader -> DecompressingBufferedReader
remove max_len from DecompressingBufferedReader as it applied to
the compressed size, not original size.
Add integration test for verifying content length of larger file
2014-02-20 11:53:08 -08:00
Kenji Nagahashi
bb87d98b73 Merge remote-tracking branch 'origin/master' into cdx-server 2014-02-20 18:10:51 +00:00
Ilya Kreymer
433b150542 Merge branch 'master' into perms-work 2014-02-20 09:22:13 -08:00
Ilya Kreymer
0cd6588a1d Add exclusions support #24
exclusions: add AllAllowPerms and refactor exclusions interface
add TestExclusionPerms and a sample exclusion integration test
refactor cdx server init params into **kwargs
convert all cdx params to use camelCase
2014-02-20 09:18:10 -08:00
Ilya Kreymer
4c96993411 fix missed param conversion 2014-02-20 09:14:27 -08:00
Kenji Nagahashi
0b768ce11a Merge remote-tracking branch 'origin/perms-work' into cdx-server 2014-02-20 10:05:07 +00:00
Kenji Nagahashi
79eb3be44f rewrite wsgi_cdxserver with werkzeug
use pkg_resources instead of pkgutil because pkgutil breaks with auto-reload.
add --port command line option.
2014-02-20 09:58:08 +00:00
Ilya Kreymer
ff428ed43e exclusions: add AllAllowPerms and refactor exclusions interface
add TestExclusionPerms and a sample exclusion integration test
refactor cdx server init params into **kwargs
convert all cdx params to use camelCase
2014-02-19 20:20:31 -08:00
Ilya Kreymer
be284859be sample perms addition to cdx ops 2014-02-19 17:52:13 -08:00
Kenji Nagahashi
d0229b6b2d cleanup setup.py indent for ease of add/remove things. also use find_package(). 2014-02-19 23:37:44 +00:00
Ilya Kreymer
531464902f add uncompressed warc 2014-02-19 00:14:23 -08:00
Ilya Kreymer
312bd71568 automatic record (warc/arc) format detection and decompression if needed.
no need to rely on file type listing
2014-02-19 00:13:15 -08:00
Ilya Kreymer
84e0121aa5 fixup READMEs, add domain-specific rules to cdx sample app 2014-02-18 18:18:46 -08:00
Ilya Kreymer
7c1ac10d6f update subpackage READMEs 2014-02-18 18:13:44 -08:00
Ilya Kreymer
a09dec4b3e cdx: add domain-specific rules at cdx layer for custom canonicalization!
and 'fuzzy' matching when not found
handled via cdxdomainspecific.py
BaseCDXServer contains a canonicalizer object and a fuzzy query
canonicalizer abstracted to seperate class (in canonicalizer.py)
clean up cdx related exceptions
default rules read from cdx/rules.yaml
filename configurable via 'domain_specific_rules' setting in config.yaml
fix typo in pywb/rewrite
2014-02-18 14:56:13 -08:00
Ilya Kreymer
ab95524b7b update README for 0.2! 2014-02-17 15:29:39 -08:00
Ilya Kreymer
5b34803a99 cdx: add support for filter:= and filter:!= for doing exact
(as opposed to regex matches)
eg: filter:urlkey=com,example)/?example=1 matches exact
string 'com,example)/?example=1' in the urlkey field
(as opposed to applying it as a regex)
2014-02-17 15:19:33 -08:00
Ilya Kreymer
28187b34d3 fix typos in remotecdxserver, url-agnostic dedup
when raising new exception, pass traceback of original also!
2014-02-17 14:52:13 -08:00
Ilya Kreymer
158b490453 add wsgi_cdxserver_test for testing cdx server app example
pywb.cdx.wsgi_cdxserver
2014-02-17 13:59:57 -08:00
Ilya Kreymer
abea504b04 cleanup cdx server config, refactored such that
a cdx server need implement a single interface:
load_cdx(self, **params)

CDXServer and RemoteCDXServer distinct classes in cdxserver.py
utility function cdxserver.create_cdx_server() to create
appropriate server based on input
2014-02-17 13:58:02 -08:00
Ilya Kreymer
94f1dc3be5 cleanup wbexceptions, remove unused 2014-02-17 10:23:37 -08:00
Ilya Kreymer
5345459298 pywb 0.2!
move to distinct packages: pywb.utils, pywb.cdx, pywb.warc, pywb.util, pywb.rewrite!
each package will have its own README and tests
shared sample_data and install
2014-02-17 10:01:09 -08:00
Ilya Kreymer
2528ee0a7c refactoring of binsearch and cdxserver into seperate packages
also move complicated doctests and integration tests to tests/
2014-02-12 13:16:07 -08:00
Ilya Kreymer
e4f409b2a4 simplify pywb_init config:
- add defaults dictionary, chain dictionaries rather than copying
 - allow custom classes to be loaded explicitly via yaml
 - for LineReader, assume ungzipped if first decompress fails
 - properly ignore bad local paths
 - add optional reporter object
2014-02-11 14:10:40 -08:00
Ilya Kreymer
8b2bfa570c referer redirect fixes:
- allow redirect if current Host: matches
- redirect request uri to host root, not current host path
2014-02-09 20:19:43 +00:00
Ilya Kreymer
2a828fab32 add sample data to dist egg to allow tests against the installed package 2014-02-09 12:06:35 -08:00
Ilya Kreymer
232ac733ab referer redirect: check against registered routes
js rewriter: only rewrite quoted strings, support relative redirect
Jinja view: add 'host' filter for extracting hostname
css tweak
2014-02-09 01:42:42 -08:00
Ilya Kreymer
a757f53bd5 cleanup Route config, move filters init into custom_init
remove extra print
2014-02-08 22:01:31 -08:00
Ilya Kreymer
44f38f44d5 paths cleanup:
- don't store explicit static path, but allow it to be set in the insert
- store host_prefix, which is either server name or empty
- for archival mode, absolute_paths settings controls if using absolute paths,
- for proxy always use absolute_paths
- default static path is: /static/default/
- allow extension apps to provide custom /static/X/ path

Route overriding:
- ability to set Route class
- custom init method

Archival Relative Redirect:
- if starting with timestamp, drop timestamp and assume host-relative path

Integration Tests:
- test proxy mode by using REQUEST_URI
- test archival relative redirect!
2014-02-08 20:07:16 -08:00
Ilya Kreymer
b11f4fad93 add support for pywb static content routes (seperate from uwsgi)
adding StaticHandler and loading templates and static resources from current package
add default template and static data to be included in the pywb package
add test for custom static route
2014-02-07 19:32:58 -08:00
Ilya Kreymer
00a7691f69 add optional filters to default Route
add examples to config.yaml and test_config.yaml and integration test
per route config is inherited globally if only name is set
2014-02-06 17:28:08 -08:00
Ilya Kreymer
d347b4952b don't mask raised exceptions, to address #23 2014-02-05 13:21:57 -08:00
Ilya Kreymer
1a1aa814d0 first pass at simple http proxy! #8
* proxy router for handling only proxy
* proxy/archival router for handling both archival and proxy mode,
  togglable with 'enable_http_proxy' setting in config
* supports only most recent capture playback -- no support for
selecting replay date/calendar view yet
* not testable with WebTest -- need better way to unit test proxy mode
2014-02-05 13:08:10 -08:00
Ilya Kreymer
848dc6d000 add new test_config.yaml! 2014-02-05 10:17:06 -08:00
Ilya Kreymer
3168b80cfa improve docs for config.yaml, group all ui settings together
create seperate test_config.yaml for testing
rename ArchivalRequestRouter -> ArchivalRouter for consistency
2014-02-05 10:10:33 -08:00
ikreymer
94f8f080c4 Merge pull request #22 from rajbot/master
Add Vagrantfile
2014-02-04 13:19:34 -08:00
rajbot
bad88e298f Clarify vagrant section of the README 2014-02-04 13:12:25 -08:00