1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

2131 Commits

Author SHA1 Message Date
Ilya Kreymer
8d3d326c9e tests: add pathresolver tests for RedisResolver and PathIndexResolver 2014-04-02 11:41:20 -07:00
Ilya Kreymer
90f4833df3 add cli interface for archiveindexer expose as 'cdx-indexer'
add tests for cli interface
additional tests for statusheaders
2014-04-02 10:36:55 -07:00
Ilya Kreymer
732df1a172 add cmdline interface with argparse to archiveindexer 2014-04-02 00:18:57 -07:00
Ilya Kreymer
28d65ce717 archiveindexer major refactoring using zlib only
supports warc.gz, arc.gz, warc, arc and optional sorting
outputs cdx 11 but possible to extend to other formats
(additional edge case testing needed)
DecompressingBufferedReader refactoring to support multi-member gzip
Unit tests for indexer, addtional unit tests for bufferedreaders and loaders,
and recordloaders
2014-03-30 23:47:33 -07:00
Ilya Kreymer
26bb695292 archiveindex: use list instead of ordereddict for cdx,
will add customizations later
2014-03-29 17:37:23 -07:00
Ilya Kreymer
cedc58a405 add archiveindexer! 2014-03-29 16:10:16 -07:00
Ilya Kreymer
7760b9b5a2 warc: seperate parse_record_loader() to enable direct parsing
of a file-like stream
detect and ignore warcinfo and arc header
2014-03-29 15:58:03 -07:00
Ilya Kreymer
99eadb3d4f update package paths 2014-03-28 11:57:13 -07:00
Ilya Kreymer
9700004dc8 move configs to pywb package as package data 2014-03-28 11:53:59 -07:00
Ilya Kreymer
49d2d5b035 customizations: support custom cdx api suffix, custom
cdx server class
2014-03-28 10:58:14 -07:00
Ilya Kreymer
e2f7777c7d jinja2: add decorator for adding custom filters 2014-03-28 10:57:55 -07:00
Ilya Kreymer
83e07442f0 add configs to datadirs 2014-03-28 10:54:37 -07:00
Ilya Kreymer
2c74ea9f23 fuzzy match: make filter string optionally overridable
setup.py: unset PYWB_CONFIG_ENV
2014-03-27 21:43:30 -07:00
Ilya Kreymer
41d51a6427 ensure 'cdx_' modifier is working 2014-03-27 14:46:59 -07:00
Ilya Kreymer
093d8310e5 config: move config files to ./configs/
PYWB_CONFIG_FILE setting overrides passed in config
2014-03-27 14:31:27 -07:00
Ilya Kreymer
b5e70f5dc6 timeutils: add sec_to_timestamp() func 2014-03-27 14:24:49 -07:00
Ilya Kreymer
da0623fbbb lxml: ensure lxml support is optional: if not available,
use_lxml_parser() will return false and doctests/pytest collection
won't test the lxml parser
2014-03-26 14:05:02 -07:00
Ilya Kreymer
4e53c2e9d8 remote cdx refactoring: refactor remote cdx source and server to support
fuzzy matching
test local cdx server, remote cdx source, local and remote filtering
with self-contained unit tests
map remote cdx httperrors to pywb exceptions
2014-03-26 11:33:46 -07:00
Ilya Kreymer
5847087aae add fakeredis mock, test for RedisCDXSource 2014-03-25 11:02:32 -07:00
Ilya Kreymer
87df7c22f1 standardize test scripts to test_*.py instead of *_test.py 2014-03-25 11:01:51 -07:00
Ilya Kreymer
596f67437b update README with changes for memento, lxml and badges for develop 2014-03-24 15:01:33 -07:00
Ilya Kreymer
c6c9fe680a memento: add original link to timemap #10 2014-03-24 14:57:41 -07:00
Ilya Kreymer
2a605652c6 add memento timemap support (for archival mode only)
add timemap Link headers to timegate and memento responses
timemap accessible via /timemap/*/ path
2014-03-24 14:00:06 -07:00
Ilya Kreymer
9654c22bed rewrite: add doctype rewriting, more tests on various markup edge cases 2014-03-23 23:46:49 -07:00
Ilya Kreymer
742df6238e fix typo in renaming file 2014-03-23 13:12:06 -07:00
Ilya Kreymer
bcaacaf642 rename handlers
pep8 cleanup for all packages
remove obsolte statictextview
2014-03-23 12:59:21 -07:00
Ilya Kreymer
ac0bf5a415 refactor: IndexReader -> QueryHandler, move query output support
to QueryHandler. allow for multiple query views in QueryHandler
2014-03-23 12:44:28 -07:00
Ilya Kreymer
79da12348f limit stream by warc/arc record length instead of
http content length.
track length of StatusAndHeaders also.
add tests to verify content length correct for identity
arc and arcgz replays as well
2014-03-22 11:30:51 -07:00
Ilya Kreymer
53590537e0 Merge develop and lxml 2014-03-18 17:14:27 -07:00
Ilya Kreymer
a6b4ae4c47 chardet optimization: using chardet feed() approach to avoid passing in entire buffer 2014-03-17 20:53:42 -07:00
Ilya Kreymer
d1ad9b5e69 refactor: cleanup HTMLRewrtier/LXMLHTMLRewriter close path,
single close in base class delegeating to _internal_close()
Also, HTMLRewriter auto-terminates <script> and <style> tags
for consistency with lxml
2014-03-17 20:50:35 -07:00
Ilya Kreymer
10c84d8354 embed rewriting: add 'em_' flag for all regex-based rewrites
(js, css, xml) to be able to distinguish between embeds and non-embeds
more conclusively
wbrequest: add is_embed(), is_identity() properties
update tests
don't insert html banner if detected as an embed
2014-03-17 19:36:25 -07:00
Ilya Kreymer
52d99aef57 misc fixes: RemoteCDXServer throws NotFoundException on 404
fix typo in handlers
make WBHandler overridable in pywb_init
make perms_policy optional in IndexReader
2014-03-17 17:35:10 -07:00
Ilya Kreymer
2e7b17ed56 cleanup: move lxml tests to seperate test dir, seperate html, lxml html and regex
tests into seperate files
fix lxml toggle in rewriterrules
2014-03-17 15:30:45 -07:00
Ilya Kreymer
f35e82a4d5 ensure final output from close() is encoded!
add config option to 'use_lxml_parser' if available, if not,
will default to regular parser
testing on travis with lxml (not adding to dep yet)
2014-03-17 13:19:51 -07:00
Ilya Kreymer
1404177c6f fixes for unicode (doctests)
remove explicit </html> since lxml does not parse past the </html>
tag and adds one anyway (not ideal but only workaround for html after closing tag)
2014-03-17 11:55:45 -07:00
Ilya Kreymer
23d60b0bb8 more work on lxml parser.. always write
start/end tags..
rewriterules: experiment defaulting to lxml if possible!
2014-03-17 09:48:31 -07:00
Ilya Kreymer
bd10c6c2d2 first pass -- lxml parser! 2014-03-16 23:12:04 -07:00
Ilya Kreymer
b0a7cafe6d update default static path to: pywb/static/ 2014-03-14 18:37:03 -07:00
Ilya Kreymer
6461af030b refactoring: clean up handlers and replay_views for pep8
use BlockLoader().load for StaticHandler static file resolving
update static paths to point to pywb/static instead of static
2014-03-14 18:17:22 -07:00
Ilya Kreymer
a69d565af5 make pywb.rewrite package pep8-compatible
move doctests to test subdir
2014-03-14 16:44:23 -07:00
Ilya Kreymer
bfffac45b0 remove reference to deleted file wbexceptions.py 2014-03-14 11:22:50 -07:00
Ilya Kreymer
cb244a8c25 more readme tweaks 2014-03-14 11:19:05 -07:00
Ilya Kreymer
535cbc6dde update README 2014-03-14 11:05:05 -07:00
Ilya Kreymer
14a12f95b2 pep8 fixes, improve docs for proxy
move CaptureException into replay_views
2014-03-14 11:02:03 -07:00
Ilya Kreymer
bdcda1df6f add test config for memento #10 2014-03-14 11:01:47 -07:00
Ilya Kreymer
a1ab54c340 first pass at memento support #10!
memento support enabled by default, togglable via 'enable_memento' config property
supporting timegate and memento apis, no timemap yet
supporting pattern 2.3 for archival and pattern 1.3 for proxy modes
also:
simplify exception hierarchy a bit more, move down to utils
make WbRequest and WbResponse extensible with mixins (eg for memento)
2014-03-14 10:46:20 -07:00
Ilya Kreymer
dd9a2c635f disable pypy travis builds for now 2014-03-13 22:38:56 -07:00
Ilya Kreymer
29ecadee54 update README, fix setup.py typo 2014-03-12 18:35:21 -07:00
Ilya Kreymer
3222f3ee08 update setup, remove markdown readme 2014-03-12 17:57:54 -07:00