1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

1193 Commits

Author SHA1 Message Date
Ilya Kreymer
5388a0b03b Merge branch 'develop' of https://github.com/ikreymer/pywb into develop 2014-04-03 12:45:54 -07:00
Ilya Kreymer
5dd586cf07 refactor: simplify rewrite_content and replay_views, remove
redundant code.. everything goes through rewrite_content(),
is sanitized (for transfer encoding) if needed
additional testing for decode_buff
fix failed_files bug in resolvingloader, add tests
2014-04-03 12:44:00 -07:00
Ilya Kreymer
5155a5c842 fix README headings 2014-04-03 09:25:10 -07:00
Ilya Kreymer
bd21fec6d4 update run-uwsgi.sh and add run-gunicorn.sh
update README and INSTALL, fix typo
only list wb handlers on home page by default
pep8 fixes
2014-04-03 08:56:18 -07:00
Ilya Kreymer
1e7ecb901a tweak README, add no cover pragmas to blocking cli apps (for now) 2014-04-02 21:43:09 -07:00
Ilya Kreymer
80f2da9548 refactor: move configs/config.yaml to root again
remove cdx-server specific config, instead make cdx server api-only
path configurable from regular config
2014-04-02 21:26:53 -07:00
Ilya Kreymer
8bdafeb040 Update README.rst
move changes, installation to separate files.. add simplified install guide
2014-04-02 20:29:00 -07:00
Ilya Kreymer
05eba0194a add CHANGES.rst changelist 2014-04-02 20:19:17 -07:00
Ilya Kreymer
bfa3f64121 create INSTALL.rst
advanced install info moved to INSTALL.rst
2014-04-02 19:23:56 -07:00
Ilya Kreymer
399642d719 add missing cdxserver test file 2014-04-02 18:34:05 -07:00
Ilya Kreymer
8b37fef8e0 tests: add explicit cdxserver config testing with different config variations 2014-04-02 15:01:40 -07:00
Ilya Kreymer
91184426b7 test coverage pass:
refactor and cleanup to improve coverage for corner cases
2014-04-02 13:16:54 -07:00
Ilya Kreymer
8d3d326c9e tests: add pathresolver tests for RedisResolver and PathIndexResolver 2014-04-02 11:41:20 -07:00
Ilya Kreymer
90f4833df3 add cli interface for archiveindexer expose as 'cdx-indexer'
add tests for cli interface
additional tests for statusheaders
2014-04-02 10:36:55 -07:00
Ilya Kreymer
732df1a172 add cmdline interface with argparse to archiveindexer 2014-04-02 00:18:57 -07:00
Ilya Kreymer
28d65ce717 archiveindexer major refactoring using zlib only
supports warc.gz, arc.gz, warc, arc and optional sorting
outputs cdx 11 but possible to extend to other formats
(additional edge case testing needed)
DecompressingBufferedReader refactoring to support multi-member gzip
Unit tests for indexer, addtional unit tests for bufferedreaders and loaders,
and recordloaders
2014-03-30 23:47:33 -07:00
Ilya Kreymer
26bb695292 archiveindex: use list instead of ordereddict for cdx,
will add customizations later
2014-03-29 17:37:23 -07:00
Ilya Kreymer
cedc58a405 add archiveindexer! 2014-03-29 16:10:16 -07:00
Ilya Kreymer
7760b9b5a2 warc: seperate parse_record_loader() to enable direct parsing
of a file-like stream
detect and ignore warcinfo and arc header
2014-03-29 15:58:03 -07:00
Ilya Kreymer
99eadb3d4f update package paths 2014-03-28 11:57:13 -07:00
Ilya Kreymer
9700004dc8 move configs to pywb package as package data 2014-03-28 11:53:59 -07:00
Ilya Kreymer
49d2d5b035 customizations: support custom cdx api suffix, custom
cdx server class
2014-03-28 10:58:14 -07:00
Ilya Kreymer
e2f7777c7d jinja2: add decorator for adding custom filters 2014-03-28 10:57:55 -07:00
Ilya Kreymer
83e07442f0 add configs to datadirs 2014-03-28 10:54:37 -07:00
Ilya Kreymer
2c74ea9f23 fuzzy match: make filter string optionally overridable
setup.py: unset PYWB_CONFIG_ENV
2014-03-27 21:43:30 -07:00
Ilya Kreymer
41d51a6427 ensure 'cdx_' modifier is working 2014-03-27 14:46:59 -07:00
Ilya Kreymer
093d8310e5 config: move config files to ./configs/
PYWB_CONFIG_FILE setting overrides passed in config
2014-03-27 14:31:27 -07:00
Ilya Kreymer
b5e70f5dc6 timeutils: add sec_to_timestamp() func 2014-03-27 14:24:49 -07:00
Ilya Kreymer
da0623fbbb lxml: ensure lxml support is optional: if not available,
use_lxml_parser() will return false and doctests/pytest collection
won't test the lxml parser
2014-03-26 14:05:02 -07:00
Ilya Kreymer
4e53c2e9d8 remote cdx refactoring: refactor remote cdx source and server to support
fuzzy matching
test local cdx server, remote cdx source, local and remote filtering
with self-contained unit tests
map remote cdx httperrors to pywb exceptions
2014-03-26 11:33:46 -07:00
Ilya Kreymer
5847087aae add fakeredis mock, test for RedisCDXSource 2014-03-25 11:02:32 -07:00
Ilya Kreymer
87df7c22f1 standardize test scripts to test_*.py instead of *_test.py 2014-03-25 11:01:51 -07:00
Ilya Kreymer
596f67437b update README with changes for memento, lxml and badges for develop 2014-03-24 15:01:33 -07:00
Ilya Kreymer
c6c9fe680a memento: add original link to timemap #10 2014-03-24 14:57:41 -07:00
Ilya Kreymer
2a605652c6 add memento timemap support (for archival mode only)
add timemap Link headers to timegate and memento responses
timemap accessible via /timemap/*/ path
2014-03-24 14:00:06 -07:00
Ilya Kreymer
9654c22bed rewrite: add doctype rewriting, more tests on various markup edge cases 2014-03-23 23:46:49 -07:00
Ilya Kreymer
742df6238e fix typo in renaming file 2014-03-23 13:12:06 -07:00
Ilya Kreymer
bcaacaf642 rename handlers
pep8 cleanup for all packages
remove obsolte statictextview
2014-03-23 12:59:21 -07:00
Ilya Kreymer
ac0bf5a415 refactor: IndexReader -> QueryHandler, move query output support
to QueryHandler. allow for multiple query views in QueryHandler
2014-03-23 12:44:28 -07:00
Ilya Kreymer
79da12348f limit stream by warc/arc record length instead of
http content length.
track length of StatusAndHeaders also.
add tests to verify content length correct for identity
arc and arcgz replays as well
2014-03-22 11:30:51 -07:00
Ilya Kreymer
53590537e0 Merge develop and lxml 2014-03-18 17:14:27 -07:00
Ilya Kreymer
a6b4ae4c47 chardet optimization: using chardet feed() approach to avoid passing in entire buffer 2014-03-17 20:53:42 -07:00
Ilya Kreymer
d1ad9b5e69 refactor: cleanup HTMLRewrtier/LXMLHTMLRewriter close path,
single close in base class delegeating to _internal_close()
Also, HTMLRewriter auto-terminates <script> and <style> tags
for consistency with lxml
2014-03-17 20:50:35 -07:00
Ilya Kreymer
10c84d8354 embed rewriting: add 'em_' flag for all regex-based rewrites
(js, css, xml) to be able to distinguish between embeds and non-embeds
more conclusively
wbrequest: add is_embed(), is_identity() properties
update tests
don't insert html banner if detected as an embed
2014-03-17 19:36:25 -07:00
Ilya Kreymer
52d99aef57 misc fixes: RemoteCDXServer throws NotFoundException on 404
fix typo in handlers
make WBHandler overridable in pywb_init
make perms_policy optional in IndexReader
2014-03-17 17:35:10 -07:00
Ilya Kreymer
2e7b17ed56 cleanup: move lxml tests to seperate test dir, seperate html, lxml html and regex
tests into seperate files
fix lxml toggle in rewriterrules
2014-03-17 15:30:45 -07:00
Ilya Kreymer
f35e82a4d5 ensure final output from close() is encoded!
add config option to 'use_lxml_parser' if available, if not,
will default to regular parser
testing on travis with lxml (not adding to dep yet)
2014-03-17 13:19:51 -07:00
Ilya Kreymer
1404177c6f fixes for unicode (doctests)
remove explicit </html> since lxml does not parse past the </html>
tag and adds one anyway (not ideal but only workaround for html after closing tag)
2014-03-17 11:55:45 -07:00
Ilya Kreymer
23d60b0bb8 more work on lxml parser.. always write
start/end tags..
rewriterules: experiment defaulting to lxml if possible!
2014-03-17 09:48:31 -07:00
Ilya Kreymer
bd10c6c2d2 first pass -- lxml parser! 2014-03-16 23:12:04 -07:00