1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00

Commit Graph

  • 05eba0194a add CHANGES.rst changelist Ilya Kreymer 2014-04-02 20:19:17 -07:00
  • bfa3f64121 create INSTALL.rst Ilya Kreymer 2014-04-02 19:23:56 -07:00
  • 399642d719 add missing cdxserver test file Ilya Kreymer 2014-04-02 18:34:05 -07:00
  • 8b37fef8e0 tests: add explicit cdxserver config testing with different config variations Ilya Kreymer 2014-04-02 15:01:40 -07:00
  • 91184426b7 test coverage pass: refactor and cleanup to improve coverage for corner cases Ilya Kreymer 2014-04-02 13:16:54 -07:00
  • 8d3d326c9e tests: add pathresolver tests for RedisResolver and PathIndexResolver Ilya Kreymer 2014-04-02 11:41:20 -07:00
  • 90f4833df3 add cli interface for archiveindexer expose as 'cdx-indexer' add tests for cli interface additional tests for statusheaders Ilya Kreymer 2014-04-02 10:36:55 -07:00
  • 732df1a172 add cmdline interface with argparse to archiveindexer Ilya Kreymer 2014-04-02 00:18:57 -07:00
  • 28d65ce717 archiveindexer major refactoring using zlib only supports warc.gz, arc.gz, warc, arc and optional sorting outputs cdx 11 but possible to extend to other formats (additional edge case testing needed) DecompressingBufferedReader refactoring to support multi-member gzip Unit tests for indexer, addtional unit tests for bufferedreaders and loaders, and recordloaders Ilya Kreymer 2014-03-30 23:47:33 -07:00
  • 26bb695292 archiveindex: use list instead of ordereddict for cdx, will add customizations later Ilya Kreymer 2014-03-29 17:37:23 -07:00
  • cedc58a405 add archiveindexer! Ilya Kreymer 2014-03-29 16:10:16 -07:00
  • 7760b9b5a2 warc: seperate parse_record_loader() to enable direct parsing of a file-like stream detect and ignore warcinfo and arc header Ilya Kreymer 2014-03-29 15:58:03 -07:00
  • 99eadb3d4f update package paths Ilya Kreymer 2014-03-28 11:57:13 -07:00
  • 9700004dc8 move configs to pywb package as package data Ilya Kreymer 2014-03-28 11:53:59 -07:00
  • 49d2d5b035 customizations: support custom cdx api suffix, custom cdx server class Ilya Kreymer 2014-03-28 10:58:14 -07:00
  • e2f7777c7d jinja2: add decorator for adding custom filters Ilya Kreymer 2014-03-28 10:57:55 -07:00
  • 83e07442f0 add configs to datadirs Ilya Kreymer 2014-03-28 10:54:37 -07:00
  • 2c74ea9f23 fuzzy match: make filter string optionally overridable setup.py: unset PYWB_CONFIG_ENV Ilya Kreymer 2014-03-27 21:43:30 -07:00
  • 41d51a6427 ensure 'cdx_' modifier is working Ilya Kreymer 2014-03-27 14:46:59 -07:00
  • 093d8310e5 config: move config files to ./configs/ PYWB_CONFIG_FILE setting overrides passed in config Ilya Kreymer 2014-03-27 14:31:27 -07:00
  • b5e70f5dc6 timeutils: add sec_to_timestamp() func Ilya Kreymer 2014-03-27 14:24:49 -07:00
  • da0623fbbb lxml: ensure lxml support is optional: if not available, use_lxml_parser() will return false and doctests/pytest collection won't test the lxml parser Ilya Kreymer 2014-03-26 14:05:02 -07:00
  • 4e53c2e9d8 remote cdx refactoring: refactor remote cdx source and server to support fuzzy matching test local cdx server, remote cdx source, local and remote filtering with self-contained unit tests map remote cdx httperrors to pywb exceptions Ilya Kreymer 2014-03-26 11:33:46 -07:00
  • 5847087aae add fakeredis mock, test for RedisCDXSource Ilya Kreymer 2014-03-25 11:02:32 -07:00
  • 87df7c22f1 standardize test scripts to test_*.py instead of *_test.py Ilya Kreymer 2014-03-25 11:01:51 -07:00
  • 596f67437b update README with changes for memento, lxml and badges for develop Ilya Kreymer 2014-03-24 15:01:33 -07:00
  • c6c9fe680a memento: add original link to timemap #10 Ilya Kreymer 2014-03-24 14:57:41 -07:00
  • 2a605652c6 add memento timemap support (for archival mode only) add timemap Link headers to timegate and memento responses timemap accessible via /timemap/*/ path Ilya Kreymer 2014-03-24 14:00:06 -07:00
  • 9654c22bed rewrite: add doctype rewriting, more tests on various markup edge cases Ilya Kreymer 2014-03-23 23:46:49 -07:00
  • 742df6238e fix typo in renaming file Ilya Kreymer 2014-03-23 13:12:06 -07:00
  • bcaacaf642 rename handlers pep8 cleanup for all packages remove obsolte statictextview Ilya Kreymer 2014-03-23 12:59:21 -07:00
  • ac0bf5a415 refactor: IndexReader -> QueryHandler, move query output support to QueryHandler. allow for multiple query views in QueryHandler Ilya Kreymer 2014-03-23 12:44:28 -07:00
  • 79da12348f limit stream by warc/arc record length instead of http content length. track length of StatusAndHeaders also. add tests to verify content length correct for identity arc and arcgz replays as well Ilya Kreymer 2014-03-22 11:30:51 -07:00
  • 53590537e0 Merge develop and lxml Ilya Kreymer 2014-03-18 17:14:27 -07:00
  • a6b4ae4c47 chardet optimization: using chardet feed() approach to avoid passing in entire buffer Ilya Kreymer 2014-03-17 20:53:42 -07:00
  • d1ad9b5e69 refactor: cleanup HTMLRewrtier/LXMLHTMLRewriter close path, single close in base class delegeating to _internal_close() Also, HTMLRewriter auto-terminates <script> and <style> tags for consistency with lxml Ilya Kreymer 2014-03-17 20:50:35 -07:00
  • 10c84d8354 embed rewriting: add 'em_' flag for all regex-based rewrites (js, css, xml) to be able to distinguish between embeds and non-embeds more conclusively wbrequest: add is_embed(), is_identity() properties update tests don't insert html banner if detected as an embed Ilya Kreymer 2014-03-17 19:36:25 -07:00
  • 52d99aef57 misc fixes: RemoteCDXServer throws NotFoundException on 404 fix typo in handlers make WBHandler overridable in pywb_init make perms_policy optional in IndexReader Ilya Kreymer 2014-03-17 17:35:10 -07:00
  • 2e7b17ed56 cleanup: move lxml tests to seperate test dir, seperate html, lxml html and regex tests into seperate files fix lxml toggle in rewriterrules Ilya Kreymer 2014-03-17 15:30:45 -07:00
  • f35e82a4d5 ensure final output from close() is encoded! add config option to 'use_lxml_parser' if available, if not, will default to regular parser testing on travis with lxml (not adding to dep yet) Ilya Kreymer 2014-03-17 13:17:02 -07:00
  • 1404177c6f fixes for unicode (doctests) remove explicit </html> since lxml does not parse past the </html> tag and adds one anyway (not ideal but only workaround for html after closing tag) Ilya Kreymer 2014-03-17 11:55:45 -07:00
  • 23d60b0bb8 more work on lxml parser.. always write start/end tags.. rewriterules: experiment defaulting to lxml if possible! Ilya Kreymer 2014-03-17 09:48:31 -07:00
  • bd10c6c2d2 first pass -- lxml parser! Ilya Kreymer 2014-03-16 23:12:04 -07:00
  • b0a7cafe6d update default static path to: pywb/static/ Ilya Kreymer 2014-03-14 18:37:03 -07:00
  • 6461af030b refactoring: clean up handlers and replay_views for pep8 use BlockLoader().load for StaticHandler static file resolving update static paths to point to pywb/static instead of static Ilya Kreymer 2014-03-14 18:17:22 -07:00
  • a69d565af5 make pywb.rewrite package pep8-compatible move doctests to test subdir Ilya Kreymer 2014-03-14 16:34:51 -07:00
  • bfffac45b0 remove reference to deleted file wbexceptions.py Ilya Kreymer 2014-03-14 11:22:50 -07:00
  • cb244a8c25 more readme tweaks Ilya Kreymer 2014-03-14 11:16:36 -07:00
  • 535cbc6dde update README Ilya Kreymer 2014-03-14 11:05:05 -07:00
  • 14a12f95b2 pep8 fixes, improve docs for proxy move CaptureException into replay_views Ilya Kreymer 2014-03-14 11:02:03 -07:00
  • bdcda1df6f add test config for memento #10 Ilya Kreymer 2014-03-14 11:01:47 -07:00
  • a1ab54c340 first pass at memento support #10! memento support enabled by default, togglable via 'enable_memento' config property supporting timegate and memento apis, no timemap yet supporting pattern 2.3 for archival and pattern 1.3 for proxy modes also: simplify exception hierarchy a bit more, move down to utils make WbRequest and WbResponse extensible with mixins (eg for memento) Ilya Kreymer 2014-03-14 10:46:20 -07:00
  • dd9a2c635f disable pypy travis builds for now 0.2.0 Ilya Kreymer 2014-03-13 22:38:56 -07:00
  • 29ecadee54 update README, fix setup.py typo Ilya Kreymer 2014-03-12 18:35:21 -07:00
  • 3222f3ee08 update setup, remove markdown readme Ilya Kreymer 2014-03-12 17:57:54 -07:00
  • 78af82d6b1 Fixes for README.rst ikreymer 2014-03-12 17:50:47 -07:00
  • 681fd79974 Update README.rst ikreymer 2014-03-10 19:25:14 -07:00
  • 1c85aebbf0 fix setup.py Ilya Kreymer 2014-03-10 19:19:41 -07:00
  • fe9eaea006 update setup.py classifiers Ilya Kreymer 2014-03-10 19:11:19 -07:00
  • f578bdf5de add README.rst Ilya Kreymer 2014-03-10 19:01:20 -07:00
  • 45972df6c4 minor fixes, copyright update Ilya Kreymer 2014-03-10 18:45:45 -07:00
  • 3322fb233f fixup wb and wombat.js: fix formatting to 4-tab snake_case, remove obsolete code Ilya Kreymer 2014-03-10 00:55:41 -07:00
  • e3d700a50f wombat improvements: override history, ajax and use seeded random number gen (with seed from capture timestamp) Ilya Kreymer 2014-03-10 00:10:20 -07:00
  • e346dfb024 remove accidental logging Ilya Kreymer 2014-03-09 23:03:55 -07:00
  • e384425d48 proxy cleanup: move HttpsUrlRewriter to url_rewriter module, move strip_scheme to replay_views where it is used regex rewriters: use url rewriter for rewriting http:// in JS, instead of just prefix, to support custom rewriters (such as https->http rewriter in proxy mode) Ilya Kreymer 2014-03-09 14:21:32 -07:00
  • 68878fa72a update domain-specific rules to make flickr replay work better! Ilya Kreymer 2014-03-08 15:53:52 -08:00
  • 4fdcdc98ae replay: ignore 304 captures archiveit 2014-03-08 23:46:59 +00:00
  • 584d826f05 rewrite: fix html rewriting, if forcing end </script>, </style>, don't actually output to preserve original wombat: copy over all Location settings wburl: convert :/ -> :// if 2nd slash missing, only check for <scheme>:/ and ignore subsequent slashes Ilya Kreymer 2014-03-08 15:10:35 -08:00
  • 541c076b77 setup: add cli scripts for wayback, cdx-server fix logging of app name, make most logging debug Ilya Kreymer 2014-03-08 15:09:53 -08:00
  • 40b7a8e921 move pytest args to pytest.ini Ilya Kreymer 2014-03-08 09:30:56 -08:00
  • 3b1afc3e3d replace StringIO with BytesIO Ilya Kreymer 2014-03-08 09:30:19 -08:00
  • 1a6f2e2fe1 Merge pull request #32 from kngenie/add-api-docs ikreymer 2014-03-07 16:02:36 -08:00
  • 4e7ec4dede Merge remote-tracking branch 'origin/master' into add-api-docs Kenji Nagahashi 2014-03-07 20:14:40 +00:00
  • 1829de2123 add API doc for all packages with sphinx-apidoc Kenji Nagahashi 2014-03-07 20:09:33 +00:00
  • e3618871c8 proxy: support setting hostname via env variable Ilya Kreymer 2014-03-07 11:42:09 -08:00
  • a60ab1f118 routing/proxy: pass in hostpaths to proxy routing add PYWB_HOST_NAME env var to allow overriding default hostname add request_hostname jinja filter Ilya Kreymer 2014-03-07 10:29:11 -08:00
  • 702e5e0143 perms test: moved test perms policy to perms/test/test_perms_policy.py all perms related configs exist within perms package Ilya Kreymer 2014-03-06 18:24:53 -08:00
  • 681c2fd8d5 perms: refactor perms config to make interface much clearer 'perms_policy' is a callback which returns a Perms object, which may filter cdx lines from the response Ilya Kreymer 2014-03-06 18:06:05 -08:00
  • 7b5cbaa878 cdx: clean up closest, reverse ops closest takes precedence over reverse 'reverse closest' not supported, add test to reflect that Ilya Kreymer 2014-03-06 16:11:46 -08:00
  • c42a96386f cdx: fix the 'yield nothing' case when limit==1 add additional test case for limit==1 and reverse=True, as limit is optimized out Ilya Kreymer 2014-03-06 16:01:49 -08:00
  • 4e71a0b772 better rules.yaml fix Ilya Kreymer 2014-03-06 02:51:54 -08:00
  • 3718e1d21b rewrite fixes: html_rewriter do not unescape attrs! rules: don't rewrite past end of block or line Ilya Kreymer 2014-03-06 02:29:52 -08:00
  • 673ff35d15 minor fixes: wombat add document.WB_wombat_location loaders: file 'urls' starting with . and / are always file paths pep8 fixes for cdx, utils packages Ilya Kreymer 2014-03-05 17:13:14 -08:00
  • de18e44231 Merge 28b49f9aeb59d12a6f39d57a7bf7dead2138e1e7 into 03ebca47c063b05ad386095c0931261a6cd06906 ikreymer 2014-03-06 00:12:28 +00:00
  • 28b49f9aeb add doc directory for Sphinx documentation Kenji Nagahashi 2014-03-05 23:03:04 +00:00
  • 03ebca47c0 Merge pull request #29 from kngenie/just-a-cleanup ikreymer 2014-03-05 14:36:07 -08:00
  • 64f4699203 clean up docstrings: fix reST formatting issues. cherry-picked f03e0a7092 + some more. Kenji Nagahashi 2014-03-04 19:08:23 +00:00
  • daf868fd61 README tweaks update setup.py to support setup.py test! .travis.yml uses python setup.py test Ilya Kreymer 2014-03-05 11:19:26 -08:00
  • 25a8514352 Update README (move pywb configuration section to wiki), recommend running pywb.apps.wayback make uWSGI optional (but included in Vagrant) rename run.sh -> run-uwsgi.sh Ilya Kreymer 2014-03-05 10:42:08 -08:00
  • fe1fa43fef zipnum: remove time-based reloading for now, just look at mtime and reload if changed Ilya Kreymer 2014-03-04 21:29:05 -08:00
  • df2f7ba496 warc: add digest filter only if digest is present for url-agnostic load ensure cdxobject format set on cdx load callback limit reader: add length wrappign utility func to limitreader Ilya Kreymer 2014-03-05 05:12:25 +00:00
  • 9690d84798 travis-ci: attempt to fix 2.6 build Ilya Kreymer 2014-03-04 19:36:29 -08:00
  • f25de8af2a tweak travis pip install config Ilya Kreymer 2014-03-04 19:17:33 -08:00
  • cc22448cc5 fixes for 2.6 and pypy Ilya Kreymer 2014-03-04 18:49:36 -08:00
  • 2d48f2d733 add testing of 2.6 and pypy (attempt) Ilya Kreymer 2014-03-04 18:12:36 -08:00
  • 202f6101e0 coverage work! add additional test for wsgi_wrappers additional test for zipnum bad location for now, not testing cli interfaces which depend on opt params Ilya Kreymer 2014-03-04 16:13:49 -08:00
  • d702a98bbc url-agnostic revisit testing! add sample warc and cdx for url-agnostic revisits add unit test and integration test resolvingloader: pass callback instead of full cdx server for use for loading cdx in case of url-agnostic revisit Ilya Kreymer 2014-03-04 20:12:09 +00:00
  • cf5aaf5de4 add new perms_handler for supporting direct permissions api currently just returning ["allow"] or ["block"] for a single url Ilya Kreymer 2014-03-03 19:37:37 -08:00
  • 577c74be49 cdx: move perms related handling to pywb.perms package, support custom processing ops, of which perms is a specific type add lazy_ops test to ensure all cdx processing ops are lazy Ilya Kreymer 2014-03-03 18:27:04 -08:00
  • e0d5846484 seperate 'perms_checker' config loading as a seperate param simplify IndexReader wrapper init, just init with a cdx server Ilya Kreymer 2014-03-03 13:40:48 -08:00