1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

622 Commits

Author SHA1 Message Date
Ilya Kreymer
cd017669ae bugfix: ChunkedDataReader handles zero-length chunk properly, add test 2014-04-23 10:00:25 -07:00
Ilya Kreymer
48e8e8eb1c allow passing optional kwargs to render search page
add configutable 'default_mod' param
2014-04-22 16:33:47 -07:00
Ilya Kreymer
2ad41e2b94 rewrite: rewrite data-* attributes if they look like links (http, https, //) 2014-04-22 16:32:36 -07:00
Ilya Kreymer
6eef0afb86 add new custom rewriting rule (flickr) 2014-04-20 21:40:27 -07:00
Ilya Kreymer
e1e55ac061 minor tweaks: rewrite 'crossorigin' -> '_crossorigin' param to disable
crossorigin as it may interfere with loading rewritten content, add
tests for html and lxml parsers
add server_cls as optional param to QueryHandler.init_from_config()
for easier customization
views: dont create template if empty template file specified
2014-04-19 12:04:43 -07:00
Ilya Kreymer
23bb5bd175 rewrite: wombat update 2.0! Using Object.defineProperty() to better
override .href and .hash properties when possible.
.href returns original url, but on assignment rewrites before redirecting
.hash proxies to location.hash
Also added:
- window.top -> window.WB_wombat_top
- document.referrer -> document.WB_wombat_referrer
- <source> html tag rewriting
2014-04-18 19:30:48 -07:00
Ilya Kreymer
e011da43f2 live rewrite: use custom REL_REFERER field don't overrie HTTP_REFERER
if REL_REFERER not set, don't send any referrer
2014-04-15 16:44:02 -07:00
Ilya Kreymer
85593696fa remove rfc3987 validation, was rejecting valid urls
add extract_referer_wburl_str() to extract WbUrl str, if any,
from the referrer. Use that for live_rewrite_handler to override
default referrer
2014-04-15 16:38:53 -07:00
Ilya Kreymer
611b9093bd html insert: add include_ts option to optionally not add timestamp 2014-04-13 18:17:31 -07:00
Ilya Kreymer
d8c9a803f6 add support for optional proxies (verify set to false for now) 2014-04-13 17:50:26 -07:00
Ilya Kreymer
7636c9d3f7 fix: when reading response, only readline() if previous read()
was non-empty
2014-04-09 16:44:45 -07:00
Ilya Kreymer
bfc2e63793 live rewriter: integrate handler with rewrite_live.py module,
clean up css, add unit and integration tests
clean up cli server now known as 'live-rewrite-server', which performs live rewrite using
iframe paradigm
2014-04-09 15:49:55 -07:00
Ilya Kreymer
11202c462f support both frames and non-frames mode
add automatic framing when in framed mode
2014-04-09 15:49:55 -07:00
Ilya Kreymer
b4f30a770f ChunkDataReader: if determined to be non-chunked, read full buffer
unchunked
2014-04-09 15:49:55 -07:00
Ilya Kreymer
19f2df4717 refactor:
- move is_identity(), is_embed() to wburl from wbrequest
- add is_mainpage() predicate
- add create_template() to each J2TemplateView to create itself
- add HeadInsertView to create a reusable head insert for
RewriteContent
- add 'mp_' as modifier for frames mode to be used as possible
  modifier with HTMLRewriter
2014-04-09 15:49:55 -07:00
Ilya Kreymer
1fb6f5eff7 add rewriter_handler, frame wrapper support! 2014-04-09 15:49:55 -07:00
Ilya Kreymer
8897a0a7c9 decompressingbufferedreader: default to 'gzip' decompression instead of
none. ChunkedDataReader also automatically attempts decompression, by default
Add tests to verify
2014-04-08 21:49:04 -07:00
Ilya Kreymer
02fe78cb0b update changes, add more tests 2014-04-07 17:41:14 -07:00
Ilya Kreymer
a331061691 minor tweaks: add default static_path for jinja,
remove unused import
2014-04-07 17:19:07 -07:00
Ilya Kreymer
c23dd7bda4 wombat update:
- support scheme-relative (//) urls
- override dom manipulation (appendChild, insertBefore, replaceChild)
- disable Worker() interface for now
2014-04-07 17:17:08 -07:00
Ilya Kreymer
2a318527df lxml: use lxml's parse interface instead of feed interface to allow
xml to handle decoding unicode data, better address #36
2014-04-07 17:13:43 -07:00
Ilya Kreymer
890c323617 update bad.arc with empty record example 2014-04-07 17:12:33 -07:00
Ilya Kreymer
64eef7063d record reading: better handling of empty arc (or warc) records
for indexing, index empty/invalid length as '-' status code
for reading, serve as 204 no content.
ensure that StatusAndHeaders has a valid statusline when serving
if http content-length is valid,, limit stream to that content-length
as well as record content-length (whichever is smaller)
replace content-length when buffering
2014-04-07 17:08:39 -07:00
Ilya Kreymer
d8c20a59cf update to version 0.3.1 2014-04-06 11:46:43 -07:00
Ilya Kreymer
d6006acdc3 rewrite: when using lxml parser, just pass raw stream to lxml
without decoding. lxml parser expects to have raw bytes and will determine
encoding on its own. then serve back as utf-8 if no encoding specified.
should address #36
2014-04-06 09:47:34 -07:00
Ilya Kreymer
3c0ca9d874 update README.rst for master 0.2.2 0.3.0 2014-04-04 13:04:30 -07:00
Ilya Kreymer
e077c23de7 fuzzy match: modify existing params to ensure any custom params are preserved
templates: add ability to set custom global vars, such as 'static_path'
for all templates
2014-04-04 12:20:54 -07:00
Ilya Kreymer
b0b0adb043 refactor: rename pywb.core -> pywb.webapp
move perms/test/test_perms_policy -> tests/perms_fixture
for rules file, use single DEFAULT_RULES_FILE import
2014-04-04 10:09:26 -07:00
Ilya Kreymer
3aa4a4da7a rewrite: ensure lxml parser closes gracefully on no input 2014-04-03 13:00:22 -07:00
Ilya Kreymer
5388a0b03b Merge branch 'develop' of https://github.com/ikreymer/pywb into develop 2014-04-03 12:45:54 -07:00
Ilya Kreymer
5dd586cf07 refactor: simplify rewrite_content and replay_views, remove
redundant code.. everything goes through rewrite_content(),
is sanitized (for transfer encoding) if needed
additional testing for decode_buff
fix failed_files bug in resolvingloader, add tests
2014-04-03 12:44:00 -07:00
Ilya Kreymer
5155a5c842 fix README headings 2014-04-03 09:25:10 -07:00
Ilya Kreymer
bd21fec6d4 update run-uwsgi.sh and add run-gunicorn.sh
update README and INSTALL, fix typo
only list wb handlers on home page by default
pep8 fixes
2014-04-03 08:56:18 -07:00
Ilya Kreymer
1e7ecb901a tweak README, add no cover pragmas to blocking cli apps (for now) 2014-04-02 21:43:09 -07:00
Ilya Kreymer
80f2da9548 refactor: move configs/config.yaml to root again
remove cdx-server specific config, instead make cdx server api-only
path configurable from regular config
2014-04-02 21:26:53 -07:00
Ilya Kreymer
8bdafeb040 Update README.rst
move changes, installation to separate files.. add simplified install guide
2014-04-02 20:29:00 -07:00
Ilya Kreymer
05eba0194a add CHANGES.rst changelist 2014-04-02 20:19:17 -07:00
Ilya Kreymer
bfa3f64121 create INSTALL.rst
advanced install info moved to INSTALL.rst
2014-04-02 19:23:56 -07:00
Ilya Kreymer
399642d719 add missing cdxserver test file 2014-04-02 18:34:05 -07:00
Ilya Kreymer
8b37fef8e0 tests: add explicit cdxserver config testing with different config variations 2014-04-02 15:01:40 -07:00
Ilya Kreymer
91184426b7 test coverage pass:
refactor and cleanup to improve coverage for corner cases
2014-04-02 13:16:54 -07:00
Ilya Kreymer
8d3d326c9e tests: add pathresolver tests for RedisResolver and PathIndexResolver 2014-04-02 11:41:20 -07:00
Ilya Kreymer
90f4833df3 add cli interface for archiveindexer expose as 'cdx-indexer'
add tests for cli interface
additional tests for statusheaders
2014-04-02 10:36:55 -07:00
Ilya Kreymer
732df1a172 add cmdline interface with argparse to archiveindexer 2014-04-02 00:18:57 -07:00
Ilya Kreymer
28d65ce717 archiveindexer major refactoring using zlib only
supports warc.gz, arc.gz, warc, arc and optional sorting
outputs cdx 11 but possible to extend to other formats
(additional edge case testing needed)
DecompressingBufferedReader refactoring to support multi-member gzip
Unit tests for indexer, addtional unit tests for bufferedreaders and loaders,
and recordloaders
2014-03-30 23:47:33 -07:00
Ilya Kreymer
26bb695292 archiveindex: use list instead of ordereddict for cdx,
will add customizations later
2014-03-29 17:37:23 -07:00
Ilya Kreymer
cedc58a405 add archiveindexer! 2014-03-29 16:10:16 -07:00
Ilya Kreymer
7760b9b5a2 warc: seperate parse_record_loader() to enable direct parsing
of a file-like stream
detect and ignore warcinfo and arc header
2014-03-29 15:58:03 -07:00
Ilya Kreymer
99eadb3d4f update package paths 2014-03-28 11:57:13 -07:00
Ilya Kreymer
9700004dc8 move configs to pywb package as package data 2014-03-28 11:53:59 -07:00