1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-16 00:24:48 +01:00

1019 Commits

Author SHA1 Message Date
Ilya Kreymer
c378cb5188 rewrite: check for closed before any use of readline() (2.6 may throw if closed),
only use readline() if line alignment needed (non-html), related to  work
2015-04-01 07:54:17 -07:00
Ilya Kreymer
e806a33289 add unclosed script sample 2015-04-01 07:13:51 -07:00
Ilya Kreymer
8e60a6464c chunkeddatareader: read(): catch ValueError when attempting to read again in case stream is already closed 2015-03-31 23:31:49 -07:00
Ilya Kreymer
990af5ee79 rewrite: add extra test for rewriting html with <script> tag that's never closed 2015-03-31 23:30:56 -07:00
Ilya Kreymer
c137dd30b8 misc fixes: remove extra debug logging
add --framed option to 'live-rewrite-server' cli app
2015-03-31 23:08:56 -07:00
Ilya Kreymer
199f552f73 rewrite: if no charset specified, attempt to read first 1024 bytes and set charset in header,
to avoid charset warning if head insert exceeds 1024 bytes ()
also encode head insert with detected charset, if possible
chunkeddatareader: add read() function to ensure read will read upto specified
length across chunks
2015-03-31 22:38:20 -07:00
Ilya Kreymer
30ab27bb1c indexing: support indexing (and even replay of) records where target-uri is a 'urn:' identifier ()
for canonicalzation, treat urns as is, already canonical
for wburl, don't add http:// prefix if urn: prefix is present
add example-wpull warc for testing
2015-03-30 17:23:50 -07:00
Ilya Kreymer
002fe6a338 certauth: change 'get_cert_for_host' -> 'cert_for_host' 2015-03-30 15:47:53 -07:00
Ilya Kreymer
dd30e3f2a7 refactor: fixes for compat with latest certauth>=1.1.0 2015-03-30 09:38:42 -07:00
Ilya Kreymer
cda7705075 split and refactor: remove certauth.py / test_certauth.py and instead use this functionality from 'certauth' package. Also remove proxy-cert-auth cli as
the 'certauth' tool superceeds this functionality. ().
To use https proxy mode, 'pip install certauth' is required. (update travis config)
2015-03-29 17:38:57 -07:00
Ilya Kreymer
273176bce5 cdx: when reading cdxj, and run into non-ascii chars in url, utf-8 encode and %-encode 2015-03-29 09:21:50 -07:00
Ilya Kreymer
fc9d659b5d loaders: switch BlockLoader to use requests instead of urliib2 2015-03-28 16:41:52 -07:00
Ilya Kreymer
f3a066f58b cdx-server query & zipnum: fixes for showNumPages query:
- if query contained in <1 secondary index block, must read first line of cdx to determine if any matches
- if no matches, don't throw 404 exception but always return json info with 0 pages
2015-03-28 16:15:24 -07:00
Ilya Kreymer
313a2efeac bump version to 0.9.3-dev 2015-03-28 16:12:28 -07:00
Ilya Kreymer
c3a108b169 minor readme tweaks 2015-03-27 09:31:17 -07:00
Ilya Kreymer
d2be90d4a1 test case tweak 2015-03-27 08:56:43 -07:00
Ilya Kreymer
41487dd9d4 update changelist for 0.9.2
cdx: include match type in cdx query error
2015-03-27 07:58:51 -07:00
Ilya Kreymer
8d686a4a98 README typos fix 2015-03-26 19:56:09 -07:00
Ilya Kreymer
6bbbb51f6e manager: relax template requirements, allow any collection template to also be added to shared dir 2015-03-26 19:40:43 -07:00
Ilya Kreymer
753300d5ed manager: use absolute path when adding warcs, () 2015-03-26 19:18:55 -07:00
Ilya Kreymer
6ce75f80f5 replay: remove restricting to provided http Content-Length (in addition to record content-length) as it may be incorrect for variety of reasons 2015-03-26 17:12:38 -07:00
Ilya Kreymer
0a4e97baa1 revisit resolving: if cdx digest is missing, attempt to resolve revisits based on url + timestamp only, if warc-refers-to-target-uri and warc-refers-to-date are available, even if warc-refers-to-target-uri == target-uri (see for more info) 2015-03-26 14:20:08 -07:00
Ilya Kreymer
85082e46bf cdxj: ensure revisit resolve is skipped if the digest is missing, as may be case in cdxj () 2015-03-26 11:11:10 -07:00
Ilya Kreymer
2dbde35d74 bump to version to 0.9.2 2015-03-26 09:14:27 -07:00
Ilya Kreymer
cf4b5c50dd more README.rst fixes 2015-03-25 22:08:53 -07:00
Ilya Kreymer
e8b6a1af88 README typo fixes 2015-03-25 21:52:38 -07:00
Ilya Kreymer
1cfe73c9db zipnum: fix block count off-by-1 error in showNumPages query 2015-03-25 20:43:59 -07:00
Ilya Kreymer
72ddb54f82 Minor README tweaks 2015-03-25 15:01:12 -07:00
Ilya Kreymer
3efbfaa8c8 pywb_init: simplify DictChain usage, remove unused methods 2015-03-25 13:30:16 -07:00
Ilya Kreymer
f808f34ba7 Update CHANGES for 0.9.1 2015-03-25 12:16:26 -07:00
Ilya Kreymer
0e8b305adc Update README to 0.9.1, add cdx api link, fix typo 2015-03-25 12:06:05 -07:00
Ilya Kreymer
15d1aea5ec Update README, improve existing collection instructions. 2015-03-25 12:02:57 -07:00
Ilya Kreymer
a6c24c2882 autoindex: undo stop/join call for indexing, breaks os x unit test.. (autoindex test may need more improvements on windows) 2015-03-25 11:09:17 -07:00
Ilya Kreymer
90eee03cdb fixes for windows:
indexing: ensure '/' always written to cdx
autoindex: improved test case, ensure threads exit with join
style: fix long lines
2015-03-25 10:56:53 -07:00
Ilya Kreymer
a7307a6d98 pywb_init: auto-collections init: inherit shared archive_paths, if any are set in main config.yaml 2015-03-25 09:36:00 -07:00
Ilya Kreymer
6a3ca566db zipnum: cleanup shared location resolution, in addition .loc file,
support a prefix resolver, where can be a regex replacement on the index path
(default is unchanged index path) ()
2015-03-25 09:07:54 -07:00
Ilya Kreymer
1a8211d752 cdx server: add simplified matchType notation, using host* for prefix and *.host for domain matchType
()
2015-03-24 19:49:54 -07:00
Ilya Kreymer
2af5a25009 zipnum: support for pagination api! and . cdx server now bounded by pageSize (default 10 blocks),
showNumPages=true returns json indicating num pages, page=N can be set to page number 0-numPages - 1
loaders: add read_last_line() to read last line of a seekable file, used to read last line of index file when
at end
tests: additional test for binsearch boundary conditions
zipnum: secondary index output supports json also
2015-03-24 18:56:13 -07:00
Ilya Kreymer
872607c07d README: move new features towards the top 2015-03-24 10:56:56 -07:00
Ilya Kreymer
3dd600c530 wombat: improve document.write override to write each elem at a time for body as well as head, 2015-03-24 10:46:10 -07:00
Ilya Kreymer
e5f321e32f bump version to 0.9.1 for further dev 2015-03-23 20:21:09 -07:00
Ilya Kreymer
57be9ca7bc tweak CHANGES.rst and INSTALL.rst for release 0.9.0 2015-03-23 17:38:22 -07:00
Ilya Kreymer
cda9f435a3 update README for final 0.9.0 release 2015-03-23 17:34:16 -07:00
Ilya Kreymer
c93501e16d more changes.rst updates 2015-03-23 16:29:18 -07:00
Ilya Kreymer
500a441ea9 README tweaks and edits from Dragan (@despens) 2015-03-23 16:16:16 -07:00
Ilya Kreymer
ec7a29a3ba static paths: ensure consistent renaming of static/default -> static/__pywb for bundled static path 2015-03-23 16:15:37 -07:00
Ilya Kreymer
5b4d12eb05 wombat: fix wombat_location.href assign when url is already rewritten, compare against current url not passed in url
fixes 
2015-03-23 16:12:58 -07:00
Ilya Kreymer
5020a09004 more CHANGES.rst updates 2015-03-23 15:43:05 -07:00
Ilya Kreymer
4aa6512b05 rewrite: fix WbUrl parsing for urls that start with a digit, eg. 1234.example.com
split latest replay url from timestamped replay regex
add additional rewrite tests
2015-03-23 15:38:10 -07:00
Ilya Kreymer
6acac67d3c rewrite: fix js rewrite again to ensure '// comments' are not rewritten as scheme-rel urls
add tests
2015-03-23 11:49:24 -07:00