Vangelis Banos
|
975f2479a8
|
Acquire and exclusive file lock when not using .open WARC suffix
|
2017-10-26 21:58:31 +00:00 |
|
Vangelis Banos
|
c9f1feb3db
|
Add hidden --no-warc-open-suffix CLI option
By default warcprox adds `.open` suffix in open WARC files. Using this
option we disable that. The option does not appear on the program help.
|
2017-10-26 19:44:22 +00:00 |
|
Vangelis Banos
|
66b4c35322
|
Remove unused imports
|
2017-09-24 11:15:30 +00:00 |
|
Noah Levitt
|
24082c2e8c
|
don't wait for queue to be empty to do idle rollovers, because sometimes warcprox can stay busy for a long, long time
|
2017-06-22 15:04:01 -07:00 |
|
Noah Levitt
|
1500341875
|
use %r instead of calling repr()
|
2017-06-07 16:05:47 -07:00 |
|
Noah Levitt
|
ef5dd2e4ae
|
multiple warc writer threads (hacked in with little thought to code organization)
|
2017-05-19 16:10:44 -07:00 |
|
Noah Levitt
|
fd770b71bc
|
revert stuff accidentally committed as part of eea582c6db9ed6d :(
|
2017-05-11 11:56:01 -07:00 |
|
Noah Levitt
|
eea582c6db
|
rewrite run-benchmarks.py for aiohttp2
|
2017-05-08 20:56:32 -07:00 |
|
Noah Levitt
|
2c65ff89fa
|
add license headers
|
2016-04-06 19:37:55 -07:00 |
|
Noah Levitt
|
decb985250
|
add length field to each record in big captures table (size in bytes of compressed warc record) because pywayback needs it
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
44a62111fb
|
support for deduplication buckets specified in warcprox-meta header {"captures-bucket":...,...}
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
c430f81883
|
some refactoring to prep for big rethinkdb capture table
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
d3df48b97e
|
shorten warc filename template
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
274a2f6b1d
|
refactor warc writing, deduplication for somewhat cleaner separation of concerns
|
2016-01-26 18:45:36 -08:00 |
|