Noah Levitt
|
ca4c62fc6d
|
don't load dedup info for empty payload
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
f806cd3e4a
|
use Rethinker.dbname to avoid conflict with rethinkdb.db
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
69d641cd50
|
avoid attempting to create tables with more shards or replicas than the number of servers
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
3b9345e7d7
|
use nicer rethinkdbstuff.Rethinker api
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
f90c3a6403
|
Rethinker class moved to its own pyrethink project
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
022f6e7215
|
wrap rethinkdb operations and retry if appropriate (as best as we can tell)
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
44a62111fb
|
support for deduplication buckets specified in warcprox-meta header {"captures-bucket":...,...}
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
6d673ee35f
|
tests pass with big rethinkdb captures table
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
c430f81883
|
some refactoring to prep for big rethinkdb capture table
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
e66dc3a9fb
|
rethinkdb dedup
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
a876152026
|
fix exception, make some tweaks
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
274a2f6b1d
|
refactor warc writing, deduplication for somewhat cleaner separation of concerns
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
5f84b061f3
|
make it work with python 2.7 again
|
2015-03-18 16:29:44 -07:00 |
|
Noah Levitt
|
9b8ffbbb51
|
separate WarcWriter and WarcWriterThread
|
2014-11-15 04:47:26 -08:00 |
|
Noah Levitt
|
b34edf8fb1
|
split into multiple files
|
2014-11-15 03:20:05 -08:00 |
|