v1.0.0
This commit is contained in:
parent
0dc3390b7f
commit
67e7b8c904
46
README.rst
46
README.rst
@ -22,27 +22,28 @@ will report all errors, e.g. files that changed on the hard drive but
|
|||||||
still have the same modification date.
|
still have the same modification date.
|
||||||
|
|
||||||
All paths stored in ``.bitrot.db`` are relative so it's safe to rescan
|
All paths stored in ``.bitrot.db`` are relative so it's safe to rescan
|
||||||
a folder after moving it to another drive.
|
a folder after moving it to another drive. Just remember to move it in
|
||||||
|
a way that doesn't touch modification dates. Otherwise the checksum
|
||||||
|
database is useless.
|
||||||
|
|
||||||
Performance
|
Performance
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
Obviously depends on how fast the underlying drive is. Since bandwidth
|
Obviously depends on how fast the underlying drive is. Historically
|
||||||
for checksum calculations is greater than your drive's data transfer
|
the script was single-threaded because back in 2013 checksum
|
||||||
rate, even when comparing mobile CPUs vs. SSD drives, the script is
|
calculations on a single core still outran typical drives, including
|
||||||
single-threaded.
|
the mobile SSDs of the day. In 2020 this is no longer the case so the
|
||||||
|
script now uses a process pool to calculate SHA1 hashes and perform
|
||||||
|
`stat()` calls.
|
||||||
|
|
||||||
No rigorous performance tests have been done. Scanning a ~1000 files
|
No rigorous performance tests have been done. Scanning a ~1000 file
|
||||||
totalling ~4 GB takes 20 seconds on a 2015 Macbook Air (SM0256G SSD).
|
directory totalling ~5 GB takes 2.2s on a 2018 MacBook Pro 15" with
|
||||||
This is with cold disk cache.
|
a AP0512M SSD. Back in 2013, that same feat on a 2015 MacBook Air with
|
||||||
|
a SM0256G SSD took over 20 seconds.
|
||||||
|
|
||||||
Some other tests back from 2013: a typical 5400 RPM laptop hard drive
|
On that same 2018 MacBook Pro 15", scanning a 60+ GB music library takes
|
||||||
scanning a 60+ GB music library took around 15 minutes. On an OCZ
|
24 seconds. Back in 2013, with a typical 5400 RPM laptop hard drive
|
||||||
Vertex 3 SSD drive ``bitrot`` was able to scan a 100 GB Aperture library
|
it took around 15 minutes. How times have changed!
|
||||||
in under 10 minutes. Both tests on HFS+.
|
|
||||||
|
|
||||||
If you'd like to contribute some more rigorous benchmarks or any
|
|
||||||
performance improvements, I'm accepting pull requests! :)
|
|
||||||
|
|
||||||
Tests
|
Tests
|
||||||
-----
|
-----
|
||||||
@ -54,17 +55,22 @@ file in the `tests` directory to run it.
|
|||||||
Change Log
|
Change Log
|
||||||
----------
|
----------
|
||||||
|
|
||||||
0.9.3
|
1.0.0
|
||||||
~~~~~
|
~~~~~
|
||||||
|
|
||||||
|
* significantly sped up execution on solid state drives by using
|
||||||
|
a process pool executor to calculate SHA1 hashes and perform `stat()`
|
||||||
|
calls; use `-w1` if your runs on slow magnetic drives were
|
||||||
|
negatively affected by this change
|
||||||
|
|
||||||
|
* sped up execution by pre-loading all SQLite-stored hashes to memory
|
||||||
|
and doing comparisons using Python sets
|
||||||
|
|
||||||
* all UTF-8 filenames are now normalized to NFKD in the database to
|
* all UTF-8 filenames are now normalized to NFKD in the database to
|
||||||
enable cross-operating system checks
|
enable cross-operating system checks
|
||||||
|
|
||||||
* the SQLite database is now vacuumed to minimize its size
|
* the SQLite database is now vacuumed to minimize its size
|
||||||
|
|
||||||
* sped up execution by pre-loading all SQLite-stored hashes to memory
|
|
||||||
and doing comparisons using Python sets
|
|
||||||
|
|
||||||
* bugfix: additional Python 3 fixes when Unicode names were encountered
|
* bugfix: additional Python 3 fixes when Unicode names were encountered
|
||||||
|
|
||||||
0.9.2
|
0.9.2
|
||||||
@ -201,4 +207,4 @@ improvements by
|
|||||||
`Reid Williams <rwilliams@ideo.com>`_,
|
`Reid Williams <rwilliams@ideo.com>`_,
|
||||||
`Stan Senotrusov <senotrusov@gmail.com>`_,
|
`Stan Senotrusov <senotrusov@gmail.com>`_,
|
||||||
`Yang Zhang <mailto:yaaang@gmail.com>`_, and
|
`Yang Zhang <mailto:yaaang@gmail.com>`_, and
|
||||||
`Zhuoyun Wei <wzyboy@wzyboy.org>`_.
|
`Zhuoyun Wei <wzyboy@wzyboy.org>`_.
|
||||||
|
@ -45,7 +45,7 @@ from concurrent.futures import ProcessPoolExecutor, wait, as_completed
|
|||||||
|
|
||||||
DEFAULT_CHUNK_SIZE = 16384 # block size in HFS+; 4X the block size in ext4
|
DEFAULT_CHUNK_SIZE = 16384 # block size in HFS+; 4X the block size in ext4
|
||||||
DOT_THRESHOLD = 200
|
DOT_THRESHOLD = 200
|
||||||
VERSION = (0, 9, 2)
|
VERSION = (1, 0, 0)
|
||||||
IGNORED_FILE_SYSTEM_ERRORS = {errno.ENOENT, errno.EACCES}
|
IGNORED_FILE_SYSTEM_ERRORS = {errno.ENOENT, errno.EACCES}
|
||||||
FSENCODING = sys.getfilesystemencoding()
|
FSENCODING = sys.getfilesystemencoding()
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user