Media File Integrity for NAS

Question

Media File Integrity for NAS

Parker Russell

I'm looking for a tool to help me preserve the integrity of my media collection. I have a backup scheme, but this weekend I discovered that a handful of my files differed in checksum from the backup. It was a huge PITA working out which was the "right" file in each case, and I'd prefer to not have to go through the hassle again. (The corruption turned out to be due to an ungraceful shutdown, by the way.)

The features I'm after are (assuming it works on the principle of checksums):
* works recursively
* checksum data doesn't have to go into a file in every subfolder, but can instead just be one file at the root (with relative paths of course)
* checksum data can be UPDATED in-place (i.e. one command causes any deleted files to have their checksum entries removed, and for new files' checksums to be generated and added to that file) instead of having to be generated from scratch
* checksum data (along with this tool) can be used on different systems with (almost) zero extra configuration (i.e. no tools intended for "full system" integrity)

I've tried out 'cfv' and 'cksfv'. 'cfv' works recursively and lets you generate one single file, but has no functionality (that I can see) to "update" an existing .sfv. 'cksfv' doesn't let you put it all on one single file, requiring it to be spread across all subfolders, and also doesn't have the ability to "update".

I looked into "aide" but this is obviously meant for "full system" use, and seems to need non-trivial configuration to even get working, but its database approach seems promising.

I'm at a loss. I struggle to believe that I'm the first person with this issue, so there must be something others use daily with success. Any ideas?

P.S. I'm on GNU/Linux, of course.

December 11, 2017 - 06:35

Other urls found in this thread:

linuxatemyram.com/
twitter.com/SFWRedditGifs

Jayden Cruz

>there must be something others use daily with success
Good hard drives and proper shutdowns

December 11, 2017 - 06:49

Jose Bennett

Thanks for your very constructive comment.

December 11, 2017 - 06:50

Brayden Gutierrez

Similar would be something like SnapRAID, but without the need for a parity disk, just checksums.

December 11, 2017 - 07:08

Josiah Diaz

Zfs. Just use it.

December 11, 2017 - 07:23

Christopher Watson

Fuck read this:

December 11, 2017 - 07:24

David Garcia

Meme.

December 11, 2017 - 08:07

Thomas Foster

Literally use fucking ZFS, you idiot, it does exactly what you want.

December 11, 2017 - 10:21

Brandon Moore

btrfs

December 11, 2017 - 11:24

Matthew Stewart

>>P.S. I'm on GNU/Linux, of course.
Then it's easy. Just do what others have said ITT and use a good filesystem with checksum like ZFS or Btrfs.
Data integrity is the sort of thing that should be handled by the filesystem.

December 11, 2017 - 11:39

Jordan Morgan

ZFS is all that in a nutshell.
And is available on Linux with only minor issues.

December 11, 2017 - 12:04

Jack Morris

ZFS has too high RAM requirements to be useful, and it has "lock in" properties which make me not want to use it.

December 11, 2017 - 15:08

Luke Gonzalez

ZFS has no minimal RAM requirements you only need large amounts of ram if you're going to be in constant usage of the entire array.

December 11, 2017 - 15:37

Colton Allen

btrfs
But really ZFS

December 11, 2017 - 15:41

Ryan Diaz

Well, I said ZFS or Btrfs, but I really meant Btrfs.
ZFS is not even really open source.

December 11, 2017 - 16:06

Easton Lewis

ZFS only has high ram requirements if you use block-based deduplication or a very large L2ARC device.

Otherwise, it uses free memory for caching like every other filesystem.

linuxatemyram.com/

December 11, 2017 - 16:08

Nicholas Garcia

Hmm. I've been using XFS for a while now, for attested performance benefits for large files (indeed almost all of my files are 'large'). I use btrfs for my / but it never occurred to me to use it for media - I've always been concerned by what people say about its reliability, though admittedly never had chance to pursue those claims very far.

December 11, 2017 - 16:17

Justin Cooper

If you don't want to change filesystems, look at git-annex. It keeps file metadata in a git repo and can track multiple copies (including backups).

If it detects file corruption, it will recover the file from a backup, or another replica.

December 11, 2017 - 16:19

Nolan Parker

>git-annex uses git to index files but does not store them in the git history. Instead a Symbolic link representing and linking to the probably large file is committed. git-annex manages a content-addressable storage for the files under its control. A separate git branch logs the location of every file. Thus users can clone a git-annex repository and then decide for every file whether to make it locally available.
Sounds interesting, but still really over-engineered.

I might just end up using cfv and accept making .sfv files for each subfolder. At least there's an option to check which files aren't described by the current checksum set.

December 11, 2017 - 16:28

David Foster

Part of the problem is my server only has 2GB of RAM. Well, "server", I'm using an ARM board for convenience and power efficiency.

December 11, 2017 - 16:35

Brody Gutierrez

I think it's clear that I just want a checksumming database tool, without all these extra bells and whistles and complexity for vastly different use-cases.

December 11, 2017 - 16:58

1 2 3 Next

Media File Integrity for NAS

Last threads