Split and Reassemble Files

Posted by admin on June 3, 2007 under Tech Tips | Be the First to Comment

If you ever need to work with a large file and wish you could split it into smaller pieces, you’ll be pleased to know that it’s extremely easy to do in Linux. You can use the “split” utility that comes standard with most *nix variations. Lets take a look at a couple easy examples.

To create a test file to work with, the following will create one that’s exactly 100 megabytes. Note, I am using ‘dd’ with /dev/urandom to demonstrate that the results of the split and reassembly are completely accurate. This will be accomplished via md5 hash comparisons at the end of this process.

$ dd if=/dev/urandom of=testfile bs=1k count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 23.2982 seconds, 4.5 MB/s

$ ls -lh testfile
-rw-r--r-- 1 gmendoza gmendoza 100M 2007-06-03 22:45 testfile

To split the file into five 20MB files, use the split command as shown below. Note, I am producing five files with a new naming convention of “splitfiles”.

$ split -b 20971520 -d testfile splitfiles

Verify by listing all files that begin with “splitfiles”. Below, you see the new files with the appropriate sequence numbers as a result of the split command.

$ ls -l splitfiles*
-rw-r--r-- 1 gmendoza gmendoza 20971520 2007-06-03 22:47 splitfiles00
-rw-r--r-- 1 gmendoza gmendoza 20971520 2007-06-03 22:47 splitfiles01
-rw-r--r-- 1 gmendoza gmendoza 20971520 2007-06-03 22:47 splitfiles02
-rw-r--r-- 1 gmendoza gmendoza 20971520 2007-06-03 22:47 splitfiles03
-rw-r--r-- 1 gmendoza gmendoza 20971520 2007-06-03 22:47 splitfiles04

To reassemble the smaller files back to their original state, concatenate them together using a simple redirect.

$ cat splitfile* > newtestfile

… and list again to show your handy work…

$ ls -lh newtestfile
-rw-r--r-- 1 gmendoza gmendoza 100M 2007-06-03 22:52 newtestfile

As proof that both the original and newly reassembled files are exactly the same, check the results of a cryptographic md5 hash:

$ md5sum testfile newtestfile
54a07d5011ca893eddfab29960a7f232 testfile
54a07d5011ca893eddfab29960a7f232 newtestfile

Cool stuff.

Add A Comment