dd, bs= and why you should use conv=fsync
This story starts with me having to simulate a faulty disk device for testing. The Linux Kernel Device mapper is a good solution for this, so i created a faulty device with a simple file backend:
These commands setup a new device on /dev/mapper/baddisk with 1GB of size. Starting from sector 6050, there are 155 faulty sectors, where any write and read operation should cause I/O errors.
I went on and used dd to write to the device, as the first faulty block should start around 3MB, i used the following command:
To my surprise, the command succeeded. I assumed some error in my setup and after re-creating the device mapper target, i tried again. This time with the following command:
Nice, the device behaves as expected! While taking notes in another terminal and switching back and forth workspaces, i issued the following command again:
What? It succeeded writing 4.1MB of data to a faulty segment of the disk which should clearly fail! This was strange, but still, after many attempts with this command writing to the complete device until it got end of space, no I/O errors were reported by dd.
Looking at the dmesg output, the kernel correctly reported errors with the underlying device:
And running badblocks on the device also correctly reported them.
Why does dd not report this error?
The difference between the commands is the used block size, so i assumed some caching beeing the cause for this situation, or maybe dd opening the file with different flags like O_DIRECT or O_SYNC if smaller block sizes are used?
I straced the dd command and the openat/write and close functions behaved exacly the same, this time i used a 5MB block size for simpler debugging:
The strace output shows that the succeeding command opens the file without any notable difference to the command writing with 512 bytes block size. The write and close functions return with no error whatsoever. dd simply does not notice the data loss while writing to the storage!
Making dd use the O_DIRECT flag during file open, or the O_SYNC option catches the error:
What is the reason for this? I assume dd, with its standard block size of 512 bytes does not the hit the linux kernels buffered I/O. And with bigger block sizes, the I/O becomes buffered, async, and as it stands, dd as user space application does not validate the write operation (using fsync) to notice errors during buffered I/O operations by default.
This leads us to my next finding: “the linux fsync() gate” that dates back to 2018, starting with the following question on stackoverflow:
And resulting LWN articles:
which provide great insight into the linux kernels error handling and how these errors are upstreamed to user space applications while writing data to faulty devices.
Long story short: If one uses dd with a bigger block size (>= 4096), be sure to use either the oflag=direct or conv=fsync option to have proper error reporting while writing data to a device. I would prefer conv=fsync, dd will then fsync() the file handle once and report the error, without having the performance impact which oflag=direct has.