Feature #14177
openPATCH: File::Stat#dev on Windows
Description
Two files are identical when pairs of File::Stat#dev and File::Stat#ino
are same on unix. However when a volume (disk partition) is mounted on
a directory they may not identical even when the pairs are same on Windows
because File::Stat#dev is based on drive letters.
I did the following on Windows on vmware.
- attach a new virtual disk to the VM.
- create two NTFS volumes in the disk and mount them to c:\volume1 and c:\volume2 respectively.
- create two files in c:\volume1 and c:\volume2 respectively.
- File.stat(filename).dev returns 2 ('C' - 'A') for both files.
- File.stat(filename).ino returns 281474976710691 for both files.
The inode number of firstly created file in NTFS seems same. - The pairs of #dev and #ino are same even though the files aren't identical.
The attached patch do the followging:
- change _dev_t to 64-bit rb_dev_t in struct stati128.
- use FILE_ID_INFO.VolumeSerialNumber (64-bit)
or BY_HANDLE_FILE_INFORMATION.dwVolumeSerialNumber (32-bit)
as File::Stat#dev. - use path_drive() only when open_special() fails.
- delete code which become unnecessary by above changes.
However, I think, there are pros and cons of the patch.
Pros.
- Two files are identical when pairs of File::Stat#dev and File::Stat#ino
are same as on unix
Cons.
- File::Stat#dev returns too large number (32-bit or 64-bit integer).
- In manual File::Stat#dev returns an integer representing the device.
However this patch makes it return volume serial number, whose
concept is a bit difference from device.
Files
Updated by nobu (Nobuyoshi Nakada) almost 7 years ago
I'm curious why/how do you use rdev/ino on Windows.
They are not guaranteed as stable across processes.
Updated by kubo (Takehiro Kubo) almost 7 years ago
I'm curious why/how do you use rdev/ino on Windows.
I don't have actual plan to use rdev/ino. Just curious about it.
They are not guaranteed as stable across processes.
If I don't misread it, does it means that rdev/ino read in a process
may be different from that in another process?
I hadn't thought about it so I investigated it.
As for #ino, file IDs aren't stable over time in the FAT file system.
The file ID of a file is changed when the file is (1)moved to another
directory, (2)renamed to longer file name or (3)changed the location
on disk by defragmentation.
In https://msdn.microsoft.com/en-us/library/windows/desktop/aa363788(v=vs.85).aspx:
In the FAT file system, the file ID is generated from the first
cluster of the containing directory and the byte offset within the
directory of the entry for the file. Some defragmentation products
change this byte offset. (Windows in-box defragmentation does not.)
Thus, a FAT file ID can change over time. Renaming a file in
the FAT file system can also change the file ID, but only if the
new file name is longer than the old one.
However file IDs seem stable in the NTFS file system. I checked that
the file ID of a file wasn't changed when the file was moved to another
directory and renamed to longer file name.
In https://msdn.microsoft.com/en-us/library/windows/desktop/aa363788(v=vs.85).aspx:
In the NTFS file system, a file keeps the same file ID until it is deleted.
As for #dev, volume serial numbers seem stable in local file systems.
In https://msdn.microsoft.com/en-us/library/windows/desktop/aa364993(v=vs.85).aspx:
This function returns the volume serial number that the operating
system assigns when a hard disk is formatted.
Volume serial numbers in the SMB file system seem to be sent from SMB servers.
At least, samba creates it from the hostname and the service name.
https://github.com/samba-team/samba/blob/a0f6ea8de/source3/smbd/trans2.c#L3525-L3526
Well, feel free to reject this patch if nobody uses File::Stat#dev on Windows.
I also have no plan to use it.