Filesystems, Inodes and Directories

Printer-friendly version

So, how does a linux filesystem work?

In the conceptual "Unix" or "Linux" filesystem, all files start with an "inode". Such a filesystem consists of three parts:

  1. a "superblock" that describes the configuration of the filesystem,
  2. an array of inodes, which describe individual files of data, and
  3. an array of data blocks, which hold the data associated with individual files.

   |<---------------- filesystem ------------------>|
   |superblock|<-- inode table -->|<---- data ----->|
   | -- -- -- |[inode][...][inode]|[data][...][data]|

The "superblock" holds information about the filesystem as a whole, including a count of the number of inodes in the filesystem, the size of a single data block, and a count of the number of data blocks in the filesystem.

The inode table consists of an array of fixed-length inodes, some in use and some not. An "inode" is the filesystem data element that records the access permissions, dates, size, and location of the data of the file. It is possible for a file to not contain data, and (thus) consist of only an "inode".

The data block table consists of an array of fixed-length data blocks, some in use, and some not. A "data block" (or "block") is a fixed size filesystem data element, set aside for the storage of file data only.

When you create a filesystem, you specify the number of inodes, and the size of the data blocks that the filesystem will hold. This information is stored in the superblock, and used to format and manage the filesystem. When the kernel mounts a filesystem, it reads the superblock to determine exactly how to access the file data on the filesystem.

Internally, within the kernel, all file access is done through the inode of the file.

So, where does that leave the usual userland interface of filenames and directories?

Well, imagine that a directory is just another file, with an inode part and data parts. The data in this file consists of small blocks, each one containing just two things: a text string, and the address of an inode. The file "gives" each inode a name (or gives each name an inode).

Now, imagine a mechanism that searches through this sort of file, looking for a specific name, and returning the inode address associated with that name. You could then access an inode (and thus data) by knowing the associated name as recorded in this file.


  [inode]
     \
      \     |<---------- "directory" file ------------->|
       '--->|[name,inode#][name,inode#][name,inode#][...]
                     \             \             \
                      \             \             '--->[inode]
                       \             '--->[inode]          \
                        '--->[inode]          \             '--->[data][data][...]
                                \              '--->[data][data][...]
                                 '--->[data][data][...]

Let's add to that: some of those directory entries point to inodes that manage "directory file" data:


  [inode]
     \
      \     |<---------- "directory" file ------------->|
       '--->|[name,inode#][name,inode#][name,inode#][...]
                     \             \             \
                      \             \             '--->[inode]
                       \             '--->[inode]          \
                        '--->[inode]          \             '--->[data][data][...]
                                \              '--->[data][data][...]
                                 \
                                  \     |<-------- "directory" file -------------->|
                                   '--->[name,inode#][name,inode#][name,inode#][...]
                                                 \            \            \
                                                  \            \            '--->[inode]
                                                   \            '--->[inode]         \
                                                    '--->[inode]         \            '--->[data][data][...]
                                                             \            '--->[data][data][...]
                                                              '--->[data][data][...]

Now, we have a directory tree! We just have to know which inode to start at.

Part of the superblock tells the kernel which inode# is the first one in the tree. Each inode contains a flag that tells the kernel whether to interpret the associated data as a directory or as a data file. These two things give the filesystem the capability of storing files using a tree hierarchy naming convention.

But, there's a side effect of this. There's nothing stopping two directory entries (even in different branches of the tree) from specifying the same inode number. This side effect allows two different names to point to the same file. It also allows (in limited cases) two different names to point to the same directory.

In each "directory", two entries are special: the directory entry named ".", and the directory entry named "..". The "." directory entry points to the inode that is associated with the directory file that the "." entry is found in. In other words, it points to itself. The ".." directory entry points to the inode manages the directory file that contains the name/inode# of the directory file that the ".." file is found in. In other words, it points to the directory above itself.



\
 :--->[inode]
 |       \     |<--- "directory" file --->|
 A        '--->|[.......][name,inode#][...]
 |                                \
 |                                 :--->[inode]
 |                                 |        \     |<----- "directory" file ---->|
 |                                 A         '--->|[".",inode#]["..",inode#][...]
 |                                 |                     /            /
 |                                 '-<------------------'            /
 '-<----------------------------------------------------------------'
Articles: