Formatting and File Systems

Before a disk can be used it must be partitioned and formatted. Some confusion arises because the term formatting actually refers to two separate steps.

Low-Level Formatting

The first stage is low-level formatting (a.k.a. physical formatting) which is almost always done by the manufacturer (and therefore rarely by the end user). This process arranges the platters of the drive into concentric circles called tracks. The tracks are then further divided into 512 byte regions called sectors. (Actually, the sectors are larger than this, because they also contain additional data such as error correction codes. However, for the purposes of storing user data, only 512 bytes per sector are usable.)

Low-level formatting essentially makes the drive readable and writable by the system BIOS. Note that because the actuator arms are connected as a block, the read/write heads will all exist in the same track of their own platters at any given time. For this reason, the location of the heads is typically referred to not as a sector location, but as a cylinder location, where a cylinder is the collective term for a given single track in all platters. This is best explained with a diagram:

Tracks, cylinders and sectors

Since outer tracks are longer than inner ones (since the radius, and hence a track's circumference is larger), it is actually possible to have more sectors on the outer tracks than on the inner ones. However, early BIOS implementations and DOS versions could not handle disks with different numbers of sectors on different tracks. One way around this was for the integrated disk controller to provide the BIOS with false drive geometry data. Thus the drive would appear to have a constant number of sectors per track.

The other approach, which is the one used in drives today, is called Logical Block Addressing (LBA), which must be supported by the BIOS and the OS. LBA supports the use of different sectors-per-track figures, as well as providing support for much larger disks than was previously possible.

Partitioning

The drive must then be divided into partitions using a tool such as FDISK or Partition Magic on DOS/Windows-based operating systems. A typical arrangement under Microsoft platforms is to have a single primary active partition followed by an extended partition. This extended partition must then be further divided into one or more logical partitions. Drive letters that you see in Windows Explorer are mapped to the primary partition and to each logical partition (on each hard disk that you may have). This is why you can have more than one drive letter with only one drive.

Furthermore, partitioning involves writing information to the master boot record (MBR) of the hard disk. This is data held in the very first sector of the drive that tells the system how and where to boot from. When the PC is powered-on, information held in the BIOS tells the system how to access the MBR. The MBR holds the master partition table which describes how the drive(s) have been partitioned, and tells the BIOS which drive is set as the active partition to boot from.

High-Level Formatting

The last stage is high-level formatting, a.k.a. logical formatting. When people speak of "formatting their hard disk," it is this process they are usually referring to.

This process arranges the sectors into larger, more usable chunks called clusters, aka allocation units. Essentially, a cluster is a contiguous (i.e. adjacent) group of sectors, and is the smallest unit of storage the operating system can allocate to a file. The exact number of clusters created on your disk by formatting depends upon the operating system and desired type of file system. Under the DOS and Windows environments (apologies to Unix/Linux users and OS/2 users, but I won't be discussing UFS and HPFS here) there are three main types of file system: FAT16 (or just FAT), FAT32 and NTFS. (Note, FAT = File Allocation Table. In simple terms, the FAT simply maps clusters to files.) The filing system is simply a way of storing data.

FAT16 partitions have limitations. Not surprisingly, they are 16 bit! What does this mean? It means that cluster locations are described using 16 bit numbers. Thus the maximum number of clusters in a partition is 65536 (since 216 is 65536). The maximum cluster size for a FAT16 partition is 32kB. This means that clusters are defined as groups of 64 sectors, since sectors are 512 bytes. Thus the maximum possible size for a FAT16 partition is 2 gigabytes (65536 x 32kB). Furthermore, FAT16 partitions only allow up to 512 files in the root directory.

The FAT32 structure, supported by Windows 95 Service Release 2, Windows 98 and later operating systems, circumvents both of these limitations. There is no limit to the number of files in the root directory, and the theoretical partition size limit is 8TB. (This is based on 28 bits per cluster, since the top 4 are reserved, with 32kB allocation units.) However, FAT32 is not supported by either DOS or Windows NT. Instead, Windows NT has a file system called NTFS which implements built in security and robustness.

Modern operating systems like Windows XP actually support both FAT32 and NTFS. Careful consideration is required when deciding upon your choice of file system. For example, if you are planning a dual boot system using more than one operating system, you need to be aware that you may make some partitions invisible to one or more of your boot environments, depending on your decision.

If security is paramount, NTFS is possibly your best choice. Furthermore, NTFS supposedly keeps itself defragmented (see later), alleviating the need for defragmentation runs. Personally, I find this utter rubbish. I ran two similar partitions, one with NTFS and one with FAT32 for an extended period. Both were exposed to large amounts of data movement. After a couple of months, the FAT32 partition was 10% fragmented, but the NTFS partition was over 70% fragmented! That's frankly ridiculous! Still, Microsoft must also realise how rediculous that claim is, other wise they wouldn't have included a disk defragmenter that supports NTFS with Windows 2000. (Recall that Windows 2000 is actually NT5.0; NT4.0 didn't come with a defragmenter. Join the dots...)

It's also worth bearing in mind that if you do decide to go with NTFS, you may run into problems if you have a severe system crash (e.g. one that wont allow you to boot into Windows). Because it is secure, you can't simply boot in with a system disk and have a look around. You will have extremely limited access to your system. There are tools to hack the security, and Windows 2000 and XP do come with a DOS-like recovery console that provides minimal system access in the event of a serious system error. Even so, you will still in for a lot of trouble.

Move on to look at performance which discusses performance measurements and physical factors, cluster sizes and slack space, and finally fragmentation.