UP | HOME

The File System
COP-3402

Table of Contents

The file browser

Screenshot of the Nautilus File Browser

When you think of a file system, you might image something like this, the file browser.

File browsers or explorers let you see what files and directories are available, open, copy, move, delete them, etc.

But this is just a view of the file system. In reality, files do much more under the hood of the operating system.

The File Abstraction

Files provide a way to group sequences of binary data.

That's basically all files are. No naming, no hardware specifics, no file formats.

Why is it an abstraction?

Files capture what's common about storage hardware.

What are some different data storage technologies?

  • Magnetic, optical, flash, RAM

What's different about them?

What's common about them?

  • They provide a way to read and write binary on a physical medium

Files abstract away the specifics of using different hardware technologies. As we'll see, the abstract can be applied to other hardware, such as network devices, and even used to create non-physical files, such as pipes.

File systems group together data on storage medium providing a large addressable sequence of bytes (usually as blocks).

How is the abstraction used?

Reading bytes of data

Writing bytes of data

Abstractions are defined by how they are used.

We need to be able to write data, extending the size of files, delete data, and read it, independently of where it may lie on the physical storage medium.

What other common operations can we do on files? Seek? Open/close?

Do other devices besides storage hardware fit this abstraction?

We'll be using the UNIX-style file abstraction in this class.

  • Used in all modern OSes (GNU/Linux, MacOS, Windows)

Looking at a file's contents

hexdump -C hello.c
#include <stdio.h>

int main() {
  printf("hello, world!\n");
}
00000000  23 69 6e 63 6c 75 64 65  20 3c 73 74 64 69 6f 2e  |#include <stdio.|
00000010  68 3e 0a 0a 69 6e 74 20  6d 61 69 6e 28 29 20 7b  |h>..int main() {|
00000020  0a 20 20 70 72 69 6e 74  66 28 22 68 65 6c 6c 6f  |.  printf("hello|
00000030  2c 20 77 6f 72 6c 64 21  5c 6e 22 29 3b 0a 7d 0a  |, world!\n");.}.|
00000040

Notice the hexadecimal numbers, which represents the binary data, and the text on the right, which is interpreted according to ASCII code (man ascii).

What about file extensions?

Extensions are a convention used by applications

Files are oblivious to their extension, .c, .txt, .mp3, etc.

The map is not the territory

What does a file abstraction that doesn't capture names or file formats mean for file extensions?

The file command

file looks at contents of the file instead of extension

file hello.c
file stomping_grounds.mp3

cp hello.c hello.mp3
cp stomping_grounds.mp3 stomping_grounds.c

file hello.mp3
file stomping_grounds.c

xdg-open hello.mp3
xdg-open stomping_grounds.c

xdg-open hello.c
xdg-open stomping_grounds.mp3

file is not bullet-proof. Some malware will modify the magic bytes to avoid detection.

Some file types

Type Contents
Text files strings of characters
Program files sequences of machine code
Images, music, etc. sequences of bytes in a format recognized by applications

Magic bytes

file/magic/Magdir/audio

file stomping_grounds.mp3
hexdump -C stomping_grounds.mp3 | head
gcc -o hello hello.c
file hello
hexdump -C hello | head

ILOVEYOU computer worm

Loveletter

  • Email attachment: LOVE-LETTER-FOR-YOU.TXT.vbs
  • .vbs hidden, so users clicked on what they thought was a textfile
  • .vbs is a script that gets executed

Anatomy of an Attack: Detecting and Defeating CRASHOVERRIDE

EXEC xp_cmdshell 'move C:\Delta\m32.txt C:\Delta\m32.exe';

How do files get named?

If the file itself doesn't store its own name, how do files get their name?

Directories

  • Directories map the names of files to the file
    • Think pointers in C
    • Think name to phone number

The map is not the territory

Files are given unique IDs

  • OS kernel assigns ID to file
  • Called inode numbers

Take a class in or read about operating systems class to find out how file systems are implemented

Why separate the name from the file?

  • Renaming is easy
  • We can give many names to the same file (links)

Directories themselves are also files

A directory behaves exactly like an ordinary file except that it cannot be written on by unprivileged programs, so that the system controls the contents of directories. -The UNIX Time-Sharing System

Example directory file contents

Name inode number
hello 44214038
hello.c 44214011
stomping_grounds.mp3 44214055
ls -i

Move vs. rename

  • How do we rename a file?
  • How do we move a file?

Renaming and moving are actually the same command in unix: mv for "move".

Can directories contain other directories?

Yes! Directories are also files

What happens when directories contain directories?

We have a hierarchical file system

File system hierarchy

Conventions

  • The root directory is called / (forward slash)
  • We also separate nested directories with /
  • All directories contain a . (dot) directory that points to itself
  • All directories contain a .. (double dot) directory that points to its parent

Example file system hierarchy

Diagram

  • Root directory
  • Directory contents via child nodes
  • Color code directories vs. files
  • Directories can contain other directories
  • File tree
  • Absolute paths
  • Working directory
  • Relative paths
  • Current directory
  • Parent directory
  • Links (cross-tree edges)
  • Hard vs. soft links

Paths

A string that containers the sequence of directory names along the file tree to the directory that contains the file and the file name itself. Allows you to uniquely identify files, even those with the same name.

Absolute paths

Relative paths

./hello cd ..

Where is the "." and ".." in our file tree?

In order to answer this, we need to introduce a new concept, the working directory.

The working directory

Parent directory

Extending the file abstraction

  • Network sockets
  • Pipes
  • Random number generation, /dev/urandom
  • /dev/null
  • RAM itself, /dev/mem
  • Kernel settings and information /proc
  • Graphics
  • Peripherals, e.g., keyboard and mouse
  • Temperature sensors and other measurement devices

  • The file abstraction can be used for all I/O on your system and interact with all hardware
  • In practice other models and implementations of I/O are used (sockets, raw access)

Other topics for an OS class

  • Permissions and security
  • Implementing file systems
  • Kernel design, layered approaches
  • Block vs. character devices
  • Sockets and networking

Key takeaways

A file is an abstraction

  • most commonly known for persistent hardware storage, but a file does not mean data on a disk (you can have that without a file)
  • but it is an abstraction for reading and writing sequences of bytes which can apply to any i/o (stdio.h defines the unix syscall and libc conventions)

How file hierarchies work

  • directories are special files that store mappings from names to other files
  • if a directory contains a mapping to another directory file, we have a directory hierarchy

Referring to files with paths

  • referring to files using relative and absolute naming (using the unix convention)
  • relative paths are relative to the current working directory, which is stored with a running program (process)

Author: Paul Gazzillo

Created: 2024-08-27 Tue 10:27

Validate