The File System
COP-3402
Table of Contents
The file browser
When you think of a file system, you might image something like this, the file browser.
File browsers or explorers let you see what files and directories are available, open, copy, move, delete them, etc.
But this is just a view of the file system. In reality, files do much more under the hood of the operating system.
The File Abstraction
Files provide a way to group sequences of binary data.
That's basically all files are. No naming, no hardware specifics, no file formats.
Why is it an abstraction?
Files capture what's common about storage hardware.
What are some different data storage technologies?
- Magnetic, optical, flash, RAM
What's different about them?
What's common about them?
- They provide a way to read and write binary on a physical medium
Files abstract away the specifics of using different hardware technologies. As we'll see, the abstract can be applied to other hardware, such as network devices, and even used to create non-physical files, such as pipes.
File systems group together data on storage medium providing a large addressable sequence of bytes (usually as blocks).
How is the abstraction used?
Reading bytes of data
Writing bytes of data
Abstractions are defined by how they are used.
We need to be able to write data, extending the size of files, delete data, and read it, independently of where it may lie on the physical storage medium.
What other common operations can we do on files? Seek? Open/close?
Do other devices besides storage hardware fit this abstraction?
We'll be using the UNIX-style file abstraction in this class.
- Used in all modern OSes (GNU/Linux, MacOS, Windows)
Looking at a file's contents
hexdump -C hello.c
#include <stdio.h> int main() { printf("hello, world!\n"); }
00000000 23 69 6e 63 6c 75 64 65 20 3c 73 74 64 69 6f 2e |#include <stdio.| 00000010 68 3e 0a 0a 69 6e 74 20 6d 61 69 6e 28 29 20 7b |h>..int main() {| 00000020 0a 20 20 70 72 69 6e 74 66 28 22 68 65 6c 6c 6f |. printf("hello| 00000030 2c 20 77 6f 72 6c 64 21 5c 6e 22 29 3b 0a 7d 0a |, world!\n");.}.| 00000040
Notice the hexadecimal numbers, which represents the binary data, and the text on the right, which is interpreted according to ASCII code (man ascii
).
What about file extensions?
Extensions are a convention used by applications
Files are oblivious to their extension, .c, .txt, .mp3, etc.
The map is not the territory
What does a file abstraction that doesn't capture names or file formats mean for file extensions?
The file
command
file
looks at contents of the file instead of extension
file hello.c file stomping_grounds.mp3 cp hello.c hello.mp3 cp stomping_grounds.mp3 stomping_grounds.c file hello.mp3 file stomping_grounds.c xdg-open hello.mp3 xdg-open stomping_grounds.c xdg-open hello.c xdg-open stomping_grounds.mp3
file
is not bullet-proof. Some malware will modify the magic bytes to avoid detection.
Some file types
Type | Contents |
---|---|
Text files | strings of characters |
Program files | sequences of machine code |
Images, music, etc. | sequences of bytes in a format recognized by applications |
Magic bytes
file stomping_grounds.mp3
hexdump -C stomping_grounds.mp3 | head
gcc -o hello hello.c file hello hexdump -C hello | head
- Email attachment: LOVE-LETTER-FOR-YOU.TXT.vbs
- .vbs hidden, so users clicked on what they thought was a textfile
- .vbs is a script that gets executed
By Mario23 - , Public Domain, https://commons.wikimedia.org/w/index.php?curid=19189003
Anatomy of an Attack: Detecting and Defeating CRASHOVERRIDE
EXEC xp_cmdshell 'move C:\Delta\m32.txt C:\Delta\m32.exe';
How do files get named?
If the file itself doesn't store its own name, how do files get their name?
Directories
- Directories map the names of files to the file
- Think pointers in C
- Think name to phone number
The map is not the territory
Files are given unique IDs
- OS kernel assigns ID to file
- Called inode numbers
Take a class in or read about operating systems class to find out how file systems are implemented
Why separate the name from the file?
- Renaming is easy
- We can give many names to the same file (links)
Directories themselves are also files
A directory behaves exactly like an ordinary file except that it cannot be written on by unprivileged programs, so that the system controls the contents of directories. -The UNIX Time-Sharing System
Example directory file contents
Name | inode number |
---|---|
hello | 44214038 |
hello.c | 44214011 |
stomping_grounds.mp3 | 44214055 |
ls -i
Move vs. rename
- How do we rename a file?
- How do we move a file?
Renaming and moving are actually the same command in unix: mv
for "move".
Can directories contain other directories?
Yes! Directories are also files
What happens when directories contain directories?
We have a hierarchical file system
File system hierarchy
Conventions
- The root directory is called
/
(forward slash) - We also separate nested directories with
/
- All directories contain a
.
(dot) directory that points to itself - All directories contain a
..
(double dot) directory that points to its parent
Example file system hierarchy
Diagram
- Root directory
- Directory contents via child nodes
- Color code directories vs. files
- Directories can contain other directories
- File tree
- Absolute paths
- Working directory
- Relative paths
- Current directory
- Parent directory
- Links (cross-tree edges)
- Hard vs. soft links
Paths
A string that containers the sequence of directory names along the file tree to the directory that contains the file and the file name itself. Allows you to uniquely identify files, even those with the same name.
Absolute paths
Relative paths
./hello cd ..
Where is the "." and ".." in our file tree?
In order to answer this, we need to introduce a new concept, the working directory.
The working directory
Parent directory
Extending the file abstraction
- Network sockets
- Pipes
- Random number generation, /dev/urandom
- /dev/null
- RAM itself, /dev/mem
- Kernel settings and information /proc
- Graphics
- Peripherals, e.g., keyboard and mouse
- Temperature sensors and other measurement devices
- The file abstraction can be used for all I/O on your system and interact with all hardware
- In practice other models and implementations of I/O are used (sockets, raw access)
Other topics for an OS class
- Permissions and security
- Implementing file systems
- Kernel design, layered approaches
- Block vs. character devices
- Sockets and networking
Key takeaways
A file is an abstraction
- most commonly known for persistent hardware storage, but a file does not mean data on a disk (you can have that without a file)
- but it is an abstraction for reading and writing sequences of bytes which can apply to any i/o (stdio.h defines the unix syscall and libc conventions)
How file hierarchies work
- directories are special files that store mappings from names to other files
- if a directory contains a mapping to another directory file, we have a directory hierarchy
Referring to files with paths
- referring to files using relative and absolute naming (using the unix convention)
- relative paths are relative to the current working directory, which is stored with a running program (process)