Assessment 1 Practice

Question 1 (win20 Midterm 2b)

Write the read_line function for this program. The function should populate buf with characters read from fd until receiving a newline, or until the end of the file. The string in buf should be a properly null-terminated C-string.

int read_line(int fd, char *buf, int buf_len) {
     // reads up to and including a newline (or EOF) from fd
     // replaces newline with null terminator
     // only reads up to buf_len - 1 bytes
     // @param fd: the file descriptor to read from
     // @param buf: the buffer to populate
     // @param buf_len: the size of the buffer
     // @return: the number of bytes read, not including
     //          the newline. Can be 0 if at end of file

     // TODO: Your code here
}

int read_line(int fd, char *buf, int buf_len) {
    int bytes_read = 0;
    while (bytes_read < buf_len - 1) {
        int actually_read = read(fd, buf + bytes_read, 1);
        if (actually_read == 0 || buf[bytes_read] == '\n') break;
        bytes_read++;
    }
    buf[bytes_read] = '\0';
    return bytes_read;
}

Question 2 (win20 Midterm 4) (20-25min)

a) For assignment 2, we saw that a directory is modeled the same as a file, except its contents are a list of unsorted directory entries. A fellow filesystem designer suggests storing the entries in sorted order. In 3 sentences or less for each, explain 1 benefit and 1 drawback in terms of performance of this change.

A benefit is quicker lookup of an entry, as we can use binary search to search for an entry name rather than looping over each entry. A drawback is when we add or rename directory entries, we must potentially re-sort the elements, which is extra work as opposed to the unsorted version.

b) We have seen that an inode stores only 8 block numbers, and if a file has more block numbers than this, we must use approaches such as singly and doubly indirect blocks to store them. A fellow filesystem designer suggests adding more space in each inode to store block numbers. In three sentences or less for each, explain 1 benefit and 1 drawback in terms of performance of increasing the number of block numbers stored in an inode.

One benefit of this is files are stored more directly, as we can fit more block numbers directly (or indirectly) in an inode. A drawback of this is more space may be taken up on disk by non-file data.

c) For each bullet below, specify a single range of file block indexes (as a number or mathematical expression) that could be stored there, assuming the same filesystem implementation as in assign2. File block index is defined the same as in assign2 - it is a data block index for the file, starting at 0; so the first block of file data has number 0, the second block of file data has number 1, etc. As an example, if a bullet said "the last three direct blocks in an inode using direct addressing", the answer would be 5 - 7, since file block index 5 is stored in the 3rd to last block, and index 7 in the last block. You do not need to explain your answers.

The first singly-indirect block in an inode using indirect addressing
The doubly-indirect block in an inode using indirect addressing

The first singly-indirect block in this case would hold file block numbers 0 - 255, inclusive. The doubly-indirect block in this case would hold block numbers 7 * 256 through 7 * 256 + 256 * 256 - 1, inclusive.

Question 3 (spr19 Midterm 1)

a) Write a function, writeBuf that uses the write function to print an entire buffer to a file descriptor (you are not allowed to use the dprintf function for this question).

/* Function: writeBuf
 * ------------------
 * Writes the contents of buf with size len to the file descriptor fdOut
 * @param fdOut - the file descriptor where the data will be written
 * @param buf - the buffer to write
 * @param len - the number of bytes in the buffer
 */
void writeBuf(int fdOut, char *buf, ssize_t len) {
    // TODO
}

void writeBuf(int fdOut, char *buf, ssize_t len) {
    ssize_t bytesWritten = 0;
    while (bytesWritten < len) {
        bytesWritten += write(fdOut, buf + bytesWritten, len - bytesWritten);
    }
}

b) Write a function, dualEcho that reads from fdIn and writes to both fdOut1 and fdOut2 until fdIn closes. If either fdOut1 or fdOut2 has a value of -1, you should not write any data to that file descriptor. You can assume that your writeBuf function from part (a) works perfectly. You are not allowed to use any dynamic memory for this function, and you cannot presume to know how much data will be read from fdIn.

/* Function: dualEcho
 * ------------------
 * Writes all input from fdIn to both fdOut1 and fdOut2
 * @param fdIn - the file descriptor to read from (until EOF)
 * @param fdOut1 - the first output file descriptor (ignore if -1)
 * @param fdOut2 - the second output file descriptor (ignore if -1)
 */
void dualEcho(int fdIn, int fdOut1, int fdOut2) {
    // TODO
}

void dualEcho(int fdIn, int fdOut1, int fdOut2) {
    // read from fdIn and echo to fdOut1 and fdOut2
    // if either fdOut1 or fdOut2 are -1, don't write
    char buf[1024];
    while (true) {
        ssize_t bytesRead = read(fdIn, buf, sizeof(buf));
        if (bytesRead <= 0) break;
        if (fdOut1 != -1) {
            writeBuf(fdOut1, buf, bytesRead);
        }
        if (fdOut2 != -1) {
            writeBuf(fdOut2, buf, bytesRead);
        }
    }
}

Question 4 (spr19 Miderm 4) (30min)

Recall that the Unix v6 filesystem has 512 byte sector sizes, with a 32-byte inode per file, 2-byte block id numbers, and 16-byte struct direntv6 entries. Use this information to answer the following questions. All answers must be 100 words or less for credit.

a) Newer hard drives and file systems sometimes define the sector size to be 4096 bytes. Describe the pros and cons of a larger sector size.

Pros of a 4096 byte sector size:

Files fit into fewer blocks, meaning that there is less overhead per file.
It takes less time to read files, because there are fewer blocks to look through - There would be less disk fragmentation because files would have fewer blocks

Con of a 4096 byte sector size:

Every file takes at least 4096 bytes, so if there are a lot of small files, this can waste space.

b) Assume a new file system changed the size of an inode to 64 bytes. Explain how you would use the extra 32 bytes for the greatest benefit of the file system. List one negative of having a larger inode size.

Possible uses for extra 32-bytes

more direct or indirect blocks, so files can be bigger
more metadata about the files
for very small files, keep the file data itself in the inode

Tradeoffs to having larger inode

metadata is overhead, which takes away from usable payload space
more complicated handling of files could be inefficient

c) What is the main limitation to a 2-byte block id number? Name a tradeoff to having a 4-byte block number.

Main limitation to a 2-byte inumber: file count is limited to 2^16 (64K) files for the entire file system

Tradeoff for 4-byte block number: Needs more metadata space (2x) to hold the numbers themselves.

d) Explain how a directory is "just a regular file" in terms of the Unix v6 filesystem.

A directory is just a file in the sense that it is accessed like any other file (with direct and indirect blocks), and it simply holds payload data that refers to other files. It isn’t a separate part of the file system, and there is no need for the filesystem code to treat a directory differently when reading and writing to their associated files.

Question 5 (Practice midterm 1 1) (45-50min)

Your answers to the following questions should be 50 words or less. Responses longer than 50 words will receive 0 points. You needn’t write in complete sentences provided it’s clear what you’re saying. Full credit will only be given to clear, correct, complete, relevant responses. Just because everything you write is technically true doesn’t mean you get all the points.

a) The block layer of a file system needs to keep track of which blocks are allocated and which ones aren’t. One scheme sets aside a portion of the underlying device to store a bitmap, where a single bit notes that some block is free if it’s 0, or allocated if it’s 1. The 0th bit tracks the allocation status of the 0th block beyond the bitmap, the 1th bit tracks the allocation status of the 1th block, and so forth. Another scheme might thread all of the unallocated blocks into a linked list (called a free list), where the first four bytes (i.e. the improvised next pointer) of each unallocated block store the block number of the next unallocated block in the free list, and so forth. The super block could store the block number of the first unallocated block in this list, and the last unallocated block could store 0 in its next pointer to signal the end of the free list. Briefly present one advantage of each approach.

Advantages of the bitmap scheme:

no lseek required
more cache friendly

Advantages of the free list approach:

zero memory footprint/overhead (clearly the better answer of the two I’m presenting)
arguably simpler to implement (though eventually both of these are equally easy to implement, so this is the weaker of the two answers)

b) If block sizes are 1024 (or 2^10) bytes and inodes are 32 (or 2^5) bytes, what percentage of the storage device should be allocated for the inode table if we never want to run out of inodes? Your answer can be approximate, and the 50-words-or-less defense of your answer should include the necessary math. Assume that all files require at least one block of payload, and assume a minimum storage device size of 1 terabyte (or 2^40 bytes).

One block of inodes can lead to 2^5 files, each with at least one block. 1 out of every 2^5 blocks must store inodes. That’s about 3%. (Arguments involving 16 bytes of dirent structure are also good, but remember that answer only needed to be approximate.)

c) A soft symbolic link is an alias for another pathname (e.g. cs110 could be a soft symbolic link in your home directory that aliases /usr/class/cs110). Identify two of the layers contributing to your assignment file system you would need to change (and how you would change them) to support this new file type.

inode layer would need to change to include a new file type
pathname layer would need to change to replace the symbolic link component with its expansion
other layers might be impacted, but above two layers require the most obvious changes.

d) A variant of the open system call—called openat—is equivalent to open unless the path argument is relative. In the case where the provided path argument is relative, the file to be opened is relative to the directory associated with the provided descriptor (instead of the current working directory). Leveraging your understanding of how open works with the file descriptor tables, the file entry table, and vnode entry table, and the layers of the file system, briefly explain how openat might be implemented.

int openat(int fd, const char *path, int oflag, ...);

If path is absolute, ignore fd and resolve as usual. If path is relative, drill through fd to file entry to vnode entry to get companion inode number (not the same as the fd), and use that inode number as the start inode (instead of ROOT_INUMBER).

e) Describe a simple modification to your assignment file system that would allow arbitrarily large file names while respecting the fact that most file names are short. We’d like the directory file payload to be reasonably compact.

If the file name is 14 or more characters, let the companion inumber refer to a new type of inode (similar to that used for symlinks in part 2b) or let it identify a block (or the head of a linked list of blocks) containing the full string.

Question 6 (Practice Midterm 2 2) (20-25min)

Unless otherwise noted, your answers to the following questions should be 50 words or fewer. Responses longer than 50 words will receive 0 points. You needn’t write in complete sentences provided it’s clear what you’re saying. Full credit will only be given to the best of responses. Just because everything you write is true doesn’t mean you get all the points.

b) Explain what happens when you type cd ../.. at the shell prompt. Frame your explanation in terms of your file system assignment and the fact that the inode number of the current working directory is the only relevant global variable maintained by your shell.

Search cwd’s payload for .., set inumber of cwd to inumber associated with ..; repeat one more time.

c) Consider the prototype for the flock system call, which is as follows:

int flock(int fd, int op);

flock can be used to gain exclusive access to the file session bound to fd. The op parameter can (for the purposes of this problem) be one of two constants, and those constants are:

LOCK_EX, which is a request to grab exclusive access to a file session that should be respected by all other processes. If the resource isn’t locked at the time of the call, then it is locked and flock returns right away. If the resource is locked, then the process blocks within the flock call until the lock is lifted by another process.
LOCK_UN, which releases the lock held on a resource (or is a no-op if the lock wasn’t held in the first place).

Explain why information about the locked state of a file session needs to be stored in a file entry table instead of a file descriptor table.

For lock to be respected by all processes, info must be stored in data structure referenced by all processes.

e) Your assignment file system relied on direct indexing for small files and singly and doubly indirect indexing for large files. In the name of code uniformity, you could have just represented all files, large and small, using doubly indirect indexing. Briefly describe the primary advantage (other than uniformity of implementation) and primary disadvantage of relying on doubly indirect indexing for all file sizes.

Advantage: supports even larger files than original design.

Disadvantage: accessing payload for very small files requires many disk accesses, cache unfriendly

Question 7 (Practice Midterm 3 2) (20min)

Unless otherwise noted, your answers to the following questions should be 75 words or fewer. You needn’t write in complete sentences provided it’s clear what you’re saying. Full credit will only be given to the best of responses. Just because everything you write is true doesn’t mean you get all the points.

a) remove is a C library function that removes a name from the file system. If the supplied name was the last one to identify the file, then the file itself is truly deleted and its resources donated back for reuse. In the context of your assignment filesystem design, explain how the file system would need to be updated to fully realize a call to remove.

Find relevant directory entry in parent directory’s payload, remove entry after decrementing the corresponding inode’s reference count. If that reference count falls to zero, deallocate all payload blocks and mark inode as free.

b) Recall that the struct inode from the filesystem assignment looked like this:

struct inode { // some fields irrelevant to the problem are omitted
    uint16_t i_mode; // bit vector of file type and permissions
    uint8_t i_size0; // most significant byte of size
    uint16_t i_size1; // lower two bytes of size
    uint16_t i_addr[8]; // device addresses constituting file
};

One CS110 student once proposed the following idea: for files with sizes that are just slightly larger than a perfect multiple of the block size (e.g. 1027, when the block size is 256), those last few bytes (e.g. the last three bytes of a 1027-byte file) could be stored in the inode itself. Describe how you would support this optimization so that entire blocks needn’t be allocated just to store a few bytes of payload.

Store last few bytes in the unused i_addr entries. In the case of the 1027-byte file, those three extra bytes could be dropped in the space set aside for i_addr[4] and the first byte of i_addr[5]. It’s easy to discern from the overall file size how many extra bytes there are and if they’ll fit in unused i_addr entries.

Question 8 (Practice Midterm 4 3) (20-25min)

a) The rename system call renames a file, moving it from one directory to another if necessary. It comes with the following prototype:

int rename(const char *ep, const char *np);

ep is short for existing path, and we’ll assume it’s an absolute path to a valid file you have permission to rename. np (short for new path, and also absolute) identifies where the file should be moved to and what new name it should assume. Any intermediate directories needed for the move are created. So, a call to

rename("/WWW/index.html","/archive/winter-2019/index-w19.html");

would remove index.html from WWW and move it to archive/winter-2019, creating archive and winter-2019 if necessary, with the name of index-w19.html. The renaming works even if the file being moved is a directory or a symbolic link. Without worrying about error checking, describe how rename could be efficiently implemented in terms of your assignment file system.

Find the inode number of the file being moved by iteratively drilling toward it as pathname_lookup would. Remove file’s dirent from parent directory payload, and then drill toward new parent directory of second argument, creating new dirent's along the way as needed. Finally, append new dirent on behalf of new name, with new name (e.g. index-w19.html) and existing inode number.

b) Consider the following program, which creates and opens a file in read-write mode, writes five characters to the file, and then attempts to read a single character through the same descriptor:

int main(int argc, char *argv[]) {
    int f = open("foo.txt", O_RDWR | O_CREAT | O_EXCL, 0644); 
    write(f, "abcde", 5);
    char ch;
    int count = read(f, &ch, 1);
    if (count == 0) {
        printf("No data!\n"); 
    } else {
        printf("Got this: %c\n", ch);
    }
    close(f);
    return 0;
}

When the above program runs, the output is No data!. Given the output, and leveraging what you know about descriptor, open file, and vnode tables, explain why the output is what it is.

Open file table entry contains a single cursor, which is left to reference the end of the file by the one write operation. That means the read call attempts to pull from the end of the file, so it returns 0.

CS110: Principle of Computer Systems, Winter 2022