May 24th Lecture ================= [ Multiplexing I/O ] Consider X server: * read from keyboard * read from mouse * write to framebuffer * read from network connections (multiple) * write to network connections (multiple) One option: NBIO / polling - inefficient: CPU spins; would like to keep off ready list Another option: fork proc for each I/O source * setup n+1 pipes for n children - one for all children to talk to server - one per child for server -> child communication * high overhead in context switches, memory requirements, etc... Another option: signals on I/O arrival * expensive to catch; no good for high-I/O processes What we want: a query mechanism to find out which descriptors in a set are actually ready, or block if none are ready select() * proc passes 3 bitmaps: read, write, exceptions * returns as soon as a descriptor in any of the sets is ready * can set a timeout * O(n) Other interfaces: * poll() -- portable, similar to select(), O(n) * epoll() -- Linux only, more efficient O(1) * kqueue() -- FreeBSD only, more efficient O(1) [ Vectored I/O: Scatter / gather ] ssize_t readv(int fd, struct iovec* iov, int iovcnt); ssize_t writev(int fd, const struct iovect* iov, int iocnt); struct iovec { void* iov_base; size_t iov_len; } Compare to: read(int fd, void* buf, size_t nbytes) All kernel-internal I/O is done with iovecs Advantages: * atomicity * fewer syscalls without data copying * header/body splitting/collection (e.g., webserver) [ VFS ] In early OS's, entry in fd table pointed to structure with fs-specific information (inode); only one filesystem for all storage Now we have different filesystems for different purposes * large files vs small files * random access vs sequential access * read-only vs read-write * local storage vs remote storage New vnode layer added above inode layer * extensible, object-oriented interface * common filesystem information - type (VREG, VDIR, VLNK) - mode (permissions) - owner UID, GID - file size - access time - modification time * operations: - create - remove - lookup - link - rename - mkdir - rmdir - getpages - putpages Every file/directory in active use is represented by a vnode object in kernel memory syscall -> fd fd -> open file entry open file entry -> vnode vnode -> {inode, socket, etc.} -> storage vnodes are ref counted by structures that point to them * open file table * "current directory" field for process * FS mount points Each FS keeps a list of its vnodes; VFS layer maintains an overall free list Stackable Filesystems: * vnodes can be stacked on top of each other to perform transformations on the data - compression: read/write/size modified, date, permissions, etc not - encryption - mirroring / transparent redundancy - null (just pass on to lower level) -- used for mounting Union FS: merge two filesystems; modifications only made to second one * NFS-mounted source repositories * CDROM as root FS (e.g., live-CD) * to delete, use "whiteout" -- special entry in FS #2 that removes entry from listing