Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Windows Internals [ PART II ]

Windows Internals [ PART II ]

Published by Willington Island, 2021-09-03 14:56:13

Description: [ PART II ]

See how the core components of the Windows operating system work behind the scenes—guided by a team of internationally renowned internals experts. Fully updated for Windows Server(R) 2008 and Windows Vista(R), this classic guide delivers key architectural insights on system design, debugging, performance, and support—along with hands-on experiments to experience Windows internal behavior firsthand.

Delve inside Windows architecture and internals:


Understand how the core system and management mechanisms work—from the object manager to services to the registry

Explore internal system data structures using tools like the kernel debugger

Grasp the scheduler's priority and CPU placement algorithms

Go inside the Windows security model to see how it authorizes access to data

Understand how Windows manages physical and virtual memory

Tour the Windows networking stack from top to bottom—including APIs, protocol drivers, and network adapter drivers

Search

Read the Text Version

15. Container size: 10 Mb 16. Total log capacity: 20 Mb 17. Total free log space: 14 Mb 18. Minimum containers: 2 19. Maximum containers: 20 20. Log growth increment: 2 container(s) 21. Auto shrink: Not enabled 22. RM prefers availability over consistency. As mentioned, the fsutil resource command has many options for configuring TxF resource managers, including the ability to create a secondary resource manager in any directory of your choice. For example, you can use the fsutil resource create c:\\rmtest command to create a secondary resource manager in the Rmtest directory, followed by the fsutil resource start c:\\rmtest command to initiate it. Note the presence of the $Tops and $TxfLogContainer* files and of the TxfLog and $Txf directories in this folder. On-Disk Implementation As shown earlier in Table 11-5, TxF uses the $LOGGED_UTILITY_STREAM attribute type to store additional data for files and directories that are or have been part of a transaction. This attribute is called $TXF_DATA and contains important information that allows TxF to keep active offline data for a file part of a transaction. The attribute is permanently stored in the MFT; that is, even after the file is not part of a transaction anymore, the stream remains, for reasons we’ll explain shortly. The major components of the attribute are shown in Figure 11-48. The first field shown is the file reference to the root of the resource manager responsible for the transaction associated with this file. For the default resource manager, the file reference ID is 5, which is the file record for the root directory (\\), as shown earlier in Figure 11-25. TxF needs this information when it creates an FCB for the file so that it can link it to the correct resource manager, which in turn needs to create an enlistment for the transaction when a transacted file request is received by NTFS. (For more information on enlistments and transactions, see the KTM section in Chapter 3.) Another important piece of data stored in the $TXF_DATA attribute is the TxF file ID, or TxID, and this explains why $TXF_DATA attributes are never deleted. Because NTFS writes file names to its records when writing to the transaction log, it needs a way to uniquely identify files in the same directory that may have had the same name. For example, if sample.txt is deleted from a directory in a transaction and later a new file with the same name is created in the same directory (and as part of the same transaction), TxF needs a way to uniquely identify the two instances of 880

sample.txt. This identification is provided by a 64-bit unique number, the TxID, that TxF increments when a new file (or an instance of a file) becomes part of a transaction. Because they can never be reused, TxIDs are permanent, so the $TXF_DATA attribute will never be removed from a file. Last but not least, three CLFS LSNs are stored for each file part of a transaction. Whenever a transaction is active, such as during create, rename, or write operations, TxF writes a log record to its CLFS log. Each record is assigned an LSN, and that LSN gets written to the appropriate field in the $TXF_DATA attribute. The first LSN is used to store the log record that identifies the changes to NTFS metadata in relation to this file. For example, if the standard attributes of a file are changed as part of a transacted operation, TxF must update the relevant MFT file record, and the LSN for the log record describing the change is stored. TxF uses the second LSN when the file’s data is modified. Finally, TxF uses the third LSN when the file name index for the directory requires a change related to a transaction the file took part in, or when a directory was part of a transaction and received a TxID. The $TXF_DATA attribute also stores internal flags that describe the state information to TxF and the index of the USN record that was applied to the file on commit. A TxF transaction can span multiple USN records that may have been partly updated by NTFS’s recovery mechanism (described shortly), so the index tells TxF how many more USN records must be applied after a recovery. Logging Implementation As mentioned earlier, each time a change is made to the disk because of an ongoing transaction, TxF writes a record of the change to its log. TxF uses a variety of log record types to keep track of transactional changes, but regardless of the record type, all TxF log records have a generic header that contains information identifying the type of the record, the action related to the record, the TxID that the record applies to, and the GUID of the KTM transaction that the record is associated with. A redo record specifies how to reapply a change part of a transaction that’s already been committed to the volume if the transaction has actually never been flushed from cache to disk. An undo record, on the other hand, specifies how to reverse a change part of a transaction that hasn’t been committed at the time of a rollback. Some records are redo-only, meaning they don’t contain any equivalent undo data, while other records contain both redo and undo information. Through the TOPS file, TxF maintains two critical pieces of data, the base LSN and the restart LSN. The base LSN determines the LSN of the first valid record in the log, while the restart LSN indicates at which LSN recovery should begin when starting the resource manager. When TxF writes a restart record, it updates these two values, indicating that changes have been made to the volume and flushed out to disk—meaning that the file system is fully consistent up to the new restart LSN. TxF also writes compensating log records or CLRs. These records store the actions that are being performed during transaction rollback (explained next). They’re primarily used to store the undo-next LSN, which allows the recovery process to avoid repeated undo operations by 881

bypassing undo records that have already been processed, a situation that can happen if the system fails during the recovery phase and has already performed part of the undo pass. Finally, TxF also deals with prepare records, abort records, and commit records, which describe the state of the KTM transactions related to TxF. Recovery Implementation When a resource manager starts because of an FSCTL_TXFS_START_RM call (or, for the default resource manager, as soon as the volume is mounted), TxF runs the recovery process. It reads the TOPS file to determine the restart LSN, where the recovery process should start, and then reads each record forward through the log (called the redo pass). As each record is being processed, TxF opens the file referenced by the record and compares the LSN in the $TXF_DATA attribute with the LSN in the record. If the LSN stored in the attribute is greater than or equal to the LSN of the log record, the action is not applied because the on-disk copy of the file is as new or newer than that of the log record action. If the LSN is not greater than or equal to the LSN in the record, the log contains information about the file that was never written to the file itself. In this case, TxF applies whichever action was recorded in the log record and updates the LSN in the $TXF_DATA attribute with the LSN from the record. As TxF is processing its redo pass, it builds its transaction table, which describes the operations that it has completed; if it encounters an abort or commit record along the way, TxF discards the related transactions. By the end of the redo pass, TxF parses the final transaction table and connects to the KTM to see whether the KTM recorded a commit or an abort for the transactions. (KTM stores this information in the KtmLog stream of the TxF multiplexed log, as explained earlier.) After TxF has finished communicating with the KTM, it looks at any leftover transactions in the transaction table and begins the undo pass. In the undo pass, TxF aborts all the remaining transactions in the transaction table by traversing each transaction’s undo LSN chain and applying the undo action for each log record. At the end of the undo pass, the resource manager is consistent and initialized. This process is very similar to the log file service’s recovery procedure, which is described later in more detail. You should refer to this description for a complete picture of the standard transactional recovery mechanisms. 11.8 NTFS recovery Support NTFS recovery support ensures that if a power failure or a system failure occurs, no file system operations (transactions) will be left incomplete and the structure of the disk volume will remain intact without the need to run a disk repair utility. The NTFS Chkdsk utility is used to repair catastrophic disk corruption caused by I/O errors (bad disk sectors, electrical anomalies, or disk failures, for example) or software bugs. But with the NTFS recovery capabilities in place, Chkdsk is rarely needed. As mentioned earlier (in the section “Recoverability”), NTFS uses a transaction-processing scheme to implement recoverability. This strategy ensures a full disk recovery that is also extremely fast (on the order of seconds) for even the largest disks. NTFS limits its recovery 882

procedures to file system data to ensure that at the very least the user will never lose a volume because of a corrupted file system; however, unless an application takes specific action (such as flushing cached files to disk), NTFS’s recovery support doesn’t guarantee user data to be fully updated if a crash occurs. This is the job of Transactional NTFS (TxF). The following sections detail the transaction-logging scheme NTFS uses to record modifications to file system data structures and explain how NTFS recovers a volume if the system fails. 11.8.1 Design NTFS implements the design of a recoverable file system. These file systems ensure volume consistency by using logging techniques (sometimes called journaling) originally developed for transaction processing. If the operating system crashes, the recoverable file system restores consistency by executing a recovery procedure that accesses information that has been stored in a log file. Because the file system has logged its disk writes, the recovery procedure takes only seconds, regardless of the size of the volume. The recovery procedure for a recoverable file system is exact, guaranteeing that the volume will be restored to a consistent state. A recoverable file system incurs some costs for the safety it provides. Every transaction that alters the volume structure requires that one record be written to the log file for each of the transaction’s suboperations. This logging overhead is ameliorated by the file system’s batching of log records—writing many records to the log file in a single I/O operation. In addition, the recoverable file system can employ the optimization techniques of a lazy write file system. It can even increase the length of the intervals between cache flushes because the file system can be recovered if the system crashes before the cache changes have been flushed to disk. This gain over the caching performance of lazy write file systems makes up for, and often exceeds, the overhead of the recoverable file system’s logging activity. Neither careful write nor lazy write file systems guarantee protection of user file data. If the system crashes while an application is writing a file, the file can be lost or corrupted. Worse, the crash can corrupt a lazy write file system, destroying existing files or even rendering an entire volume inaccessible. The NTFS recoverable file system implements several strategies that improve its reliability over that of the traditional file systems. First, NTFS recoverability guarantees that the volume structure won’t be corrupted, so all files will remain accessible after a system failure. Second, although NTFS doesn’t guarantee protection of user data in the event of a system crash—some changes can be lost from the cache—applications can take advantage of the NTFS write-through and cache-flushing capabilities to ensure that file modifications are recorded on disk at appropriate intervals. Both cache write-through—forcing write operations to be immediately recorded on disk—and cache flushing—forcing cache contents to be written to disk—are efficient operations. NTFS doesn’t have to do extra disk I/O to flush modifications to several different file system data structures because changes to the data structures are recorded—in a single write 883

operation—in the log file; if a failure occurs and cache contents are lost, the file system modifications can be recovered from the log. Furthermore, unlike the FAT file system, NTFS guarantees that user data will be consistent and available immediately after a write-through operation or a cache flush, even if the system subsequently fails. 11.8.2 Metadata Logging NTFS provides file system recoverability by using the same logging technique used by TxF, which consists of recording all operations that modify file system metadata to a log file. Unlike TxF, however, NTFS’s built-in file system recovery support doesn’t make use of CLFS but uses an internal logging implementation called the log file service. Another difference is that while TxF is used only when callers opt-in for transacted operations, NTFS records all metadata changes so that the file system can be made consistent in the face of a system failure. Log File Service The log file service (LFS) is a series of kernel-mode routines inside the NTFS driver that NTFS uses to access the log file. NTFS passes the LFS a pointer to an open file object, which specifies a log file to be accessed. The LFS either initializes a new log file or calls the Windows cache manager to access the existing log file through the cache, as shown in Figure 11-49. Note that although LFS and CLFS have similar sounding names, they are separate logging implementations used for different reasons, although their operation is similar in many ways. The LFS divides the log file into two regions: a restart area and an “infinite” logging area, as shown in Figure 11-50. 884

NTFS calls the LFS to read and write the restart area. NTFS uses the restart area to store context information such as the location in the logging area at which NTFS will begin to read during recovery after a system failure. The LFS maintains a second copy of the restart data in case the first becomes corrupted or otherwise inaccessible. The remainder of the log file is the logging area, which contains transaction records NTFS writes to recover a volume in the event of a system failure. The LFS makes the log file appear infinite by reusing it circularly (while guaranteeing that it doesn’t overwrite information it needs). Just like CLFS, the LFS uses LSNs to identify records written to the log file. As the LFS cycles through the file, it increases the values of the LSNs. NTFS uses 64 bits to represent LSNs, so the number of possible LSNs is so large as to be virtually infinite. NTFS never reads transactions from or writes transactions to the log file directly. The LFS provides services that NTFS calls to open the log file, write log records, read log records in forward or backward order, flush log records up to a particular LSN, or set the beginning of the log file to a higher LSN. During recovery, NTFS calls the LFS to perform the same actions as described in the TxF recovery section: a redo pass for nonflushed committed changes, followed by an undo pass for noncommitted changes. Here’s how the system guarantees that the volume can be recovered: 1. NTFS first calls the LFS to record in the (cached) log file any transactions that will modify the volume structure. 2. NTFS modifies the volume (also in the cache). 3. The cache manager prompts the LFS to flush the log file to disk. (The LFS implements the flush by calling the cache manager back, telling it which pages of memory to flush. Refer back to the calling sequence shown in Figure 11-49.) 4. After the cache manager flushes the log file to disk, it flushes the volume changes (the metadata operations themselves) to disk. These steps ensure that if the file system modifications are ultimately unsuccessful, the corresponding transactions can be retrieved from the log file and can be either redone or undone as part of the file system recovery procedure. File system recovery begins automatically the first time the volume is used after the system is rebooted. NTFS checks whether the transactions that were recorded in the log file before the crash were applied to the volume, and if they weren’t, it redoes them. NTFS also guarantees that transactions not completely logged before the crash are undone so that they don’t appear on the volume. Log Record Types The NTFS recovery mechanism uses similar log record types as the TxF recovery mechanism: update records, which correspond to the redo and undo records that TxF uses, and checkpoint records, which are similar to the restart records used by TxF. Figure 11-51 shows three update records in the log file. Each record represents one suboperation of a transaction, creating a new file. The redo entry in each update record tells NTFS how to reapply the suboperation to the volume, and the undo entry tells NTFS how to roll back (undo) the suboperation. 885

After logging a transaction (in this example, by calling the LFS to write the three update records to the log file), NTFS performs the suboperations on the volume itself, in the cache. When it has finished updating the cache, NTFS writes another record to the log file, recording the entire transaction as complete—a suboperation known as committing a transaction. Once a transaction is committed, NTFS guarantees that the entire transaction will appear on the volume, even if the operating system subsequently fails. When recovering after a system failure, NTFS reads through the log file and redoes each committed transaction. Although NTFS completed the committed transactions from before the system failure, it doesn’t know whether the cache manager flushed the volume modifications to disk in time. The updates might have been lost from the cache when the system failed. Therefore, NTFS executes the committed transactions again just to be sure that the disk is up to date. After redoing the committed transactions during a file system recovery, NTFS locates all the transactions in the log file that weren’t committed at failure and rolls back each suboperation that had been logged. In Figure 11-51, NTFS would first undo the T1c suboperation and then follow the backward pointer to T1b and undo that suboperation. It would continue to follow the backward pointers, undoing suboperations, until it reached the first suboperation in the transaction. By following the pointers, NTFS knows how many and which update records it must undo to roll back a transaction. Redo and undo information can be expressed either physically or logically. As the lowest layer of software maintaining the file system structure, NTFS writes update records with physical descriptions that specify volume updates in terms of particular byte ranges on the disk that are to be changed, moved, and so on, unlike TxF, which uses logical descriptions that express updates in terms of operations such as “delete file A.dat.” NTFS writes update records (usually several) for each of the following transactions: ■ Creating a file ■ Deleting a file ■ Extending a file ■ Truncating a file ■ Setting file information ■ Renaming a file ■ Changing the security applied to a file 886

The redo and undo information in an update record must be carefully designed because although NTFS undoes a transaction, recovers from a system failure, or even operates normally, it might try to redo a transaction that has already been done or, conversely, to undo a transaction that never occurred or that has already been undone. Similarly, NTFS might try to redo or undo a transaction consisting of several update records, only some of which are complete on disk. The format of the update records must ensure that executing redundant redo or undo operations is idempotent, that is, has a neutral effect. For example, setting a bit that is already set has no effect, but toggling a bit that has already been toggled does. The file system must also handle intermediate volume states correctly. In addition to update records, NTFS periodically writes a checkpoint record to the log file, as illustrated in Figure 11-52. A checkpoint record helps NTFS determine what processing would be needed to recover a volume if a crash were to occur immediately. Using information stored in the checkpoint record, NTFS knows, for example, how far back in the log file it must go to begin its recovery. After writing a checkpoint record, NTFS stores the LSN of the record in the restart area so that it can quickly find its most recently written checkpoint record when it begins file system recovery after a crash occurs—this is similar to the restart LSN used by TxF for the same reason. Although the LFS presents the log file to NTFS as if it were infinitely large, it isn’t. The generous size of the log file and the frequent writing of checkpoint records (an operation that usually frees up space in the log file) make the possibility of the log file filling up a remote one. Nevertheless, the LFS, just like CLFS, accounts for this possibility by tracking several numbers: ■ The available log space ■ The amount of space needed to write an incoming log record and to undo the write, should that be necessary ■ The amount of space needed to roll back all active (noncommitted) transactions, should that be necessary If the log file doesn’t contain enough available space to accommodate the total of the last two items, the LFS returns a “log file full” error, and NTFS raises an exception. The NTFS exception handler rolls back the current transaction and places it in a queue to be restarted later. To free up space in the log file, NTFS must momentarily prevent further transactions on files. To do so, NTFS blocks file creation and deletion and then requests exclusive access to all system files and shared access to all user files. Gradually, active transactions either are completed 887

successfully or receive the “log file full” exception. NTFS rolls back and queues the transactions that receive the exception. Once it has blocked transaction activity on files as just described, NTFS calls the cache manager to flush unwritten data to disk, including unwritten log file data. After everything is safely flushed to disk, NTFS no longer needs the data in the log file. It resets the beginning of the log file to the current position, making the log file “empty.” Then it restarts the queued transactions. Beyond the short pause in I/O processing, the “log file full” error has no effect on executing programs. This scenario is one example of how NTFS uses the log file not only for file system recovery but also for error recovery during normal operation. You’ll find out more about error recovery in the following section. 11.8.3 Recovery NTFS automatically performs a disk recovery the first time a program accesses an NTFS volume after the system has been booted. (If no recovery is needed, the process is trivial.) Recovery depends on two tables NTFS maintains in memory: a transaction table, which behaves just like the one TxF maintains, and a dirty page table, which records which pages in the cache contain modifications to the file system structure that haven’t yet been written to disk. This data must be flushed to disk during recovery. NTFS writes a checkpoint record to the log file once every 5 seconds. Just before it does, it calls the LFS to store a current copy of the transaction table and of the dirty page table in the log file. NTFS then records in the checkpoint record the LSNs of the log records containing the copied tables. When recovery begins after a system failure, NTFS calls the LFS to locate the log records containing the most recent checkpoint record and the most recent copies of the transaction and dirty page tables. It then copies the tables to memory. The log file usually contains more update records following the last checkpoint record. These update records represent volume modifications that occurred after the last checkpoint record was written. NTFS must update the transaction and dirty page tables to include these operations. After updating the tables, NTFS uses the tables and the contents of the log file to update the volume itself. To effect its volume recovery, NTFS scans the log file three times, loading the file into memory during the first pass to minimize disk I/O. Each pass has a particular purpose: 1. Analysis 2. Redoing transactions 3. Undoing transactions Analysis Pass During the analysis pass, as shown in Figure 11-53, NTFS scans forward in the log file from the beginning of the last checkpoint operation to find update records and use them to update the 888

transaction and dirty page tables it copied to memory. Notice in the figure that the checkpoint operation stores three records in the log file and that update records might be interspersed among these records. NTFS therefore must start its scan at the beginning of the checkpoint operation. Most update records that appear in the log file after the checkpoint operation begins represent a modification to either the transaction table or the dirty page table. If an update record is a “transaction committed” record, for example, the transaction the record represents must be removed from the transaction table. Similarly, if the update record is a “page update” record that modifies a file system data structure, the dirty page table must be updated to reflect that change. Once the tables are up to date in memory, NTFS scans the tables to determine the LSN of the oldest update record that logs an operation that hasn’t been carried out on disk. The transaction table contains the LSNs of the noncommitted (incomplete) transactions, and the dirty page table contains the LSNs of records in the cache that haven’t been flushed to disk. The LSN of the oldest update record that NTFS finds in these two tables determines where the redo pass will begin. If the last checkpoint record is older, however, NTFS will start the redo pass there instead. Note In the TxF recovery model, there is no distinct analysis pass. Instead, as described in the TxF recovery section, TxF performs the equivalent work in the redo pass. Redo Pass During the redo pass, as shown in Figure 11-54, NTFS scans forward in the log file from the LSN of the oldest update record, which it found during the analysis pass. It looks for “page update” records, which contain volume modifications that were written before the system failure but that might not have been flushed to disk. NTFS redoes these updates in the cache. When NTFS reaches the end of the log file, it has updated the cache with the necessary volume modifications, and the cache manager’s lazy writer can begin writing cache contents to disk in the background. Undo Pass 889

After it completes the redo pass, NTFS begins its undo pass, in which it rolls back any transactions that weren’t committed when the system failed. Figure 11-55 shows two transactions in the log file; transaction 1 was committed before the power failure, but transaction 2 wasn’t. NTFS must undo transaction 2. Suppose that transaction 2 created a file, an operation that comprises three suboperations, each with its own update record. The update records of a transaction are linked by backward pointers in the log file because they are usually not contiguous. The NTFS transaction table lists the LSN of the last-logged update record for each noncommitted transaction. In this example, the transaction table identifies LSN 4049 as the last update record logged for transaction 2. As shown from right to left in Figure 11-56, NTFS rolls back transaction 2. After locating LSN 4049, NTFS finds the undo information and executes it, clearing bits 3 through 9 in its allocation bitmap. NTFS then follows the backward pointer to LSN 4048, which directs it to remove the new file name from the appropriate file name index. Finally, it follows the last backward pointer and deallocates the MFT file record reserved for the file, as the update record with LSN 4046 specifies. Transaction 2 is now rolled back. If there are other noncommitted transactions to undo, NTFS follows the same procedure to roll them back. Because undoing transactions affects the volume’s file system structure, NTFS must log the undo operations in the log file. After all, the power might fail again during the recovery, and NTFS would have to redo its undo operations! When the undo pass of the recovery is finished, the volume has been restored to a consistent state. At this point, NTFS is prepared to flush the cache changes to disk to ensure that the volume is up to date. Before doing so, however, it executes a callback that TxF registers for notifications 890

of LFS flushes. Because TxF and NTFS both use write-ahead logging, TxF must flush its log through CLFS before the NTFS log is flushed to ensure consistency of its own metadata. (And similarly, the TOPS file must be flushed before the CLFS-managed log files.) NTFS then writes an “empty” LFS restart area to indicate that the volume is consistent and that no recovery need be done if the system should fail again immediately. Recovery is complete. NTFS guarantees that recovery will return the volume to some preexisting consistent state, but not necessarily to the state that existed just before the system crash. NTFS can’t make that guarantee because, for performance, it uses a “lazy commit” algorithm, which means that the log file isn’t immediately flushed to disk each time a “transaction committed” record is written. Instead, numerous “transaction committed” records are batched and written together, either when the cache manager calls the LFS to flush the log file to disk or when the LFS writes a checkpoint record (once every 5 seconds) to the log file. Another reason the recovered volume might not be completely up to date is that several parallel transactions might be active when the system crashes and some of their “transaction committed” records might make it to disk whereas others might not. The consistent volume that recovery produces includes all the volume updates whose “transaction committed” records made it to disk and none of the updates whose “transaction committed” records didn’t make it to disk. NTFS uses the log file to recover a volume after the system fails, but it also takes advantage of an important “freebie” it gets from logging transactions. File systems necessarily contain a lot of code devoted to recovering from file system errors that occur during the course of normal file I/O. Because NTFS logs each transaction that modifies the volume structure, it can use the log file to recover when a file system error occurs and thus can greatly simplify its error handling code. The “log file full” error described earlier is one example of using the log file for error recovery. Most I/O errors a program receives aren’t file system errors and therefore can’t be resolved entirely by NTFS. When called to create a file, for example, NTFS might begin by creating a file record in the MFT and then enter the new file’s name in a directory index. When it tries to allocate space for the file in its bitmap, however, it could discover that the disk is full and the create request can’t be completed. In such a case, NTFS uses the information in the log file to undo the part of the operation it has already completed and to deallocate the data structures it reserved for the file. Then it returns a “disk full” error to the caller, which in turn must respond appropriately to the error. 11.8.4 NTFS Bad-Cluster Recovery The volume manager included with Windows (VolMgr) can recover data from a bad sector on a fault-tolerant volume, but if the hard disk doesn’t use the SCSI protocol or runs out of spare sectors, a volume manager can’t perform sector sparing to replace the bad sector. (See Chapter 8 for more information on the volume manager.) When the file system reads from the sector, the volume manager instead recovers the data and returns the warning to the file system that there is only one copy of the data. The FAT file system doesn’t respond to this volume manager warning. Moreover, neither FAT nor the volume manager keeps track of the bad sectors, so a user must run the Chkdsk or 891

Format utility to prevent the volume manager from repeatedly recovering data for the file system. Both Chkdsk and Format are less than ideal for removing bad sectors from use. Chkdsk can take a long time to find and remove bad sectors, and Format wipes all the data off the partition it’s formatting. In the file system equivalent of a volume manager’s sector sparing, NTFS dynamically replaces the cluster containing a bad sector and keeps track of the bad cluster so that it won’t be reused. (Recall that NTFS maintains portability by addressing logical clusters rather than physical sectors.) NTFS performs these functions when the volume manager can’t perform sector sparing. When a volume manager returns a bad-sector warning or when the hard disk driver returns a bad-sector error, NTFS allocates a new cluster to replace the one containing the bad sector. NTFS copies the data that the volume manager has recovered into the new cluster to reestablish data redundancy. Figure 11-57 shows an MFT record for a user file with a bad cluster in one of its data runs as it existed before the cluster went bad. When it receives a bad-sector error, NTFS reassigns the cluster containing the sector to its bad-cluster file. This prevents the bad cluster from being allocated to another file. NTFS then allocates a new cluster for the file and changes the file’s VCN-to-LCN mappings to point to the new cluster. This bad-cluster remapping (introduced earlier in this chapter) is illustrated in Figure 11-57. Cluster number 1357, which contains the bad sector, must be replaced by a good cluster. Bad-sector errors are undesirable, but when they do occur, the combination of NTFS and the volume manager provides the best possible solution. If the bad sector is on a redundant volume, the volume manager recovers the data and replaces the sector if it can. If it can’t replace the sector, it returns a warning to NTFS, and NTFS replaces the cluster containing the bad sector. If the volume isn’t configured as a redundant volume, the data in the bad sector can’t be recovered. When the volume is formatted as a FAT volume and the volume manager can’t recover the data, reading from the bad sector yields indeterminate results. If some of the file system’s control structures reside in the bad sector, an entire file or group of files (or potentially, the whole disk) can be lost. At best, some data in the affected file (often, all the data in the file beyond the bad sector) is lost. Moreover, the FAT file system is likely to reallocate the bad sector to the same or another file on the volume, causing the problem to resurface. Like the other file systems, NTFS can’t recover data from a bad sector without help from a volume manager. However, NTFS greatly contains the damage a bad sector can cause. If NTFS discovers the bad sector during a read 892

operation, it remaps the cluster the sector is in, as shown in Figure 11-58. If the volume isn’t configured as a redundant volume, NTFS returns a “data read” error to the calling program. Although the data that was in that cluster is lost, the rest of the file—and the file system—remains intact; the calling program can respond appropriately to the data loss, and the bad cluster won’t be reused in future allocations. If NTFS discovers the bad cluster on a write operation rather than a read, NTFS remaps the cluster before writing and thus loses no data and generates no error. The same recovery procedures are followed if file system data is stored in a sector that goes bad. If the bad sector is on a redundant volume, NTFS replaces the cluster dynamically, using the data recovered by the volume manager. If the volume isn’t redundant, the data can’t be recovered, and NTFS sets a bit in the volume file that indicates corruption on the volume. The NTFS Chkdsk utility checks this bit when the system is next rebooted, and if the bit is set, Chkdsk executes, fixing the file system corruption by reconstructing the NTFS metadata. In rare instances, file system corruption can occur even on a fault-tolerant disk configuration. A double error can destroy both file system data and the means to reconstruct it. If the system crashes while NTFS is writing the mirror copy of an MFT file record—of a file name index or of the log file, for example—the mirror copy of such file system data might not be fully updated. If the system were rebooted and a bad-sector error occurred on the primary disk at exactly the same location as the incomplete write on the disk mirror, NTFS would be unable to recover the correct data from the disk mirror. NTFS implements a special scheme for detecting such corruptions in file system data. If it ever finds an inconsistency, it sets the corruption bit in the volume file, which causes Chkdsk to reconstruct the NTFS metadata when the system is next rebooted. Because file system corruption is rare on a fault-tolerant disk configuration, Chkdsk is seldom needed. It is supplied as a safety precaution rather than as a first-line data recovery strategy. 893

The use of Chkdsk on NTFS is vastly different from its use on the FAT file system. Before writing anything to disk, FAT sets the volume’s dirty bit and then resets the bit after the modification is complete. If any I/O operation is in progress when the system crashes, the dirty bit is left set and Chkdsk runs when the system is rebooted. On NTFS, Chkdsk runs only when unexpected or unreadable file system data is found and NTFS can’t recover the data from a redundant volume or from redundant file system structures on a single volume. (The system boot sector is duplicated, as are the parts of the MFT required for booting the system and running the NTFS recovery procedure. This redundancy ensures that NTFS will always be able to boot and recover itself.) Table 11-9 summarizes what happens when a sector goes bad on a disk volume formatted for one of the Windows-supported file systems according to various conditions we’ve described in this section. If the volume on which the bad sector appears is a fault-tolerant volume (a mirrored or RAID-5 volume) and if the hard disk is one that supports sector sparing (and that hasn’t run out of spare sectors), it doesn’t matter which file system you’re using (FAT or NTFS). The volume manager replaces the bad sector without the need for user or file system intervention. If a bad sector is located on a hard disk that doesn’t support sector sparing, the file system is responsible for replacing (remapping) the bad sector or—in the case of NTFS—the cluster in which the bad sector resides. The FAT file system doesn’t provide sector or cluster remapping. The benefits of NTFS cluster remapping are that bad spots in a file can be fixed without harm to the file (or harm to the file system, as the case may be) and that the bad cluster won’t be reallocated to the same or another file. 11.8.5 Self-Healing With today’s multiterabyte storage devices, taking a volume offline for a consistency check can result in a service outage of many hours. Recognizing that many disk corruptions are localized 894

to a single file or portion of metadata, NTFS implements a self-healing feature to repair damage while a volume remains online. When NTFS detects corruption, it prevents access to the damaged file or files and creates a system worker thread that executes Chkdsklike corrections to the corrupted data structures, allowing access to the repaired files when it has finished. Access to other files continues normally during this operation, minimizing service disruption. You can use the fsutil repair set command to view and set a volume’s repair options, which are summarized in Table 11-10. The Fsutil utility uses the FSCTL_SET_REPAIR file system control code to set these settings, which are saved in the VCB for the volume. In all cases, including when the visual warning is disabled (the default), NTFS will log any selfhealing operation it undertook in the System event log. Apart from periodic automatic self-healing, NTFS also supports manually initiated selfhealing cycles through the FSCTL_INITIATE_REPAIR and FSCTL_WAIT_FOR_REPAIR control codes, which can be initiated with the fsutil repair initiate and fsutil repair wait commands. This allows the user to force the repair of a specific file and to wait until repair of that file is complete. To check the status of the self-healing mechanism, the FSCTL_QUERY_REPAIR control code or the fsutil repair query command can be used, as shown here: 1. C:\\>fsutil repair query c: 2. Self healing is enabled for volume c: with flags 0x1. 3. flags: 0x01 - enable general repair 4. 0x08 - warn about potential data loss 5. 0x10 - disable general repair and bugcheck once on first corruption 11.9 Encrypting File System Security EFS security relies on cryptography support. The first time a file is encrypted, EFS assigns the account of the user performing the encryption a private/public key pair for use in file encryption. Users can encrypt files via Windows Explorer by opening a file’s Properties dialog box, clicking Advanced, and then selecting the Encrypt Contents To Secure Data option, as shown in Figure 11-59. Users can also encrypt files via a command-line utility named cipher. Windows automatically encrypts files that reside in directories that are designated as encrypted directories. When a file is encrypted, EFS generates a random number for the file that EFS calls the file’s file 895

encryption key (FEK). EFS uses the FEK to encrypt the file’s contents with a stronger variant of the Data Encryption Standard (DES) algorithm—Triple-DES (3DES) or Advanced Encryption Standard (AES). EFS stores the file’s FEK with the file but encrypts the FEK with the user’s EFS public key by using the RSA public key–based encryption algorithm. After EFS completes these steps, the file is secure: other users can’t decrypt the data without the file’s decrypted FEK, and they can’t decrypt the FEK without the private key. eFS FeK Key Strength The default FEK encryption algorithm is AES. The Windows AES algorithm uses 256-bit keys. Use of 3DES allows access to larger sized keys, so if you require greater key strength you can enable 3DES encryption in one of two ways: either as the algorithm for all system cryptographic services or just for EFS. To have 3DES be the encryption algorithm for all system cryptographic services, open the Local Security Policy Editor by entering secpol.msc in the Run dialog box from the Start menu and open the Security Options node under Local Policies. View the properties of System Cryptography: Use FIPS Compliant Algorithms For Encryption, Hashing And Signing, and enable it. To enable 3DES for EFS only, create the DWORD value HKLM\\SOFTWARE\\Microsoft \\Windows NT\\CurrentVersion\\EFS\\AlgorithmID, set it to 0x6603, and reboot. EFS uses a private/public key algorithm to encrypt FEKs. To encrypt file data, EFS uses AES or 3DES because both are symmetric encryption algorithms, which means that they use the same key to encrypt and decrypt data. Symmetric encryption algorithms are typically very fast, which makes them suitable for encrypting large amounts of data, such as file data. However, symmetric encryption algorithms have a weakness: you can bypass their security if you obtain the key. If multiple users want to share one encrypted file protected only by AES or 3DES, each user would require access to the file’s FEK. Leaving the FEK unencrypted would obviously be a security problem, but encrypting the FEK once would require all the users to share the same FEK decryption key—another potential security problem. Keeping the FEK secure is a difficult problem, which EFS addresses with the public key–based half of its encryption architecture. Encrypting a file’s FEK for individual users who access the file lets multiple users share an encrypted file. EFS can encrypt a file’s FEK with each user’s public key and can store each user’s encrypted FEK with the file. Anyone can access a 896

user’s public key, but no one can use a public key to decrypt the data that the public key encrypted. The only way users can decrypt a file is with their private key, which the operating system must access. A user’s private key decrypts the user’s encrypted copy of a file’s FEK. Public key–based algorithms are usually slow, but EFS uses these algorithms only to encrypt FEKs. Splitting key management between a publicly available key and a private key makes key management a little easier than symmetric encryption algorithms do and solves the dilemma of keeping the FEK secure. Windows stores a user’s private keys in the user’s profile directory (typically under \\Users) within the AppData\\Roaming\\Microsoft\\Crypto\\RSA subdirectory. To protect private keys, Windows encrypts all files within the RSA folder with a random symmetric key called the user’s master key. The master key is 64 bytes in length and is generated by a strong random number generator. The master key is also stored in the user’s profile under the AppData\\Roaming \\Microsoft\\Protect directory and is 3DES-encrypted with a key that’s in part based on the user’s password. When a user changes his or her password, master keys are automatically unencrypted and re-encrypted using the new password. Several components work together to make EFS work, as the diagram of EFS architecture in Figure 11-60 shows. EFS support is merged into the NTFS driver. Whenever NTFS encounters an encrypted file, NTFS executes EFS functions that it contains. The EFS functions encrypt and decrypt file data as applications access encrypted files. Although EFS stores an FEK with a file’s data, users’ public keys encrypt the FEK. To encrypt or decrypt file data, EFS must decrypt the file’s FEK with the aid of cryptography services that reside in user mode. The Local Security Authority Subsystem (Lsass; \\%SystemRoot%\\System32\\Lsass.exe) manages logon sessions but also handles EFS key management chores. For example, when EFS needs to decrypt an FEK to decrypt file data a user wants to access, NTFS sends a request to Lsass. EFS sends the request via an advanced local procedure call (ALPC) message. The KSecDD (\\%SystemRoot%\\System32\\Drivers\\Ksecdd.sys) device driver exports functions for other drivers 897

that need to send ALPC messages to Lsass. The Local Security Authority Server (Lsasrv; \\%SystemRoot%\\System32\\Lsasrv.dll) component of Lsass that listens for remote procedure call (RPC) requests passes requests to decrypt an FEK to the appropriate EFS-related decryption function, which also resides in Lsasrv. Lsasrv uses functions in Microsoft CryptoAPI (also referred to as CAPI) to decrypt the FEK, which the NTFS driver sent to Lsass in encrypted form. CryptoAPI comprises cryptographic service provider (CSP) DLLs that make various cryptography services (such as encryption/decryption and hashing) available to applications. The CSP DLLs manage retrieval of user private and public keys, for example, so that Lsasrv doesn’t need to concern itself with the details of how keys are protected or even with the details of the encryption algorithms. EFS uses the RSA encryption algorithms provided by the Microsoft Enhanced Cryptographic provider (\\%SystemRoot%\\System32\\Rsaenh.dll). After Lsasrv decrypts an FEK, Lsasrv returns the FEK to the NTFS driver via an ALPC reply message. After EFS receives the decrypted FEK, EFS can use AES to decrypt the file data for NTFS. Let’s look at the details of how EFS integrates with NTFS and how Lsasrv uses CryptoAPI to manage keys. 11.9.1 Encrypting a File for the First Time The NTFS driver calls its internal EFS functions when it encounters an encrypted file. A file’s attributes record that the file is encrypted in the same way that a file records that it is compressed (discussed earlier in this chapter). NTFS has specific interfaces for converting a file from nonencrypted to encrypted form, but user-mode components primarily drive the process. As described earlier, Windows lets you encrypt a file in two ways: by using the cipher command-line utility or by checking the Encrypt Contents To Secure Data check box in the Advanced Attributes dialog box for a file in Windows Explorer. Both Windows Explorer and the cipher command rely on the EncryptFile Windows API that Advapi32.dll (Advanced Windows APIs DLL) exports. Advapi32 loads another DLL, Feclient.dll (File Encryption Client DLL), to obtain APIs that Advapi32 can use to invoke EFS interfaces in Lsasrv via ALPC. When Lsasrv receives an RPC message from Feclient to encrypt a file, Lsasrv uses the Windows impersonation facility to impersonate the user that ran the application (either cipher or Windows Explorer) that is encrypting the file. (Impersonation is described in Chapter 6.) This procedure lets Windows treat the file operations that Lsasrv performs as if the user who wants to encrypt the file is performing them. Lsasrv usually runs in the System account. (The System account is described in Chapter 6.) In fact, if it doesn’t impersonate the user, Lsasrv usually won’t have permission to access the file in question. Lsasrv next creates a log file in the volume’s System Volume Information directory into which Lsasrv records the progress of the encryption process. The log file usually has the name Efs0. log, but if other files are undergoing encryption, increasing numbers replace the 0 until a unique log file name for the current encryption is created. CryptoAPI relies on information that a user’s registry profile stores, so if the profile is not already loaded, Lsasrv next uses the LoadUserProfile API function of Userenv.dll (User Environ ment DLL) to load the profile into the registry of the user it is impersonating. Typically, the user profile is already loaded because Winlogon loads a user’s profile when a user logs on. However, if 898

a user uses the Windows RunAs command to log on to a different account, when the user tries to access encrypted files from that account, the account’s profile might not be loaded. Lsasrv then generates an FEK for the file by using the RSA encryption facilities of the Microsoft Base Cryptographic Provider 1.0 CSP. Constructing Key Rings At this point, Lsasrv has an FEK and can construct EFS information to store with the file, including an encrypted version of the FEK. Lsasrv reads the HKEY_ CURRENT_USER\\Software\\Microsoft\\Windows NT\\CurrentVersion\\EFS\\CurrentKeys \\CertificateHash value of the user performing the encryption to obtain the user’s public key signature. (Note that this key doesn’t appear in the registry until a file or folder is encrypted.) Lsasrv uses the signature to access the user’s public key and encrypt FEKs. Lsasrv can now construct the information that EFS stores with the file. EFS stores only one block of information in an encrypted file, and that block contains an entry for each user sharing the file. These entries are called key entries, and EFS stores them in the Data Decryption Field (DDF) portion of the file’s EFS data. A collection of multiple key entries is called a key ring because, as mentioned earlier, EFS lets multiple users share encrypted files. Figure 11-61 shows a file’s EFS information format and key entry format. EFS stores enough information in the first part of a key entry to precisely describe a user’s public key. This data includes the user’s security ID (SID) (note that the SID is not guaranteed to be present), the container name in which the key is stored, the cryptographic provider name, and the private/public key pair certificate hash. Only the private/public key pair certificate hash is used by the decryption process. The second part of the key entry contains an encrypted version of the FEK. Lsasrv uses the CryptoAPI to encrypt the FEK with the RSA algorithm and the user’s public key. Next, Lsasrv creates another key ring that contains recovery key entries. EFS stores information about recovery key entries in a file’s Data Recovery Field (DRF). The format of DRF entries is identical to the format of DDF entries. The DRF’s purpose is to let designated accounts, or Recovery Agents, decrypt a user’s file when administrative authority must have access to the user’s data. For example, suppose a company employee forgot his or her logon password. An administrator can reset the user’s password, but without Recovery Agents, no one can recover the user’s encrypted data. 899

Recovery Agents are defined with the Encrypted Data Recovery Agents security policy of the local computer or domain. This policy is available from the Local Security Policy MMC snap-in, as shown in Figure 11-62. When you use the Add Recovery Agent Wizard (by rightclicking Encrypting File System and then clicking Add Data Recovery Agent), you can add Recovery Agents and specify which private/public key pairs (designated by their certificates) the Recovery Agents use for EFS recovery. Lsasrv interprets the recovery policy when it initializes and when it receives notification that the recovery policy has changed. EFS creates a DRF key entry for each Recovery Agent by using the cryptographic provider registered for EFS recovery. In the final step in creating EFS information for a file, Lsasrv calculates a checksum for the DDF and DRF by using the MD5 hash facility of Base Cryptographic Provider 1.0. Lsasrv stores the checksum’s result in the EFS information header. EFS references this checksum during decryption to ensure that the contents of a file’s EFS information haven’t become corrupted or been tampered with. Encrypting File Data Figure 11-63 illustrates the flow of the encryption process. After Lsasrv constructs the necessary information for a file a user wants to encrypt, it can begin encrypting the file. Lsasrv creates a backup file, Efs0.tmp, for the file undergoing encryption. (Lsasrv uses higher numbers in the backup file name if other backup files exist.) Lsasrv creates the backup file in the directory that contains the file undergoing encryption. Lsasrv applies a restrictive security descriptor to the backup file so that only the System account can access the file’s contents. Lsasrv next initializes the log file that it created in the first phase of the encryption process. Finally, Lsasrv records in the log file that the backup file has been created. Lsasrv encrypts the original file only after the file is completely backed up. Lsasrv next sends the EFS kernel-mode code inside NTFS a command to add to the original file the EFS information that it just created. The EFS kernel-mode code takes the EFS information that Lsasrv sent and applies the information to the file, which lets EFS add the $EFS attribute to NTFS files. Execution returns to Lsasrv, which copies the contents of the file undergoing encryption to the backup file. When the backup copy is complete, including backups of all alternate data streams, Lsasrv records in the log file that the backup file is up to date. Lsasrv then sends another command to NTFS to tell NTFS to encrypt the contents of the original file. 900

When NTFS receives the EFS command to encrypt the file, NTFS deletes the contents of the original file and copies the backup data to the file. After NTFS copies each section of the file, NTFS flushes the section’s data from the file system cache, which prompts the cache manager to tell NTFS to write the file’s data to disk. Because the file is marked as encrypted, at this point in the file-writing process, NTFS calls its EFS routines to encrypt the data before it writes the data to disk. EFS uses the unencrypted FEK that NTFS passes it to perform AES or 3DES encryption of the file as appropriate, one sector (512 bytes) at a time. After EFS encrypts the file, Lsasrv records in the log file that the encryption was successful and deletes the file’s backup copy. Finally, Lsasrv deletes the log file and returns control to the application that requested the file’s encryption. Encryption Process Summary The following list summarizes the steps EFS performs to encrypt a file: 1. The user profile is loaded if necessary. 2. A log file is created in the System Volume Information directory with the name Efsx.log, where x is a unique number (for example, Efs0.log). As subsequent steps are performed, records are written to the log so that the file can be recovered in case the system fails during the encryption process. 3. Base Cryptographic Provider 1.0 generates a random 128-bit FEK for the file. 4. A user EFS private/public key pair is generated or obtained. HKEY_CURRENT_USER \\Software\\Microsoft\\Windows NT\\CurrentVersion\\EFS\\CurrentKeys \\CertificateHash identifies the user’s key pair. 5. A DDF key ring is created for the file that has an entry for the user. The entry contains a copy of the FEK that has been encrypted with the user’s EFS public key. 6. A DRF key ring is created for the file. It has an entry for each Recovery Agent on the system, with each entry containing a copy of the FEK encrypted with the agent’s EFS public key. 901

7. A backup file with a name in the form Efs0.tmp is created in the same directory as the file to be encrypted. 8. The DDF and DRF key rings are added to a header and augment the file as its EFS attribute. 9. The backup file is marked as encrypted, and the original file is copied to the backup. 10. The original file’s contents are destroyed, and the backup is copied to the original. This copy operation results in the data in the original file being encrypted because the file is now marked as encrypted. 11. The backup file is deleted. 12. The log file is deleted. 13. The user profile is unloaded (if it was loaded in step 1). If the system crashes during the encryption process, either the original file remains intact or the backup file contains a consistent copy. When Lsasrv initializes after a system crash, it looks for log files under the System Volume Information subdirectory on each NTFS volume on the system. If Lsasrv finds one or more log files, it examines their contents and determines how recovery should take place. Lsasrv deletes the log file and the corresponding backup file if the original file wasn’t modified at the time of the crash; otherwise, Lsasrv copies the backup file over the original, partially encrypted file and then deletes the log and backup. After Lsasrv processes log files, the file system will be in a consistent state with respect to encryption, with no loss of user data. 11.9.2 The Decryption Process The decryption process begins when a user opens an encrypted file. NTFS examines the file’s attributes when opening the file and reads the $EFS attribute associated with the encrypted file. NTFS completes the necessary steps to open the file and ensures that the user opening the file has access privileges to the file’s encrypted data (that is, that an encrypted FEK in either the DDF or DRF key ring corresponds to a private/public key pair associated with the user). As EFS performs this validation, EFS obtains the file’s decrypted FEK to use in subsequent data operations the user might perform on the file. EFS can’t decrypt an FEK and relies on Lsasrv (which can use CryptoAPI) to perform FEK decryption. EFS sends an ALPC message by way of the Ksecdd.sys driver to Lsasrv that asks Lsasrv to obtain the decrypted form of the encrypted FEK in the $EFS attribute data (the EFS data) that corresponds to the user who is opening the file. When Lsasrv receives the ALPC message, Lsasrv executes the Userenv.dll (User Environment DLL) LoadUserProfile API function to bring the user’s profile into the registry, if the profile isn’t already loaded. Lsasrv proceeds through each key field in the EFS data, using the user’s private key to try to decrypt each FEK. For each key, Lsasrv attempts to decrypt a DDF or DRF key entry’s FEK. If the certificate hash in a key field doesn’t refer to a key the user owns, 902

Lsasrv moves on to the next key field. If Lsasrv can’t decrypt any DDF or DRF key field’s FEK, the user can’t obtain the file’s FEK. Consequently, EFS denies access to the application opening the file. However, if Lsasrv identifies a hash as corresponding to a key the user owns, it decrypts the FEK with the user’s private key using CryptoAPI. Because Lsasrv processes both DDF and DRF key rings when decrypting an FEK, it automatically performs file recovery operations. If a Recovery Agent that isn’t registered to access an encrypted file (that is, it doesn’t have a corresponding field in the DDF key ring) tries to access a file, EFS will let the Recovery Agent gain access because the agent has access to a key pair for a key field in the DRF key ring. Decrypted FEK Caching Traveling the path from the NTFS driver to Lsasrv and back can take a relatively long time—in the process of decrypting an FEK, CryptoAPI results in more than 2,000 registry API calls and 400 file system accesses on a typical system. The NTFS driver uses a cache to try to avoid this expense. Decrypting File Data After an application opens an encrypted file, the application can read from and write to the file. NTFS uses EFS to decrypt file data as NTFS reads the data from the disk and before NTFS places the data in the file system cache. Similarly, when an application writes data to a file, the data remains in unencrypted form in the file system cache until the application or the cache manager uses NTFS to flush the data back to disk. When an encrypted file’s data writes back from the cache to the disk, NTFS uses EFS to encrypt the data. As stated earlier, EFS performs encryption and decryption in 512-byte units. The 512-byte size is the most convenient for the driver because disk reads and writes occur in multiples of the 512-byte sector. 11.9.3 Backing Up Encrypted Files An important aspect of any file encryption facility’s design is that file data is never available in unencrypted form except to applications that access the file via the encryption facility. This restriction particularly affects backup utilities, in which archival media store files. EFS addresses this problem by providing a facility for backup utilities so that the utilities can back up and restore files in their encrypted states. Thus, backup utilities don’t have to be able to decrypt file data, nor do they need to encrypt file data in their backup procedures. Backup utilities use the EFS API functions OpenEncryptedFileRaw, ReadEncryptedFileRaw, WriteEncryptedFileRaw, and CloseEncryptedFileRaw in Windows to access a file’s encrypted contents. The Advapi32.dll library provides these API functions, which all use ALPC to invoke corresponding functions in Lsasrv. For example, after a backup utility opens a file for raw access during a backup operation, the utility calls ReadEncryptedFileRaw to obtain the file data. The Lsasrv function EfsReadFileRaw issues control commands (which the EFS session key encrypts 903

with AES or 3DES) to the NTFS driver to read the file’s EFS attribute first and then the encrypted contents. EfsReadFileRaw might have to perform multiple read operations to read a large file. As EfsReadFileRaw reads each portion of such a file, Lsasrv sends an RPC message to Advapi32.dll that executes a callback function that the backup program specified when it issued the ReadEncryptedFileRaw API function. EfsReadFileRaw hands the encrypted data it just read to the callback function, which can write the data to the backup media. Backup utilities restore encrypted files in a similar manner. The utilities call the WriteEncryptedFileRaw API function, which invokes a callback function in the backup program to obtain the unencrypted data from the backup media while Lsasrv’s EfsWriteFileRaw function is restoring the file’s contents. eXPeriMeNT: Viewing eFS information EFS has a handful of other API functions that applications can use to manipulate encrypted files. For example, applications use the AddUsersToEncryptedFile API function to give additional users access to an encrypted file and RemoveUsersFromEncryptedFile to revoke users’ access to an encrypted file. Applications use the QueryUsersOn-EncryptedFile function to obtain information about a file’s associated DDF and DRF key fields. QueryUsersOnEncryptedFile returns the SID, certificate hash value, and display information that each DDF and DRF key field contains. The following output is from the EFSDump utility, from Sysinternals, when an encrypted file is specified as a commandline argument: 1. C:\\>efsdump test.txt 2. EFS Information Dumper v1.02 3. Copyright (C) 1999 Mark Russinovich 4. Systems Internals – http://www.sysinternals.com 5. test.txt: 6. DDF Entry: 7. DARYL\\Mark: 8. CN=Mark,L=EFS,OU=EFS File Encryption Certificate 9. DRF Entry: 10. Unknown user: 11. EFS Data Recovery You can see that the file test.txt has one DDF entry for user Mark and one DRF entry for the EFS Data Recovery agent, which is the only Recovery Agent currently registered on the system. 11.10 Conclusion Windows supports a wide variety of file system formats accessible to both the local system and remote clients. The file system filter driver architecture provides a clean way to extend and augment file system access, and NTFS provides a reliable, secure, scalable file system format for local file system storage. In the next chapter, we’ll look at networking on Windows. 904

12. Networking Windows was designed with networking in mind, and it includes broad networking support that is integrated with the I/O system and the Windows API. The four basic types of networking software are services, APIs, protocols, and network adapter device drivers, and each is layered on the next to form a network stack. Windows has well-defined interfaces for each layer, so in addition to using the wide variety of APIs, protocols, and adapter device drivers that ship with Windows, third parties can extend the operating system’s networking capabilities by developing their own. In this chapter, we take you from the top of the Windows networking stack to the bottom. First, we present the mapping between the Windows networking software components and the Open Systems Interconnection (OSI) reference model. Then we briefly describe the networking APIs available on Windows and explain how they are implemented. You’ll learn how multiple redirector support and name resolution work and how protocol drivers are implemented. After looking at the implementation of network adapter device drivers, we examine binding, which is the glue that connects services, protocol stacks, and network adapters. 12.1 Windows Networking Architecture The goal of network software is to take a request (usually an I/O request) from an application on one machine, pass it to another machine, execute the request on the remote machine, and return the results to the first machine. In the course of this process, the request must be transformed several times. A high-level request, such as “read x number of bytes from file y on machine z,” requires software that can determine how to get to machine z and what communication software that machine understands. Then the request must be altered for transmission across a network—for example, divided into short packets of information. When the request reaches the other side, it must be checked for completeness, decoded, and sent to the correct operating system component for execution. Finally, the reply must be encoded for sending back across the network. 12.1.1 The OSI Reference Model To help different computer manufacturers standardize and integrate their networking software, in 1984 the International Organization for Standardization (ISO) defined a software model for sending messages between machines. The result was the Open Systems Interconnection (OSI) reference model. The model defines six layers of software and one physical layer of hardware, as shown in Figure 12-1. 905

The OSI reference model is an idealized scheme that few systems implement precisely, but it’s often used to frame discussions of networking principles. Each layer on one machine assumes that it is “talking to” the same layer on the other machine. Both machines “speak” the same language, or protocol, at the same level. In reality, however, a network transmission must pass down each layer on the client machine, be transmitted across the network, and then pass up the layers on the destination machine until it reaches a layer that can understand and implement the request. The purpose of each layer in the OSI model is to provide services to higher layers and to abstract how the services are implemented at lower layers. Detailing the purpose of each layer is beyond the scope of this book, but here are some brief descriptions of the various layers: ■ Application layer Handles information transfer between two network applications, including functions such as security checks, identification of the participating machines, and initiation of the data exchange. ■ Presentation layer Handles data formatting, including issues such as whether lines end in a carriage return/line feed (CR/LF) or just a carriage return (CR), whether data is to be compressed or encrypted, and so forth. ■ Session layer Manages the connection between cooperating applications, including high-level synchronization and monitoring of which application is “talking” and which is “listening.” ■ Transport layer On the client, this layer divides messages into packets and assigns them sequence numbers to ensure that they are all received in the proper order. On the destination, it assembles packets that have been received. It also shields the session layer from the effects of changes in hardware. ■ Network layer Creates packet headers and handles routing, congestion control, and internetworking. It is the highest layer that understands the network’s topology—that is, the physical configuration of the machines in the network, any limitations in bandwidth, and so on. ■ Data-link layer Transmits low-level data frames, waits for acknowledgment that they were received, and retransmits frames that were lost over unreliable lines as a result of collisions. ■ Physical layer Passes bits to the network cable or other physical transmission medium. The gray lines in Figure 12-1 represent protocols used in transmitting a request to a remote machine. As stated earlier, each layer of the hierarchy assumes that it is speaking to the same layer 906

on another machine and uses a common protocol. The collection of protocols through which a request passes on its way down and back up the layers of the network is called a protocol stack. 12.1.2 Windows Networking Components Figure 12-2 provides an overview of the components of Windows networking, showing how each component fits into the OSI reference model and which protocols are used between layers. The mapping between OSI layers and networking components isn’t precise, which is the reason that some components cross layers. The various components include the following: ■ Networking APIs provide a protocol-independent way for applications to communicate across a network. Networking APIs can be implemented in user mode or in both user mode and kernel mode, and in some cases they are wrappers around another networking API that implements a specific programming model or provides additional services. (Note that the term networking API also describes any programming interfaces provided by networking-related software.) ■ Transport Driver Interface (TDI) clients are legacy kernel-mode device drivers that usually implement the kernel-mode portion of a networking API’s implementation. TDI clients get their name from the fact that the I/O request packets (IRPs) they send to protocol drivers are formatted according to the Windows Transport Driver Interface standard (documented in the Windows Driver Kit). This standard specifies a common programming interface for kernel-mode device drivers. (See Chapter 7 for more information about IRPs.) ■ TDI transports, also known as transports, Network Driver Interface Specification (NDIS) protocol drivers, and protocol drivers, are kernel-mode protocol drivers. They accept IRPs from TDI clients and process the requests these IRPs represent. This processing might require network communications with a peer, prompting the TDI transport to add protocol-specific headers (for example, TCP, UDP, and/or IP) to data passed in the IRP and to communicate with adapter drivers using NDIS functions (also documented in the Windows Driver Kit). TDI transports generally facilitate application network communications by transparently performing message operations such as segmentation and reassembly, sequencing, acknowledgment, and retransmission. ■ Winsock Kernel (WSK) is a transport-independent, kernel-mode networking API that replaces the legacy TDI mechanism. WSK provides network communication by using socketlike programming semantics similar to user-mode Winsock, while also providing unique features such as asynchronous I/O operations built on IRPs and event callbacks. WSK also natively supports IP version 6 (IPv6) functionality in the Next Generation TCP/IP network stack in Windows. ■ The Windows Filtering Platform (WFP) is a set of APIs and system services that provide the ability to create network filtering applications. The WFP allows applications to interact with packet processing at different levels of the Windows networking stack, much like file system filters. Similarly, network data can be traced, filtered, and also modified before it reaches its destination. 907

■ WFP callout drivers are kernel-mode drivers that implement one or more callouts, which extend the capabilities of the WFP by processing TCP/IP-based network data in ways that extend the basic functionality provided by the WFP. ■ The NDIS library (Ndis.sys) provides encapsulation for adapter drivers, hiding from them specifics of the Windows kernel-mode environment. The NDIS library exports functions for use by TDI transports as well as support functions for adapter drivers. ■ NDIS miniport drivers are kernel-mode drivers that are responsible for interfacing TDI transports to particular network adapters. NDIS miniport drivers are written so that they are wrapped by the Windows NDIS library. NDIS miniport drivers don’t process IRPs; rather, they register a call-table interface to the NDIS library that contains pointers to functions corresponding to ones that the NDIS library exports to TDI transports. NDIS miniport drivers communicate with network adapters by using NDIS library functions that resolve to hardware abstraction layer (HAL) functions. As Figure 12-2 shows, the OSI layers don’t correspond to actual software. WSK transport providers, for example, frequently cross several boundaries. In fact, the bottom three layers of software and the hardware layer are often referred to collectively as “the transport.” Software components residing in the upper three layers are referred to as “users of the transport.” 908

In the remainder of this chapter, we’ll examine the networking components shown in Figure 12-2 (as well as others not shown in the figure), looking at how they fit together and how they relate to Windows as a whole. 12.2 Networking APIs Windows implements multiple networking APIs to provide support for legacy applications and compatibility with industry standards. In this section, we’ll briefly look at the networking APIs and describe how applications use them. It’s important to keep in mind that the decision about which API an application uses depends on characteristics of the API, such as which protocols the API can layer over, whether the API supports reliable or bidirectional communication, and the API’s portability to other Windows platforms the application might run on. We’ll discuss the following networking APIs: ■ Windows Sockets (Winsock) ■ Winsock Kernel ■ Remote procedure call (RPC) 909

■ Web access APIs ■ Named pipes and mailslots ■ NetBIOS ■ Other networking APIs 12.2.1 Windows Sockets The original Windows Sockets (Winsock) (version 1.0) was Microsoft’s implementation of BSD (Berkeley Software Distribution) Sockets, a programming API that became the standard by which UNIX systems have communicated over the Internet since the 1980s. Support for sockets on Windows makes the task of porting UNIX networking applications to Windows relatively straightforward. The modern versions of Winsock include most of the functionality of BSD Sockets but also include Microsoft-specific enhancements, which continue to evolve. Winsock supports reliable-connection-oriented communication as well as unreliableconnectionless communication. Windows provides Winsock 2.2, which adds numerous features beyond the BSD Sockets specification, such as functions that take advantage of Windows asynchronous I/O to offer far better performance and scalability than straight BSD Socket programming. Winsock includes the following features: ■ Support for scatter-gather and asynchronous application I/O. ■ Quality of service (QoS) conventions so that applications can negotiate latency and bandwidth requirements when the underlying network supports QoS. ■ Extensibility so that Winsock can be used with protocols other than those Windows requires it to support. ■ Support for integrated namespaces other than those defined by a protocol that an application is using with Winsock. A server can publish its name in Active Directory, for example, and by using namespace extensions, a client can look up the server’s address in Active Directory. ■ Support for multipoint messages where messages transmit from a single source to multiple receivers simultaneously. We’ll examine typical Winsock operation and then describe ways that Winsock can be extended. Winsock Client Operation The first step a Winsock application takes is to initialize the Winsock API with a call to an initialization function. A Winsock application’s next step is to create a socket that will represent a communications endpoint. The application obtains the address of the server to which it wants to connect by calling getaddrinfo (and later calling freeaddrinfo to release the information). The getaddrinfo function returns the list of addresses assigned to the server, and the client attempts to connect to each one in turn until it is able to establish a connection with one of them. 910

This ensures that a client that supports both IP version 4 (IPv4) and IPv6 will connect to the appropriate and/or most efficient address on a server that might have both IPv4 and IPv6 addresses assigned to it. Winsock is a protocol-independent API, so an address can be specified for any protocol installed on the system over which Winsock operates (TCP/IP or TCP/IP with IPv6). After obtaining the server address, a connection-oriented client attempts to connect to the server by using connect and specifying the server address. When a connection is established, the client can send and receive data over its socket using recv and send, for example. A connectionless client specifies the remote address with connectionless APIs, such as the connectionless equivalents of send and recv, and sendto and recvfrom. Clients can also use the select and WSAPoll APIs to wait on or poll multiple sockets for synchronous I/O operations or to check their state. Winsock Server Operation The sequence of steps for a server application differs from that of a client. After initializing the Winsock API, the server creates a socket and then binds it to a local address by using bind. Again, the address type specified—whether TCP/IP, TCP/IP with IPv6, or some other address type—is up to the server application. If the server is connection oriented, it performs a listen operation on the socket, indicating the backlog, or the number of connections the server asks Winsock to hold until the server is able to accept them. Then it performs an accept operation to allow a client to connect to the socket. If there is a pending connection request, the accept call completes immediately; otherwise, it completes when a connection request arrives. When a connection is made, the accept function returns a new socket that represents the server’s end of the connection. The server can perform receive and send operations by using functions such as recv and send. Like Winsock clients, servers can use the select and WSAPoll functions to query the state of one or more sockets; however, the Winsock WSAEventSelect function and overlapped I/O extensions are preferable for higher scalability. Figure 12-3 shows connection-oriented communication between a Winsock client and server. After binding an address, a connectionless server is no different from a connectionless client: it can send and receive data over the socket simply by specifying the remote address with each operation. Most connectionless protocols are unreliable and in general will not know whether the destination actually received the sent data packets, known as datagrams. Datagram protocols are ideal for quick message passing, where the overhead of establishing a connection is too much and reliability is not required (although an application can build reliability on top of the protocol). Winsock Extensions 911

In addition to supporting functions that correspond directly to those implemented in BSD Sockets, Microsoft has added a handful of functions that aren’t part of the Winsock standard. Two of these functions, AcceptEx and TransmitFile, are worth describing because many Web servers on Windows use them to achieve high performance. AcceptEx is a version of the accept function that, in the process of establishing a connection with a client, returns the client’s address and the client’s first message. AcceptEx allows the server application to prepost multiple accept operations so that high volumes of incoming connection requests can be handled. With this function, a Web server avoids executing multiple Winsock functions that would otherwise be required. After establishing a connection with a client, a Web server sometimes sends a file, such as a Web page, to the client. The TransmitFile function’s implementation is integrated with the Windows cache manager so that a client can send a file directly from the file system cache. Sending data in this way is called zero-copy because the server doesn’t have to touch the file data to send it; it simply specifies a handle to a file and the ranges of the file to send. In addition, TransmitFile allows a server to prepend or append data to the file’s data so that the server can send header information, which might include the name of the Web server and a field that indicates to the client the size of the message the server is sending. Internet Information Services (IIS), which is included with Windows, uses both AcceptEx and TransmitFile. Windows also supports a handful of other, multifunction APIs, including ConnectEx, Discon nectEx, and TransmitPackets. ConnectEx establishes a connection and sends the first message on the connection. DisconnectEx closes a connection and allows the socket handle representing the connection to be reused in a call to AcceptEx or ConnectEx. Finally, TransmitPackets is similar to TransmitFile, except that it allows for the sending of in-memory data in addition to or in lieu of file data. Finally, by using the WSAImpersonateSocketPeer and WSARevertImpersonation functions, Winsock servers can perform impersonation (described in Chapter 6) to perform authorization or to gain access to resources based on the client’s security credentials. Extending Winsock Winsock is an extensible API on Windows because third parties can add a transport service provider that interfaces Winsock with other protocols or layers on top of existing protocols to provide functionality such as proxying. Third parties can also add a namespace service provider to augment Winsock’s name-resolution facilities. Service providers plug in to Winsock by using the Winsock service provider interface (SPI). When a transport service provider is registered with Winsock, Winsock uses the transport service provider to implement socket functions, such as connect and accept, for the address types that the provider indicates it implements. There are no restrictions on how the transport service provider implements the functions, but the implementation usually involves communicating with a transport driver in kernel mode. A requirement of any Winsock client/server application is for the server to make its address available to clients so that the clients can connect to the server. Standard services that execute on the TCP/IP protocol use “well-known addresses” to make their addresses available. As long as a browser knows the name of the computer a Web server is running on, it can connect to the Web server by specifying the well-known Web server address (the IP address of 912

the server concatenated with :80, the port number used for HTTP). Namespace service providers make it possible for servers to register their presence in other ways. For example, one namespace service provider might on the server side register the server’s address in Active Directory, and on the client side look up the server’s address in Active Directory. Namespace service providers supply this functionality to Winsock by implementing standard Winsock name-resolution functions such as getaddrinfo and getnameinfo. eXPeriMeNT: Looking at Winsock Service Providers The Net Shell (Netsh.exe) utility included with Windows is able to show the registered Winsock transport and namespace providers by using the netsh winsock show catalog command. For example, if there are two TCP/IP transport service providers, the first one listed is the default provider for Winsock applications using the TCP/IP protocol. Here’s sample output from Netsh showing the registered transport service providers: 1. C:\\Users\\Administrator>netsh winsock show catalog 2. Winsock Catalog Provider Entry 3. ------------------------------------------------------ 4. Entry Type: Base Service Provider 5. Description: MSAFD Tcpip [TCP/IP] 6. Provider ID: {E70F1AA0-AB8B-11CF-8CA3-00805F48A192} 7. Provider Path: %SystemRoot%\\system32\\mswsock.dll 8. Catalog Entry ID: 1001 9. Version: 2 10. Address Family: 2 11. Max Address Length: 16 12. Min Address Length: 16 13. Socket Type: 1 14. Protocol: 6 15. Protocol Chain Length: 1 16. Winsock Catalog Provider Entry 17. ------------------------------------------------------ 18. Entry Type: Base Service Provider 19. Description: MSAFD Tcpip [UDP/IP] 20. Provider ID: {E70F1AA0-AB8B-11CF-8CA3-00805F48A192} 21. Provider Path: %SystemRoot%\\system32\\mswsock.dll 22. Catalog Entry ID: 1002 23. Version: 2 24. Address Family: 2 25. Max Address Length: 16 26. Min Address Length: 16 27. Socket Type: 2 28. Protocol: 17 29. Protocol Chain Length: 1 30. Winsock Catalog Provider Entry 31. ------------------------------------------------------ 913

32. Entry Type: Base Service Provider 33. Description: MSAFD Tcpip [RAW/IP] 34. Provider ID: {E70F1AA0-AB8B-11CF-8CA3-00805F48A192} 35. Provider Path: %SystemRoot%\\system32\\mswsock.dll 36. Catalog Entry ID: 1003 37. Version: 2 38. Address Family: 2 39. Max Address Length: 16 40. Min Address Length: 16 41. Socket Type: 3 42. Protocol: 0 43. Protocol Chain Length: 1 You can also use the Autoruns utility from Windows Sysinternals (www.microsoft.com/technet/sysinternals) to view namespace and transport providers, as well as to disable those that might be causing problems or unwanted behavior on the system. Winsock Implementation Winsock’s implementation is shown in Figure 12-4. Its application interface consists of an API DLL, Ws2_32.dll (\\%SystemRoot%\\System32\\Ws2_32.dll), which provides applications access to Winsock functions. Ws2_32.dll calls on the services of namespace and transport service providers to carry out name and message operations. The Mswsock.dll library acts as a transport service provider for the protocols Microsoft provides support for in Winsock and uses Winsock Helper libraries that are protocol specific to communicate with kernel-mode protocol drivers. For example, Wshtcpip.dll is the TCP/IP helper. Mswsock.dll (\\%SystemRoot%\\System32 \\Mswsock.dll) implements the Microsoft Winsock extension functions, such as TransmitFile, AcceptEx, and WSARecvEx. Windows ships with helper DLLs for TCP/IP, TCP/IP with IPv6, Bluetooth, NetBIOS, IrDA (Infrared Data Association), and PGM (Pragmatic General Multicast). It also includes namespace service providers for DNS (TCP/IP), Active Directory (LDAP), and NLA (Network Location Awareness). 914

Like the named pipe and mailslot APIs (described later in this chapter), Winsock integrates with the Windows I/O model and uses file handles to represent sockets. This support requires the aid of a kernel-mode file system driver, so Msafd.dll uses the services of the Ancillary Function Driver (AFD—\\%SystemRoot%\\System32\\Drivers\\Afd.sys) to implement socket-based functions. AFD is a Transport Layer Network Provider Interface (TLNPI) client and executes network socket operations, such as sending and receiving messages, by sending TDI IRPs to protocol drivers. AFD isn’t coded to use particular protocol drivers; instead, Msafd.dll informs AFD of the name of the protocol used for each socket so that AFD can open the device object representing the protocol. 12.2.2 Winsock Kernel (WSK) To enable kernel-mode drivers and modules to have access to networking API interfaces similar to those available to user-mode applications, Windows implements a socket-based networking programming interface called Winsock Kernel (WSK). WSK replaces the legacy TDI API interface present on older versions of Windows but maintains the TDI SPI interface for transport providers. Compared to TDI, WSK provides better performance and scalability and much easier programming, because it relies less on internal kernel behavior and more on socket-based semantics. Additionally, WSK was written to take full advantage of the latest technologies in the Windows TCP/IP stack, which TDI was not originally anticipated to support. As shown in Figure 12-5, WSK makes use of the Network Module Registrar (NMR) component of Windows to attach to and detach from transport devices, and it can be used, just like Winsock, to support many different kinds of network clients—for example, the Http.sys driver for the HTTP Server API (mentioned later in the chapter) is a WSK client. 915

WSK Implementation WSK’s implementation is shown in Figure 12-6. At its core is the WSK subsystem itself, which uses the Next Generation TCP/IP Stack (%SystemRoot%\\System32\\Drivers\\Netio.sys) but is actually implemented in AFD. The subsystem is responsible for the provider side of the WSK API. The subsystem interfaces with transport providers (shown at the bottom of Figure 12-5) to provide support for various transport protocols. Attached to the WSK subsystem are WSK applications, which are kernel-mode drivers that implement the client-side WSK API in order to perform network operations. The WSK subsystem calls WSK applications to notify them of asynchronous events. WSK applications are bound to the WSK subsystem through the NMR or through the WSK’s registration functions, which allow WSK applications to dynamically detect when the WSK subsystem becomes available and then load their own dispatch table to describe the provider and client-side implementations of the WSK API. These implementations provide the standard WSK socket-based functions, such as WskSocket, WskAccept, WskBind, WskConnect, WskReceive, and WskSend, which have similar semantics (but not necessarily similar parameters) as their user-mode Winsock counterparts. However, unlike user-mode Winsock, the WSK subsystem defines four different kinds of socket categories, which identify which functions and events are available: ■ Basic sockets, which are used only to get and set information on the transport. They cannot be used to send or receive data or be bound to an address. ■ Listening sockets, which are used for sockets that accept only incoming connections. ■ Datagram sockets, which are used solely for sending and receiving datagrams. ■ Connection-oriented sockets, which support all the functionality required to send and receive network traffic over an established connection. 916

Apart from the socket functions described, WSK also provides events through which applications are notified of network status. Unlike the model for socket functions, in which a client controls the connection, events allow the subsystem to control the connection and merely notify the client. These include the WskAcceptEvent, WskInspectEvent, WskAbortEvent, WskReceiveFromEvent, WskReceiveEvent, WskDisconnectEvent, and WskSendBacklogEvent routines. Finally, like user-mode Winsock, WSK can be extended through extension interfaces that applications can associate with sockets. These extensions can enhance the default functionality provided by the WSK subsystem. 12.2.3 Remote Procedure Call Remote procedure call (RPC) is a network programming standard originally developed in the early 1980s. The Open Software Foundation (now The Open Group) made RPC part of the distributed computing environment (DCE) distributed computing standard. Although there is a second RPC standard, SunRPC, the Microsoft RPC implementation is compatible with the OSF/DCE standard. RPC builds on other networking APIs, such as named pipes or Winsock, to provide an alternate programming model that in some sense hides the details of networking programming from an application developer. RPC Operation An RPC facility is one that allows a programmer to create an application consisting of any number of procedures, some that execute locally and others that execute on remote computers via a network. It provides a procedural view of networked operations rather than a transport-centered view, thus simplifying the development of distributed applications. Networking software is traditionally structured around an I/O model of processing. In Windows, for example, a network operation is initiated when an application issues a remote I/O request. The operating system processes the request accordingly by forwarding it to a redirector, which acts as a remote file system by making the client interaction with the remote file system invisible to the client. The redirector passes the operation to the remote file system, and after the remote system fills the request and returns the results, the local network card interrupts. The kernel handles the interrupt, and the original I/O operation completes, returning results to the caller. RPC takes a different approach altogether. RPC applications are like other structured applications, with a main program that calls procedures or procedure libraries to perform specific tasks. The difference between RPC applications and regular applications is that some of the procedure libraries in an RPC application execute on remote computers, as shown in Figure 12-7, whereas others execute locally. To the RPC application, all the procedures appear to execute locally. In other words, instead of making a programmer actively write code to transmit computational or I/O-related requests across a network, handle network protocols, deal with network errors, wait for results, and so forth, RPC software handles these tasks automatically. And the Windows RPC facility can operate over any available transports loaded into the system. 917

To write an RPC application, the programmer decides which procedures will execute locally and which will execute remotely. For example, suppose an ordinary workstation has a network connection to a Cray supercomputer or to a machine designed specifically for highspeed vector operations. If the programmer were writing an application that manipulated large matrices, it would make sense from a performance point of view to off-load the mathematical calculations to the remote computer by writing the program as an RPC application. RPC applications work like this: As an application runs, it calls local procedures as well as procedures that aren’t present on the local machine. To handle the latter case, the application is linked to a local static-link library or DLL that contains stub procedures, one for each remote procedure. For simple applications, the stub procedures are statically linked with the application, but for bigger components the stubs are included in separate DLLs. In DCOM, covered later in the chapter, the latter method is typically used. The stub procedures have the same name and use the same interface as the remote procedures, but instead of performing the required operations, the stub takes the parameters passed to it and marshals them for transmission across the network. Marshaling parameters means ordering and packaging them in a particular way to suit a network link, such as resolving references and picking up a copy of any data structures that a pointer refers to. The stub then calls RPC run-time procedures that locate the computer where the remote procedure resides, determine which transport mechanisms that computer uses, and send the request to it using local transport software. When the remote server receives the RPC request, it unmarshals the parameters (the reverse of marshaling them), reconstructs the original procedure call, and calls the procedure. When the server finishes, it performs the reverse sequence to return results to the caller. In addition to the synchronous function-call-based interface described here, Windows RPC also supports asynchronous RPC. Asynchronous RPC lets an RPC application execute a function but not wait until the function completes to continue processing. Instead, the application can execute other code and later, when a response has arrived from the server, the RPC runtime notifies the client that the operation has completed. The RPC runtime uses the notification mechanism requested by the client. If the client uses an event synchronization object for notification, it waits for the signaling of the event object by calling the WaitForSingleObject or WaitForMultipleObject function. If the client provides an asynchronous procedure call (APC), the runtime queues the execution of the APC to the thread that executed the RPC function. If the client program uses an I/O completion port as its notification mechanism, it must call 918

GetQueuedCompletionStatus to learn of the function’s completion. Alternatively, a client can poll for completion by calling RpcAsyncGetCallStatus. In addition to the RPC runtime, Microsoft’s RPC facility includes a compiler, called the Microsoft Interface Definition Language (MIDL) compiler. The MIDL compiler simplifies the creation of an RPC application. The programmer writes a series of ordinary function prototypes (assuming a C or C++ application) that describe the remote routines and then places the routines in a file. The programmer then adds some additional information to these prototypes, such as a network-unique identifier for the package of routines and a version number, plus attributes that specify whether the parameters are input, output, or both. The embellished prototypes form the developer’s Interface Definition Language (IDL) file. Once the IDL file is created, the programmer compiles it with the MIDL compiler, which produces both client-side and server-side stub routines, mentioned previously, as well as header files to be included in the application. When the client-side application is linked to the stub routines file, all remote procedure references are resolved. The remote procedures are then installed, using a similar process, on the server machine. A programmer who wants to call an existing RPC application need only write the client side of the software and link the application to the local RPC run-time facility. The RPC runtime uses a generic RPC transport provider interface to talk to a transport protocol. The provider interface acts as a thin layer between the RPC facility and the transport, mapping RPC operations onto the functions provided by the transport. The Windows RPC facility implements transport provider DLLs for named pipes, HTTP, TCP/IP, and UDP. In a similar fashion, the RPC facility is designed to work with different network security facilities. Most of the Windows networking services are RPC applications, which means that both local processes and processes on remote computers can call them. Thus, a remote client computer can call the server service to list shares, open files, write to print queues, or activate users on your server, all subject to security constraints, of course. Server name publishing, which is the ability of a server to register its name in a location accessible for client lookup, is in RPC and is integrated with Active Directory. If Active Directory isn’t installed, the RPC name locator services fall back on NetBIOS broadcast. This behavior allows RPC to function on stand-alone servers and workstations. RPC Security Windows RPC includes integration with security support providers (SSPs) so that RPC clients and servers can use authenticated or encrypted communications. When an RPC server wants secure communication, it tells the RPC runtime what authentication service to add to the list of available authentication services. When a client wants to use secure communication, it binds to the server. At that time, it must tell the RPC runtime the authentication service and authentication level it wants. Various authentication levels exist to ensure that only authorized clients connect to a server, verify that each message a server receives originates at an authorized client, check the integrity of RPC messages to detect manipulation, and even encrypt RPC message data. Obviously, higher authentication levels require more processing. The client can also optionally specify the 919

server principal name. A principal is an entity that the RPC security system recognizes. The server must register its SSP-specific principal name with an SSP. An SSP handles the details of performing network communication authentication and encryption, not only for RPC but also for Winsock. Windows includes a number of built-in SSPs, including a Kerberos SSP to implement Kerberos version 5 authentication (including AES support) and Secure Channel (SChannel), which implements Secure Sockets Layer (SSL) and the Transport Layer Security (TLS) protocols. SChannel also supports TLS and SSL extensions, which allow you to use the AES cipher as well as elliptic curve cryptographic (ECC) ciphers on top of the protocols. Also, because it supports an open cryptographic interface (OCI) and crypto-agile capabilities, SChannel allows government organizations to substitute the existing ciphers with more complex combinations. In the absence of a specified SSP, RPC software uses the built-in security of the underlying transport. Some transports, such as named pipes or local RPC, have built-in security. Others, like TCP, do not, and in this case RPC makes unsecure calls in the absence of a specified SSP. Another feature of RPC security is the ability of a server to impersonate the security identity of a client with the RpcImpersonateClient function. After a server has finished performing impersonated operations on behalf of a client, it returns to its own security identity by calling RpcRevertToSelf or RpcRevertToSelfEx. (See Chapter 6 for more information on impersonation.) RPC Implementation RPC implementation is depicted in Figure 12-8, which shows that an RPC-based application links with the RPC run-time DLL (\\%SystemRoot%\\System32\\Rpcrt4.dll). The RPC run-time DLL provides marshaling and unmarshaling functions for use by an application’s RPC function stubs as well as functions for sending and receiving marshaled data. The RPC run-time DLL includes support routines to handle RPC over a network as well as a form of RPC called local RPC. Local RPC can be used for communication between two processes located on the same system, and the RPC run-time DLL uses the advanced local procedure call (ALPC) facilities in kernel mode as the local networking API. (See Chapter 3 for more information on ALPCs.) When RPC is based on nonlocal communication mechanisms, the RPC run-time DLL uses the Winsock or named pipe APIs. 920

The RPC subsystem (RPCSS—\\%SystemRoot%\\System32\\Rpcss.dll) is implemented as a Windows service. RPCSS is itself an RPC application that communicates with instances of itself on other systems to perform name lookup, registration, and dynamic endpoint mapping. (For clarity, Figure 12-8 doesn’t show RPCSS linked with the RPC run-time DLL.) Windows also includes support for RPC in kernel mode through the kernel-mode RPC driver (%SystemRoot%\\System32\\Drivers\\Msrpc.sys). Kernel-mode RPC is for internal use by the system and is implemented on top of ALPC. Winlogon includes an RPC server with a documented set of interfaces that user-mode RPC clients can access, while Win32k.sys includes an RPC client that communicates with Winlogon for internal notifications, such as the secure attention sequence (SAS; see Chapter 6 for more information). The TCP/IP stack in Windows (as well as the WFP) also uses kernel-mode RPC to communicate with the Network Store Interface (NSI) service, which handles network routing information. 12.2.4 Web Access APIs To ease the development of Internet applications, Windows provides both client and server Internet APIs. By using the APIs, applications can provide and use FTP and HTTP services without knowledge of the intricacies of the corresponding protocols. The client APIs include Windows Internet, also known as WinInet, which enables applications to interact with the FTP and HTTP protocols, and WinHTTP, which enables applications to interact with the HTTP protocol and is more suitable than WinInet in certain situations. HTTP Server is a server-side API that enables the development of Web server applications. WinInet WinInet supports the FTP and HTTP 1.0 and 1.1 protocols. The APIs break down into sub-API sets specific to each protocol. Using the FTP-related APIs—such as InternetConnect to connect to an FTP server, FtpFindFirstFile and FtpFindNextFile to enumerate the contents of an FTP directory, and FtpGetFile and FtpPutFile to receive and send files—an application developer avoids the details of establishing a connection and formatting TCP/IP messages to the FTP protocol. The HTTP-related APIs provide similar levels of abstraction, providing cookie persistence, automatic dial-up services, client-side file caching, and automatic credential dialog handling. WinInet is used by core Windows components such as Windows Explorer and Internet Explorer. WinHTTP The current version of the WinHTTP API, 6.0, is available on Windows Vista and Windows Server 2008. It provides an abstraction of the HTTP/1.1 protocol for HTTP client applications similar to what the WinInet HTTP-related APIs provide. However, whereas the WinInet HTTP API is intended for user-interactive client-side applications, the WinHTTP API is designed for server applications that communicate with HTTP servers. Server applications are often implemented as Windows services that do not provide a user interface and so do not desire the dialog boxes that WinInet APIs display. In addition, the WinHTTP APIs are more scalable (such as supporting uploads of greater than 4 GB) and offer security functionality, such as thread 921

impersonation, not available from the WinInet API. WinHTTP also provides support for chunked transfer encoding, issuer list retrieval for SSL authentication, client certificate requests, and port range reservations. HTTP Using the HTTP Server API, which Windows Vista and Windows Server 2008 implement, server applications can register to receive HTTP requests for particular URLs, receive HTTP requests, and send HTTP responses. The HTTP Server API includes SSL support so that applications can exchange data over secure HTTP connections. The API includes server-side caching capabilities, synchronous and asynchronous I/O models, and both IPv4 and IPv6 addressing. IIS version 7, the version that ships with Windows Server 2008, uses the HTTP Server API. The HTTP Server API, which applications access through the Httpapi.dll library, relies on the kernel-mode Http.sys driver. Http.sys starts on demand the first time any application on the system calls HttpInitialize. Applications then call HttpCreateServerSession to initialize a server session for the HTTP Server API. Next they use HttpCreateRequestQueue to create a private request queue and HttpCreateUrlGroup to create a URL group, specifying the URLs that they want to handle requests for with HttpAddUrlToUrlGroup. Using the request queues and their registered URLs (which they associate by using HttpSetUrlGroupProperty), Http.sys allows more than one application to service HTTP requests on a given port (port 80 for example), with each servicing HTTP requests to different parts of the URL namespace, as shown in Figure 12-9. HttpReceiveHttpRequest receives incoming requests directed at registered URLs, and HttpSendHttpResponse sends HTTP responses. Both functions offer asynchronous operation so that an application can use GetOverlappedResult or I/O completion ports to determine when an operation is completed. Applications can use Http.sys to cache data in nonpaged physical memory by calling HttpAddFragmentToCache and associating a fragment name (specified as a URL prefix) with the 922

cached data. Http.sys invokes the memory manager function MmAllocatePagesForMdlEx to allocate unmapped physical pages (for large requests, Http.sys also attempts to use large pages to optimize access to the buffered data). When Http.sys requires a virtual address mapping for the physical memory described by an entry in the cache—for instance, when it copies data to the cache or sends data from the cache—it uses MmMap Locked Pages-SpecifyCache and then MmUnmapLockedPages after it completes its access. Http.sys maintains cached data until an application invalidates it or an optional application-specified timeout associated with the data expires. Http.sys also trims cached data in a worker thread that wakes up when the low memory notification event is signaled. (See Chapter 9 for information on the low memory notification event.) When an application specifies one or more fragment names in a call to HttpSendHttpResponse, Http.sys passes a pointer to the cached data in physical memory to the TCP/IP driver and avoids a copy operation. Http.sys also contains code for performing server-side authentication, including full SSL support, which removes the need to call back to the user-mode API to perform encryption and decryption of traffic. Finally, the HTTP Server API contains many configuration options that clients can use to set functionality, such as authentication policies, bandwidth throttling, logging, connection limits, server state, response caching, and SSL certificate binding. 12.2.5 Named Pipes and Mailslots Named pipes and mailslots are programming APIs for interprocess communication. Named pipes provide for reliable bidirectional communications, whereas mailslots provide unreliable unidirectional data transmission. An advantage of mailslots is that they support broadcast capability. In Windows, both APIs take advantage of Windows security, which allows a server to control precisely which clients can connect to it. The names that servers assign to named pipes and clients conform to the Windows Universal Naming Convention (UNC), which is a protocol-independent way to identify resources on a Windows network. The implementation of UNC names is described later in the chapter. Named Pipe Operation Named pipe communication consists of a named pipe server and a named pipe client. A named pipe server is an application that creates a named pipe to which clients can connect. A named pipe’s name has the format \\\\Server\\Pipe\\PipeName. The Server component of the name specifies the computer on which the named pipe server is executing. (A named pipe server can’t create a named pipe on a remote system.) The name can be a DNS name (for example, mspress.microsoft.com), a NetBIOS name (mspress), or an IP address (131.107.0.1). The Pipe component of the name must be the string “Pipe”, and PipeName is the unique name assigned to a named pipe. The unique portion of the named pipe’s name can include subdirectories; an example of a named pipe name with a subdirectory is \\\\MyComputer\\Pipe\\MyServerApp\\ConnectionPipe. A named pipe server uses the CreateNamedPipe Windows function to create a named pipe. One of the function’s input parameters is a pointer to the named pipe name, in the form \\\\.\\Pipe\\PipeName. The “\\\\.\\” is a Windows-defined alias for “this computer.” Other parameters 923

the function accepts include an optional security descriptor that protects access to the named pipe, a flag that specifies whether the pipe should be bidirectional or unidirectional, a value indicating the maximum number of simultaneous connections the pipe supports, and a flag specifying whether the pipe should operate in byte mode or message mode. Most networking APIs operate only in byte mode, which means that a message sent with one send function might require the receiver to perform multiple receive operations, building up the complete message from fragments. A named pipe operating in message mode simplifies the implementation of a receiver because there is a one-to-one correspondence between send and receive requests. A receiver therefore obtains an entire message each time it completes a receive operation and doesn’t have to concern itself with keeping track of message fragments. The first call to CreateNamedPipe for a particular name creates the first instance of that name and establishes the behavior of all named pipe instances having that name. A server creates additional instances, up to the maximum specified in the first call, with additional calls to CreateNamedPipe. After creating at least one named pipe instance, a server executes the ConnectNamedPipe Windows function, which enables the named pipe the server created to establish connections with clients. ConnectNamedPipe can be executed synchronously or asynchronously, and it doesn’t complete until a client establishes a connection with the instance (or an error occurs). A named pipe client uses the Windows CreateFile or CallNamedPipe function, specifying the name of the pipe a server has created, to connect to a server. If the server has performed a ConnectNamedPipe call, the client’s security profile and the access it requests to the pipe (read, write) are validated against the named pipe’s security descriptor. (See Chapter 6 for more information on the security-check algorithms Windows uses.) If the client is granted access to a named pipe, it receives a handle representing the client side of a named pipe connection and the server’s call to ConnectNamedPipe completes. After a named pipe connection is established, the client and server can use the ReadFile and WriteFile Windows functions to read from and write to the pipe. Named pipes support both synchronous and asynchronous operations for message transmittal. Figure 12-10 shows a server and client communicating through a named pipe instance. Another characteristic of the named pipe networking API is that it allows a server to impersonate a client by using the ImpersonateNamedPipeClient function. See the “Impersonation” section in Chapter 6 for a discussion of how impersonation is used in client/server applications. A second advanced area of functionality of the named pipe API is that it allows for atomic send and receive operations through the TransactNamedPipe API, which behaves according to a simple transactional model in which a message is both sent and received in the same operation. Mailslot Operation 924

Mailslots provide an unreliable unidirectional broadcast mechanism. One example of an application that can use this type of communication is a time-synchronization service, which might broadcast a source time across the domain every few seconds. Receiving the sourcetime message isn’t crucial for every computer on the network and is therefore a good candidate for the use of mailslots. Like named pipes, mailslots are integrated with the Windows API. A mailslot server creates a mailslot by using the CreateMailslot function. CreateMailslot accepts a name of the form “\\\\.\\Mailslot\\MailslotName” as an input parameter. Again like named pipes, a mailslot server can create mailslots only on the machine it’s executing on, and the name it assigns to a mailslot can include subdirectories. CreateMailslot also takes a security descriptor that controls client access to the mailslot. The handles returned by CreateMailslot are overlapped, which means that operations performed on the handles, such as sending and receiving messages, are asynchronous. Because mailslots are unidirectional and unreliable, CreateMailslot doesn’t take many of the parameters that CreateNamedPipe does. After it creates a mailslot, a server simply listens for incoming client messages by executing the ReadFile function on the handle representing the mailslot. Mailslot clients use a naming format similar to that used by named pipe clients but with variations that make it possible to broadcast messages to all the mailslots of a given name within the client’s domain or a specified domain. To send a message to a particular instance of a mailslot, the client calls CreateFile, specifying the computer-specific name. An example of such a name is “\\\\Server\\Mailslot\\MailslotName”. (The client can specify “\\\\.\\” to represent the local computer.) If the client wants to obtain a handle representing all the mailslots of a given name on the domain it’s a member of, it specifies the name in the format “\\\\*\\Mailslot\\MailslotName”, and if the client wants to broadcast to all the mailslots of a given name within a different domain, the format it uses is “\\\\DomainName\\Mailslot\\MailslotName”. After obtaining a handle representing the client side of a mailslot, the client sends messages by calling WriteFile. Because of the way mailslots are implemented, only messages smaller than 424 bytes can be broadcast. If a message is larger than 424 bytes, the mailslot implementation uses a reliable communications mechanism that requires a one-to-one client/server connection, which precludes broadcast capability. This limitation makes mailslots generally unsuitable for messages larger than 424 bytes. Figure 12-11 shows an example of a client broadcasting to multiple mailslot servers within a domain. Named Pipe and Mailslot Implementation 925

As evidence of their tight integration with Windows, named pipe and mailslot functions are all implemented in the Kernel32.dll Windows client-side DLL. ReadFile and WriteFile, which are the functions applications use to send and receive messages using named pipes or mailslots, are the primary Windows I/O routines. The CreateFile function, which a client uses to open either a named pipe or a mailslot, is also a standard Windows I/O routine. However, the names specified by named pipe and mailslot applications specify file system namespaces managed by the named pipe file system driver (\\%SystemRoot%\\System32\\Drivers\\Npfs.sys) and the mailslot file system driver (\\%SystemRoot%\\System32\\Drivers\\Msfs.sys), as shown in Figure 12-12. The named pipe file system driver creates a device object named \\Device\\NamedPipe and a symbolic link to that object named \\Global??\\Pipe. The mailslot file system driver creates a device object named \\Device\\Mailslot and a symbolic link named \\Global??\\Mailslot that points to that object. (See Chapter 3 for an explanation of the \\Global?? object manager directory.) Names passed to CreateFile of the form \\\\.\\Pipe\\… and \\\\.\\Mailslot\\… have their prefix of \\\\.\\ translated to \\Global??\\ so that the names resolve through a symbolic link to a device object. The special functions CreateNamedPipe and CreateMailslot use the corresponding native functions NtCreateNamedPipeFile and NtCreateMailslotFile. Later in the chapter, we’ll discuss how the redirector file system driver is involved when a name that specifies a remote named pipe or mailslot resolves to a remote system. However, when a named pipe or mailslot is created by a server or opened by a client, the appropriate file system driver (FSD) on the machine where the named pipe or mailslot is located is eventually invoked. There are several reasons why FSDs in kernel mode implement named pipes and mailslots, the main one being that they integrate with the object manager namespace and can use file objects to represent opened named pipes and mailslots. This integration results in several benefits: ■ The FSDs use kernel-mode security functions to implement standard Windows security for named pipes and mailslots. ■ Applications can use CreateFile to open a named pipe or mailslot because FSDs integrate with the object manager namespace. ■ Applications can use Windows functions such as ReadFile and WriteFile to interact with named pipes and mailslots. ■ The FSDs rely on the object manager to track handle and reference counts for file objects representing named pipes and mailslots. 926

■ The FSDs can implement their own named pipe and mailslot namespaces, complete with subdirectories. Because named pipes and mailslots use name resolution that relies on the redirector FSD to communicate across the network, they indirectly rely on the Server Message Block (SMB) protocol. SMB works by using the TCP/IP and TCP/IP with IPv6 protocols, so applications running on systems that have at least one of these in common can use named pipes and mailslots. For more information about SMB, see Chapter 11. eXPeriMeNT: Listing the Named Pipe Namespace and Watching Named Pipe Activity It’s not possible to use the Windows API to open the root of the named pipe FSD and perform a directory listing, but you can do this by using native API services. The PipeList tool from Sysinternals shows you the names of the named pipes defined on a computer as well as the number of instances that have been created for a name and the maximum number of instances as defined by a server’s call to CreateNamedPipe. Here’s an example of PipeList output: 1. C:\\>pipelist 2. PipeList v1.01 3. by Mark Russinovich 4. http://www.sysinternals.com 5. Pipe Name Instances Max Instances 6. --------- --------- ------------- 7. InitShutdown 3 -1 8. lsass 6 -1 9. protected_storage 3 -1 10. ntsvcs 3 -1 11. scerpc 3 -1 12. net\\NtControlPipe1 1 1 13. plugplay 3 -1 14. net\\NtControlPipe2 1 1 15. Winsock2\\CatalogChangeListener-394-0 1 1 16. epmapper 3 -1 17. Winsock2\\CatalogChangeListener-25c-0 1 1 18. LSM_API_service 3 -1 19. net\\NtControlPipe3 1 1 20. eventlog 3 -1 21. net\\NtControlPipe4 1 1 22. Winsock2\\CatalogChangeListener-3f8-0 1 1 23. net\\NtControlPipe5 1 1 24. net\\NtControlPipe6 1 1 25. net\\NtControlPipe0 1 1 26. atsvc 3 -1 27. Winsock2\\CatalogChangeListener-438-0 1 1 28. Winsock2\\CatalogChangeListener-2c8-0 1 1 927

29. net\\NtControlPipe7 1 1 30. net\\NtControlPipe8 1 1 31. net\\NtControlPipe9 1 1 32. net\\NtControlPipe10 1 1 33. net\\NtControlPipe11 1 1 34. net\\NtControlPipe12 1 1 35. 142CDF96-10CC-483c-A516-3E9057526912 1 1 36. net\\NtControlPipe13 1 1 37. net\\NtControlPipe14 1 1 38. TSVNCache-000000000001b017 20 -1 39. TSVNCacheCommand-000000000001b017 2 -1 40. Winsock2\\CatalogChangeListener-2b0-0 1 1 41. Winsock2\\CatalogChangeListener-468-0 1 1 42. TermSrv_API_service 3 -1 43. Ctx_WinStation_API_service 3 -1 44. PIPE_EVENTROOT\\CIMV2SCM EVENT PROVIDER 2 -1 45. net\\NtControlPipe15 1 1 46. keysvc 3 -1 It’s clear from this output that several system components use named pipes as their communications mechanism. For example, the InitShutdown pipe is created by Winlogon to accept remote shutdown commands, and the Atsvc pipe is created by the Task Scheduler service to enable applications to communicate with it to schedule tasks. You can determine what process has each of these pipes open by using the object search facility in Process Explorer. Note A Max Instances value of –1 means that there is no upper limit on the number of instances. 12.2.6 NetBIOS Until the 1990s, the Network Basic Input/Output System (NetBIOS) programming API had been the most widely used network programming API on PCs. NetBIOS allows for both reliable-connection-oriented and unreliable-connectionless communication. Windows supports NetBIOS for its legacy applications. Microsoft discourages application developers from using NetBIOS because other APIs, such as named pipes and Winsock, are much more flexible and portable. NetBIOS is supported by the TCP/IP protocol on Windows. NetBIOS Names NetBIOS relies on a naming convention whereby computers and network services are assigned a 16-byte name called a NetBIOS name. The sixteenth byte of a NetBIOS name is treated as a modifier that can specify a name as unique or as part of a group. Only one instance of a unique NetBIOS name can be assigned to a network, but multiple applications can assign the same group name. A client can broadcast messages by sending them to a group name. 928

To support interoperability with Windows NT 4 systems as well as Windows 9x/Me, Windows automatically defines a NetBIOS name for a domain that includes up to the first 15 bytes of the left-most Domain Name System (DNS) name that an administrator assigns to the domain. For example, if a domain were named mspress.microsoft.com, the NetBIOS name of the domain would be mspress. Similarly, Windows requires an administrator to assign each computer a NetBIOS name at the time of installation. Another concept used by NetBIOS is that of LAN adapter (LANA) numbers. A LANA number is assigned to every NetBIOS-compatible protocol that layers above a network adapter. For example, if a computer has two network adapters and TCP/IP and NWLink can use either adapter, there would be four LANA numbers. LANA numbers are important because a NetBIOS application must explicitly assign its service name to each LANA through which it’s willing to accept client connections. If the application listens for client connections on a particular name, clients can access the name only via protocols on the network adapters for which the name is registered. The “Windows Internet Name Service” section later in this chapter describes the resolution of NetBIOS names to TCP/IP addresses. NetBIOS Operation A NetBIOS server application uses the NetBIOS API to enumerate the LANAs present on a system and assign a NetBIOS name representing the application’s service to each LANA. If the server is connection oriented, it performs a NetBIOS listen command to wait for client connection attempts. After a client is connected, the server executes NetBIOS functions to send and receive data. Connectionless communication is similar, but the server simply reads messages without establishing connections. A connection-oriented client uses NetBIOS functions to establish a connection with a NetBIOS server and then executes further NetBIOS functions to send and receive data. An established NetBIOS connection is also known as a session. If the client wants to send connectionless messages, it simply specifies the NetBIOS name of the server with the send function. NetBIOS consists of a number of functions, but they all route through the same interface: Netbios. This routing scheme is the result of a legacy left over from the time when NetBIOS was implemented on MS-DOS as an MS-DOS interrupt service. A NetBIOS application would execute an MS-DOS interrupt and pass a data structure to the NetBIOS implementation that specified every aspect of the command being executed. As a result, the Netbios function in Windows takes a single parameter, which is a data structure that contains the parameters specific to the service the application requests. eXPeriMeNT: using Nbtstat to See NetBiOS Names You can use the Nbtstat command, which is included with Windows, to list the active sessions on a system, the NetBIOS-to-TCP/IP name mappings cached on a computer, and the NetBIOS names defined on a computer. Here’s an example of the Nbtstat command with the –n option, which lists the NetBIOS names defined on the computer: 929


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook