| # @(#)TOUR 5.1 (Berkeley) 3/7/91 |
| # |
| # /b/source/CVS/src/bin/sh/TOUR,v 1.3 1993/03/23 00:27:32 cgd Exp |
| |
| A Tour through Ash |
| |
| Copyright 1989 by Kenneth Almquist. |
| |
| |
| DIRECTORIES: The subdirectory bltin contains commands which can |
| be compiled stand-alone. The rest of the source is in the main |
| ash directory. |
| |
| SOURCE CODE GENERATORS: Files whose names begin with "mk" are |
| programs that generate source code. A complete list of these |
| programs is: |
| |
| program intput files generates |
| ------- ------------ --------- |
| mkbuiltins builtins builtins.h builtins.c |
| mkinit *.c init.c |
| mknodes nodetypes nodes.h nodes.c |
| mksignames - signames.h signames.c |
| mksyntax - syntax.h syntax.c |
| mktokens - token.def |
| bltin/mkexpr unary_op binary_op operators.h operators.c |
| |
| There are undoubtedly too many of these. Mkinit searches all the |
| C source files for entries looking like: |
| |
| INIT { |
| x = 1; /* executed during initialization */ |
| } |
| |
| RESET { |
| x = 2; /* executed when the shell does a longjmp |
| back to the main command loop */ |
| } |
| |
| SHELLPROC { |
| x = 3; /* executed when the shell runs a shell procedure */ |
| } |
| |
| It pulls this code out into routines which are when particular |
| events occur. The intent is to improve modularity by isolating |
| the information about which modules need to be explicitly |
| initialized/reset within the modules themselves. |
| |
| Mkinit recognizes several constructs for placing declarations in |
| the init.c file. |
| INCLUDE "file.h" |
| includes a file. The storage class MKINIT makes a declaration |
| available in the init.c file, for example: |
| MKINIT int funcnest; /* depth of function calls */ |
| MKINIT alone on a line introduces a structure or union declara- |
| tion: |
| MKINIT |
| struct redirtab { |
| short renamed[10]; |
| }; |
| Preprocessor #define statements are copied to init.c without any |
| special action to request this. |
| |
| INDENTATION: The ash source is indented in multiples of six |
| spaces. The only study that I have heard of on the subject con- |
| cluded that the optimal amount to indent is in the range of four |
| to six spaces. I use six spaces since it is not too big a jump |
| from the widely used eight spaces. If you really hate six space |
| indentation, use the adjind (source included) program to change |
| it to something else. |
| |
| EXCEPTIONS: Code for dealing with exceptions appears in |
| exceptions.c. The C language doesn't include exception handling, |
| so I implement it using setjmp and longjmp. The global variable |
| exception contains the type of exception. EXERROR is raised by |
| calling error. EXINT is an interrupt. EXSHELLPROC is an excep- |
| tion which is raised when a shell procedure is invoked. The pur- |
| pose of EXSHELLPROC is to perform the cleanup actions associated |
| with other exceptions. After these cleanup actions, the shell |
| can interpret a shell procedure itself without exec'ing a new |
| copy of the shell. |
| |
| INTERRUPTS: In an interactive shell, an interrupt will cause an |
| EXINT exception to return to the main command loop. (Exception: |
| EXINT is not raised if the user traps interrupts using the trap |
| command.) The INTOFF and INTON macros (defined in exception.h) |
| provide uninterruptable critical sections. Between the execution |
| of INTOFF and the execution of INTON, interrupt signals will be |
| held for later delivery. INTOFF and INTON can be nested. |
| |
| MEMALLOC.C: Memalloc.c defines versions of malloc and realloc |
| which call error when there is no memory left. It also defines a |
| stack oriented memory allocation scheme. Allocating off a stack |
| is probably more efficient than allocation using malloc, but the |
| big advantage is that when an exception occurs all we have to do |
| to free up the memory in use at the time of the exception is to |
| restore the stack pointer. The stack is implemented using a |
| linked list of blocks. |
| |
| STPUTC: If the stack were contiguous, it would be easy to store |
| strings on the stack without knowing in advance how long the |
| string was going to be: |
| p = stackptr; |
| *p++ = c; /* repeated as many times as needed */ |
| stackptr = p; |
| The folloing three macros (defined in memalloc.h) perform these |
| operations, but grow the stack if you run off the end: |
| STARTSTACKSTR(p); |
| STPUTC(c, p); /* repeated as many times as needed */ |
| grabstackstr(p); |
| |
| We now start a top-down look at the code: |
| |
| MAIN.C: The main routine performs some initialization, executes |
| the user's profile if necessary, and calls cmdloop. Cmdloop is |
| repeatedly parses and executes commands. |
| |
| OPTIONS.C: This file contains the option processing code. It is |
| called from main to parse the shell arguments when the shell is |
| invoked, and it also contains the set builtin. The -i and -j op- |
| tions (the latter turns on job control) require changes in signal |
| handling. The routines setjobctl (in jobs.c) and setinteractive |
| (in trap.c) are called to handle changes to these options. |
| |
| PARSING: The parser code is all in parser.c. A recursive des- |
| cent parser is used. Syntax tables (generated by mksyntax) are |
| used to classify characters during lexical analysis. There are |
| three tables: one for normal use, one for use when inside single |
| quotes, and one for use when inside double quotes. The tables |
| are machine dependent because they are indexed by character vari- |
| ables and the range of a char varies from machine to machine. |
| |
| PARSE OUTPUT: The output of the parser consists of a tree of |
| nodes. The various types of nodes are defined in the file node- |
| types. |
| |
| Nodes of type NARG are used to represent both words and the con- |
| tents of here documents. An early version of ash kept the con- |
| tents of here documents in temporary files, but keeping here do- |
| cuments in memory typically results in significantly better per- |
| formance. It would have been nice to make it an option to use |
| temporary files for here documents, for the benefit of small |
| machines, but the code to keep track of when to delete the tem- |
| porary files was complex and I never fixed all the bugs in it. |
| (AT&T has been maintaining the Bourne shell for more than ten |
| years, and to the best of my knowledge they still haven't gotten |
| it to handle temporary files correctly in obscure cases.) |
| |
| The text field of a NARG structure points to the text of the |
| word. The text consists of ordinary characters and a number of |
| special codes defined in parser.h. The special codes are: |
| |
| CTLVAR Variable substitution |
| CTLENDVAR End of variable substitution |
| CTLBACKQ Command substitution |
| CTLBACKQ|CTLQUOTE Command substitution inside double quotes |
| CTLESC Escape next character |
| |
| A variable substitution contains the following elements: |
| |
| CTLVAR type name '=' [ alternative-text CTLENDVAR ] |
| |
| The type field is a single character specifying the type of sub- |
| stitution. The possible types are: |
| |
| VSNORMAL $var |
| VSMINUS ${var-text} |
| VSMINUS|VSNUL ${var:-text} |
| VSPLUS ${var+text} |
| VSPLUS|VSNUL ${var:+text} |
| VSQUESTION ${var?text} |
| VSQUESTION|VSNUL ${var:?text} |
| VSASSIGN ${var=text} |
| VSASSIGN|VSNUL ${var=text} |
| |
| In addition, the type field will have the VSQUOTE flag set if the |
| variable is enclosed in double quotes. The name of the variable |
| comes next, terminated by an equals sign. If the type is not |
| VSNORMAL, then the text field in the substitution follows, ter- |
| minated by a CTLENDVAR byte. |
| |
| Commands in back quotes are parsed and stored in a linked list. |
| The locations of these commands in the string are indicated by |
| CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether |
| the back quotes were enclosed in double quotes. |
| |
| The character CTLESC escapes the next character, so that in case |
| any of the CTL characters mentioned above appear in the input, |
| they can be passed through transparently. CTLESC is also used to |
| escape '*', '?', '[', and '!' characters which were quoted by the |
| user and thus should not be used for file name generation. |
| |
| CTLESC characters have proved to be particularly tricky to get |
| right. In the case of here documents which are not subject to |
| variable and command substitution, the parser doesn't insert any |
| CTLESC characters to begin with (so the contents of the text |
| field can be written without any processing). Other here docu- |
| ments, and words which are not subject to splitting and file name |
| generation, have the CTLESC characters removed during the vari- |
| able and command substitution phase. Words which are subject |
| splitting and file name generation have the CTLESC characters re- |
| moved as part of the file name phase. |
| |
| EXECUTION: Command execution is handled by the following files: |
| eval.c The top level routines. |
| redir.c Code to handle redirection of input and output. |
| jobs.c Code to handle forking, waiting, and job control. |
| exec.c Code to to path searches and the actual exec sys call. |
| expand.c Code to evaluate arguments. |
| var.c Maintains the variable symbol table. Called from expand.c. |
| |
| EVAL.C: Evaltree recursively executes a parse tree. The exit |
| status is returned in the global variable exitstatus. The alter- |
| native entry evalbackcmd is called to evaluate commands in back |
| quotes. It saves the result in memory if the command is a buil- |
| tin; otherwise it forks off a child to execute the command and |
| connects the standard output of the child to a pipe. |
| |
| JOBS.C: To create a process, you call makejob to return a job |
| structure, and then call forkshell (passing the job structure as |
| an argument) to create the process. Waitforjob waits for a job |
| to complete. These routines take care of process groups if job |
| control is defined. |
| |
| REDIR.C: Ash allows file descriptors to be redirected and then |
| restored without forking off a child process. This is accom- |
| plished by duplicating the original file descriptors. The redir- |
| tab structure records where the file descriptors have be dupli- |
| cated to. |
| |
| EXEC.C: The routine find_command locates a command, and enters |
| the command in the hash table if it is not already there. The |
| third argument specifies whether it is to print an error message |
| if the command is not found. (When a pipeline is set up, |
| find_command is called for all the commands in the pipeline be- |
| fore any forking is done, so to get the commands into the hash |
| table of the parent process. But to make command hashing as |
| transparent as possible, we silently ignore errors at that point |
| and only print error messages if the command cannot be found |
| later.) |
| |
| The routine shellexec is the interface to the exec system call. |
| |
| EXPAND.C: Arguments are processed in three passes. The first |
| (performed by the routine argstr) performs variable and command |
| substitution. The second (ifsbreakup) performs word splitting |
| and the third (expandmeta) performs file name generation. If the |
| "/u" directory is simulated, then when "/u/username" is replaced |
| by the user's home directory, the flag "didudir" is set. This |
| tells the cd command that it should print out the directory name, |
| just as it would if the "/u" directory were implemented using |
| symbolic links. |
| |
| VAR.C: Variables are stored in a hash table. Probably we should |
| switch to extensible hashing. The variable name is stored in the |
| same string as the value (using the format "name=value") so that |
| no string copying is needed to create the environment of a com- |
| mand. Variables which the shell references internally are preal- |
| located so that the shell can reference the values of these vari- |
| ables without doing a lookup. |
| |
| When a program is run, the code in eval.c sticks any environment |
| variables which precede the command (as in "PATH=xxx command") in |
| the variable table as the simplest way to strip duplicates, and |
| then calls "environment" to get the value of the environment. |
| There are two consequences of this. First, if an assignment to |
| PATH precedes the command, the value of PATH before the assign- |
| ment must be remembered and passed to shellexec. Second, if the |
| program turns out to be a shell procedure, the strings from the |
| environment variables which preceded the command must be pulled |
| out of the table and replaced with strings obtained from malloc, |
| since the former will automatically be freed when the stack (see |
| the entry on memalloc.c) is emptied. |
| |
| BUILTIN COMMANDS: The procedures for handling these are scat- |
| tered throughout the code, depending on which location appears |
| most appropriate. They can be recognized because their names al- |
| ways end in "cmd". The mapping from names to procedures is |
| specified in the file builtins, which is processed by the mkbuil- |
| tins command. |
| |
| A builtin command is invoked with argc and argv set up like a |
| normal program. A builtin command is allowed to overwrite its |
| arguments. Builtin routines can call nextopt to do option pars- |
| ing. This is kind of like getopt, but you don't pass argc and |
| argv to it. Builtin routines can also call error. This routine |
| normally terminates the shell (or returns to the main command |
| loop if the shell is interactive), but when called from a builtin |
| command it causes the builtin command to terminate with an exit |
| status of 2. |
| |
| The directory bltins contains commands which can be compiled in- |
| dependently but can also be built into the shell for efficiency |
| reasons. The makefile in this directory compiles these programs |
| in the normal fashion (so that they can be run regardless of |
| whether the invoker is ash), but also creates a library named |
| bltinlib.a which can be linked with ash. The header file bltin.h |
| takes care of most of the differences between the ash and the |
| stand-alone environment. The user should call the main routine |
| "main", and #define main to be the name of the routine to use |
| when the program is linked into ash. This #define should appear |
| before bltin.h is included; bltin.h will #undef main if the pro- |
| gram is to be compiled stand-alone. |
| |
| CD.C: This file defines the cd and pwd builtins. The pwd com- |
| mand runs /bin/pwd the first time it is invoked (unless the user |
| has already done a cd to an absolute pathname), but then |
| remembers the current directory and updates it when the cd com- |
| mand is run, so subsequent pwd commands run very fast. The main |
| complication in the cd command is in the docd command, which |
| resolves symbolic links into actual names and informs the user |
| where the user ended up if he crossed a symbolic link. |
| |
| SIGNALS: Trap.c implements the trap command. The routine set- |
| signal figures out what action should be taken when a signal is |
| received and invokes the signal system call to set the signal ac- |
| tion appropriately. When a signal that a user has set a trap for |
| is caught, the routine "onsig" sets a flag. The routine dotrap |
| is called at appropriate points to actually handle the signal. |
| When an interrupt is caught and no trap has been set for that |
| signal, the routine "onint" in error.c is called. |
| |
| OUTPUT: Ash uses it's own output routines. There are three out- |
| put structures allocated. "Output" represents the standard out- |
| put, "errout" the standard error, and "memout" contains output |
| which is to be stored in memory. This last is used when a buil- |
| tin command appears in backquotes, to allow its output to be col- |
| lected without doing any I/O through the UNIX operating system. |
| The variables out1 and out2 normally point to output and errout, |
| respectively, but they are set to point to memout when appropri- |
| ate inside backquotes. |
| |
| INPUT: The basic input routine is pgetc, which reads from the |
| current input file. There is a stack of input files; the current |
| input file is the top file on this stack. The code allows the |
| input to come from a string rather than a file. (This is for the |
| -c option and the "." and eval builtin commands.) The global |
| variable plinno is saved and restored when files are pushed and |
| popped from the stack. The parser routines store the number of |
| the current line in this variable. |
| |
| DEBUGGING: If DEBUG is defined in shell.h, then the shell will |
| write debugging information to the file $HOME/trace. Most of |
| this is done using the TRACE macro, which takes a set of printf |
| arguments inside two sets of parenthesis. Example: |
| "TRACE(("n=%d0, n))". The double parenthesis are necessary be- |
| cause the preprocessor can't handle functions with a variable |
| number of arguments. Defining DEBUG also causes the shell to |
| generate a core dump if it is sent a quit signal. The tracing |
| code is in show.c. |