Useful Advanced Topics

I really wanted this tutorial to be five parts. Five is a clean and handy number, but I discovered writing the examples that there were a lot of tremendously handy features that needed to be introduced beyond those most easily recognizable. Also, I'll admit that the section on non-local exits probably belongs in the Control Structures unit, but, since one can make a fully-functional program that works without it, I moved it here.


Eval is a strange beast. It's usually used in conjunction with read-from-string like in the below example (where it's useful), but by itself, you can think of its big cousins eval-buffer or eval-last-sexp (think C-x C-e). In fact, it doesn't really do anything on its own. Go ahead and try this:

(eval '(eval '(eval '(message "Hello"))))

When you execute the buffer, you'll get only one single "Hello", but eval is far from useless...

"REPL" stands for Read Evaluate Print Loop. It'll simply take some input, execute it, report the output of said execution, and then come back for more. Let's make a REPL environment in Emacs (though one already exists with M-x ielm):

(defun repl-eval ()
"Does the back-breaking work of EVALing the current line."
(let ( (input "") )
(backward-to-indentation 0)
(setq input (buffer-substring (+ (point) 2) (+ (buffer-size) 1)) )
(goto-char (point-max))
(insert "\n")
(insert (eval (car (read-from-string input ) ) ))
(insert "\n> ")

(defun repl-mode () "Start Elisp REPL session."
(switch-to-buffer (read-from-minibuffer "Buffer name: "))
(local-set-key (kbd "RET") 'repl-eval)
(insert "ELISP REPL ****\n\n> ")

If you want to execute more than one command on a single line in a REPL, remember the magic of progn, which evaluates, in turn, any number of arguments. It was introduced in part 4.

This is powerful sauce. read-from-string and eval, when used together, allow metaprogramming in the strictest sense. Theoretically, one could create a defun that spewed out realistic Elisp that could get thrown through this pair of functions through iterations until the function defined something somewhat autonomous. This is where artificial intelligence becomes achievable: a program that writes itself on the fly.

Regular Expressions

Lots of programming languages support "regular expressions". I mentioned them in an example in part 5. If you press M-%, you can replace one dumb string with another, but suppose you want something a bit more abstract? Like, you didn't just want to match "C6H12O6", but any series of one letter followed by any number of numbers? Regular expressions (abbreviated "regexps" by Emacs) can do things like this. Most of Elisp's regular expressions require their own buffer, though. This doesn't bother the user, or even blink when its active. There is a single function to replace a regular expression inside a string, but the escaping required on that is ludicrous. Simply use with-temp-buffer when you need to regexp search and replace. The main functions here are: re-search-forward, match-string, and replace-match.

Regexps are a sort of language within Elisp in much the same way format (section 2) was, and, in fact, how format formatted output, regexps format input. These, however, have a really funny syntax. Let's start with a simple US phone number validation function: we want to see that the string passed to the function matches "(XXX) XXX-XXXX" where the "X"es are integers. Let's do this with re-search-forward.

(defun is-phone (phone)
(insert phone)
(goto-char 0)
((re-search-forward "([0-9]\\{3\\}) [0-9]\\{3\\}-[0-9]\\{4\\}" nil t)
(message "Yes."))
(t (message "No."))
(is-phone "(410) 867-5309") ; messages "Yes".

Oh dear... What happened? with-temp-buffer creates a new buffer and insert does the familiar insertion. goto-char is used to move the point to the beginning of the buffer, then re-search-forward does the real work here. The second and third parameters are kind-of required. The second limits the search and that's useless for most cases, so we set it to nil. The third argument is really important: without it, if this expression doesn't find a match, an error will get thrown and evaluation will stop.

Regular expressions can contain special strings that match generally a source string. In the example, it doesn't matter what the numbers are, just as long as they're numbers. So, with that said, these special strings contain these magical elements:

\( [ \{ . * + - ? \} ] \)— Explained thusly:

So, the regular expression was: ([0-9]\{3\}) [0-9]\{3\}-[0-9]\{4\}. Extra slashes had to be added ("escaping" from section 2). The 0-9 part define a "character class". We could have just as reasonably said [a-z] or [A-Z] to specify a range. For individual characters, just type them in by themselves without brackets. For a literal dash inside a character class, add a backslash in front (\-). If you want to negate a set, use "^" in the beginning ([^a-z]). Things do get a little special. Square brackets aren't escaped, but the braces ({}) are. Also, escaped parentheses have special meaning.

The braces specify a number of that character class to match. \{4\} will match exactly four of whatever exist in the brackets to the left.

Let me explain grouping with a real-world example that I used for writing this page, which began with M-x query-replace-regexp (I have found to M-^). I supplied "\([^"]*\)" and <\1> to replace things that I put in quotes with HTML italics. Grouping is useful for replacement or isolating a particular part of a string. Whatever gets enclosed in escaped parentheses ( \( ... \) ) can be retrieved with either a slash followed by the number of parentheses group (starting from one) that you're trying to retrieve, or the match-string function.

This is another real-life example pulling down some JSON and pretty-printing it for a ticketing system at my job... the input looks like "key1":"value2","key2":"value2", where the keys are lower-case letters and numbers and underscores. There was more to the pretty-printing script, but this made the keys fall one-per-line (thanks to the \n in the replace-match), and blue (thanks to propertize).

(goto-char 0) ; re-search-forward uses the point. Start at the beginning.

(while (re-search-forward "\"\\([a-z0-9_]+\\)\":" nil t)
(replace-match (concat "\n" (propertize (match-string 1) 'face
'(:foreground "blue")) ": "))

Next, I promised to explain "non-greedy evaluation" (specified with a question mark). Suppose you wanted to match an HTML tag (looks like <strong>Text goes here</strong>, for example) and just isolate the name of the tag (strong in our example). A good first guess might be the regexp <\(.*\)>. By default, regular expressions match as much as possible, so our example sent through our regular expression would match strong>Text goes here</strong, which isn't what we wanted at all. If we modify our regexp to <\(.*?\)>, we get strong.


In the intro, it states that "it's a great tool for quick and dirty front-ends for back-end tools or web APIs". The above example hints at its usefulness with web formats (like HTML or JSON), but if you cruise the function index, there aren't any functions for doing Internet-related things. Also, if you run M-x grep, you'll notice that Emacs simply calls grep within a terminal and reports the output. How does it all work?

There are two common answers (though there are more ways in the manual): call-process and start-process. call-process will "synchronously" call another program, wait for it to finish, and output the results. start-process "asynchronously" starts a program and continues executing stuff in Emacs while the program runs silently in the background. Let's talk about invocation. curl is a pretty common tool in the Linux world for calling a website and pulling down its HTML, or, in the case of a web service, some JSON or XML. Let's look at this first. Once again, this example comes from real life, but the variable names have been changed to protect the innocent:

(call-process "/usr/bin/curl" nil
"buffername" t
"-s" ; The 's' is for "shhh!"
"-H" (concat "Authorization: Basic " (base64-encode-string (concat username ":" password)) )

What's the story with the parameters? The first one is the command to run, the second is a filename for standard input, if you'd like to use it (sort of like "<" in the command line). The third a buffer name to catch the output or:

All the other arguments are command-line arguments that get sent to the program.

Phew! That just leaves start-process, which, due to its asynchronous nature is a little hairy to manage. You can imagine that it's used by Emacs' own ERC, the IRC client. Invocation is pretty simple— the first argument is a name, the second is its output buffer, the third argument is the command, and all the rest of the arguments are arguments to the command. Unfortunately, management requires it's own host of functions: process-send-string or process-send-region would be used to handle sending input to a process and process-status is used to see if it's still running. These are pretty intuitive to figure out.

Non-Local Exits

If you've messed with loops in data structures, you'll pretty quickly discover that Elisp's lack of a "return" function can be kind of annoying... or "break", for that matter. Lisp in general really isn't big on the whole procedural line-by-line execution thing, but they do have a feature to help us. If you've played with C++ or Java, brace yourself for this: throw and catch. These are a catch-all for when you need to hop back up your program's call stack. So, Function A calls Function B calls Function C— not unusual in Lisp— and now you're done and need to get back. Well... throw and catch. First, define a catch with a single symbol as a tag. Anywhere inside that function (or descendent functions of that function), you can call throw, specify that tag, and pass it a value. catch will return whatever value you throw at it and stop execution immediately.

I was pretty confused until I tried iterating through an array and needing some way out. The example in the Elisp manual is a bit dense, but hopefully this will be easier to follow. Imagine I have a nested data structure: a vector inside a list... not unusual. I want to get out one of those vectors by supplying some sort of criteria. I could initialize a temporary variable, iterate the entire list until I find the desired element, attach that element to the temporary variable, wait for the list to finish iterating, and return that temporary variable, or...

; Ship arrays ["Name" maxHealth curHealth minDamge maxDamage]
(setq ships (list
["Nina" 100 80 20 40]
["Pinta" 110 72 20 40]
["Santa Maria" 150 112 30 50]

; Example to find the ship...
(defun getShipByName (name)
"Given 'Name', return the vector of that ship!"
(setq retIndex
; Using setq to get whatever value is thrown to this here catch...
(catch 'foundShip
(dotimes (i (length ships))
; When name equals the first element in the SHIPS vector... exit
(when (string= (downcase name) (downcase (aref (nth i ships) 0)))
; We've got a match! Jump ship back to "catch" and give it the index.
(throw 'foundShip i)
(nth retIndex ships)

; Let's get the maxHealth of the Pinta and print it out:
(message (number-to-string (aref (getShipByName "Pinta") 1)))

The End

I don't really know how to end this tutorial, but I hope you found some value in it and you'll write me if there are any omissions. I know there are some topics that could use a deeper dig. There are some vague plans to make this tutorial into a YouTube series. Keep your eyes peeled!