HomeBlogHashes and raw strings in Common Lisp

Hashes and raw strings in Common Lisp

So I'm learning Common Lisp.  Why would I do that?  Well, for one thing, I don't have much work to do lately, which is pretty bad because it directly translates in a poor financial situation (wanna hire me?).  But on the other hand, because Lisp is cool and I've long wanted to learn it.  Friends of mine keep asking "What can you do with Lisp?  Can you find a job?  Can you make any money with it?" — I haven't got an answer to these questions yet, but one thing is certain: I can use it myself and finally replace Perl for my server-side needs.

I'm writing in this article how I solved two of the things that bothered me.  It's probably not useful for experienced Lisp hackers. ;-)

Hashes

As a long-time Perl and JavaScript hacker, I'm used to certain “syntactic sugar” for defining hash tables, or accessing hash elements.  In JavaScript:

var hash = {
    foo: "bar",
    other: {
        name: "Joe"
    }
};

alert(hash.foo);
alert(hash.other.name);

Nice and clean, isn't it?  Now let's see how we would do the above in Common Lisp:

(setq hash (make-hash-table))
(setf (gethash 'foo hash) "bar")
(setf (gethash 'other hash) (make-hash-table))
(setf (gethash 'name (gethash 'other hash)) "Joe")

(print (gethash 'foo hash))
(print (gethash 'name (gethash 'other hash)))

[ Oh, please notice there are fewer lines in Lisp. :-p ]

I'm not sure if experienced Lisp hackers are able to endure such code, but I'm not.  I searched ways to make it better, because Lisp is the "programmable programming language" and I came up with a macro that allows me to write the previous example like this:

(setq hash %((make-hash-table) <--
                'foo = "bar"
                'other = %((make-hash-table) <-- 'name = "Joe"))

(print %(hash --> 'foo))
(print %(%(hash --> 'other) --> 'name))

To me, that looks better.  It would be even nicer if the last access could have been written like this:

%(hash -> 'other -> 'name)

but so far I haven't got around to do that.

The macro starts with %( and ends with ) (of course, nested parens are allowed).  The first thing after the start paren is the hash table, and the next is one of these symbols (no need to quote them): <-- or <-, and --> or ->.  Left arrow means “put data into this hash”, while right arrow means “get data from this hash”.  The put operations return the hash, so that's why setq above works fine.  If we macroexpand the first form in the code above, we get:

(LET ((#:G792 (MAKE-HASH-TABLE)))
  (SETF (GETHASH 'FOO #:G792) "bar")
  (SETF (GETHASH 'OTHER #:G792)
          (LET ((#:G791 (MAKE-HASH-TABLE)))
            (SETF (GETHASH 'NAME #:G791) "Joe")
            #:G791))
  #:G792)

It's possible to fetch multiple values at once in a "get" operation:

(multiple-value-bind (foo bar baz) %(hash --> 'foo 'bar 'baz)
  (print foo) (print bar) (print baz))

When you fetch a single value though, it will return like the standard gethash call (returns both the value of the key, and a boolean specifying if the key was present in the hash).

The code for the super miraculous hash syntactic sugar is the following:

(make-dispatch-macro-character #\%)

(defmacro defreader (left right parms &body body)
  `(ddfn ,left ,right #'(lambda ,parms ,@body)))

(let ((rpar (get-macro-character #\) )))
  (defun ddfn (left right fn)
    (set-macro-character right rpar)
    (set-dispatch-macro-character #\% left
                                  #'(lambda (stream char1 char2)
                                      (declare (ignore char1 char2))
                                      (apply fn
                                             (read-delimited-list right stream t))))))

(defreader #\( #\) (hash op &rest stuff)
  (let ((var-hash (gensym)))
    (cond ((or (eq op '->) (eq op '-->))
           ;; "GET" operations
           (cond ((= 1 (length stuff))
                  (setq stuff (car stuff))
                  `(gethash ,stuff ,hash))
                 (t
                  `(let ((,var-hash ,hash))
                     (values ,@(mapcar #'(lambda(key)
                                           `(gethash ,key ,var-hash))
                                       stuff))))))
          ((or (eq op '<-) (eq op '<--))
           ;; "SET" operations
           (cond ((= 3 (length stuff))
                  `(let ((,var-hash ,hash))
                     (setf (gethash ,(first stuff) ,var-hash) ,(third stuff))
                     ,var-hash))
                 (t
                  (setq stuff
                        (loop :for (key nil value) :on stuff :by #'cdddr
                           :collecting (list key value)))
                  `(let ((,var-hash ,hash))
                     ,@(loop :for c :in stuff
                          :collecting
                          `(setf (gethash ,(first c) ,var-hash) ,(second c)))
                     ,var-hash)))))))

(note, the defreader and ddfn functions are copied from Paul Graham's ”On Lisp” book).

Raw strings

In Emacs Lisp, regular expressions follow an archaic syntax where group-defining characters (and other special characters) need to be prefixed by a backslash.  This is different from Perl-compatible regular expressions, which have been implemented by most other languages nowadays.  Coupled with the fact that ELisp doesn't have literal regexps, thus you need to write them as strings, they tend to get pretty ugly:

In Perl: /(foo|bar|baz)/
In ELisp: "\\(foo\\|bar\\|baz\\)"

As I started learning Common Lisp, I figured out that it would be great if I found a more enjoyable way to write regular expressions, therefore I defined a macro that reads “raw strings”.  A raw string is a string where the backslash doesn't have any special meaning, so in order to include a literal backslash in the string you don't need to type it twice.  It would allow me to write the regexp above like this:

#R"\(foo\|bar\|baz\)"

Of course, it was only after I finished this macro when I realized that cl-ppcre, the "de-facto standard" Common Lisp regexp library, doesn't suffer from the backslash plague found in Emacs simply because it follows the syntax of Perl regexps, so this macro became pretty much pointless...  But I still think useful.  For example it allows you to use something else than quotes in order to define a string, which can be handy if you need to include lots of literal quotes in the string.  Examples:

#R(foo "bar" baz) ==> "foo \"bar\" baz"
#R/(this|looks|like|a|literal|regexp)/  ==>  "(this|looks|like|a|literal|regexp)"
#R[We have a \ backslash] ==> "We have a \\ backslash"

Basically the character that follows "#R" must be repeated to end the string, unless it's one of the regular parens (that is "(, [, {, <"), in which case the matching closing paren will be expected instead, as you can see in the last line above.

The only allowed backslashing is to escape the closing character, when you need to include it literally:

#R(foo (bar\)) ==> "foo (bar)"

in all other cases, blackslash is literal and it's included in the final string as such.

Here's the source for this macro (note that it requires the macro above for hashes):

(defparameter *raw-string-end-quotes*
  %( (make-hash-table) <--
     #\( = #\)
     #\{ = #\}
     #\[ = #\]
     #\< = #\> ))

(defun read-raw-string (stream c1 c2)
  (declare (ignore c1 c2))
  (loop
     :with value = (make-array 0 :element-type 'character :fill-pointer 0 :adjustable t)
     :and quote = (read-char stream)
     :initially (setq quote (or %(*raw-string-end-quotes* --> quote) quote))
     :for next = (read-char stream)
     :for escaped = (and (eq next #\\) (eq quote (peek-char nil stream)))
     :do
     (if (not escaped)
         (if (eq quote next)
             (return value)
             (vector-push-extend next value))
         (vector-push-extend (read-char stream) value))))

(set-dispatch-macro-character #\# #\R #'read-raw-string)

Well, that's it for today, hope to be able to write more interesting stuff later.  I wish I'd explain these macros, BTW, but I won't — reading the first few chapters of Practical Common Lisp should help anyone understand them.

Comments

  • By: IulianSep 12 (18:42) 2009RE: Hashes and raw strings in Common Lisp §

    Take a look at Clojure. It's a very well-designed Lisp. Hash maps are a matter of:
    <pre>
    (def h (hash-map :january 31 :february 28 :march 31))
    </pre>

    • By: Leslie P. PolzerSep 14 (10:06) 2009RE[2]: Hashes and raw strings in Common Lisp §

      Hello Mihai,

      you can make money with Lisp, but to get the chance you need to show that you're proficient with it. Most companies (esp. the small ones that usually want Lispers) cannot afford to hire Lisp programmers that don't know Lisp well...

      I would suggest you try to get used to the hash syntax in CL. The beauty of Lisp is its quite uniform syntax compared to traditional languages. It sometimes seems more verbose but it's much more readable this way in fact. I think no serious Lisp programmers is using read macros.

      Your string read macro is available out of the box with CL-INTERPOL btw.

      Iulian's Clojure demonstration (hash-map function) can be easily rewritten in CL like this:

      (defun plist->hash-table (plist &rest hash-table-initargs)
        "Returns a hash table containing the keys and values of the property list
      PLIST. Hash table is initialized using the HASH-TABLE-INITARGS."
        (let ((table (apply #'make-hash-table hash-table-initargs)))
          (do ((tail plist (cddr tail)))
              ((not tail))
            (setf (gethash (car tail) table) (cadr tail)))
          table))

      Hope that helps,

        Leslie

  • By: MarijnNov 10 (21:18) 2009RE: Hashes and raw strings in Common Lisp §

    Reader macros are awesome. And also completely non-modular, and a pain to use in serious systems. The hash-table one could be rewritten as a function -- I often use something like (hash "foo" '(bar) "bar" :foo), which uses scary heuristics to figure out whether the keys should be eq/eql/equal.

    • By: mishooNov 10 (22:27) 2009RE[2]: Hashes and raw strings in Common Lisp §

      “Reader macros are awesome. And also completely non-modular, and a pain to use in serious systems”

      Yeah.. Eventually I learned this lesson the hard way.  At some point I tried to use closure-html for parsing a piece of HTML from the browser, and it wouldn't compile (failing with a rather cryptical error message for a novice like me).  I spent like one hour on it, then bugged the good folks on irc.freenode.org/#lisp for another hour, until I finally realied that it fails because "%" was a macro dispatch character in my Lisp core, but also used in closure-html as part of some variable names.  Imagine how stupid I felt. :-)

      Still, syntactic sugar was too nice to have in some places, so I hided it with "named-readtables" (suggested by someone on #lisp, I think it was the author).  Pretty cool stuff.

Page info
Created:
2009/09/12 16:09
Modified:
2009/09/12 16:17
Author:
Mihai Bazon
Comments:
4
Tags:
lisp, programming
See also