ich1 the CSI Killer

One of the optional parts ofimplementing MAL was line editing for easier interaction with the interpreter. Most implementations either load an existing binding to readline
or make use of it directly with FFI
magicks, but neither were an option for me as there simply was no such library (yet) and Emacs module support isn’t terribly well documented
(just like the C API), not terribly well looking
and requires a recent build plus a compile-time option to be made use of. So, not an option other than for a few enthusiasts
.

The naïve way of doing event handling wasn’t an option either as one can verify by looking at read_minibuf_noninteractive
and how it reads a string outside the command loop. By that time I concluded it to be impossible to reimplement a fancier variant of it with Emacs Lisp, yet asked around for help on #emacs
. Truth to be told, I didn’t expect to find any, but then the unlikely happened and Alain Kalker
proposed a hack crazy enough to work. One thing led to the other and after two weeks of hacking I can finally present you emacsrepl
, the Emacs Lisp REPL you’ve all been waiting for! It’s fairly useless at the moment as it doesn’t have any redeeming features (yet), but has been an interesting exercise and could not only make it into my MAL implementation, but even pave the way for standard input handling in Emacs…

You may wonder what could possibly be complicated about letting the cursor dance in the terminal. As it is with most problems, it’s not a single reason, but rather a combination of unfortunate factors. In this case, featuritis, legacy hardware support, arcane documentation and the curse of existing work being good enough to not bother walking a different path
. Or maybe it’s just people not giving a damn about how smelly widely used code can be:

/* Delete the string between FROM and TO.  FROM is inclusive, TO is not.
   Returns the number of characters deleted. */
int
rl_delete_text (from, to)
     int from, to;
{
  register char *text;
  register int diff, i;

  /* Fix it if the caller is confused. */
  if (from > to)
    SWAP (from, to);

  /* fix boundaries */
  if (to > rl_end)
    {
      to = rl_end;
      if (from > to)
        from = to;
    }
  if (from < 0)
    from = 0;

  text = rl_copy_text (from, to);

  ...
}

This is just not right. Fixing API usage mistakes reeks of Windows 95 programming practices. Even worse if you consider that this function is not part of the external API and therefore the “confused caller” is something inside readline itself that prompted this addition. Why one would not simply debug the codebase to not require that way of doing things is beyond me.

Another problem I’ve got with this is that readline clocks in at about 23k SLOC. Fortunately I’m not the only one considering that fact a problem: Salvatore Sanfilippo
wrote linenoise
as minimal replacement for it. The code is clean, very readable and was consulted extensively for getting the design and implementation of emacsrepl
right.

The program itself can be split into two parts, a shell script conjuring the spirits of the terminal and the Emacs Lisp code receiving input and printing output.

To react to every single input of the user, two conditions must be fulfilled: Characters are read in raw mode, a terminal state that deactivates any special-casing that would keep us from detecting a
C-c

and characters are read in one at a time. The former is typically done in C with the termios.h
family and a menagerie of flags (which must be undone on exit). Fortunately it can be done with the stty
command and trapping exit.

The next problem is getting one character at a time into Emacs. As I’m leveraging
emacs --batch

to be able to read from standard input in the first place, I can only read lines with it, so the next part of the hack is printing each character on its own line, piping that into Emacs and ensuring line buffering for it to work as expected. This obviously introduces overhead, but nothing noticable for this usecase.

Now for the Emacs side of things. Characters are read in successfully, but not everything of interest can be expressed as a single byte. Experimenting with
cat -A

reveals that key combinations involving modifiers are different, even more so special keys like
and
. These sequences are decoded with a simple state machine. The same approach was chosen for UTF-8 data as anything beyond ASCII is represented with more than one byte, with the state machine being a faithful port of prior art
.

To move the cursor around and update the edited line, one needs to print out characters, some of which are special and terminal-specific. I did initially reach out to the terminfo
database, but gave up quickly due to the lacking documentation on what sequences like ich1
do and unexpected interactions such as typing being messed up after exiting the REPL. Fortunately linenoise goes for an easier approach: Picking a small set of primitives from the nearly ubiquitiously supported CSI codes
and redrawing with these only. This works reasonably well at the cost of not being completely compatible with everything out thereand should be the slower approach, but again it isn’t an issue in practice.

As the appearance of edited text is manipulated, the underlying text representation of the user input so far must be kept up to date. For this Emacs offers the perfect data structure: The buffer! This helps keeping the code size small as boundaries, contents and the position of point are tracked for you. I don’t have much to complain here, save that some essential operations like replacing text must be implemented as deletion and insertion. Another benefit of it is that more complex editing operations (think Paredit
) don’t need to be reimplemented, only the redisplay of them.

稿源:Emacs Ninja (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 综合编程 » ich1 the CSI Killer

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录