Download Papers & Documentation
Editing Text with Lightweight Structure
Lightweight Structure is the ability to recognize
text structure automatically, using an extensible library
of patterns and parsers. Structure can be detected in
lots of ways: grammars (e.g. Java or HTML), regular expressions,
even manual selections by the user. With lightweight structure,
it doesn't matter how the structure was detected--whether
by a regular expression, or by a grammar, or by a hand-coded
parser. All that matters is the result: a region
set, which is a set of intervals in the text.
LAPIS is an experimental web browser
and text editor that demonstrates how lightweight structure can be
useful. Its most novel and interesting features are:
- Text constraints, a new pattern language
that lets you write simple but powerful patterns using
lightweight structure. For example:
This pattern uses lots of lightweight structure.
Java syntax is detected by a Java parser
and lines are found by a regular expression,
but these facts are irrelevant to the pattern. Text
constraints are used throughout LAPIS:
to make multiple selections for editing, to give arguments
to commands, to give feedback for inference, and
to add more lightweight structure to the structure
first Line in Java.Comment just before Java.Method
- Simultaneous editing, a technique
for doing repetitive text edits by controlling multiple
cursors. Lightweight structure is used to make inference
faster, more accurate, and more high-level. When
you select a Java expression, the system can infer
"Java.Expression" -- not because the inference engine
knows anything about Java syntax, but only because the library
contains a Java parser that spits out region sets.
- Outlier finding, a technique for catching
errors in user-written patterns and inferred
patterns. Unusual pattern matches are highlighted
as possible errors. Lightweight structure
- Structured text tools that operate
on region sets. Think "Unix tools for structured
text." Where grep and sort manipulate lines, however,
the LAPIS tools can operate on
any region set. With lightweight structure, that
means you can sort words, filter HTML table rows, and count
Java statements, all with the same general set of tools.
- A browser shell, a command shell built
into a web browser. You can invoke Tcl commands
and external programs and see their results displayed
in the browser. The browser shell is useful for building
command pipelines and automating web browsing. Lightweight
structure contributes here by making it easy to write
patterns that match parts of web pages.
A screenshot of LAPIS is shown below. Click
to magnify it.
This research couldn't have been done without USENIX
Student Research Grants.
Many thanks to all the people who have contributed
to the development of LAPIS with
ideas, code, or their precious time: Brad Myers, Laura
Cassenti, John Pane, Dorothy Zabrowski, Brice Cassenti,
Jean Cassenti, Dan Cassenti, Yuzo Fujishima, Rich Clingman,
Monty Zukowski, Julián Jesús Martínez
López, John Padula, Franklin Chen, Brian Kernighan,
David Garlan, Jim Morris, Chris Long, and John Gersh.
Go to the LAPIS home page.
Send comments or questions to Rob Miller, (firstname.lastname@example.org)
Copyright©2003 - Massachusetts Institute of Technology. All Rights