Thin Air

Scripting languages and IDEs

2006-10-20T18:51:32-07:00

On the Squeak development list there's been a lot of talk lately about creating a scripting language based on Squeak. On the surface it seems like a great idea. Scripting languages are popular, dynamism is in vogue, and it would be nice to be able to use Smalltalk for all the day-to-day utilities and admin tools that tends to get done in Perl or Ruby. On top of that, the main drawback of scripting languages is that there aren't any good IDEs for them. Squeak has a great IDE, and should be able to provide a great script development environment.

I'm pretty skeptical of the idea, because I think scripting languages and IDEs are like oil and water. They just don't mix. What follows is a post I made to the Squeak list defending this position. First, I'd like to define some terms.

IDE - This is a program that allows one to view and manipulate another program in terms of it's semantic elements, such as classes and methods, rather than in terms of the sequence of characters that will be fed to a parser. IDEs might happen to display text, but they also provide tools like class browsers, refactoring and other transformations, auto-completion of identifiers etc, things that require a higher level model of the program than text. Examples include various Smalltalk implementations, Eclipse, Visual Studio, IDEA.

Scripting language - a programming language and execution model where the program is stored as text until it is executed. Immediately prior to execution, the runtime environment is created, the program's source code is parsed and executed, and then the runtime environment is destroyed. This is an important point - the state of the runtime environment is not preserved when execution terminates, and one invocation of a program cannot influence future invocations.

Now, one might quibble over my definition of "scripting language." Fine, I agree that it's not a good general definition of everyday use of the term. But it's an important feature of languages like Ruby, Python, Perl, Javascript, and PHP and one that makes IDEs for those languages particularly hard to write.

Damien Pollet brought up the key issue in designing a Smalltalk-bases scripting language - should the syntax be declarative or imperative?

Imperative syntax gives us a lot of flexibility and power in the language. A lot of the current fascination with Ruby stems from Java programmers discovering what can be done with imperative class definitions. The Ruby pickaxe book explains this well:

In languages such as C++ and Java, class definitions are processed
at compile time: the compiler loads up symbol tables, works out how much
storage to allocate, constructs dispatch tables, and does all those other
obscure things we'd rather not think too hard about. Ruby is different. In
Ruby, class and module definitions are executable code.

Executable definitions is how metaprogramming is done in scripting languages. Ruby on Rails gets a lot of milage out of this, essentially by adding class-side methods that can be called from within these executable class definitions to generate a lot of boring support code. In Java, we can't modify class definitions at runtime, and that's why Java folks use so much XML configuration.

Python does this too. Perl5 is pretty weird, but Perl6 is slated to handle class definition this way as well. Javascript doesn't have class definitions, but we can build up pseudoclasses by creating objects and assigning functions to their properties.

When writing an executable class definition, we have the full power of the language available. You can create methods inside of conditionals to tailor the class to it's environment. You can use eval() to create methods by manipulating strings. You can send messages to other parts of the system. You can do anything.

I'm making a big deal out of this, because I think it's a really, really important feature of modern scripting languages.

Declarative syntax, on the other hand, gives us a lot of flexibility and power in the tools. Java, C++ and C# have declarative class definitions. This means that IDEs can read in the source code, create a semantic model of it, manipulate that model in response to user commands, and write it back out as source code. The source code has a cannonical represenation as text, so the code that's produced is similar to the code that was read in, with the textual changes proportional to the semantic changes that were made in between.

This is really hard to do with scripting languages, because we can't create the semantic units of the program just by parsing the source code. You actually have to execute it to fully create the program's structure. This is problematic to an IDE for many reasons: the program might take a long time to run, it might have undesirable side effects (like deleting files), and in the end, there's no way to tell whether the program structure we end up with is dependent on the input to the program.

Even if we did have a way to glean the program structure from a script, there would be no way to write it back out again as source code. All of the metaprogramming in the script would be undone, partially evaluated, as it were, and we'd be stuck with whatever structures were created on that particular invocation of the script.

So, it would appear that we can have either a powerful language, or powerful tools, but not both at the same time. And looking around, it's notable that there are no good IDEs for scripting languages, but none of the languages that have good IDEs lend themselve to metaprogramming.

There is, of course, one exception. Smalltalk.

With Smalltalk, we have the best of both worlds. A highly dynamic language where metaprogramming is incredibily easy, and at the same time, a very powerful IDE. We can do this because we sidestep the whole issue of declarative vs. imperative syntax by not having any syntax at all.

In Smalltalk, classes and methods are created by executing Smalltalk code, just like in scripting languages. That code creates objects which reflect the semantic elements of the program, just like in the IDEs for compiled languages. One might say that programs in compiled languages are primarily state, while programs in scripting languages are primarily behavior. Smalltalk programs are object-oriented; they have both state and behavior. The secret ingredient that makes this work is the image - Smalltalk programs don't have to be represented as text.

And that's why a Smalltalk-like scripting language wouldn't be worthwhile. It leaves out the very thing that makes Smalltalk work so well - the image. It would have to have syntax for creating classes - either imperatively or declaratively. We'd end up limiting either the language or the tools, or if we tried hard enough, both.

I'd much rather see a Smalltalk that let me create small, headless images, tens or hundreds of kilobytes in size, with just the little bits of functionality I need for a particular task. If they had good libraries for file I/O, processing text on stdin/stdout and executing other commandline programs, they'd fill the "scripting language" niche very well. If they could be created and edited by a larger IDE image, they'd have the Smalltalk tools advantages as well.

I have high hopes for Spoon in this regard. Between shrinking, remote messaging and Flow, it's already got most of the ingredients. It just needs to be packaged with a stripped down VM, and integrated into the host operating system.

Announcements

2006-07-08T19:31:43-07:00

The basic design strategy for OmniBrowser is simple: rather than modelling a browser with one large and complex object (like Browser does), break it up into a network of smaller, simpler objects. From there, the design is pretty straightforward, and it's much easier to build lots of kinds of browsers from the same code base.

This design does have a downside, though. It makes event handling more difficult, because the objects that need to communicate to respond to events are often in distant parts of the network, and can't rely on the the structure of the network to find each other. Early versions of OmniBrowser responded to events, such as a click, with a cascade of messages, with each object letting it's neighbors know about the the event. This had the advantage that each object only needed to know about it's immediate neighbors, but it was also fragile and prone to infinite loops as neighbors repeatedly notified each other of the same event.

My second attempt to address this problem involved the use of a Dispatcher. This was an central object that all notification messages would flow through. As the various parts of the browser were created, they would register with the dispatcher to receive messages. This was an improvement, because objects could send messages to "everybody" rather than to an explicit receiver. But it was still awkward, and the event handling code was still convoluted and difficult to understand.

I've just finished up the implementation of my third attempt, this time based on Vassili Bykov's notion of Announcements. I talked to the folks at Cincom about porting the code to Squeak, but that didn't work out. I ended up just doing a mini-implementation that meets my needs for OmniBrowser. (Actually this was probably what I should have done in the first place. It was probably less work for me to re-implement Announcements from scratch than it would have been for someone at Cincom to get corporate approval to release the code under an open source license.)

Despite all the positive things Vassili had to say about Announcements, I have to admit I was surprised what an improvement it made in OmniBrowser's event handling code. My first pass at the conversion was simple. I replaced messages sent to the dispatcher with announcements sent to the announcer. Then I installed an announcement spy and browsed around the image a bit. It turned out that every event resulted in 3 or 4 redundant announcements, and probably even more unnecessary updates to the UI.

So I made a second pass, explicitly aimed at removing all the redundant announcements. In many cases, this meant finding the ultimate source of a particular announcement. For example, OBSelectionChanged should only be announced from two places in the code. All the other places where it was being announced were redundant, and had to be removed. By spying on announcements, I was able to get a clearer idea of the code flow in response to different events, and find other ways to simplify.

I suspect there's even more simplification that can be made, but even without it, moving to Announcements was a big improvement.

Questions on the versioning model

2006-06-11T22:22:24-07:00

Bruce Badger posted a comment in response to my post on the versioning model used in Monticello 2. He has some questions about methods:

What is the identity (or primary key) of a method?
Within what scope is the identity unique?
If I wanted to use a particular version of a particular method in two classes, could I (setting asside the question of whether this is a good idea or not)?

The short answer is that Monticello two uses the same semantics that the Smallltalk runtime uses. The identity of a MethodElement is class name and selector; it's only guaranteed to be unique within a given image. You couldn't put the same method in two classes, it would have to be copied.

Now, Avi and I have kicked around ideas for a deeper model of Smalltalk code. Rather than identifying elements by name, they'd each have UUIDs. Method sources would be versioned as an AST. The nodes for variable references would have the UUIDs of the elements the variables are bound to in the compiled method.

This would have two advantages:

First, it would help with platform independence. Rather than depending on names to bind variables during compilation, we'd be relying on UUIDs. This would make it easier to transform the names when moving code back and forth between dialects. This would make it easier to handle Namespaces in VW, for example, or differences in platform libraries.

Second, it would allow us to provide a more accurate reproduction of code between images. We'd be restoring methods to their compiled states rather than just their source code. This is one of the things that's so compelling about Spoon, and it would allow Bruce's scenario of the same method version being used in two different classes.

On the other hand, it's that much more code and complexity. It would require a custom parser, an AST able to handle all the syntactic quirks of the various dialects of Smalltalk where Monticello will run, and a compiler back end for each platform. Monticello 2 is already an ambitious project, and a significant improvement over Monticello 1. Our goal for now is to get the current version up to production quality so we can start using it. Maybe some of these ideas will be part of Monticello 3.

Slicing the image

2006-06-01T21:31:12-07:00

In my last post, I mentioned that version history in Monticello 2 isn't tied to packages. Instead, it introduces the concept of slices.

A slice is, quite simply, a set of elements - an arbitrary slice of the code in the image. We can define several different kinds of slices:

Packages

In Squeak, we can use PackageInfoSlice to get packages identical to those used by Monticello 1. In other dialects we'd create slices to interface with the native packaging code - PackageSlice and BundleSlice in VisualWorks for example.

Change Sets

A ChangeSet also defines an interesting slice of the image, and by implementing ChangeSetSlice, we can make them versionable and mergeable, just like packages. I'm really looking forward to this one, actually. It'll make the lives of package maintainers easier, since contributors can just send them change sets rather than full packages.

Modules

Lately, I've become interested in combining Monticello with Spoon. One of the keys to that integration would be to create a NaiadSlice. This would define a slice based on the elements involved in executing a given Smalltalk expression.

Explicit

Probably the simplest kind of slice is defined with a collection of elements. At some point, I'd like to create a UI for easily creating an ExplicitSlice. I'm imagining a window which lists the contents of the slice, and accepts new elements via drag and drop from OmniBrowser. For now, though, ExplicitSlices can be created pogrammatically, and are really handy for testing.

Others

Although they're probably not useful for everyday development, there are other kinds of slice one might want. A FileOutSlice would enumerate all the elements in a particular chunk file. We could do the same thing with the sources and changes files. We could create a slice that scanned the changes file and included all elements modified between a pair of snapshot markers. When demoing Monticello 2 I sometimes joke about creating a slice that includes all the elements that match a given rewrite rule. I don't know how useful it would be, but why not?

For the moment, I've only implemented ExplicitSlice and PackageInfoSlice, since they're needed to acheive feature parity with Monticello 1.

Monticello's versioning model

2006-05-25T22:03:31-07:00

Although Monticello has proven very useful for developing applications that run in Squeak, it hasn't been very helpful in supporting the development of Squeak itself. The problem is that the versioning model used in Monticello 1 is based on the assumption of packages with well-defined and relatively stable boundaries. In Squeak, the well-defined packages have already been removed, and what remains is a large chunk of tangled and inter-dependent code.

Monticello 2 adopts a new versioning model, one that's not tied to packages as the fundamental unit of versioning. Instead, Monticello 2 divides the system into its fundamental elements. In Squeak, Smalltalk code is made up of following elements:

Classes
Methods
Class comments
Instance variables
Class variables
Class instance variables
Pool Imports

Rather than maintaining the version history of packages, Monticello 2 keeps version history for each element.

Right off the bat, this makes it easy to implement a feature that Monticello has never had before: the ability to view previous versions of a given method. More importantly, though, it makes it much easier to deal with fluid package boundaries. Packages can be created, renamed or destroyed, elements can move back and forth between packages, elements can even belong to more than one package at a time. Since the version history is attached to the element, it's not affected.

Another consequence of element-level version history is that merges can be performed on individual elements. Although Monticello 1 supports cherry-picking, it does so in an awkward and non-intuitive way. In Monticello 2, cherry picking is the norm, and merging an entire package is just a special case.

Monticello 2 alpha release

2006-05-24T22:05:39-07:00

One of the things that surprised me at Smalltalk Solutions this year was the continuing interest in Monticello 2 from outside the Squeak world. Now that I'm not working in VisualWorks day-to-day anymore, I've been more focused on solving the problems that we have with using Monticello 1 in Squeak.

However, there is a real need for tools to make cross-dialect development easier, and versioning is an important component of that. After doing a few demos, I had volunteers to maintain VisualAge and Dolphin ports. The VisualWorks folks all seem pretty busy, but I'm sure somebody will step up when MC2 gets to production quality.

With all that momentum coming out of the conference, I cleaned up the code a bit, wrote an installer and posted the first alpha to SqueakMap. The reaction has been mostly positive, particularly given that Monticello 2 is still very raw and there's no documentation at all.

To remedy that I'll post some discussion of the architecture and features of Monticello 2 over the coming weeks.

Humor

2006-05-23T18:13:01-07:00

Today Patricia asked me to explain about "knock, knock." So I explained the call-and-answer sequence and gave a couple of examples. Then I had to tell her that they were meant to be jokes. She just stared at me and said, "but it's not funny." She's right. It's not.

Newfie jokes seem to be universal, though. In Ecuador, they make jokes about "Pastusos" - people from Pasto, a Colombian city just across the border.

New Digs

2006-05-21T22:07:27-07:00

After months of neglect, I've finally put some time in to fixing my blog. I'm ditching SmallBlog and moving to Typo. We'll see if Rails is all it's cracked up to be. As much as I'm a believer in eating my own dogfood, Smallblog was just painfully difficult to post to, and I'd rather spend my spare time writing than tinkering with the infrastructure. I've still got to find a theme that doesn't clash with the rest of the site, but posting works, finally.

The old posts are still available, but I'll migrate them over to Typo as soon as I can.

I've decided to stick with "Thin Air," despite the fact that it's wildly inappropriate these days. Now that I work for Smallthought, I expect to be near Vancouver again soon.

BASTUG Meeting

2005-09-15T11:28:41-07:00

I'm stoked that we're back on track with BASTUG, the Boston Area Smalltalk User's Group. It went on a short hiatus while I got married and Rob's wife had a baby, but now that things have settled down, we've decided on the third Tuesday of the month for regular meetings. That makes for short notice of this month's meeting, but here's where it will be:

Date: Tuesday, September 20, 2005
Time: 6:00 pm

The Joshua Tree Bar & Grille
256 Elm St.
Somerville, MA 02144
617-623-9910

map

Not Messages

2005-07-12T22:04:25-07:00

In my last post, I speculated that a hypothetical "good modelling language" wouldn't revolve around message-sending, but rather focus on the relationships between objects and making explicit the patterns of cooperating objects that we see in good OO design. I shouldn't have imagined that I could get away with being that vague; Vincent Foley quickly asked the pertinent question:

I have a little question: I quite like Smalltalk (though I'm more a Ruby
guy), but I was wondering what you meant by a language that is not
message-oriented? What would that look like?

One of the first things I do when modelling is to identify some of the objects that will be in the model: the nouns in the domain language. These would be objects. I also want to classify objects, so they should have classes or types attached to them.

I also want to describe relationships between objects. I'm imagining a way to build up complex relationships from simpler ones, in the same way that "high level" methods can call "low level" methods. Perhaps the language or libraries would provide basic relationships such as is-a and has-a. By combining several of these I could build complexity. For example, I might define the relationship between an invoice and line items like this:

an invoice has a collection of line items
an invoice has a total
a line item has a value
an invoice's total is the sum of the values of its line items

Now, this sort of declarative definition of relationships is great in the abstract, but I still need a way of causing computation to occur. The state of the program has to change over time, so I need a way to describe the changes that might happen in the relationships of objects at runtime. By defining a transformation, I provide a transition from one state to another. For example, I might say that given an invoice and a line item, the line item can be added to the invoice's collection.

Finally, all this needs to be hooked up to input and output. With the right hooks, input would create new objects, whose existence would trigger a cascade of transformations, state changes in the model, and ultimately, a result going to output.

I'm still doing a lot of hand-waving here, but one can at least imagine that such a language could exist, and perhaps what programming it might feel like. This notion of input triggering a cascade of transformations sounds a bit like monads, which makes me wonder if this is maybe a lazy functional language in disguise. I should probably take a cue from Blaine and learn Haskell.