Thin Air

Beyond Smalltalk

Well, I'm back from Smalltalk Solutions, and now that I've caught up on my sleep, reflecting a bit on what I saw and heard. Rather than reporting blow by blow from the conference, I like to record what I learned while I was there.

For me, the highlight of the official program was Eric Evans' keynote. He took us through a fairly simple refactoring of a hypothetical shipping application, showing some of the techniques he uses to distill a domain expert's knowledge into a model that can drive the design of the application. His thesis was that a good model provides the language that the team uses to communicate about the domain, and should be directly reflected in the code.

One of the points Eric made in his talk was that he preferred language rather than pictures for modelling, and as such, he preferred modelling in Smalltalk or Java rather than UML. Later in the day, I joined a conversation he was having with Blaine Buxton, and the three of us spent quite some time arguing about language design. If Smalltalk is the best modelling language we've yet encountered, and we were to design something better, what would it be like?

One of the key things we settled on was that it would be object-oriented, but not message-oriented. Eric insisted that sending messages is a much lower-level operation than what we do when we talk about models with experts in the business domain, and that a good modelling language should operate at that level. What's missing from OOP, we decided, is explicit capture of the relationships between objects.

Thinking about it over the last few days, it occurred to me that relationships are also at the heart of many of the Design Patterns that have become popular in the OO world over the last several years. Indeed Ubiquitous Language and Pattern Languages serve much the same purpose in a software development context. Of course, the term "language" is overloaded here - human languages, such as English, computer languages such as Smalltalk and pattern languages such as those created by Christopher Alexander or the Gang of Four.

One of the primary criticisms of design patterns is that they're really just techniques for working around language deficiencies. Dynamic typing, block closures and metaclasses make many of the patterns used in the C++ and Java unnecessary in Smalltalk.

Iterator is a classic example. An iterator object encapsulates the state required to loop over the contents of a collection. In Smalltalk, one might use an iterator to send #doSomething to all the objects in a collection:

iterator := aCollection iterator. 
[iterator hasNext] whileTrue: 
    [iterator next doSomething]

The beauty of iterators is that by encapsulating the loop state, they make all collections polymorphic. One can use the same looping code to iterate over any collection, be it an Array, a Set or a LinkedList. The problem with iterators is that they are an incomplete abstraction. They capture the state of the loop, but leave the looping behavior its self implicit, and require the loop to be duplicated wherever the iterator is used. In contrast, Smalltalk's #do: method provides the complete abstraction: by making the implicit loop explicit, it provides resusable polymorphism.

I think that step of making the implicit explicit is important. How might we make explicit, for example, the relationship between an AST and a Visitor? I don't know, but I think such a language would be good for domain driven development.

Posted in design

Modules and Late Binding

Travis Griggs just posted some musings on namespaces and imports in VW. We do things a bit differently at Quallaby. We have very few namespaces - a "main" one for most of the code, one for test cases, and a couple of other special purpose namespaces that help enforce conceptual boundaries. This works pretty well for us; most of the time we only think about it when creating a new package, and even then the norm is to import just the main namespace. Still, it feels like this is a way to avoid the problem without really solving it.

This issue has come up several times on the squeak-dev list in recent months and has been debated pretty extensively. There hasn't been anything even approaching a consensus, but a couple interesting tidbits have come up.

Forth has been put forward as an example of how to do namespaces right. The idea, as I understand it, is to decide on how the names in a module should be resolved, not when a module is defined, but when it's loaded. When you load a module you give the compiler an (ordered) list of namespaces to look in to resolve names, and a "target" namespace, where the names defined in the module will be placed. (This seems pretty unusual to me - I don't know of any other language that allows a module to be compiled without reference to it's own contents!)

This is an attractive idea to me because it casts the issue of namespace and imports as a question of early- vs. late-binding. Do we decide on how variables will be resolved when the code is written, or when it's compiled?

Another option that takes that idea even further is Dan Ingalls' "Environments," which was used as part of the (now-defunct) modules system in Squeak 3.3. It pushed name-binding even later, from compile-time to execution-time, by making it a message send. Instead of writing dotted names (Module.Class new), you'd send messages: Module Class new.

It would be interesting to see how late-bound module dependencies work in practice.

Posted in design

Tyranny considered beneficial

Ian Bicking has posted a couple thoughtful comments on this versioning thread. This one in particular caught my attention. In conclusion, he writes:

In the end, it doesn't seem like Smalltalk is an evolving language, and
there's no one to even go to to ask for new features (who would realistically
respond in a positive way). So it's hard to compare; in all of these
languages, you can cope somehow. You can always do code generation, after all.
In evolving languages there are options besides coping; that people respond to
that and ask for changes doesn't indicate a more flawed language.

I wonder if much of the disconnect between Smalltalk and the newer breed of dynamic "scripting" languages stems from this difference in social organization. The Benevolent Dictator is a very common pattern in the open source world - think Linus Torvalds, Miguel de Icaza, Larry Wall, Guido van Rossum, Yukihiro Matsumoto, and many more. In the Smalltalk world, we have icons like Alan Kay and Dan Ingalls, but they've moved away from active leadership roles in the community. That makes it hard to see all the things that are going on in the community.

In fact, the Smalltalk world is evolving quite rapidly, despite (or perhaps because of) the lack of a dictator to set direction and coordinate development. In the last few years there have been several notable language changes in the Smalltalk world:

  • The Croquet project introduced new syntax to make their 3D work easier: matrix literals and message sends using positional rather than keyword parameters.
  • The folks at SCG introduced Traits, a new mechanism for composing object behavior that avoids the problems of multiple inheritance and mixins. It will probably be part of Squeak 3.9 (and, incidentally, Perl 6).
  • Cincom added Namespaces to VisualWorks.
  • Cincom removed class variables from VisualWorks.
  • There have been 3 new implementations of Smalltalk: #Smalltalk, which runs on the .NET VM, Ambrai for Mac OS X, and OOVM, for embedded platforms. There may be one or two others that I'm forgetting at the moment.
  • Brian Rice and Lee Salzman created Slate, which is a Smalltalk variant with a number of language-level changes, most notably multiple dispatch.

The thing is, these don't feel like languages changes to people outside the community. For one thing, they only affect one dialect. There's no "offical Smalltalk" that one can refer to for a language definition, and so we end up using either Smalltalk-80 or ANSI Smalltalk when the need arises. Of course, the real meaning of "Smalltalk" is more of a weighted average of all the dialects, a kind of eigenlanguage that you can only absorb though experience.

Ian is absolutely right, there's no one to ask for a change in the language, but neither is there anything to stop you from doing it yourself if you feel the need. And isn't that the spirit of open source?

Posted in community

Versioning Smalltalk

Having been working in Smalltalk for a few years now, I find I occasionally forget just how different it is from the mainstream world of programming. The other day Avi posted about the recent interest in versioning systems and how what we're doing in Monticello is both similar and different to what's going on in other languages.

On the one hand, we're wrestling with the same information-theoretic problems as all other versioning systems. Essentially we want to be able to merge the work done by developers working separately in such a way that changes that don't affect each other are handled automatically, but those that do conflict are detected so that a human can figure out how to harmonize them. We want the merge process to be fast, the history data to be compact, and the restrictions placed on how developers work to be minimal.

On the other hand, Smalltalk code isn't like that of other languages. The issue isn't so much where it's stored - text files or image files - but how it's created. The structures needed to execute the code at runtime, classes and compiled method objects, are built up directly by the development tools. The only text involved is little snippets that make up method bodies. Heck, even when Smalltalk is written out to a text file, that file just contains a series of expressions that can be compiled and executed to rebuild the same executable objects in another image.

So for large parts of a Smalltalk program, there is no text to version. This is a problem because it means versioning Smalltalk programs with the same tools that the rest of the world uses is very difficult.

It can be done, of course. The precursor to Monticello was called DVS, and was mainly concerned with representing Smallalk code textually so that that we could version it with CVS. It would scan the text files for CVS's conflict markers and present them to the user for resolution. This worked ok most of the time, and was an improvement over collaborating via change sets.

But CVS has problems (hence then need for Subversion, Arch, Monotone, darcs, Codeville, BitKeeper etc.), and DVS wasn't able to completely bridge the gap between the objects created by the Smalltalk dev tools and the textual representations that CVS was dealing with. The result was lots of bogus conflicts. If two developers created methods that sorted near each other alphabetically, for example, that would be a textual conflict as far as CVS was concerned, but not a conflict at all in the Smalltalk world.

In trying to work around these problems, DVS had grown from a "little utility" for versioning Smalltalk code with CVS into a versioning system that used CVS as a backend. The only way to improve it was to ditch CVS and do the versioning in Smalltalk. And this is where the lack of a textual representation turned into an opportunity.

A Monticello snapshot is a list of definitions that make up a package. Working with them is almost absurdly easy compared to working with text. The standard diffing and patching that tools like CVS do is trivial, and that let us put our effort into solving the harder problems that the post-CVS generation of tools are tackling. As Avi noted, the solutions we came up with work, but they're not very elegant, and now we're looking for better ones.

Now, Smalltalkers tend to be enthusiastic about Smalltalk, and that can come across as arrogant. zippy's reaction isn't all that unusual. But I think language holy wars are a distraction from the intent of Avi's post. Smalltalk really is different from other languages, and that makes it interesting. What happened with Monticello is a recurring pattern. There's lots of tools out there that the Smalltalk community can't use, and so we're forced to write our own. Fortunately doing so easier than one might think, and what we end up with is pretty good.

The other thing that's easy to miss is that the Monticello approach can be applied to any language, not just Smalltalk. It's a bit more work, because you need to parse the language syntax before doing versioning operations, and of course, you loose the language-independence that text based tools have traditionally enjoyed.

Even so, I think mainstream versioning systems will end up there eventually. IDEs are leading the way - Eclipse, IDEA and their ilk are gradually replacing generic text editors like vi and Emacs, opening the way for syntax-aware versioning. The Stellation project was pursuing this, though it doesn't seem to have made progress for a while.

In the meantime, it'll be interesting to see how Monticello evolves as we make the most of our handicap.

Posted in monticello

Scalability

Now that I've been at Quallaby for a little while, I've begun to get a sense of what is going on in our app. The most striking thing is an apparent contradiction: At first glance it's an incredibly boring, even trivial application. We fetch files from remote machines, parse them, and load the data into a relational database. Users view the data via a web-based reporting tool. But when you look more closely, the gymnastics we have to go through to accomplish this are amazing.

One reason is scale. We're processing statistical data gathered from devices on very large networks. The exact volume varies from customer to customer of course, but it's pretty easy to get over a million records per minute going into the database, hour after hour, day after day. It's so much data that statistical reports on it can't be computed on the fly. It all has to be pre-computed as the data is loaded, or the reporting interface won't be responsive enough to be usable. Of course, that puts even more stress on the backend - that "trivial" application to fetch data and load it into the database.

Another source of interesting complications is the nature of the statistics we need to compute. Conceptually they're pretty simple; for example, the number of packets sent or received on a particular port of a particular device. But the method for locating those bits of data is enormously variable, as each type of network device presents the information differently. So we make this part of the application scriptable, and turn over the job of dealing with the quirks of ATM paths, in-octets and QoS thresholds to the networking experts.

As a result, the subproblems we have have to solve to get data from A to B are fascinating. Take scripting: we currently have several DSLs for specifying how data should be handled as it goes through various stages of processing on its way to the database. There are too many of them, in fact, and we're currently working on consolidating the user-scriptable portions of the app on two languages: ECMAScript and SQL.

From a computer-science point of view these are really interesting choices. On one hand we have a dynamic, imperative, prototype OO/functional language. ECMAScript might be described as a cross between Self and Lisp, wrapped up in C syntax. It fits in nicely with many of the things we're used to doing in the Smalltalk world, but with a more mainstream syntax.

On the other hand, we have SQL, a declarative query language based on relational algebra. But instead of executing the queries against tables in a database, we're applying them to virtual tables representing data in network devices, intermediate results as it moves through the processing pipeline, or in any one of several tables in the central database. Naturally, the implementations of both languages have to be robust, memory efficient and fast.

Personally, I'm fascinated by computer languages, so for me this is the most interesting part of what we're doing. But there's gobs of other interesting problems that we run into: memory management, execution optimization, cluster computing etc. Recently we've been digging into the research that Google Labs has been doing in this space. We don't have scale nearly as high as Google does, of course, but we're running into many of the same issues they are, and every good idea helps.

Posted in design

Automated Builds

Travis Griggs, over on This TAG is Extra has a post on doing automated builds in VisualWorks. We do this at Quallaby too, and we're expecting to make changes to the process in the near future, so here's a summary of how we do it now. Once the new process is in place I'll post again on what we changed and why. I'm still a little new here at Quallaby, so I'm a bit hazy on some of the details about how things work, but this is the gist of it.

Unlike KeyTech, we have only one product, but it involves many components that run on different machines and communicate with each other. To keep things simple, we just use one image for all the components and configure them at installation to perform different functions at run time.

Our build runs on a headless Solaris server and is invoked every night at midnight via cron. There's a directory for our current development stream that contains the static elements of the build - the VM, a base image, pre-compiled binaries for some external libraries we use etc. This stuff is organized by platform, with the platform-independent bits going to a 'common' directory.

The base image has just enough in it to get the process going; Store is installed and the repository is configured. When the build is kicked off, the base image is launched and begins loading code from Store. It first loads the most recent version of our root bundle, then updates any packages that are more recent still.

Then it creates a directory for the build, naming it based on the development stream, version and timestamp of the build. Within this directory are three subdirectories: release, test and working.

The contents of the 'release' directory are what we would ship to a customer. There's a subdirectory for each platform we support, each containing a complete release for that platform - VM, headless and headful images, shared libraries and a couple of installation scripts. The build populates these directories then builds tarballs for delivery.

The 'test' directory is for running the unit tests. The build image saves a headless image into this directory, and also copies some files with test data and generates a test script. Then it launches the test image. The test image reads in the test script, which launches a headless test runner. As the tests run, the results are logged to a text file in the same directory, and any errors that occur are caught and the stack is dumped to a text file. When the tests are complete the results are mailed to the team and the image exits.

The 'working' directory is for continuing with development based on the the code in the build. Again it contains platform-specific subdirectories for each architecture, each with a VM, precompiled libraries and headful images. The basic code is saved into an image called 'build.im' and then other images are saved and launched with a parameter for customizing the image. Several of the developers have their own customization code which gets called based on the parameter, and might involve loading goodies from cincom public store, configuring key bindings or anything else that might make development more pleasant.

All in all this process works pretty well for us. I find the 'working' directory a nice touch, as it means we don't spend much time configuring our development environments, and we have no qualms about discarding images and starting fresh each day or even more frequently.

There are a couple of things we'd like to improve. For one thing, it takes too long to run the tests - over an hour at the moment. Some of this can be improved though old-fashioned optimization; many of the tests do a lot more work than necessary, and we can get the same level of coverage with simpler and faster tests.

We also have tests that probably shouldn't be run during the build at all. These tend to be end-to-end tests of complete subsystems, using data with known characteristics. They take much longer to run than unit tests and should be broken out into a separate (but also automated) testing regime. If we can get the tests to run in about 10 minutes or so, it would be feasible to run builds automatically after publishing, as Travis describes.

The other problem we run into sometimes is broken builds. The code that loads packages from Store isn't very smart - it doesn't distinguish between trunk and branches, and doesn't pay attention to dependencies. Occasionally this leads to packages not loading correctly, or test runs going horribly wrong. Shouldn't be hard to fix this up.

Posted in programming

Farewell Ecuador

That's it. My sabbatical is over. Reality being what it is, it couldn't last forever, and recently I decided it was time to wrap things up in Ecuador, head back to North America, and figure out what to do next.

So much for that plan. A week later I was on my way to Boston for a job interview. A week after that I was frantically packing. I'm now camped out on the floor of my new and very empty apartment in Lowell, Massachusetts. I've just started a new job at Quallaby, working on the back end to PROVISO.

Now that I've had a little time to get my bearings, all I can say is wow. The scale of this project is amazing, and the quality of the code and the development processes it entails is impressive. I can tell I'm going to learn a lot here and have a lot of fun in the process.

One of things that has been most interesting so far is using VisualWorks for actual work instead of just poking around. I'm constantly noticing little details of the interface and development tools - where they're the similar to Squeak, where they're different. I've reached two main conclusions so far: (1) The VW tools are very good, and (2) Squeak holds its own much better than I thought it would.

All in all, I've very happy to be where I am, and doing what I'm doing. It's a chance to learn from some very skilled Smalltalkers and very difficult problems.

I'm gonna miss Ecuador though.

Posted in ecuador

Galeras

The other day I noticed a odd-looking cloud on the horizon. There was the usual horizontal band shrouding the mountain tops, but rising out of it was a vertical plume that was amazingly tall. It seemed like a mini mushroom cloud or maybe a volcanic eruption. But it was too small, and there was no ash... I shrugged and went inside.

Today I found out that it was, in fact a volcano eruption. Volcán Galeras in south-western Colombia erupted this week, with plumes as high as 6 km. No wonder I could see it all the way from Quito. Apparently the ash got as far as Ibarra, which is a town about an hour north of Quito.

Posted in ecuador

Ya Mismo

Ya mismo is an expression that Ecuadorians use a lot. If you ask a question like "when will lunch be ready," chances are good that the answer will be "ya mismo." In theory, it means "right now," or "immediately." However, a Canadian friend of mine who has lived in Ecuador for a while has a favorite rant about how the real meaning of ya mismo is somewhere between "in two minutes" and "in two weeks".

{{http://www.wiresong.ca/static/blog/clock.jpeg|Clock}} Complaining about it doesn't do any good, of course. Things happen on their own schedule here, called la hora Ecuatoriana - Ecuadorian time. Ecuadorians themselves seem to view la hora Ecuatoriana with a mixture of amused cynicism and rueful helplessness. It's terrible, they'll agree, but what can we do?

The government's answer to the problem was to launch a public campaign to promote punctuality. The clock on the right is probably the most significant aspect, there are quite a few of them scattered around Quito. The sign says, "It's time to be punctual." I don't know how effective the signs are, but the clocks, which also display the temperature, are pretty handy.

Of course, the campaign didn't get off to a very good start. The official launch was delayed 15 minutes because President Gutiérrez was late.

Posted in ecuador

Bodily Processes

Over on Exploration Through Example, Brian Marik notes that biological systems, despite being fantastically flexible and resilient are kind of ugly from an engineering and design point of view, as they flagrantly violate the OneResponsibilityRule:

The fact is, the body is a gross kludge. You'd fire anyone who designed
software that way.

To me the interesting thing about biology isn't the designs that one finds in living organisms, marvelous as they are, but the process that produces them. Evolution is the ultimate agile development process.

The key to this metaphor is that evolution doesn't design bodies - individual organisms - but genomes. Species. Individual organisms are nothing more than test runs for the genome, with the world as a test fixture with only one assertion: any organism that creates a new copy of the genome gets a green bar. Development proceeds by making small design changes, testing them, and immediately applying the feedback.

The amazing thing about evolution is not that it's able to produce designs that work, but that it's able to do so without intelligent direction. It's the process of constantly gathering and applying feedback that enables evolution to work so well.

The same can be said of agile development in the software world. The problem with BigDesignUpFront is that it relies too much on the intelligence of the designers. Nobody is smart enough to create a design that can perfectly meet the needs of everyone who will ever use a piece of software, nor anticipate all the ways in which it might need to be modified or extended. By seeking and applying feedback, agile developers eliminate the need for an omniscient designer, and bring the task down to a level that mere mortals can handle.

Where agile software design differs from evolution is that we do apply intelligent direction to the process, and so we have to arrange our designs so that we can bring our intelligence to bear on them. The agile practices that aren't about feedback tend to be about comprehension. Refactoring, pair programming, retrospectives, group ownership and so on are all aimed at making the design and process as comprehensible as possible to as many people as possible.

And isn't this what OneResponsibilityRule and OnceAndOnlyOnce are about anyway? After all, one organ that performs several complementary functions is more efficient than several organs, from a strictly functional point of view, and that's actually quite beautiful in it's own way.

Posted in programming

Prev Next