UML Diagrams with MetaUML

I'm a big fan of UML as a standardized notation. I haven't been a big fan, though, of UML generation software. My first experience with UML diagramming software was Rational Rose: I was working for a real estate firm with deep enough pockets to buy RR licenses at the time, or else I doubt I ever would have tried it. What I rapidly found out was that Rational Rose is really great for trivial cases, but breaks down entirely for cases where it might actually be useful. It could, of course, scan your entire project and auto-generate a UML diagram for you. It couldn't, however, figure out which relationships were meaningful and which weren't, and which methods and fields were relevant to a human — so you ended up looking at everything. Are serialVersionNumbers relevant fields? What about relationships to java.lang.String? getters and setters? Is a class with a java.util.LinkedList a relationship to a LinkedList or a one-to-many association with whatever the LinkedList contains? I found that I was spending as much time removing irrelevant details from diagrams just to put something useful together than I would have spent just adding all of the classes "by hand" — and then removing them again when I tried to re-import the latest changes. It was also insanely opinionated on how things should look, so adding annotations and moving things around was usually an exercise in masochism.

Then, of course, there's Visio. It seems like the perfect solution: a box-and-line generator program! It was definitely easier to work with than Rational Rose, but of course was incredibly manual in a drag, drop, point and click sort of way. Even then, it was pretty limited in functionality. If I tried to draw multiple associations out of one class to many others, they all had to originate from the middle of the top, bottom, left or right of the box — it wouldn't let me have one association originate from the middle, one from halfway between the middle and the top and a third from halfway between the middle and the bottom (at least not if I wanted the association to move with the class itself). If I had multiple attributes of the same class (very common!) there was no way to annotate a relationship with two different association names.

Visio is also Windows only; I can run Powerpoint, Word, Excel and even Outlook on my Mac, but it doesn't look like we'll ever get Visio for Mac. There's the web-based Gliffy, but in spite of their best efforts, it can't even match Visio for performance. I tried ArgoUML that somehow managed to combine the worst of Rational Rose (minus, I guess, the license fees) with the worst of Visio. And it looks like it requires JDK 1.6 (that's a maximum, not a minimum!)

Maybe there's a way to work around all of those limitations if you spend a lot of time reading the (not-so-user-friendly) documentation. Since generating UML diagrams was just a small part of my job, though, I never could justify the time it would have taken just to see if that was possible. Instead, I ended up manually working around the limitations and, in most cases, generating substandard documentation. And of course, being the command-line-fanatic that I am, I was always irritated by having to interrupt my flow by clicking the mouse, returning to the keyboard, clicking the mouse, returning to the keyboard... Being GUI-focused applications, they precluded any sort of useful automation. Rational Rose automated too much; the others didn't automate quite enough.

I finally found what i think is the optimal approach (at least for folks like me): MetaUML.

Honestly, TeX, LaTex, MetaFont, MetaPost, etc. have been on my to-learn list for a long time. I dabbled in TeX for formatting mathematical equations when I was in grad school, but I succumbed to time pressure and embarrassed myself by turning in a thesis written using MS-Word instead. I finally found a good excuse to at least scratch the surface while learning to generate diagrams using MetaUML. Rather than dragging boxes and lines around, you create an input file in a text editor like the one in listing 1.

input metauml;
beginfig(1);
  Class.widget("Widget")()
    ("draw()");
  Class.textInput("TextInput")()();
  Class.button("Button")()();
  Class.window("Window")()();

  leftToRight(30)(textInput, button, window);
  topToBottom(30)(widget, button);

  drawObjects(widget, textInput, button, window);

  link(inheritance)(textInput.n -- widget.s);
  link(inheritance)(button.n -- widget.s);
  link(inheritance)(window.n -- widget.s);
  link(aggregation)(widget.e -- window.n);
endfig;
end

Listing 1: Sample UML input file

This is pretty self-explanatory (the .n, .s, etc. refer to the north, south, east and west sides of an element). In particular, it works out how to position everything without my needing to explicit place any of the components; in this relatively simple case, I can just tell it how the objects should be related. The resulting output from the mptopdf command is shown in figure 1.

Figure 1: basic output

I don't like the straight lines; I can "stair-step" them with a few extra changes as shown in listing 2.

input metauml;
beginfig(1);
  Class.widget("Widget")()
    ("draw()");
  Class.textInput("TextInput")()();
  Class.button("Button")()();
  Class.window("Window")()();

  leftToRight(30)(textInput, button, window);
  topToBottom(30)(widget, button);

  drawObjects(widget, textInput, button, window);

  link(inheritance)(pathStepY(textInput.n, widget.s, 10));
  link(inheritance)(pathStepY(button.n, widget.s, 10));
  link(inheritance)(pathStepY(window.n, widget.s, 10));
  link(aggregation)(rpathStepX(window.e, widget.e, 20));
endfig;
end

Listing 2: Stair-step paths

Figure 2: Stair-step output

If you use the topToBottom and leftToRight macros to position things and then you add fields or methods later, mptopdf pushes the associated object down or to the right to make room!

Still, MetaUML is part of the Tex/MetaPost ecosystem, so it takes some getting used to:

  • One of the hardest things to get used to with MetaUML is that the coordinate system works like ordinary cartesian coordinates (positive Y goes up, not down like we're used to), and the origin moves around to accommodate the drawing rather than staying fixed: if you define an object at -100, -100, the origin shifts up 100 units to the right and to the left, rather than the object just being invisible as it would with most other graphical libraries.
  • Each drawObjects call overwrites the previous one(s), so everything has to be drawn at once - but the links have to follow the objects.
  • MetaUML is based on MetaPost, which itself is based on Donald Knuth's MetaFont: an equation solver for font creation. This is how the topToBottom and leftToRight macros do their job. You'll get an inscrutable "Inconsistent equation" error if the constraints are actually unsolvable.
  • If you try to link two objects without including the edges to link, you'll get an inexplicable:
    ! Missing `)' has been inserted.
    <to be read again>
                       {
    --->
         curl1}..{curl1}
    l.50   link(aggregation)(Container -- Contained);
    
  • A metapost file is a program and, just like any program, you'll probably need to debug it after a certain point. Since it's declarative, there are fewer debugging options than with imperative programming languages, but show (which is documented in the MetaPost guide but not the MetaUML guide) works well enough.

Of course, even though I can still work from the command-line, there's still a lot of tedious typing involved if you're trying to document an existing code base: to me, the best compromise between the opinionation of Rational Rose and the open world of Visio would be a converter from source code to MetaUML format which I could then manipulate in a text editor. Fortunately, since MetaUML's input is text, it's not too hard to put together a simple "parser" that does exactly that. The Python file in listing 3 reads in a Java source file, looks for private/public/protected markers and outputs the structure in MetaUML format (that's right, I wrote a Java parser in Python! You got a problem with that?)

import sys
import re

if len(sys.argv) < 2:
  print("Usage: convertJavaToMetaUML <filename>")
  sys.exit(0)

# TODO: doesn't deal with inner classes

infilename = sys.argv[1]
className = ''
fields = []
methods = []
for line in open(infilename):
  content = line.strip()
  tokenizer = re.compile(r'[a-zA-Z0-9<>\.]+|\(|\)|,|/\*|/\*')
  # will skip package-private methods...
  if content.startswith("private") or content.startswith("protected") or content.startswith("public"):
    tokens = tokenizer.findall(line)
    stage = 0
    umlDecl = ''
    returnType = ''
    parameterReturnType = ''
    parameterName = ''
    first = True
    inComment = False
    isMethod = False
    while len(tokens) > 0:
      token = tokens.pop(0).strip()
      # ignore whitespace
      if len(token) == 0:
        continue

      # skip inline comments
      if token == '/*':
        while len(tokens) > 0 and token != '*/':
          token = tokens.pop(0)

      # Special handling for declarations like Map<String, Object>: if the token contains a < but not a >,
      # keep concatenating tokens until the end delimiter is found
      if token.find('<') > -1:
        supp = token
        while len(tokens) > 0 and supp.find('>') == -1:
          supp = tokens.pop(0)
          token += supp

      # public, private or protected
      if stage == 0:
        if token == 'private':
          umlDecl += '-'
        elif token == 'protected':
          umlDecl += '#'
        elif token == 'public':
          umlDecl += '+'
        stage += 1
      elif stage == 1:  # scanning for return type
        if token in ['abstract', 'final', 'static']:
          continue
        # This is either a class declaration, a field, a method, or a constructor.
        if token == 'class':
          while len(tokens) > 0 and className == '':
            className = tokens.pop(0).strip()
          tokens = [] # don't care about implements or extends (at least for now...)
        else:
          if token == className:  # it's a constructor, don't output anything
            tokens = []
          else: 
            returnType = token
            stage += 1
      elif stage == 2: # expecting method or field declaration
        # skip getters, setters and standard methods
        if token[0:3] == 'get' or token[0:3] == 'set' or token == 'toString' or token == 'hashCode':
          tokens = []
          returnType = ''
          break
        umlDecl += token
        stage += 1
      elif stage == 3: # if this is a '(', starts a method.  Otherwise, starts a variable
        if token == '=':
          tokens = [] # variable declaration; ignore rest of line
          continue
        if token == ';':
          continue
        if token == '(':
          isMethod = True
        umlDecl += token
        stage += 1
      elif stage == 4: # scanning for parameters
        if token == ')':
          break;
        if token in ['final', ',']:
          continue
        parameterReturnType = token
        stage += 1
      elif stage == 5:
        umlDecl += ('' if first else ', ') + token + ': ' + parameterReturnType
        first = False
        stage -= 1

    if returnType != '':
      if isMethod:
        methods.append('%s): %s' % (umlDecl, returnType))
      else:
        fields.append('%s: %s' % (umlDecl, returnType))
    else:
      print '%s had no return type' % content

print 'Class.%s("%s")(' % (className, className)
for field in fields:
  print '"%s",' % field
print ')('
for method in methods:
  print '"%s",' % method
print ')'

Listing 3: Java source file converter

This doesn't capture the parameters after the first line on multiline declarations; if I made it any more complicated, I'd probably be better off using a proper lexical parser, but this is simple enough to capture the sort of code that comes out of code generators like JAXB that I find myself trying to get my head around quite a bit.

Add a comment:

Completely off-topic or spam comments will be removed at the discretion of the moderator.

You may preserve formatting (e.g. a code sample) by indenting with four spaces preceding the formatted line(s)

Name: Name is required
Email (will not be displayed publicly):
Comment:
Comment is required
My Book

I'm the author of the book "Implementing SSL/TLS Using Cryptography and PKI". Like the title says, this is a from-the-ground-up examination of the SSL protocol that provides security, integrity and privacy to most application-level internet protocols, most notably HTTP. I include the source code to a complete working SSL implementation, including the most popular cryptographic algorithms (DES, 3DES, RC4, AES, RSA, DSA, Diffie-Hellman, HMAC, MD5, SHA-1, SHA-256, and ECC), and show how they all fit together to provide transport-layer security.

My Picture

Joshua Davies

Past Posts