Understanding Interaction Styles to Make Your UIs Richer.
Command line, menu, form, direct manipulation, these are the basic interaction styles, as Shneidermann calls them, or dialogue types, as ISO 9241 calls them. They are your alternatives for interacting with the users. They go back to some of the earliest personal computer UIs. Command line, menu, and form predate GUIs, while direct manipulation (DM) is impractical without reasonably large bitmapped displays and a precise pointer such as a mouse.
At one time, it was useful to contrast these interaction styles and designers would consider the appropriateness of each for a given application. In modern GUIs with windowing environments, however, multiple styles coexist within the same application or even for the same feature. Users can now seamlessly switch from one style to another without any thought. The styles blend together -you can’t easily tell where one style ends and another begins. For example, consider your basic Open dialog box. At first glance, it appears to be a form, where the user completes necessary information then executes it. However, choosing a file from a list box is essentially picking a file name from a menu. Even selecting the Open button rather than Cancel is essentially picking from a two-item menu. Or is clicking a button direct manipulation? The Open dialog box provided through Microsoft’s common dialogs also supports drag and drop, further complicating the question.
It could be argued that it doesn’t matter anymore. The key question facing the designer of a UI for a feature is not the interaction style but the proper selection and combination of controls. Command button, check box, text box, and so on have become the building blocks of design, rather than menu, form, and typed command. Does it matter that a list box is a menu of sorts? Maybe it matters more that a list box is faster and easier to use than a drop down list, but at the price of consuming more screen real estate.
However, this control-oriented approach to user interface design has led to a certain degree of stagnation. While I do not endorse the wholesale discarding of the GUI, which has been a great success, I believe it can be given a shot in the arm by revisiting the classic interaction styles within the context of the GUI. Advancements of the basic GUI may come from more effectively combining and integrating the aspects of each style than is currently convention. Furthermore, as we build our libraries for rich-interaction web applications, interaction styles have much to teach us on making a rich UI rich.
Inductions and Deliveries
For GUIs, it’s clear that is doesn’t make sense to talk about the interaction style of a feature, little alone an entire application. However, we can talk about the interaction style of a specific induction that a user sends to the computer. I define “induction” as an instance of an action on a particular data object. An induction has three components:
- The data object, which is a logically persistent instance of content or information that exists independent of the user interface. A data object may be a database record, a document, or a shape or letter within a document.
- The action to be applied to the data object, the action being an logical atomic process that changes the state of the data object or the collection it belongs to. Actions include displaying, creating, deleting, duplicating, converting, associating, and dissociating the object, and changing object attribute values.
- Optionally, parameters detailing how the action is completed. A parameter may be another data object, such as when associating two data objects.
For example, opening the file “bunny-rabbit.doc” in an editor application is an induction, where opening is the action, and the file is the data object, and the specific editor to use is a parameter. At a low level, the task of the user breaks down to delivering the right sequence of inductions to the computer.
The induction is the actual information as received by the computer, but a user interface may support multiple deliveries for a given induction, where a “delivery” is a method or pathway the user may use to transmit the information to the computer. Different deliveries supported by the UI may use different interaction styles, or what we should now call “delivery styles.” For example, one can open a webpage by picking the page name from a Favorites list (Menu style), or by dragging and dropping a shortcut of the webpage from the desktop to a browser window (DM), or by typing the URL into the address bar (Form).
A given delivery doesn’t necessarily explicitly specify all three induction components. Any component may be default. For example, when opening a file by double-clicking it, the user specifies the file, but the action (opening) and the parameters (into what application) are default.
With that as background, I’d like to define three base delivery styles:
- Command. The user assembles an induction relying on a memorized list of possible components. Generally this is done by typing and traditionally each component is a delimited “word,” and a special key such as the Enter key signals the end of the delivery. However, by my definition, the terminating signal could be anything within the delivery and a complete delivery can be as little as a single key press.
- Menu. The user assembles an induction from a displayed presentation of possible components. In GUIs, the components are usually selected by clicking on the display of the component with the mouse, but components may also be selected by entering some token, such as a function key, that represents the component. If the menu uses such tokens, by definition they are displayed with the component.
- Direct Manipulation (DM). The data object is visually represented and the action is delivered by a gesture applied directly on the object image, where the effects of the action are shown on the object as the gesture is applied. Usually, the gesture is some sort of dragging motion on the object image with the mouse, but it could be other acts, including a single click. However, the mere selection of an object does not constitute a complete induction.
An induction with all its components can be completed with any one of these styles alone. For example, to put a copy of the document “bunny-rabbit.doc” in the “babies” directory with a command style, the user can type at command line “copy bunny-rabbit.doc babies,” specifying action, object, and a parameter, in that order. With the menu style, user can pick the Copy menu item, which pops of a list of documents to copy; on choosing bunny-rabbit.doc, the next menu lists places to copy the document to (including the babies directory), which the user can select. With DM, the user sees the bunny-rabbit.doc object image (e.g., an icon) and the babies folder, and ctrl-drags the bunny-rabbit.doc object image to the babies folder.
These definitions are intended to divide the styles along functional criteria to highlight their advantages and disadvantages in design. For example, a command style implies the user will be using only recall to create a delivery, while a menu style implies the user will use recognition. A menu style will consume more display space than command. Of the three DM is the most metaphorically consistent with the manipulation of physical objects.
The definitions are also intended to be interpretable when applied to GUIs with clear differences among them. Clicking a button is not DM because a button is not a data object. Clicking a button is Menu style, as is clicking a link, even a link embedded in a paragraph of text. Executing Save through keyboard access keys (Alt-F, S) is also menu style, since the menu is visible while making the delivery. However, executing Save through the Ctrl-S accelerator is, perhaps surprisingly, Command style.
Command versus DM
To fully apply Command, Menu, and DM as defined above to inductions in a GUI, they should be regarded as pure styles on a continuum, a delivery spectrum.
At one extreme is Command, a verbal means of delivering an induction, paralleling conversation, where the user “says” a command to the computer, and the computer replies back. Most command line languages explicitly exploit the conversational metaphor. We type “copy bunny-rabbit.doc baby,” echoing in the syntax how we would command a human: “copy the bunny-rabbit document to the baby directory.” There’s no reason we couldn’t make a command line interpreter which handles syntax as “bunny-rabbit copier baby,” that is, input, function name, output. Temporally, that would make a lot of sense. However, we deliberately choose syntax to make commands mirror human speech presumably for usability reasons. The very name of the delivery style “command” indicates a verbal way of thinking. Commands imply verbal behavior. No one talks about flicking a wall switch as “commanding” the light to come on.
At other end of the delivery spectrum is DM, a spatial means of delivering an induction. Data objects appear as images arranged in space that might correspond to physical reality, such as letters on a printed page, but space can also be used metaphorically, such as to represent “folders” to store documents, recognizing that the folder is merely a user-defined category for the document, and does not represent its physical location on the disk.
Where objects, actions, and parameters are usually represented as words in a command style, in the DM style, data objects are represented as object images, actions are visible user gestures applied to the object, and parameters are spatial details of those gestures (e.g., what part of the object is dragged and where it is dragged to). The feedback is also spatial, for example showing the object image transversing the screen as it is dragged. Induction completion is indicated by the object assuming a new location or shape, not a verbal confirmatory reply like “1 file(s) copied.”
All this, of course, is a deliberate attempt to mimic human interaction with physical objects, to remove the apparent intermediary command-following agent and provide apparently more “direct” manipulation of data objects. It is the use of this spatial-physical metaphor that results in DM having reversed the syntax typical of Command. A person will say “move that folder” to someone else, but when a person goes to do it, she or he must first reach for the folder before moving it. Thus, DM uses object-action syntax in contrast to Command action-object syntax.
DM and Command are in their pure forms opposites of each other along the verbal-spatial continuum, representing two very different ways of even thinking about human-computer interaction, with Command seeing the interaction as communication between two agents, and DM seeing the interaction as one agent (the user) utilizing a tool on a set of objects. These are useful metaphors, but it’s important to recognize they are only metaphors that are only good if they help us design. We should avoid letting them limit our creativity.
Parenthetically, if you find my use of the terms “induction” and “delivery” awkward, I’m with you, but I couldn’t think of any alternatives that aren’t biased towards one end of the spectrum or the other (e.g., instruction and expression, or operation and implementation).
By seeing delivery styles as a spectrum, we can see Menu as a style in between the verbal Command style and spatial DM style. The exact point on the spectrum depends on the menu’s specific design. Selecting an object image with a right click then selecting an action from the context menu and feedback represented back in object image leans towards the spatial side of the spectrum. Textual links imbedded in a paragraph of prose lean towards the verbal side.
The range of delivery of Menu over the spectrum is significant. In providing a menu style of delivery, the designer has a certain degree of flexibility to combine verbal and spatial aspects to best suit design goals.
While Menu delivery can range over a short range, there is still a boundary that distinguishes a “pure” menu from Command and DM. A menu item may be verbal like a command. A menu delivery may even follow the syntax of a verbal command where the action proceeds the data object and parameters. For example, the user may first choose the Format pulldown menu followed by the Background menu item, followed by a Fuchsia menu item from the cascade menu, essentially saying “Format the background color as fuchsia.” However, as long as the actions, objects, and parameters are displayed to the user, it’s a pure menu delivery style.
In DM, the objects are displayed, just as objects may be displayed in a menu as menu items to complete an induction. The difference is that in DM all induction components and the feedback are proximal and integral to the object images. For example, in moving a data object with DM, the user drags the object image to the new location. The induction’s action (move), the parameters (the new location), and the feedback (object appears in selected location) are all proximal and integrated with each other and the larger representation of the collection of data objects. When moving a data object with Menu, the user selects the object to move from one menu, the Move action from another menu, and the location to move to with a third menu (not necessarily in that order). The menus for the action and location are not part of the object image nor are they all spatially associated with the object image.
Part of the beauty of the GUI is how successfully these alternative styles are compactly combined. When users move a rectangle in a drawing program by dragging and dropping it, they are using the DM style. But when users move it by selecting the rectangle and using the cut and paste menu items, they are using the Menu style. The same visual entity, the rectangle, acts as both an object image for DM and a menu item for Menu. The combination is so seamless that users hardly notice the change in styles.
Wait, What Happened to the Form Style?
GUIs, as I’ve said, blend the three pure styles. A given delivery may combine aspects of styles in any number of ways, resulting in many possible styles along the spectrum. One case of this is the Form delivery style, which includes aspects of Menu and Command.
In GUIs, form interfaces are typically implemented as dialog boxes. Inductions are delivered by navigating to the dialog box often through Menu, often specifying the action and object in the process. The dialog box then provides a place to enter parameters for the command. If all the parameters are specified with limited-choice controls like option buttons and drop down lists, the delivery remains technically Menu style. However, if one or more parameters are specified with a text box, the delivery now includes aspects of a Command: a component is entered as a alphanumeric sequence relying on memory.
Thus the Form delivery style, with its combination of displayed and non-displayed component possibilities, is not a distinct delivery style but a hybrid of Menu and Command. It lies between Menu and Command on the verbal side of the spectrum. In structure, your basic dialog box parallels the command line. Action is specified first (through opening the dialog box) and is followed optionally by the user specifying the data object and parameters with the controls in the dialog box, any of which may have default values. Parameters are delimited not by separating spaces seen in command line but by separate controls. The induction is finalized with the Enter key or the mouse equivalent of clicking of an OK button. In using a dialog box, the user is giving a verbal command, albeit a prompted one. Even the term “dialog box” signals the verbal metaphor invoked.
While firmly verbal like the Command, the Form’s inclusion of Menu aspects makes it a “friendlier” than the usual Command. Default object and parameter values are visible for user verification, not implied like in command line. Parameters with a fairly limited number of options may be selected rather than typed, reducing errors and often allowing faster entry (a user can click on an option button in about the same time it takes to type 5 characters plus the space bar, not counting time to switch from mouse to keyboard in either case). Action and parameter names can be complete words or phases rather than cryptic abbreviations seen in command line in effort to keep them fast and easy to type.
Delivery Styles Multiply Like Rabbits
Symmetry dictates that if Form is a blend of Command and Menu, then there must be blend of Menu and DM. There is, and you’ve probably used it yourself, but it doesn’t have a well-known name. I call it the “Pointer-tool” delivery style.
One of the disadvantages of DM is that it is hard for the user to predict exactly how an object image will respond to a mouse gesture such as drag and drop. Will it move or copy the object? Often the action for the induction will depend on combining the gesture with an arbitrary meta-key. Alternately it may depend on exactly where the user applies the gesture on object image –the particular handle selected. For example, in a drawing program, dragging one handle may change an object’s height, another changes the width, another rotates the object, and another may move the entire object. Handle or drag imagery attempts to communicate the action, but imagery is not necessarily understandable, and its meaning may have to be memorized.
The Pointer-tool style avoids this problem by the user selecting the action component of the induction from a “palette” of “tools” (a mixed metaphor if there ever was one, but there you go). This determines the action carried out by the mouse. The gesture itself (e.g., start and end points of the drag) determines the parameter values. For example, to move a polygon in drawing program the user uses the default Select tool to drag the polygon. To change the position of the vertices of the polygon, the user selects a Shape tool and drags individual vertices. Often in Pointer-tool there are separate tools to create different classes of objects, where dragging determines the size and position of the object. With Pointer-tool style, the user can easily anticipate the action because it is explicitly labeled in the palette.
The metaphor for the Pointer-tool style is straight out of the spatial-physical side of the delivery spectrum. The act of selecting a tool is, as the name suggest, is analogous to picking up a physical tool from a tool rack. The gesture is the act of applying the tool to the object. The use of icons to represent tools on the palette is an attempt to make the tool seem more like physical things. However, there are aspects of Pointer-tool that shift it away from pure DM towards the verbal side of the spectrum. First of all the syntax is now action – object, more along the lines of verbal command, although still, in the context of the tool rack metaphor, consistent with the physical world. More significantly is the palette. It’s functionally a menu: a displayed list of potential induction components.
So we have not three, not four, but five delivery styles, all present in modern GUIs, sometimes in the same application, and sometimes providing alternative deliveries to create the same induction.
But wait, there’s more. Form is only one way to combine Menu and Command aspects. There’s no reason you can’t combine different aspects in different ways to make additional styles. There’re at least two others I can think of that are already seen in today’s UIs. I mentioned that the use of accelerator keys, such as Ctrl-S for save, is by my definition Command (with object and parameters default). But what about copying an object through accelerators? Selecting the object is Menu-like, but the accelerator (such as Ctrl-D in certain apps) is Command-like. Neither pure Menu nor Command, and clearly not Form, we need to regard this as a style of its own, although one with limited applicability in current GUIs.
Then there’s Microsoft’s Intellisense, a distinct hybrid of Command and Menu of its own often seen in GUI programming environments. As the user types a function name, prompts appear labeling parameters and providing menus to choose values. Like Form, this eases the memory burden inherent in pure Command styles, but, in contrast to Form, it allows the user to work within the compact confines of lines of code.
There’s more than one way to combine Menu and DM too. Have you ever tried to drag and drop a document in MS Windows using the right mouse button? Do so, and you’ll see yet another way aspects of the menu are applied to make the action delivered with the DM gesture unambiguous.
Effectively Integrating Styles
Only the designer’s imagination limits the number of combinations of style aspects along the spectrum. All it comes down is decomposing the induction into its components and choosing the best style for each. Are users having a hard time remembering the action associated with drag and drop in DM? Add some Menu in the form of Pointer-tool, or something else. Does it take too long to specify an action via Menu? Add some Command in the form of an accelerator or other memorized shortcut. However, for a style to perform well, the aspects have to effectively integrated and the resulting style has to be cohesive with the other styles used in the interface. There are several attributes of successful combinations:
- Overlap. Successful styles re-use components of other styles. For example, the same display of objects used for DM is also used as a menu for Menu. Menu items used for Menu also display the accelerators used for Command. The same dialog called up through a menu item is also called up through an accelerator. Changes made through a dialog are reflected immediately in the objects on the screen, just like in DM. An example of poor overlap would be the old Motif convention of the Command Area, where a text box is hung on the bottom of a primary window to provide users a location for entering arbitrary commands.
- Openness. Successful styles do not force users into modes, instead allowing them to change styles on the fly, sometimes in mid-delivery. A user may select an object to drag it to the Recycling bin, but then elect to simply hit the Delete key instead. In using Intellisense, the user may select a suggested value from a menu, or simply keep typing to complete the delivery; the user doesn’t have to dismiss the menu. Good integration means never having to explicitly opt out.
- Smoothness. Successful styles allow the efficient execution of inductions, consuming only necessary user time, effort, mental capacity, and visual field while avoiding jarring transitions that take time for the user to re-orient. For example, the best dialog boxes are kept small and simple, allowing the user to see the selected object it applies to so that continuity between the object and the other components of the induction are maintain and the user doesn’t have to keep the selected object in memory. Intellisense is even smoother, with menu options appearing immediately under the user’s typing as part of the document.
- Consistency. Successful styles use idioms consistent or similar to the other styles it seeks to combine. If right-clicking brings up a menu, then right dragging should also result in a menu. The menu of actions in a Pointer-tool style (the palette) are represented as physical “tools” to be consistent with the physical metaphor used in DM.
Rich Apps are Broad Spectrum
The basic desktop GUI owes much of its success to its combination of multiple styles under these principles. Many desktop GUI apps commonly integrate DM, Menu, Form, and a little Command (the latter as accelerators). Pointer-tool is commonly added too, making for a full-spectrum UI, the very definition of a rich app. A complex and capable application -a “sovereign” application, to use Alan Cooper’s term -will typically employ many well-integrated styles. In contrast classic pre-AJAX web apps and most portable devices such as cell phones are limited to Menu and a little Form, a narrow slice of the spectrum that contributes greatly to their clunky feel compared to a modest desktop app.
So, broad equals rich. But before getting all excited about enriching your applications by using more of the spectrum, it’s worth asking, “Is rich good?” Is a broad spectrum UI appropriate for your app?
An application could use (and some have used) a single point on the delivery spectrum, but these make for impoverished user interfaces. An app that only uses the Command style works efficiently for short commonly used commands, but becomes slow and unwieldy as commands multiply and users have to refer to documentation for parameter values. An application that only uses Menu works well for beginners but becomes slow and confusing as menu hierarchies become deeper trying to include all possible parameters and values as lists of choices. DM is fine when the only action allowed is to simply “move” (e.g., re-classify) a data object, but becomes unpredictable when subtly and possibly arbitrarily different gestures represent different actions. As has been learned with web apps, the more interaction supported, the more the app will benefit from a rich wide-spectrum user interface.
Each style has its drawbacks so the best solution is to draw on all of them. When done right, the result is a user interface where styles may be applied to optimize the interaction. If the users know the accelerator key, they can specify the action faster with that than the menu, but if they don’t know the accelerator, they can use the menu. By having both verbally and spatially oriented delivery styles the designer or user can select the style that best fits task. In some cases, it’s important that the style be compatible with the task. For example, to resize a drawing object, it usually makes sense to use DM so the user can see exactly the resulting size while they are delivering the induction. However, if users know the precise numeric dimensions they want, something more verbal like Form is better. Sometimes it’s better for the delivery to use the opposite of that used by the task. Humans have a capacity to do limited parallel processing through verbal and spatial channels. Thus, by using drag and drop in a word processor to move a paragraph (spatial channel) rather than accelerators, users have more cognitive resources to simultaneously think about what they’re going to write next (verbal channel).
That’s the advantage of rich apps in theory. However, when given a choice of the style to use, I’m not so sure users choose optimally. Also if you do give users a choice on how to deliver an induction, then the act of making the choice itself adds to the cognitive workload for the user, slowing them down. Rich applications have the potential for smoother, faster, more power UIs, but they also can add complexity, even with good integration. So when is rich good? Rich is good when the app is meant for complex and demanding tasks where the benefits of greater power and speed are substantial. Rich is also good when the app is frequently used so the user has a chance to develop expertise in the UI. So, yes to rich full-spectrum UIs for sovereign apps; no to rich UIs for simple-task use-once-in-a-blue-moon apps.
The Future of Delivery Styles
Comeback for Command?
GUI apps may in principle be full spectrum, but in many cases, particularly for line-of-business database apps, the interaction tends to be concentrated in the middle of the spectrum around Menu and Form, with only small amounts of DM and Command, rendering them little better than a web app (to which many business apps have migrated, it would seem). Advancements in an individual app’s UI and in the GUI in particular may come from developing the ends of the spectrum more.
Some observers here, there, and yonder are predicting the come-back of the Command style, and it must be admitted that the accelerators employed in GUIs are a poor stub of once powerful command line languages. The appeal of Command over other points on the delivery spectrum is the speed and flexibility potential. Typically a user can type six keys faster than one can slew the mouse to a button or link and click it, making the quick one- or two-key accelerator more efficient than a toolbar item. Humans are naturally verbal creatures capable of elaborate expression through language, so it would seem advantageous to tap that capability and provide a highly expressive language for communicating with a computer. If a command language can be designed such that six key presses expresses more than the point-and-grunt click of the mouse, such a language will make a significant contribution to the UI.
However, realizing this potential is not easy even with the human capacity for language. If the user needs to slow down to try to remember the command, then the speed advantage of Command is likely counteracted. If the user fails to correctly recall the command, resulting in an input error, the speed advantage is certainly blown away. Ironically, while such a Command interface can be slower than the classic GUI, it may seem faster to the users because they are constantly thinking and interacting. It is daunting to make an expressive but easy to remember language where on average each induction component is expressed with six characters. Too easily the language can degenerate into a large vocabulary of arbitrary codes, which, while theoretically faster than other styles, fails to perform in practice due to the memory burden.
This seems to be what has happened to YubNub and SugarCodes, browser plug-ins that allows users to execute a command for a particular web site (e.g., search the Internet Movie Database for a given title). But this means there’s a different command for every web site that can be used. Worse, to keep commands short, instead of using parameters to specific exactly what happens at the web site, there are multiple commands per site. The result is a lengthy list of inevitably arbitrary commands. In SugarCodes for example, NOAA’s National Weather Service web site has nws, noaa, noa. The differences apparently are meant to be memorized. Sometimes the command is downright cryptic such as “scro” in YubNub.
It seems questionable whether this is actually faster than something in more a Form style. For example, one could have a dropdown menu of web site actions, similar to Bookmarks or Favorites, but where clicking on an item presents an adjacent text box ready to accept parameters. Provide access keys for the menu items, and speed is equal to or faster than Yubnub or SugarCodes and there’re no commands to memorize.
If Command is going to play a larger role in GUIs, it will likely leverage syntax rather than vocabulary. A successful Command implementation will have a small set of words and some simple rules for combining them such that the order of the words has meaning. This allows the user to deliver many possible inductions with minimal memory burden. This is what we see in the most powerful Command type interfaces such as the UNIX shell and grep. Another feature of a successful Command implementation in a GUI is that it works on relatively “distant” data objects or other components, that is, those not immediately accessible in the interface by pointing and clicking.
Finally, just because you’re adding more Command doesn’t mean you have to be a purist and not also include aspects of other styles when advantageous. Ideally, you’d mix in some Menu to overcome the memory problems of Command, maybe using something like Intellisense or auto-complete. Consider how Menu can be applied to teach users commands, so they become faster and more proficient with practice. Regard, for example, the menu for the old DOS-based Lotus 1-2-3 spreadsheet program. The user selected a menu item by typing the first letter of its name. Users navigated through the menu hierarchy with a sequence of key presses, somewhat like the use of access keys to execute a menu item in a modern pulldown menu. As a result, the act of using the menu also caused users to memorize the key sequences until they soon no longer needed to look at the menu for common commands (again, like some users just know Alt-F, X exits a Windows application). When it comes time for the user to write macros, the commands are represented by these key sequences, which the user already knows.
The most promising production application I’ve seen to add more Command to the desktop is Enso by Humanized. Enso is an excellent example of a Command UI that uses several of these principles to its advantage. Humanize eases the memory burden of Command by extensively using Form and Menu-like aspects such as autocomplete, drop-down suggestions, and prompts. Integration is excellent, with the current selection being the data object for commands like Spellcheck and Learn. Their Open and Spellcheck commands access functionality that is otherwise not easily available (buried in the Start menu or another application respectively). Enso could leverage this principle even further if the Close and Minimize commands would optionally take as a parameter the name of the window affected. Often it’s an inactive window I want to get out of the way -the reason it become inactive is because I’ve lost interest in it. But if I have to first select the window before commanding Close or Minimize, I can hit the title bar buttons (or do Alt-F4 or Alt-Space – n) faster than I can do Caps Lock – Command – Caps Lock.
Google is an Improved Form, Not Command
Others have pointed to Google and Vista Search as other examples of Command coming back. For example, if you type an address into Google, it will include in the search results a map locating that address. Type “weather” followed by a city name, and it will summarize the forecast for that city at the top of the results.
That’s really cool, but it’s stretching it to call it a Command style. Google is a Form style. Most of these capabilities represent semi-intelligent interpretation of a parameter, not a full induction. The Google Search blank is still largely a place for a single parameter, the search criteria. It’s a significant advance that Google Search recognizes more of the semantics of the criteria (e.g., that it’s a flight number or UPS tracking number) rather than just treating it as some words. Google is doing what it has always done –provide information about whatever you type in the blank This is part of Google’s general trend to broaden its inclusion of content and increase its reach, first into web pages, then PDFs, and now maps, videos, and more.
However, Google is still mostly just a form to do content searches –just one action, unlike a command language, which supports numerous actions. It seems the web has so dominated our consciousness of what a computer does that we think finding content is all there is to do with it. What about creating content? Or combining it? Modifying? Duplicating? Deleting? Editing? Converting? Describing? Transferring? What do you type in Google or Vista Search for those actions?
It may be argued that the Google text box shifts towards the Command side of the spectrum when it does things like calculate arithmetic, essentially inferring the action from the format of the parameter. “Calculate result” is a different command than “Find content.” However, I see it as a case of raising the level of abstraction of the action, of making the same form do two similar actions depending on the context. Google is becoming a general question-answering tool, using the web and its own resources to answer the implicit “what,” “where,” and “how much” questions in our input strings. That’s a significant development for the Form style. I’m anticipating the day I can enter “Lincoln Assassination” into Google and get back “April 14, 1865″ at the top of the search results. However, it means that Google still really does just one action.
We see this sort of action abstraction today in other styles, such as with Menu. Depending on my selection in Windows Explorer, I can use the same Copy and Paste menu items to duplicate a file or to copy a substring of a file name to another file name. File copying and file renaming are two different actions, but at a higher level of abstraction represented by Copy and Paste, they’ve become the same. No one would say that the Copy and Paste menu items represent a command language.
It’s not just an academic difference to say whether Google is becoming Command or abstracted. If we are going to regard Google as a success that should be emulated, we better understand what makes it a success. The design lesson from Google is not to bring back the command line or to make your UIs more verbal. The lesson is to consider supporting smarter more abstract actions to simplify your user interface.
More Better DM
While in principle desktop GUIs are full-spectrum, DM, like Command, is often underdeveloped in many apps. Humans are naturally verbal creatures, but they are also naturally tool users too. Anyone that says point-and-grunt DM will never achieve the expressiveness of a language should take a close look at Michelangelo’s David, a product of nothing more than direct manipulation with a dumb hammer and chisel.
The apparent lack of expressiveness of DM in software user interfaces probably has more to do with its limited use than the inherited limited power of DM. Many apps fail to implement basic drag and drop, usually restricting it to “moving” data objects among “folders.” My email client Foxmail for example does not even support moving selected text by drag and drop. No wonder novice users typically do little dragging and dropping. They so rarely have a chance to practice and generally can’t rely on it being available anyway.
Even rarer are applications that support actions other than “move” by direct manipulation. One good way to do this is provide multiple handles that affect different attributes of the data object. For example, why can’t I set text size by DM of handles on a cursor or selected text?
Drag the top handle vertically to set the font size; drag horizontally to set paragraph alignment. For the bottom handle dragging vertically can set the text to be subscript or superscript (well, I would use it) while dragging horizontally sets paragraph indenting.
As another example, in MS Word, a dialog box is used to change the cropping of a picture, which can make for a frustrating cut-and-try operation. Why aren’t there handles for that in the picture?
Why are we doing DM with one arm tied behind our backs? Think of the possibilities that two mice, one in each hand, could provide. You can have multi-touch interactions, but with the comfort and precision of using mice.
Command + DM?
You know, I’ve always thought this left-brain/right-brain verbal/spatial distinction is oversold. In some sense Command and DM are opposites, but they also have a lot in common compared to Menu. First of all, both have memory burdens. In Command you have to remember what to type; in DM you have to remember what dragging a particular data object or handle will do. Menu is all recognition. This contributes to drag and drop and accelerators being functionally expert shortcuts in GUIs, rarely used by novices. Both Command and DM have syntax. That is, the user controls the order of the induction components and that order has meaning. Typing “copy bunny-rabbit.doc babies” means something different than “copy babies bunny-rabbit.doc.” Likewise dropping bunny-rabbit.doc on to the babies folder will have a different effect than dragging the babies folder to the location of the bunny-rabbit.doc. Syntax is what makes these styles versatile and expressive, much more expressive than Menu. In pure Menu style, the order is structured by the design of the menu. The user doesn’t have the opportunity to express anything through syntax. Both Command and DM are efficient, consuming little screen real estate. Menus need a place to appear.
These similarities suggest the delivery spectrum is not so much linear as triangular, not a one dimensional spectrum but more like a gamut.
Rather than incompatible opposites, perhaps aspects of Command and DM can also be combined into new styles. In real life we integrate spatial/physical with verbal interactions all the time without difficulty. We say things like, “get that bunny-rabbit out of the hutch over there” (while pointing), or “hold this bunny-rabbit, while I tie a ribbon on it.” There no reason our interaction with computers can’t do the same, but I’m not aware of any that do.
One new style that may come from combining DM and Command is what might be called “Hold and Type,” where the user clicks on a object image and holds the mouse button down while typing something with the other hand. This may be combined with a drag to provide more information. For example, what if I drag bunny-rabbit.doc to the babies folder and before I release the mouse button, I type a “12.” The result? Twelve copies of bunny-rabbit.doc appear in the babies folder. Click and hold a scrollbar for an editable table and type a string of letters, and the scrollbar (and pointer) jumps to the first instance of an object with a name matching those letters. Sure would beat opening a Find dialog box. Click and hold an object image or one of its resize handles and type the cursor keys, and the object moves or resizes in discrete amounts. No more tiresome toggling in and out of “snap to grid”; another mode bites the dust.
And of course, we could combine all three corners of the spectrum. Why not use a keyboard accelerator to select the tool in a Pointer-tool style? That would sure save a lot of slewing back and forth between the object images and the palette. When moving a data object where precise position counts (e.g., in a solid modeling application), I’d like to be able to drag the object image to the general location, but as I’m dragging, text boxes hover over the object image giving a continuous read out of the distanced dragged and relation to other objects. While hold the mouse button down, I’d be able to navigate to any text box with the tab key, and enter the precise final values.
Combining Command and DM, with assistance from Menu, may produce the ultimate in rich, fluid, expressive, and powerful delivery styles.
Problem: Achieving rich, fluid, powerful UIs.
- Create full-spectrum UIs, appropriately using the following delivery styles:
- Give special attention to styles towards the DM and Command ends of the spectrum that tend to be overlooked or oversimplified.
- Invent new styles by combining aspects of Command, Menu, and DM.
- Integrate your delivery styles with each other by having