I don’t think I have to explain why applications on the web have historically been hard to use. If you want to argue that web-based apps are easier to update and maintain than thick-client desktop apps, I won’t disagree. However, I have had developers tell me we should go web-based rather than thick client for an application because the web is so user friendly. They say words to effect of, “My Aunt Tilly can use the web, but she sure can’t use Excel.” What they don’t seem to realize is that the web is not easy because of something inherent in HTML. The classic web is easy because it does so little. In classic web sites, the users’ only task is pretty much just finding content to display. I suppose Aunt Tilly could use Excel if all it did was open spreadsheets. Web applications are different from classic web sites, and the truth is even simple web applications, using little other than HTML forms, generally have dismal usability compared to what can be accomplished with a thick client.
Now our hero AJAX is here, allowing new flexibility in web app UIs, so we’re finally ready to make truly usable web applications. Right? I mean it’s very cool how we are working to use AJAX and other techniques to validate or auto-correct a field at the time the user enters data into it. We can provide unobtrusive but effective sanity checking, control enabling dependent on input (e.g., of required fields), and fill calculated fields on the fly. We have continuously zoomable and pannable maps, reacting in real time to user control.
But, while this is a great step forward, a lot more has to be done before web apps have anything like the UI power of desktop apps. So if you’re working on your library of functions or patterns for these new rich highly interactive web apps, I’ve prepared a list of things you should have in it before you can say you can make apps “just as good as a thick client.” These are the basic elements that a UI designer is going to need.
Firstly, let’s look more closely at what it means to be a “highly interactive” app. Classic web sites have little interactivity, with user inputs mostly limited to selecting content to display. The vast majority of information travels one way: from server to user. However, in a rich app, there is extensive two-way exchange of information. The bulk of user time and effort isn’t to find content. Rather, it’s to build content, to create, update, relate, and destroy information in order to achieve a desired end product. To do this, the user needs far more actions available than just a way to select content from a server.
For the user interface designer, the goal is to support many more user actions on content while at the same time preserving the direct manipulation aspect of interaction that has made classic web sites and GUI applications successful. Direct manipulation means having a visual representation of the data, allowing changes to the data by the user directly interacting with the representation, and showing the changes immediately in the same representation. Your classic HTML page use direct manipulation, to the degree there’s any manipulation at all, in that to change the content shown, the user clicks on a label for the content. Your basic thick-client GUI application supports many more kinds of direct manipulation and not-so-direct manipulation, which all have their uses for making usable applications. These are the UI elements that you need in your library.
Interactivity Means Menus and More
Pulldown Menus
Many actions mean many controls to execute those actions. But direct manipulation principles mean you need to dedicate the bulk of your real estate to your content, not to the controls to manipulate that content. You need a compact but easy to access method for storing all those action controls. A pulldown menu is just the thing.
It’s high time we stopped dragooning the combo box for duty as a pulldown menu. To achieve usability goals, a rich app will often need a pulldown menu with look and feel consistent with those in the resident OS. Pulldown menus are fairly common on the web even on older sites, but ever notice that often those custom-made pulldowns on certain websites are really awkward to use? The mechanics for a smoothly working pulldown menu are not so straightforward .
Meanwhile, in addition to supporting your generic menu items, you’ll need to support cascade menus, separators, and toggling menu items. Extra points if the menu items can have icons too. I took Airset’s on-line calendar for a spin, mostly just because Joel said he liked it. It has a pretty good menu that does all these things except toggling menu items. And it won’t let you do the more efficient Mac-style click-and-drag menu selection, unlike MS Windows and, well, Mac.
Oh, and you need to provide keyboard access to the menu and each item in it, preferably by something easier than just tabbing tabbing tabbing up to it. And of course, you need to indicate in the menu captions any accelerators.
Accelerators
Oh, yes. In addition to some sort of keyboard access to all menu items (well, all controls for that matter), your app will undoubtedly also require its own accelerator or shortcut keys. For all its advantages, the pulldown menu isn’t always the most convenient way to execute an action, being way up there out of the way and hidden. Accelerators are necessary for your app to be usable for your more experienced users, especially for keyboard intensive apps. Heavy keyboard use is rare in classic web pages, but relatively common in highly interactive apps.
Remember not too long ago that when you viewed a PDF document in a browser how Ctrl-F opened the browser’s Find dialog rather than Acrobat’s, which of course would fail to find anything? Annoying and downright misleading. The standard accelerators offered by the browser will often cause confusion and will need to be suppressed.
Context Menus
For the same reasons you need accelerators, your app’s usability will be enhanced by having “right-click” context menus, with menu items specific to your application’s objects, giving the user faster and often easier access to relevant actions. As with the accelerator keys, the standard context menus of the browser will need to be suppressed. Bonus points if each menu can have a designer-specified default action that is triggered by double-clicking. Windows Live Local has a cool context menu for its maps, and supports a handy double-click action, but, oddly, does not include this action in its context menu. However, that’s better than Google Maps, which supports a double-click action, but has no context menu at all.
Disabling
In a classic web page, the user is in a relatively constrained and static mode, and no action controls are visible that aren’t also available. But once you give your user a whole host of actions available through menus and such, there are going to be situations where some actions are not going to be appropriate. You need to support disabling, keeping the control for the action visible, but removing it’s ability to execute and making the inability visually obvious to the user. For example, if you must have required fields for entering a contact, then don’t enable the Save button or menu item until those required fields are filled. Anything else would be misleading. If you insist on some actions being executed through web-like links, you need to invent the disabled link.
Dialog Boxes
The principles of direct manipulation state that as much as possible input should be done directly on the content as represented in your window. However, inevitably you’ll have among your many new actions some which require additional information or parameters from the user. To get this information without disrupting the display of your content, you need windows that act as secondary windows to the parent window showing the content, which means the secondary windows always float above the parent in the z-order (but not above other windows), can be floated outside the boundaries of the parent, close with the parent, disappear when the parent is iconified, and lack a task bar icon of their own. It should be possible for such windows to be modeless or modal, where modal means modal just for the parent window or your entire application, not for every browser window open on the user’s desktop.
Airset takes a shot at making dialog boxes. In one case it can’t be floated outside the bounds of the parent browser window, which really defeats half the advantage of a dialogue box –that you can move it out of the way to see the content below. In another case, the dialog box is equal in the z-order as the parent. Now it can get too far out of the way, easily getting hidden under the parent. Keep trying, guys.
Message Boxes
This is easy once you have a capability to throw up dialog boxes, since a message box is just a specific kind of modal dialog box. You just need to make it easy to generate the standard types: error, informative, warning, question, and progress, using the resident OS’s standard icons and buttons. At the same time, you’ll need full control over the text in the message, icons used, and labels for the buttons.
And you need to be able to show the messages whenever there is an appropriate event that warrants them. No app can call itself usable if it ever lets the user close a window without first verifying that the user wants to discard any unsaved input. Sorry, Airset.
Preserved Input
Classic web sites can get away with being stateless, because there’s so little the user can do to change the state of things anyway. In a rich web app, changing the state is the whole point, so you need provisions to preserve the users’ input over time, and I’m not just talking about the content they create. When a user executes an action through a dialog box, then returns to execute the action again, chances are she or he will want to use the same or similar parameter values that were entered before in the dialog box. So to save the user a lot of boring re-entry, each dialog box should preserve the last input given by the user for at least the duration of the session if not indefinitely. Which reminds me: have you given any thought to how to handle “sessions” in your web app? What are the boundaries of preserving the state, and how will the user know them?
Tool Tips
Tool bar? Sure, knock yourself out. Yet another way to provide convenient access to an action is to have a toolbar with compact buttons that execute common actions otherwise buried in a pulldown menu or (better) in a dialog box. Just if you’re using icons alone to label these buttons, don’t forget the tool tips. In fact, there may be many ways tool tips can be used to enhance usability.
Interactivity Means Selectable Objects
In an ordinary web page, the user manipulates the content very little so we could get away with the primitive support of selecting parts of the content, essentially limited to highlighting blocks of text and right-clicking pictures or links. In other cases, where a site-specific action is available for part of the content, we have provided a link or other control for each content part that appears on the page (e.g., a “Delete this item” link for every item an e-commerce site’s shopping cart). However, as the number of actions multiplies in a rich web app, that’s going generate far too much clutter. Direct manipulation requires the content dominate the window. That’s where you want the user’s attention to focus. Listing all available commands for each part of the content crowds out the content, allowing little content to be displayed at a time, forcing more scrolling, paging, or other navigation. For example if Windows Explorer, an app with relatively few actions, were designed this way, it would look something like this to show just four files:
In real life, it looks like this:
The tried and true solution, of course, is the model of the user selecting an object then selecting an action from a single control in a centralized location such as the pulldown menu. Thus, your content is going to have to be broken down into selectable objects. An appointment book will have days and hours, appointments and alerts, contacts and locations. Each will need to be selectable as a whole so the user can actually do something with each. Just having text and pictures as your only object types is not going to fly.
A selectable objects capacity is all but necessary for accelerators to work on parts of your content. Unless you seriously want to consider giving each part a different accelerator for the same command, there is no other way for the accelerator to know what part to act on. An enabling capability is also closely associated with the presence of selectable objects since often what makes an action appropriate or not is the current class of object selected.
Object Selection and Indication
So if the user must select then act, then you need to make it clear where to click to select and what gets selected when you click there. There has to be an obvious target point, but not too obvious because they’re be a lot of them and you have to control the clutter. Icons are used to represent selectable objects on the desktop, but given icons are used for everything these days, that probably won’t be sufficient. Selectability has also been indicated by changes in the pointer sprite or the object representation when hovering the pointer over the object.
Once an object is selected, then you need to provide some feedback. You need to show in a way obvious to the user which objects are selected. MS’s Hotmail drafts the checkbox control for object selection duty, which is one way to do it. A weird way, but one way. Some sort of use of reverse video, handles, or the resident’s OS’s Selected color are more conventional ways to indicate currently selected objects.
Multi-Selection
Did I say “objects”? Yes, you need to support multi-selection. A typical usable app would certainly allow users to select multiple objects for a single action, and it would let them select by dragging a rectangle around the objects as well as using shift-click and control/command-click. This ability greatly saves on clicks or key-presses when the same action is to be applied to multiple objects, and especially when the same sequence of multiple actions is to be applied to multiple objects. This is an advantage that no alternative to the object-selection-action model can match.
Drag and Drop
Menus and buttons aren’t the only way to effect actions. An app will also often need to support drag and drop to fulfill usability needs. Say you need to move an appointment 1 hour earlier. Use the Change Appointment Time menu item? Don’t be absurd. Just grab and move that appointment object within the time grid of the day of the appointment. Drag and drop should also work between applications when that’s useful.
Pointer Tools
Speaking of selection and drag and drop, there’s often more to do with a mouse than just select. Some actions are best done by the user directly dragging on the appropriate part of the content. For example, to zoom in on a map, the user drags a rectangle around the area the user wants to zoom. Much easier than taking multiple panning and zooming steps as seen in Google Maps, where the latter action uses an abstract slider control on the side. Changes to content, such as drawing lines or connecting objects, may also be best done with the mouse pointer directly on the content. To do this, you need to support multiple modes for the mouse pointer –tools that the user selects which change what a click or drag does. It’s essential to provide continuous feedback on the current mode of the pointer by changing its sprite.
Clipboard
Why clutter your UI up with application specific menu items that the user has to learn, like Duplicate Appointment? A more usable app will be able to use the clipboard to move and copy objects, in addition to accomplishing this with drag and drop. And don’t go building your own application-specific clipboard. It should be the resident OS’s so your app can export objects to other apps. Don’t you think your users are going to want to paste an appointment into an email?
Pages are so One-way Web
Another thing to do when designing your app or developing your UI library is to take a good hard look at the browser-style multi-page model of interaction. The page model works pretty well when the task is for the user to flip through a bunch of content to find something to sit there are read. When the only available action is to choose some content to show, metaphorically navigating to a “place” is quite appropriate. It’s great when you’re basically dealing with an electronic reference book, which is exactly what most traditional web sites are. There’s a reason why we call it a browser –it’s used for browsing, wandering around, looking at this or that.
However, direct manipulation principles dictate that the user stay with all the objects being manipulated. Spreading interaction across multiple pages is going to result in unnecessarily modes, a lot of unproductive navigation effort for the users, and a sense of detachment from the content they’re trying to build. Don’t make users go to a specific place to do action on an object. Instead, let them complete the action in place. For example, don’t make users open an “edit contact” page to edit a contact in an appointment book. Give them the control to edit it directly in the same page that displays it. Don’t have one page to enter a query (e.g., address to show on a map) then have another page to see the results. Instead show the query result (map) in a window, and provide a simple pane, palette, or dialog for that same window to specify or adjust the query, eliminating the need to navigate back to change the query; the user just changes it right there while still viewing the current map.
In other words, in a rich app you should seriously consider the thick-client style “primary window” model. Here, a complex primary window presents the objects to be manipulated in a particular session. The user works directly and extensively on these objects, completing multiple interactions to build content. Users don’t just look over a window full of content and move on, but may spend perhaps hours or even days on a single set of content. “Navigation,” if you still want to call it that, is less a linear wander through a hierarchy and more a star-pattern radiating out and back to and from the set of objects being manipulated. Even when users leave the content to visit a dialog box, they aren’t really leaving the content cognitively. Rather, they are still applying actions on the content, just via a set of side controls.
Others may disagree, but you may even want to consider suppressing all browser controls –menu bar, tool bar, status bar, the works. More than distractions, these might even screw up things for your users. Even the all-mighty Back button can become a liability. The browser has great usability –for browsing. Not so good when it comes time to really interact with something.
And Don’t Forget…
Undo
What’s a rich app without undo? Unusable, that’s what. The Back Button is no longer Undo –it only undoes what content to show, but in a rich app there are so many more actions than selecting content to show. Any object creation, update, relate, or deletion should be undoable through a standard action, such as a menu item. Even changes committed to the backend should undoable.
Speed
If you really want to compete with thick clients on usability, you need speed. Fours seconds to drill down for more detailed content is just too slow, Airset.
Innovate –If You’re Not Busy Enough
Now you don’t have to have all these things implemented just like I described. If there’s a better way to cover the same underlying functionality, then go for it. For example, relative to using a message box to show error messages, there’s a lot to be said for showing the messages in the context of the offending object (e.g., beside a field with an invalid date entered). As another example, Windows Live Local map’s uses Ctrl-middle-mouse-button to allow the user to drag a rectangle to pan and zoom to, eliminating the need for a less convenient (if more discoverable) zoom pointer tool. Not bad. Wouldn’t mind seeing that be a new convention even in thick client apps. With a sort of a clean slate to work with, rich web apps may have an opportunity to make new conventions and avoid some unfortunate choices made in early on in GUIs.
Except that it is not really a clean slate: users will be interacting with your web app simultaneously with thick client apps and expectations from 20 years of GUIs are going to carry over to your app. In employing novel UI design elements, being equally good as prior conventions is not good enough. If you’re going to be inconsistent with standards or conventions, you have to provide a usability benefit to be worth the inevitable confusion it will bring some users. Given your new UI element is going to be less easy to learn (because it’s new), then it should, for example, make the task faster to complete.
A rich web app is going to be much less tolerant of deviations from conventions than a traditional web site. Pretty much all the user has to figure out from a traditional web site is where’s the menu and what color are links this time. Not too hard. But in a rich app, there’s much more to do, and therefore potentially much more to learn. If you load your rich web app with all kinds of novel UI elements, your user will get stuck quite often. To justify the risk, any novel element has to be demonstrably better than convention. Maybe much better. Unless you have a durn good usability lab and lots of time and/or intent to train your users, you probably should stay as close to current GUI conventions as possible.
Back to Work
So build a library with everything on this list, and you’ll be able to make a web app UI as advanced as a desktop app from about 1995. It’s a start. There’re additional things to think about that depend on the specific application type. For example, a database front end will almost certainly need support of subforms and continuous forms. I’ve yet to see a grid control or HTML table employed for this that is truly adequate. Other app types (e.g., graphics app, process control) have their specialized UI elements too.
Mm, long list. Glad I’m not a developer.
Hi Michael,
Nice blog. Very well articulated. You have identified the issues. And the solutions are not that far behind, if you know the problem very well. It is a simple logic: we need to invent reusable Ajax GUI Classes that are better than Windows/VC++ GUI classes.
Simple isn’t it?
We have already done it. It is possible to leapfrog the 25 years old desktop GUI paradigm, when next generation Vector Graphics platforms are released. Please review:
http://cbsdf.com/technologies/DHTML-Widgets/TECH-Summary.htm
But the DHTML can never match the traditional GUI platforms, because, we cannot use GIF images for many GUI components. For example, both Microsoft and Google ended up using SVG (in Mozilla) and VML (in IE) in their Maps to draw lines for directions.
If one uses a GIF image for Flights in DHTML based Air traffic monitoring system, how to rotate the Flight to show its direction or change color? In case of vector graphics, it is as simple as changing a single XML element’s attribute.
http://cbsdf.com/technologies/jsp/atc_test1a.jsp
You made a very good point:
“If there’s a better way to cover the same underlying functionality, then go for it.”
The XML graphics technologies offer unprecedented flexibility and freedom to invent new GUI paradigm. We can do lot more things that are practically impossible in the traditional GUI platforms. The following web page just briefly illustrates those.
http://cbsdf.com/technologies/demo-links/Demo-SVGS/Widget-design.html
This is just a beginning and we are exploring even more amazing possibilities for the XML based graphics technologies. Best part is that the online applications would cost lot less to build and lot more agile to maintain.
Regards,
Raju