Agents in Control

Automation Problems in Personal Computers

It happens sooner or later. You’re confronted with problem of how to make feature x work with your users, and someone says, “I’ve an idea. Let’s automate it.” Make it easy. Let the computer deal with the grunt stuff. The user won’t have to specify or command anything, eliminating the problems associated with the user understanding how to communicate with the application. The application will just know. It’ll figure it out through patterns in the user’s past behavior, or tables of rules or representations of the work flow. Make a smarter computer that truly understands what the user is trying to accomplish. Make the fabled DWIM function.

It’s all said and sometimes done with the best of intentions. But too often it backfires, reducing usability rather than improving it.

Indirect Manipulation

In days of yore there was a UI principal called “User in Control,” which, as a corollary, meant use direct manipulation. Don’t use the metaphor of the user telling the computer to do something. Let the user do it himself through graphical representations of objects and controls. This is so central to the GUI, that it’s taken for granted, but in 1984, it was a revolutionary way for home users to interact with a computer. Before that, when “interactive” meant using a command line language at a prompt, this was a common interchange:

User Types
Meaning
Dir \letters Computer, tell me what’s in my “letters” folder.
Ren \letters\David1 OldDavid1 Computer, rename the document called “David1″ in my “letters” folder to be “OldDavid1.”
Dir \letters Computer, tell me what’s in my “letters” folder again (so I can see if the renaming was successful).

It was like talking a clerk through your paperwork over the phone instead of being at your desk doing it yourself. Then in 1984, suddenly you are at the desk. You wanted to rename a document, you opened the folder on your desktop and typed a new name on the document. You did it, not some efficient but not particularly bright clerk named M. S. Dos.

For many, automation is the next logical step in intuitive user interfaces. In our enthusiasm to apply our ever increasing computing power to something helpful, we are attempting to produce agents, to give intelligence to our software so that it does automatically what we really want but didn’t explicitly ask for. Now the user won’t have to open the folder. The app will just know that a file needs renaming. Don’t laugh. Perhaps 80% of all file renaming is for only a few reasons, such as to make the name consistent with other files or to create an archive version of the file (like OldDavid1 above). Surely you can think of conditions your app could detect in the system or user behavior that would indicate what the user probably wants. You got 2.0 GHz of power. Let the computer do some thinking. It can come up with the name the user intended. No, wait, it can come up with a better name than the user could think of him/herself.

It’s a worthy goal and a seductive vision of the future. However, instead of being the next step forward, automation can easily be a step backwards, back to less control over the computer. The problem is that all efforts to date to make a smart computer have not succeeded in giving it any real intelligence. Only artificial intelligence. In the end, the computer is still an efficient but not particularly bright clerk. Too often it mis-guesses what we want.

The trail of automation failures extends through time and space. Here some examples I regard as failures:

Adaptive Menus. Menus that automatically hide and show menu items in response to frequency of use. This was an effort to automatically simplify and personalize the massive menus of MS Office, but it ended up only adding to the confusion and frustration. Gee, wasn’t that menu item here? No? Maybe there? No? Maybe it was here, but now it’s hiding. Adaptive menu folded up to a few items.

Autoformatting. Another way Office makes your life easier. You’re making a list. You must want bullets. No I don’t. Yes you do. And you typed “:)”. You really want a smiley wingding, like this. But I was about to paste this into a mail client. Can’t hear you. And you know typing asterisks mean “make bold,” as opposed to “type asterisks.”

Popup Message Boxes. These seem particular prone to media players like RealPlayer and MS Media Player with their periodic Update Me! message boxes. Office has them too. There’s some abbreviation I type that makes Word decide I’m writing in French, so it asks if I want to install French autocorrections. Apparently these things just need to be done right now. Isn’t urgency the only justification for interrupting me with a message box?

Message box offering to load French autocorrections.

Self-appearing Toolbars, Task Bars, etc. UI elements that appear automatically for certain user actions because of some vague association with a user action. In Office, clicking a link or cross-reference in a document makes the Web toolbar appear, cluttering an already busy UI. The most novice user knows how to click a link, but how many know how to get rid of an unwanted toolbar? Copying a couple times opens the Clipboard task pane, disrupting the appearance of the document, but at least that wasn’t as bad as the original implementation where a clipboard quasi-dialog would appear right in the middle of your document. It was so distracting that users would turn off Office’s otherwise excellent multi-item clipboard feature. Installing some software for my Canon scanner-copier-printer put a new toolbar in Internet Explorer. Did I ask for that? And did I ask for all those helpful menu items it added to Office and Windows Explorer?

Auto-created Files. In related news, undead folders live on my computer. I kill them but they keep coming back. There’s My eBooks, My Music, My Pictures, My Databases. Among my Favorites in Internet Explorer is a Links Folder (Huh? Aren’t all my Favorites links?). All have been deleted at one time or other, but some evil bokor keeps resurrecting them automatically. I don’t believe I’ve ever opened an eBook, yet something somewhere insists I have this folder. Then there’s all the shortcuts created by installation programs….

Clippy. Needs no introduction. Clippy, at his most obnoxiousness

You might say, what’s the big deal if the user gets a message box or reformatting they didn’t want once in a while? You have to look at the cumulative cost of all this automation. The experience for users is that computers just do things sometimes. Sometimes it’s something you want, sometimes it isn’t. Sometimes you want it to do something but it won’t, and you find yourself trying to persuade it do something. Yes, this time I do want it to be a bulleted list. I’ll try stars. No, I’ll try dashes, No…. At least with command line there was a concrete specific thing you could learn that would always work. Ren would always rename. Write it down. But with automation, the user is left to guess what vague complicated sequence of interactions will elicit the desired response. Well, sometimes this works.

What we end up with is frustrated, intimidated, learned-helpless users. Files are lost, folders are a mess, documents are misformatted, malware comes and stays. The users “don’t understand computers,” and, after years of interaction, they still won’t.

So You Want the User Flipping Each Bit?

The answer is not to exclude all automation. Heck, what’s a computer but one big lump of automation piled on top of automation? However, it’s clear that certain implementations of automation don’t work, so the question becomes what, when, and where to automate. First of all, forget about automation replacing the user. Any automation must sooner or later interface with the user, if by no other means than by its products. When designing automation, the key is to work out the user’s relation to the automation.

With automation such as adaptive menus, auto-format, and self-appearing toolbars and folders, the relation to pretty clear: Take it as is. If the user doesn’t like it, it’s left to the user to figure out how to undo what ever the automation has done. If any option is given at all, it’s only to turn off the automation entirely.

However there are various degrees of automation between full and none, as delineated by Sheridan’s scale of degrees of automation (for more on Sheridan’s classification of automation, see Sheridan, T.B. (1998). Rumination on automation (1998). Proc. 7th Intl. IFAC Symposium on Man-Machine Systems, Kyoto, Sept. 15-18.):

Sheridan’s Scale of Automation
The computer…
0 Offers no assistance; the human must do all. None
1 Suggests alternative ways to do the task.
2 Selects one way to do the task which the user approves for execution.
3 Gives the user a chance to reject its way to do the task before automatic execution.
4 Executes automatically and necessarily informs the user.
5 Executes automatically, then informs the user only if asked.
6 Executes automatically and ignores the user. Full

Consider, for example, a successful form of automation such as MS Excel’s auto-complete. As you type an entry in a cell, Excel fills it out with the first matching cell contents from above, but the user is not stuck with this choice. She or he can hit enter or tab to accept the choice or simply keep typing, and the suggested content goes away. Level 2 on the Sheridan scale.

Excell offering to autocomplete a cell

User and Automation Characteristics

Automation necessarily means removing some control from the user. That’s the whole idea: to have the computer do something instead of the user. Whether it is good or not depends on if the user prefers to delegate that control to some other entity. As whenever an activity is delegated to another entity, whether to a person or a machine, the product of the activity is merely influenced by the person’s communication with the entity. Communication is never perfect.

If you ask someone to hang a picture on a wall, for example, you have to expect that your helper will ultimately determine exactly how and where it is hung. That’s okay, if your helper is especially competent at picture hanging and talented at sensing your wishes. Perhaps they may even be better than you at it. On the other hand, if you are a real Martha Stewart about hanging pictures –you both know and care a lot about picture hanging, you’re liable to be disappointed with results or frustrated with attempts to get the helper to understand what to do. Good communication is time consuming and if you have to micro-manage the helper, it’s easier for the both of you if you just take the direct manipulation route and hang the frigging thing yourself.

The same it true with automation in personal computers. As a designer, you need to consider the characteristics of both the user and the automation.

  • User Interest. The amount the user understands and cares about the task the automation will assist with.
  • Automation Effectiveness. The probability that the automation’s output for the task will perform well.

In general, the less user interest and/or more automation effectiveness, the higher the degree of automation you should have. If the user doesn’t understand or care about the task, it might as well be highly automated, even if the automation is not entirely effective. It may still perform better than what the user can do alone, and, as far as the user is concerned, it is not necessary for the results to be optimal anyway. If the automation is very effective, a high level of automation is desirable even if the user interest is high in order the free the user of unneeded work.

But this is precisely the problem with automation such as adaptive menus, auto-format, and self-appearing toolbars and folders. Each of these is at least a 4 on the Sheridan scale, executing changes without any option by the user; the user only learns of the action after it has occurred. You may even argue they’re 6’s, because, especially for something like toolbar or folder appearance, the user may not notice the automation’s effect until long afterward, perhaps not until after exiting the originating application. Something is there now. Don’t know how or when it got there, and sure don’t know how to get rid of it. Computers are just like that.

But what is the user interest? In the case of adaptive menus, interest is very high. Future user performance (and frustration) depends on the form of the menu. After all, the user has to interact frequently with the user interface so any change to it (especially while it is in use) is likely to be felt. Formatting is second only to content creation in user priorities for word processors and the like, so interest there is also likely high. Individually, users probably don’t care much about an added toolbar or folder to their UI. After all, there’re probably a lot of things in the UI they don’t use and simply ignore. However, the aggregate effect of such automation is insidiously corrosive, leading to a cluttered and distracting user interface for the user, adding time to complete basic tasks such as searching for a particular tool or folder.

What is the automation effectiveness? How often do adaptive menus reduce user confusion versus add to it by adding another layer to the menu hierarchy (and an unstable one at that)? Regarding auto-formatting, how often are there users who know the idiom of using asterisks to mark strong text, but the same users don’t know how to use the Bold feature? How likely is the user to actually need to use those toolbars or folders that appeared? Does anyone? And if they do, is it because they want to, or because they gave in to the automation that insists they must use der computer dis vay? Is it too much to somehow ask the user if they want these things? Or at least give them the option of explicitly or implicitly rejecting such automation? If I delete My Pictures then it should stay deleted. If I never used the toolbar that appeared on its own, then maybe it should not be there the next time I open the app.

Nature of the Task

This is not to say nothing should be 4-to-6 on the Sheridan scale. As a rule of thumb for personal computers, high levels of automation work well with utilization work, while low levels of automation are preferred for operational work. Operational work is work the user is trying to complete as part of their job or personal life work (e.g., writing a report, performing financial calculations, conducting an analysis, reading a letter). It’s what they are using the computer for. Utilization work is work necessary in order to use a computer for the operational work (e.g., saving files, searching for files, maintaining network security, changing toner cartridges, and other maintenance tasks). Typically, operational work includes the tasks at a higher level of abstraction than the utilization work.

Users generally have high interest in operational work. Usually they know a lot about what they have to do (on the job they were probably hired precisely because they have such skills), and they care a lot about how it is done (on the job, their evaluations often depend on the quality of such work). Operational work is also relatively hard to automate effectively because it usually deals with the vague and messy area of human values and relations. AI research is a long way from being able to write a compelling story, for example.

Users generally have low interest in utilization work. As long as their operational work gets done they don’t really care how it gets done. Any utilization work they have to engage in is likely seen as a distraction from their “real” work –the operational work. Doing utilization work requires certain technical knowledge on computers the user may or may not have that is quite separate from the technical knowledge they need for operational work. On the other hand, utilization work is often very easy to automate, being closely tied to how computers work, which, after all, are built to be general automation machines. Most of what a computer does is already fully automatic utilization work. Allocating memory for applications, deciding exactly what disk sectors to save a file to, turning data structures into video images –all these happened at full automation, as they should. Generally, the more utilization work that is automated, the better.

For example saving a document regularly is utilization work. It is not part of actually writing the document or giving it to someone else to read. No matter how much someone already knows about writing, when she or he begins writing on computer, she or he has to learn to save a document regularly less a crash or power failure take a big chunk of completed operational work away. As utilization work, automatically saving a document is a prime candidate for a high level of automation, and, indeed, MS Office provides this feature, where it is successful. Another example is MS Windows’ automatic updating feature, which may also be done at full automation. Updates formally were at a lower level of automation where the user initiated the update which proceeded at Level 1 on the Sheridan scale. This worked poorly (in the sense that there were a lot of unpatched operating systems out there) because of low user interest: updating was time-consuming utilization work that interfered with the user’s operational work. Furthermore, users typically lacked the technical knowledge to make intelligent choices presented by the automation. There is little point in using Levels 1 through 3 if the user doesn’t know enough to know what to do. Providing documentation so the user will know what to do is not a satisfactory design in this case because it adds to the utilization work, making it even less likely the user will do it. Full automation, while unnerving to some, is the right thing to do. Norton Antivirus automatic Live Update (4 on the Sheridan scale) similarly is a successful application of high-level automation where user interest is low and automation effectiveness is high.

User in Relative Control

For all except Sheridan level 6 automation, there is some sort of user interaction with the automation beyond merely starting and stopping it. Even when you get the level of automation right, it’s still easy to end up with unsuccessful automation. Take for example the most endearing behavior of MS’s Office Assistant, lovingly known as Clippy to its countless admirers. I’m referring, of course, to the “It looks like you’re writing a letter” interruption. The level of automation is actually quite low (the initial question being a 2 on the Sheridan scale), which is the right thing to have. You certainly wouldn’t want a higher level, like level 4, where Clippy happily invokes the letter-writing wizard and takes wild guesses at the input parameters, reformats your document, and perkily tells you when it’s done. Nonetheless this hapless twist of wire has garnered more hate and derision than any other anthropomorphized office supply in history.

This leads to what I consider my first law of automation interaction. I mean after the laws like Thou Shall Not Exterminate Your Human Masters and Conquer the World, of course.

The User Shall Control Initiation of Interaction with Automation at the Place of the Automation

The problem with Clippy wasn’t so much what it offered to do, or that it offered to do it, it was when and how it offered to do it. The user types “Dear ____”, and Clippy pops up offering its services. At this point, the user is in the middle of thinking about what they’re going to write, trying to formulate just the right sentences to express love, or apology, or indignation, or triumph, or anticipation, any of the thousand of expressions far beyond a computer’s understanding. But now the user has to stop to interact with a piece of automation to say, no, you cannot help me write this letter. That was Clippy’s first violation of my law: Clippy forced the initial interaction with the user, interrupting the user until she or he clicks the Go Away button. Automation should never require the user to interact in order to not interact.

Clippy seems to be trying to mimic human interaction, engaging the user in dialog, apparently without awareness that a human interrupting a user in the middle of writing something is dang annoying too. It seems the designers at Microsoft forgot the principle that an ideal user interface disappears from the users awareness when in use, allowing the user to fully concentrate on the operational work. With Clippy, we have a UI element that is actually designed to call attention to itself and the UI in general.

Consider the improvement if instead of demanding a response, Clippy did the digital equivalent of clearing its throat, which the user could respond to or simply ignore and keep writing. One approach along these lines is MS Windows’ task tray notifications that offer automation services like cleaning up your underused desktop icons. A message slides up from the task tray. It’s salient enough to get some notice, but subtle enough to keep from being too distracting. More importantly, it can be ignored while the user goes about other tasks typically associated with computer startup , like checking email.

XP Notification offering to remove unused icons

A split menus is also a good example of automation that is readily ignorable. While adaptive menus force users to deal with them, the split menu accomplishes much of the same function by placing the few most frequently or recently used menu items at the separate easily-access region of the menu. MS Word’s Font pulldown on the format toolbar has this as does the well-used File menu. The Start Menu in Windows XP features this on the left. If the user is all set to pick an item from the main portion of the menu, they can easily go ahead and do that, ignoring the automated portion of the menu. On the other hand, it’s easy to spot the desired menu item at the automated portion and pick it from there. It’s a simple but effective automation for perhaps most long menus.

XP Start Menu, with split for frequently used items

Clippy’s second violation of my automation law was to not provide a means for the user to start the interaction, a violation the task tray notification commits too. Assume that Clippy is actually correct in recognizing that the user is writing a letter. The user doesn’t want Clippy’s help when formulating sentences (which Clippy wasn’t going to do anyway; it was going to help format the letter, a task the user is likely not immediately concerned with when she or he types “Dear”). However, the user may very well want help later on, perhaps after finishing a first draft. Where’s Clippy then? The best the user can do is use Clippy to search Help on letter writing, which after a lot of typing and clicking will ultimately lead to the Letter Wizard which is what Clippy was trying to tell the user about earlier. The ability of Clippy to detect a letter-in-the-making provides no value.

The Clean-up Your Desktop notification is just as bad, leaving the user little clues to hunt down how to initiate the automation once the notification disappears. Doing the most natural thing and interacting with the task tray does the user no good at all. Once the notification is gone, no trace remains. What if the user wants to clean up the icons on their own some time later? How will they know how? Will you see users perhaps adding useless icons hoping to provoke the icon clean up wizard? Or will they be resigned to wait until the computer decides it’s time? That’s computer-in-control.

I’ve a suggestion for pop-ups like these: Send a letter. Don’t call. What if instead the notification shrank down to an icon the user could click on the retrieve the offer? Indeed, what if the system had an entire holding pen of communications from the automation that the user can display at will, a place for all those pop-ups asking for updates and such? What if instead of Clippy, MS Word displayed a small message in the status bar offering the Letter Wizard, a message that would remain until the file is closed? Maybe Word could even display thumbnails of alternative formats for the user to choose right there.

Such offers of interaction would need some sort of expiration algorithm (such as disappearing when the file is closed) in order to prevent clutter from building up. We don’t want to add to the problem of self-appearing toolbars and such. The point is that if the user does have a need for some automation, the user shouldn’t have to decide to use it right then when the automation decides it’s time. The user should control what to initiate and when it should be initiated. This control should be provided when and where the automation occurs so the user will naturally be drawn to the right place to initiate interaction. That’s just good display-control proximity.

Even high levels of automation can benefit from following this approach. For example MS Office spelling autocorrect is 4 on the Sheridan scale, but like a good high-level automation, the user can turn it off any time for any particular word. The really great thing about it is the user can turn it off right on the page where the autocorrection occurs. Hover the mouse over a corrected word and a pulldown menu becomes available that includes an option to revert the autocorrected word and another option to turn off autocorrecting for that word entirely. This menu remains available for quite a while, letting the user decide when to turn it off, if she or he wants to turn it off. Slick.

Dropdown menu to turn-off autocorrection

Automation can significantly improve usability, but that automation should still follow the basic human factors principles of providing non-distracting and easy-to-understand information about its state, and a direct, efficient, easy-to-discover and easy-to-remember means of control.

Summary Checklist

Problem: Attempts at automation are unsuccessful.

Potential Solutions:

  • Select the right level of automation for the user’s interest and the automation’s effectiveness.
  • Recognize that lower levels of automation tend to be better for operational work.
  • Look for utilization work that should be automated.
  • Design offers of automation so user can refuse by simply ignoring it.
  • Provide a means for user to choose the time to interact with automation.
  • Keep the place of interaction with the automation proximal to the display of the automation.

Comments are closed.