Preface: This paper examines issues pertaining to the design of a user interface which must be scalable from multimodal high-performance terminals to hand-held palm terminals. The interface must be consistent, so that a user of the interface is not disoriented by moving from a very rich terminal environment to a very simple one.
This paper reflects a first examination of relevant research by the leading centers in the US in human-computer interaction, with two motivations in the literature search. One was to look for current research relevant to the task of minimizing disorientation in moving from a very rich to a very limited communication medium and to various degrees of communication capability between "very rich" and "very limited". In particular, we were looking for existing wisdom regarding moving between modalities - systems with text, speech, GUI's, styluses, tablets, physical manipulables such as dials, touchpads, mice, and perhaps haptic feedback, versus systems with subsets of the previous list. We also were looking for up-to-date indications of what the computing systems available in a five-year span might be, from some of the most likely originators of the design ideas for those systems.
What we found was very little of the first and quite a lot of food for thought regarding the second.
The report covers a six-week period of investigation and is not
comprehensive. It is the result of work by Lois Boggess and Joshua
Kilpatrick in summer of 1998.
Issues Related to Cooperative Human/Computer Interaction, Flexible Interfaces, Wearable Computers and Ubiquitous Computing
Major Sites and issues:
Summaries of some recent research
Indoor positioning is not hard to come by - there are several devices on the
market, and there are others that are not hard to assemble. [1] [1] also
lists a number of characteristics that are important - among the quantitative
measures, they indicate that it is hardest to serve a large number of users in
a large space (other quantitatives: cost, latency, data rate, resolution and
accuracy, with no indication of why these last are two different things).
Among qualitative measures they include non-intrusiveness to the user and to
the environment, and the choice of whether the data should reside in "the
environment" (presumably a distributed system) or with the user. [1]
implies that the latter choice is dictated by payoff to the user - that a
single user system would favor the information's residing with the user, but in
a collaborative system or any situation where the user benefits by the
information's residing in the system itself, clearly the design choice should
be in the other direction. IT sounds as if there's no particular dilemma
to the choice: no tradeoff is mentioned. In fact, their application
CyberGuide apparently does both. That raises the question of what they
mean when they say "If all information is local and CyberGuide is working
in single user mode, the user benefits by having the position
information. If information is distributed and CyberGuide is working in
collaborative mode, users benefit by having the system know their positions."
[1] also indicates that it is inexpensive to achieve outdoor positioning using GPS, because the cost of the latter was borne by the military and the initial costs have not been passed on to consumers.
Other issues raised by [1]: there are two kinds of communication: the position information itself, and other kinds of communication - maps, text, speech. In some systems, especially indoor systems using IR, both can be accomplished using the same communication medium. In GPS, position information is patently separate from communication. So broad-coverage untethered communication must be present in addition to the satellite communication. We don't want gaps in the physical space that is blanketed by our communications system.
[1] also mentions using a voice-only interaction system. The user has a hands-free headphone/microphone setup, a cellular phone, portable computer, and GPS device. They implemented a voice-only interface to allow retrieval of email and internet information including weather reports, news, sports information and restaurants (I infer the latter were specifically programmed for). They indicate that the speech recognition software was a weak point in the implementation - not ready for continuous speech. So they implemented a version of CyberGuide that was voice only and which took advantage of physical context (a campus tour guide, for example) to pre-load "grammars" that helped with the word recognition task.
They got into user modeling almost right away. Not just a model of the
physical context and what a user might want that relates to it, although they
expected that. Once they were using natural language to communicate, they
apparently needed to keep short-term track of names, etc. for language modeling
purposes - their example was that if the user asked for "Dave's"
email address, then the name Dave needed to be kept for awhile for pronoun
reference: the user might say to send a note "to him". They
also felt that a voice interface needed to be more flexible than a graphical
user interface, taking into account such voice indicators as whether a user
were asking a question out of interest or out of confusion, and adapting system
response accordingly.
At CMU, one project designed a wearable computer for marines performing maintenance on amphibious tracked vehicles [2]. The project used student design teams from mechanical and industrial engineering for the human factors and mechanical elements, and presumably computer and electrical engineering students for the electrical and software design. The users were presently using clipboards to record status during the maintenance and determine what needed to be replaced (and hence ordered). Different people performed the maintenance inspection and recorded the results in computer, prior to the project.
The maintenance was performed on the inspector's back and stomach at times, and in tight places. Users might be standing or squatting. Computer cables on a simulated wearable computer were likely to snag. Inspectors sometimes had to wear gloves, and sometimes put the clipboard down.
The designers in conjunction with their users felt that the inspectors needed to use both hands rarely, so one hand would normally be available for the wearable computer. One goal was to have a system about as easy to use as a clipboard (interpreted as taking only five minutes of training to become accustomed to the system).
An unexpected design direction was that they developed a dial (with large "buttons", including one that is a curved bar near the palm, according to a photo in [2]) for primary input. A large dial was usable with gloves, and it was possible to make the dial housing resistant to dirt and chemicals. That led to building the electronics in a curved shape that housed the dial. And the software design echoed the input device - selections were placed around the outside perimeter of the screen instead of looking like linear lists or rectangular menus. It became natural to move clockwise and counterclockwise among highlight-able items on the screen.
User acceptance was high, and there was a substantial reduction in task time (40%, with indications that an integrated system would have reduced time much more - the Marines were using 286's and the oldest systems CMU could work on were 386's).
Maintenance seems to be THE wearable computer application, at present. There are published accounts of a wearable computer for copy machine maintenance, and for aircraft maintenance, the latter for Boeing. (Both are mentioned in a proposal for instrumenting a number of environments for ubiquitous computing, by GIT.)
Work at Oregon Graduate Institute reported in [3] is actually focused on analysis of multimodal interfaces at the "sentence" level. For example, they would characterize differences in those interactions which were purely unimodal (speech only, graphic only, etc.) from those which were multimodal (usually speech and writing, or speech and pointing). The tasks that users performed were primarily map-based, but presumably the same behaviors would be evidenced in a graphical user interface which might present diagrams of machinery, or shipboard layouts. There were a variety of types of interactions, and speech and text were quite adequate for some of them. In fact, electronic pointing was apparently used for locative purposes but once location had been established, users often shifted to unimodal interaction (speech).
Some observations: all their users preferred multimodal interaction, and used multimodal interaction (in this context, that means more than just that one action in one mode might be followed by a different action in a different mode. Individual actions themselves were multimodal.). At the same time, most of the actions of these users were unimodal. There were different kinds of actions/commands, some of which were much more likely to be multimodal than others. As noted above, it was frequently the case that a multimodal action such as pointing to an item on the graphics tablet while speaking (to establish referent or location) was followed by unimodal speech which continued to take advantage of the already-established referent. It was also noted that the "grammar" used in multimodal interactions was distinctly different from the grammar of spoken interactions by the same users on the same system in the same scenarios. Graphical writing included pointing to locations, or drawing symbols to represent buildings, etc., drawing lines and arrows to indicate direction or actually writing words. In speech, locatives were almost always in sentence final position, [S-]V-O-LOC (as in Oviatt's example: "Add apple orchard east of Sugarloaf Mountain Park."). In contrast, when locatives were provided by electronic pointing or by the user's drawing some graphic symbol, the graphical locative almost always was in "sentence"-initial position, typically preceding the speech, resulting in a LOC[-S]-V-O construction.
Another way that the grammars differed was that unimodal spoken actions were generally much more complex than multimodal user interaction. An implication for us is that we should anticipate the need for a well-developed speech recognition system if our system is to recognize speech at all.
On the subject of speech, [3] indicates that they deliberately required subjects to indicate to the system that they were speaking to it (a "click-to-speak interface") because previous research showed that "off-line speech ... contain[s] as many as 12,400% more unintelligible words than on-line speech directed to the system [4]. That is, massive differences can exist between the intelligibility and processability of speech in a click-to-speak versus open-microphone implementation, with click-to-speak interfaces presently offering the more viable alternative."[3]
One of the surprises was that in this experiment, even though electronic pointing was easy and seemed natural, the experimenters reported that most of the multimodal interactions did not have pointing as the non-speech element. 17% of the graphical components of multimodal interactions were pointing, 7% were actual writing of words, and a whopping 76% were either what the experimenters called "drawn graphics" or symbols. Examples of drawn graphics which were mentioned included squares for buildings, a rectangle for a shopping center, a line for a roadway. Symbols included math symbols such as ">" and arrows to indicate direction. The authors concluded from this observation that "Given the more powerful and multifunctional capabilities of new pen devices, which can generate symbolic information as well as selecting things, it is clear that a broader set of multimodal integration issues needs to be addressed in future work."
A long line of research from Stanford's Center for study of Language and Information is represented by two papers by Clifford Nass and others [5],[6]. There are two reasons why their work is presented here: one is that the government (NSF and DoD) has been showcasing their research in gatherings of top-level computer interface researchers. The other reason is the plausibility that all interfaces built for human-computer cooperative enterprise should be designed with foreknowledge of the issues that Nass and his colleagues investigate.
Nass proposes the viewpoint that "Computers Are Social Actors" (CASA) and has published CASA studies for years. A central theme of his work is that people react with computers as if the computers have personalities. This thesis applies to highly sophisticated people who certainly know that computers do not have personalities. Among the examples cited are that
Nass and his colleagues pursue the social interaction between computers and
people in part because it has been suggested that computers already serve
various familiar roles for people - for example, coaches, or secretaries
- and indications are that they will do so to an ever greater
degree. The CASA researchers use typical experiments from the social
sciences for the study of interpersonal interaction, except that one side of
the interaction is a computer. In [5], they show strong evidence that if
a computer is given simple attributes that are associated with dominant or
submissive personalities, then humans respond accordingly. That is,
humans themselves tend to be dominant or submissive, and to be attracted to
similar personalities. The experiment in [5] showed that humans were "more
favourably disposed" toward computers who were similarly dominant or
submissive, and that they were more satisfied with their interaction with the
computer having a similar "personality". Moreover, it was very
simple to give the personality flags to the computer - in canned text, for
example, a "dominant" computer might present information to the user
as "You should definitely rate the flashlight higher. It is your
only reliable night signalling device", while a "submissive"
computer might convey the same information as "Perhaps the flashlight
should be rated higher? It may be your only reliable night signalling
device" [5].
The experimenters state that "Subjects found no difference between the dominant and submissive computer with respect to the Competence index" and slightly later "When the subject and the computer shared the same personality, the computer received higher competence ratings ... compared to when the subject and the computer had different personalities" [5]. From the preceding, this reader concludes that there is no evidence that a submissive computer is generally perceived as less competent. Rather, dominant humans perceive a dominant computer as more competent, and submissive humans perceive a submissive computer as more competent.
The point of [6] is to explore whether known psychological effects that apply to teams of humans also apply to teams consisting of a human and a computer. Some subjects were told that they were interdependent with their computer in the solution of a problem, while others were told that their performance evaluation would be independent of the evaluation of the computer's performance. The former subjects were highly likely to think of themselves as a team with the computer, and this strongly correlated with their perceptions of similarity to the computer, cooperativeness with the computer, openness to influence from the computer, assessment of the quality of information that they received from the computer, and their perception of "friendliness" of the computer. Significantly, such subjects were also more likely to adjust their own solution to the problem set toward an original proposed solution of the computer or their perception of the solution that the computer would propose at the end of the problem solving activity.
Scott MacKenzie of the University of Toronto has a long track record of research in human-computer interaction specializing in input/output devices - keyboards, trackballs, styluses, mice, and touchpads. Some of these have been in the context of wearable computers. We have reviewed paper [7], and are in progress on [8], [9], and [10].
The research reported in [7] indicates that the non-dominant hand should not simply be considered a less capable instrument for spatially oriented operations. While the dominant hand does seem to be more accurate for small-target spatial movements, the non-dominant hand appears to be faster for movements with grosser target areas. One implication is that future interfaces should take advantage of both hands for spatial interaction. This seems to agree with the experience of Beverly Harrison of Xerox-PARC that computer animators were very pleased with a mechanical interface (using dials) which allowed them to use both hands at once to manipulate the image on the screen. Harrison says "Animators and modeling experts are highly trained, highly skilled, and expensive. Improving their tools (and hence their productivity) has dramatic impact on the perceived usefulness of the products they buy. Ironically, taking automation OUT of their process - for certain tasks - was incredibly well received. By moving functionality from software features onto dial boxes and knobs, two-handed input was enabled. Animators could obtain tactile feedback while focusing their visual attention on the creature they were moving and altering...as they were moving it, by jiggling the knobs and dials. To me, this is also ubiquitous computing. It wasn't invisible or passive but it absolutely blended the physical and the virtual. It supported the way in which people wanted to work." [11]
The goal of work at Cambridge University [12] is to develop an environment in which a user moves around and the interface which provides information and data to that user essentially moves around with him or her, even though no single computer does. (E.g., a user is presently using a video-image based application, and the image "follows" the user from room to room.) The research environment is a laboratory with many rooms, many computers, and many devices with different interface characteristics. Adly and his colleagues use the infrared-based Active Badge system to keep position information (to within several meters) on individuals and equipment, augmented by an ultrasound system to locate entities to within 10 centimeters. They use Oracle technology both to keep track of the status of the many entities in the environment (many of which are dynamic: system topology, current operating system of a given device, etc., in addition to position) and to deliver GUI-based database services. While this work seems highly relevant to our current objectives, the only paper we have encountered at this writing indicates that the work is very much in the initial stages - developing the database needed to keep track of the changing environment.
The work at Xerox-PARC is highly influential. Much of the ubiquitous computing paradigm was formulated by Marc Weisser of Xerox-PARC. One of his ideas not referenced above but influencing research in a number of places is the concept of "peripheral" information presentation. On his home pages, he gives an example of our not paying explicit attention to the sound of our car's engine as we drive. But if it starts to sound wrong, we often are very quickly conscious of it. In a similar vein, he advocates presenting information peripherally, with no intention that such presentation should be unimportant. He also advocates "calming technology". One example is of a string sculpture (which could also be considered an example of peripheral information presentation). The dangling string sculpture is in a hallway and visible to many researchers. It receives perturbations which are directly related to the amount of traffic on the internet, so that a person is indirectly aware of how heavily used the network is at any given time.
MSU anticipates looking further into "calm technology" and "peripheral" information presentation, as likely influences in cooperative computing environments in the next few years. We expect to continue to look explicitly for research in cooperative human-computer environments and environments to enhance human cooperative problem solving. We will review work at MIT, further investigate the work of the University of Toronto, the Responsive Workbench of Stanford, implications of the smart kiosk of Anderson Computing, and the augmented computing environments of Columbia University. We are also aware that there is related research work at Apple Computer which should be investigated.
References
[1] Abowd, Gregory D., Anind Dey, Robert Orr, and Jason Brotherton. 1997. Context-awareness in wearable and ubiquitous computing. Framework discussion, research agenda for context-aware computing. GVU Center, College of Computing, Georgia Institute of Technology.
[2] Bass, Len, Chris Kasabach, Richard Martin, Dan Siewiorek, Asim Smailagic, John Stivoric. 1997. The design of a wearable computer. Proceedings of CHI97. Association of Computing Machinery. http://www.acm.org/sigchi/chi97/proceedings/paper/ljb.htm (accessed June 1998)
[3] Oviatt, Sharon, Antonella DeAngeli, and Karen Kuhn. 1997. Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings of CHI97. Association of Computing Machinery. http://www.acm.org/sigchi/chi97/proceedings/paper/slo.htm (accessed June 1998)
[4] Oviatt, Sharon, P. Cohen, and M. Wang. 1994. Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity. Speech Communication, 15(3-4):283-300
[5] Nass, Cliffor, Youngme Moon, B.J. Fogg, Byron Reeves, and D. Chrisopher Dryer. 1995. Can computer personalities be human personalities? International Journal of Human-Computer Studies. 43:223-239.
[6] Nass, Clifford, B.J. Fogg, and Youngme Moon. 1996. Can computers be teammates? International Journal of Human-Computer Studies. 45:669-678.
[7] Kabbash, Paul, I.Scott MacKenzie, and William Buxton. 1993. Human performance using computer input devices in the preferred and non-preferred hands. Proceedings of InterCHI'93, 474-481. http://www.dgp.toronto.edu/OTP/paperrs/bill.buxton/LHfitts.html (accessed June '98)
[8] Matias, Edgar, I. Scott MacKenzie, and William Buxton. 1996. A wearable computer for use in microgravity space and other non-desktop environments. Companion of the CHI '96 Conference on Human Factors in Computing Systems. 69-70 http://www.dgp.toronto.edu/people/ematias/papers/chi96/ (accessed June '98)
[9] Akamatsu, Motoyuki, I. Scott MacKenzie, and Thierry Hasbrouc. 1995. A comparison of tactile, auditory, and visual feedback in a pointing task using a mouse-type device. Ergonomics, 38:816-827.
[10] MacKenzie, I. Scott, and Aleks Oniszczak. 1998. A comparison of three selection techniques for touchpads. Proceedings of the CHI'98 Conference on Human Factors in Computing Systems. 336-343.
[11] Harrison, Beverly. Position paper for ubiquitous computing workshop.
[12] Adly, Noha, Pete Steggles, and Andy Harter. SPIRIT: A resource
database for mobile users. Cambridge University. copied from
WWW in June 1998.