Human Computer Interaction

HCI: Bringing Computers Closer to Humans

M Sasikumar, CDAC Mumbai, ni.iabmumcadc|isas#ni.iabmumcadc|isas
Paper presented as keynote address at BVCON-09, Sangli, February, 2009

Introduction

The notion of computing has changed in many ways over the last half-a-decade of its existence. From flipping switches to load instructions into memory and a completely non-interactive execution mode, we now have massive reusable libraries and drag and drop environments to write programs and a variety of mechanisms to interact with the computer even while the program is running. The early users were largely specialists and technical geeks running some scientific analysis programs; whereas today’s computer users are mostly non-technical and the usage includes even ‘non-computing’ tasks such as watching movies, booking tickets and checking out current news. This shift has a bi-directional impact on the user interface aspects, leading to the formation of the field of human computer interaction. This is today a vast area with a number of concerns spanning many disciplines from psychology to electronics. In this paper, I will focus on some aspects of HCI, mostly in the area of emerging trends.

In the next section, we will look at the current HCI scenario, and look at some of the challenges. Then I will look at how some of these challenges can be addressed and some of the issues in doing so. The coverage will, by no means be exhaustive, but indicative and also biased by my own areas of research interest and exposure. The treatment will be mostly from the software aspects, and will largely stay clear of hardware development and related technologies.

The most important element of HCI is interaction. This means the humans should be able to understand what the computer is saying and vice versa. In general, if the computer could not understand your input, it could ask you to try again – often with some revision. This failure could be due to a spelling error in issuing the command or an inappropriate context. However, this option is generally not available the other way around. Computers are not generally programmed with this flexibility. We allow or take for granted a lot of flexibility when communicating to another person. Computers usually do not have the ability to use context (and much less any common sense) in a generic way to help understand our command. Therefore, over the years we have invented precise, but increasingly richer artificial languages and mechanisms to communicate to the computer and used more natural means for the computer to communicate to us. So, we have rich programming languages to talk to the computer, but the outputs are usually produced in natural language, and would use colour and visuals. An observation of the various input and output mechanisms would provide many more illustrations of this.

HCI Today

In the early days of computing, one would need to come down to the level of a computer, almost talking binary language, if one wanted to get the computer to do any work. There was no flexibility and no tolerance. Being a computer user was a specialized job. Since then, there has been a steady effort to revise the level of the computer so that we need to bend less and less. That quest still continues along many ways.

We moved from machine language to assembly language to higher and higher levels of programming languages to help us instruct the machine more comfortably. Compare the level of abstractions in Java to that of C or Fortran. We built visual development environments to provide better tolerance to errors and to reduce artificial constructs we need to memorise. Dragging a text box icon from the menu and placing it at a suitable position on the application screen, automatically generates the code for putting a text box with the right coordinate values and relevant validation checks. The garbage collector takes away our worries about managing memory – at least to a large extent.

Attempts have also been made to build / design programming abstractions away from the model of a physical machine. Logic, functions, sets etc have formed the basis for many languages like Prolog and Haskell. Scripting languages such as Perl and Python made the development itself more interactive and incremental. Sophisticated IDEs – integrated development environments – made programming much less of a nightmare providing large reusable libraries, usual error trackers, revision control, and so on.

On the other side, the complexity of programs have been growing far more rapidly. Compared to the hundred line programs of the early days, today’s programs often run into millions of line of high level language code – usually well over the capability of human mind to grasp or comprehend. Thus the challenge of software development to produce software that is error free, scalable, etc at a fast pace still remains. We now have the added complexity of heterogeneous platforms and an unreliable (theoretically) internet to add to our sources of problems.

Along with these developments in the software development side, the end user interface has also been changing drastically. It may be noted that normally the term HCI concerns this target segment, and not the software developer community. But it is useful to keep the changes in that segment also in mind, while discussing HCI. In the (g)olden days, the user would never ‘interact’ with the computer; his role would be to anticipate all the data required for the program and provide them in one step at the start of the program and then wait for the final output from the program. The arrival of interactive terminals enabled him to get a step closer, being able to run / interrupt / terminate his programs as he wished and enter inputs as and when required. This helped in reducing time in tracking problems due to invalid input.

Today most users use an independent computer with a rich graphical interface as the media for interaction. A number of software and hardware innovations enable a much more pleasant experience. But is that enough?

We have the familiar keyboard as still the popular input device. Mouse, meant as a pointing device, has been adapted to provide many more functionalities: one can scroll a page, copy text and image (or even files) from one place to another, resize / zoom screens, and even enter text using an On-screen keyboard. Joysticks are popular in games as they can provide directional indications along with intensity. Thus your movement is faster if you push the joystick farther away from the centre. Laptops use joystick as a substitute for Mouse. Touch screens, though comparatively more expensive, is another significant development. With mouse, the device movement is on a different platform and often at a different pace compared to the movement of the pointer on the screen and hence requires higher cognitive involvement. Touch screen eliminates this 2-level indirection and allows a more natural ‘pointer control’. And building a scroll bar etc on screen allows you to provide scroll and other facilities. Touch screen combined with a good stylus can allow you to draw or even take notes in your own handwriting. One major challenge today is to enable handwriting recognition, so that one can write your commands or data on the screen, and the system would decode it as if you entered them by keyboard (see next section).

Some Challenges in HCI

Textbased input continues to be the main mode of input to the computer, and screen display the main mode of output. There are three broad directions in which the field of HCI is moving forward:

  1. Improving the effectiveness of the current I/O modalities
  2. New I/O modalities to cater to specific requirements
  3. Design elements of information presentation, etc to enhance the impact and access.

The last is based on psychological studies of human cognition, perception, etc; we will not discuss this much here. For the other two, there are some major current weaknesses which are driving the change. And as ICT moves into lower strata of society, these weaknesses become more and more important. Current interface mechanisms make a lot of assumptions on the part of the user:

  1. The user need to be English literate to understand the commands and messages.
  2. The user must have a reading literacy level in some language (mostly English today – see previous point).
  3. The user need to have a reasonable proficiency with keyboard, mouse, etc for non-trivial usage of the computer (e.g., making a document, sending e-mail, etc).
  4. The user need to be normal in eyesight, physical abilities (use of hand, etc) to operate these devices.
  5. Understanding the normal conventions of software applications, to understand the technical terms, their significance, the common options and behavioural assumptions.

If you look at the user profile, there are two broad categories of users (we attempt a more finer classification in section 3.7): those who use computers regularly at a reasonable depth, and the occassional users (check mail, e-gov application users, etc). The learning curve for the occassional users to build the necessary comfort to interface with computers is a significant concern, since they will not be ready to invest much in that. Hence, all the assumptions above has serious implications for them.

With the spread of computing to schools, homes, and SMEs all these assumptions are becoming major concerns, and surely affecting the future of HCI. Relaxing each of the assumptions, requires work on different dimensions of technology. We outline some of these in the subsections below.

Need for English Language

Software localisation is the area concerned with adapting existing software to local languages and culture[1]. Mostly this amounts to first enabling the computer system to display the relevant language on screen and providing mechanisms to enter text in that language, and then having all textual messages output by the software translated to the respective language. Thus all menu items, messages, tool-tip help, prompts, etc are made to appear in the target language.

A suitably internationalised software would have all such strings stored separately from the program source code, and thus makes it possible for language translators to do the translation without having to worry about the syntax of the programming language used. Gettext is a framework to produce such an internationalised software. Tools such as kbabel (kbabel.kde.org) and Sutra[4] provide a rich environment to undertake such translation work effectively.

One can today find Gnu/Linux desktops, office suite, browser, etc available in a number of Indian languages (see the BOSS distribution of Gnu/Linux for example, at www.bosslinux.in). However, these have not yet propagated to the relevant target user community. Though technologically localisation is now a well understood domain, there are a number of challenges remaining for this to solve the language divide problem. This includes choice of local language terminologies for computer jargons that are intelligible to common users, adopting metaphors that are more natural to these users, etc.

Need for Reading-Literacy Level

Much of India is still illiterate at the reading/writing level. This means that even if the interface is in Indian language, they can neither read the screen nor enter their responses. HCI of tomorrow need to address these people. Speech synthesis and recognition is one of the major options being explored. While both these related disciplines are quite old, the performance of speech recognition and synthesis systems today are not very good. Synthesis systems still offer largely robotic voice with no ability to bring in emotional content (prosody), or to vary the pace and pitch depending on the nature of the content. One would like to see, for example, a difference in the way a normal diagnostic message and an error message are read out. On the recognition side, there are challenges of speaker independence, ambient noise handling, quality of speech, etc. While there is reasonable progress in English, there is relatively little progress in Indian languages, for speech processing.
One needs to build good quality speech corpora, language models, linguistic resources like dictionary, etc for all languages. Speech recognition for Indian languages is an active research area.
Gesture recognition is another emerging option. In general, this refers to recognising gestures made by the user and understanding the message. Often, this term is used to mean keystrokes, where this is used as a means to enter character level input (as in keyboard). The general gesture recognition problem is a challenging research problem having to define the alphabet and build effective recognition systems.

Unfamiliarity with Devices

Keyboards and mouse may appear to be natural for those who are used to it. But for occassional users and those who are not used to it, these are cumbersome devices to use. As mentioned earlier, speech provides one option for minimising use of these devices. For addressing the unfamiliarity issue, if one is comfortable with reading/writing skills, handwriting recognition would be a good alternative. Again, one can get good performance with languages like English which has a small set of letters and where the letters do not change shape irrespective of the context. Indian language scripts are comparatively more complex in shape, and in most cases, vowels modify the shape of the associated consonant quite significantly. This adds a high degree of complexity to Indian language rendering and handwriting recognition. This is another area of serious research [2]. We are able to get decent performance with the vowels and consonants independently; combining them is still a challenging research problem.

Unfamiliarity with the Syntax

Apart from the use of devices, one barrier in computer use is the typical metaphor of working with a computer. Consider the sequence of actions to create a document, play a media file, and so on. Today's GUI has reduced many of this to a click on some icon. There are, still many occassions where users are confused as to the right sequence of actions. Natural language interface which can accept instructions or inputs in normal human language (as we would give to another human) is a possible remedy. But typing full sentences using a keyboard is often more cumbersome than learning to follow the simple command sequences. Ability to understand/decode sentences in natural language, combined with speech recognition, offers a powerful alternative. However, today, this is a dream, given the state of the art performance in speech recognition and natural language processing. But, this shows promise if attempted as a combination of technology and engineering.
Intelligent context sensitive help is another element in addressing this concern. Systems today can collect traces of your activity on the computer, and use that to predict your intention. Some of you may recall the office assistant introduced by Microsoft a few years back, which attempted this kind of task, but quite unsuccessfully. A good prediction can help you locate the right sequence and complete it quite easily including identification of the right parameters (e.g. Printer settings for printing a letter).

Changing Face of Computing

The picture of a powerful computer has changed drastically over the years from the huge boxes filling a large room, to a small handheld device. Increasingly, the mobile phones and PDAs are being used as the interface point to computing today, replacing the desktops and laptops. While the computing power in them is increasingly steadily, it is often acting simply as a gateway to richer computing environments. When you use googledocs (docs.google.com) to edit and share a document, you are using a computing environment outside your system for hosting your resources and even the software environment. Cloud computing is another example of 'externalisation' of your computing, with you needing to provide just a gateway. We are almost going back to the era of dumb terminals, with the computing and storage happening at a server elsewhere, and the terminal providing simple input/output mechanism. Today, the I/O requirements are high – speech based, gesture based, Indian language, and so on – and hence the terminals may not be all that dumb. But the approach is similar. Ubiquitous computing offers another major thread in this changing face. With computing everywhere, and the tasks being highly distributed, the requirements on the interfaces are changing drastically.
Thus, small hand-held devices is a major constituent of HCI research. And these devices, given the memory constraints, and the small physical footprint are offering new challenges to HCI design. For example, large keyboards with many keys, or mouses, and large display areas to point-and-select are not possible. Again, there are various ways to address these problems. On one side, we look at devices such as virtual keyboards over which you can type as if there is a real keyboard. On the other, we look at innovative interface and interaction mechanisms. The telephone keypad adapted to enter full English alphabet (as in cellphones) is one such example. Think of how to enter Indian language text using such devices – there are interesting ideas being explored here. One idea for example, allows you to enter the consonant from the keyboad, and indicate the following vowel, with a stroke of the stylus on the touch screen.
Equally important are the ways in which information is organised on the limited screen space – the cost of “real estate” is much higher than on a desktop. Effective use of colour at high resolution and other visual metaphors to communicate information is one aspect of this concern.
The web is now emerging as the uniform base for interaction with a computer – local desktop or a remote machine. Most of the applications today are operated through the browser. This trend is also a factor in interface design, providing a uniform convention and model, for doing a variety of tasks on the machine, compared to the variety of applications from different sources each using their own conventions and interfaces.

Physical Constraints and Devices

There is a growing concern across the world on accessibility of computing. While language and literacy are two major reasons for computing to be inaccessible to people, different types of disabilities are another. The web accessibility initiative (www.w3.org/WAI) launched by W3C has chosen “web for all” as its major driving theme. Partial or complete blindness makes screen a meaningless device and metaphor for the user. Physical disabilities may make the person incapable of operating a keyboard or a mouse. Even the trembling of hand caused due to old age or weakness, can make the mouse very hard to deal with. There are interesting alternative devices which can be operated with your eye movement, head movement, feet, etc that are addressing these kinds of problems. While these may not cause changes in the basic software architecture, they can bring in their own nuances and even opportunities. There are also switches one can make on-off by blowing or sucking air. Also recall the machine used by Stephen Hawking, the famous physicist. These are relatively open and fertile areas at present.

Types of Users

As remarked earlier, there are a few clearly different classes of users emerging in the computing space. At the lowest end are those who use a computer for one or two well defined tasks – such as sending a mail, drafting a letter, printing a document, etc. We would like a simple interface requiring minimal training, “obvious” action sequences, and simple and visible monitoring. This will be the largest segment of the users as we go along, where computing is a productivity enhancing tool like many other devices like a diary or cellphone. The next segment uses more sophisticated applications such as graphical design studios, layout designers, non-IT researchers, and so on. These work with computers more heavily, and may need to use more of the computer than the first group. They may create a large set of files, will need to deal with external storage devices and backups, the cost of disasters is high, etc. They would need richer set of tools, good organisation of the system, efficient management support, etc.
The upper end of the user spectrum are hardcore system developers – writing serious softwares. Those involved in simple database driven client server applications such as shop management, hotel reservation, etc can be placed a level lower than this group. Both these groups may need good performance, will use a variety of softwares – often expensive in CPU and memory – and will need to install and remove softwares, and may even change the system configuration from time to time. The upper most level may even need to interface additional devices – say barcode reader, laboratory equipments, etc – and hence need familiary and access to the system at a very deep level.
As one can guess, the HCI constraints on these users are very different. Unfortunately, all of them work with the same HCI, and this causes some of the difficulties of use. This need to change.

Newer Metaphors

For a few decades, the notion of a desktop has driven the interface design of computers – from the concept of neighbourhood to that of a trash can. With the changes in the computing landscape, this is no longer an obvious choice. A lot of our users may not understand the notion of a desktop, for example. The kind of 'surface computing' model introduced by Microsoft recently, provides a richer interface scheme – though the theme is still that of a desktop. One can perform a lot of the common operations just using one's finger using touch and pressure variation.
Progress in the virtual reality area offers even more interesting ways to interact with computers. One can be in a 3-D world – from the 1-D world of a keyboard, and the 2-D world of a computer screen – and have the associated degrees of freedom. An office or a house can now become the main interface, where you can wander from room to room, in all three dimensions, providing newer ways of organising information and documents.

Enhancing effectivness of existing modalities

A number of ideas have been brought into user interfaces to reduce errors and improve speed even using existing modalities such as keyboad and mouse. Today, no online form would require you to type in a country name or even a city name; these can easily be selected from a standard list of valid names. This reduces typing overhead and also typographical errors. Even numeric values such as percentage can be entered efficiently by dragging a slider, or rotating a wheel. The system can also, on the fly, compute even dependent parameters as you experiment with this parameter, providing you additional feedback in making your selection. Imagine selecting R-G-B values to indicate your choice of colour without getting to see what the resulting colour looks like.
Support dictionaries of common words, spell checkers and grammar checkers are also significant productivity enhancers. Many word processors provide auto-word completion based on first few characters you type. This saves a lot of effort, particularly for long words. Systems like Anumaan (www.ossrc.org.in) takes this further to word level and also to personalise the prediction by using documents he/she had created earlier. These improve data entry speed and reduce errors in entering text.

Conclusion

As the previous section illustrated, there are a number of threads of activity that is influencing the human computer interaction metaphor of tomorrow. In addition to these, we are also concerned with raising the level of interaction between a human and computer. Computer is today, no longer, a idiot sawant, compared to a decade or two back. But it is still quite a bit away from offering the comfort of interacting with another human being. Artificial intelligence techniques are sure to play a role in this realm. One interesting thread is for the machine to be able to recognise the emotional state of the human user [3], and adapt its response suitably. Emotions are now recognised to be a key component of human intelligence infrastructure, and hence the interest in equiping the machine with some capability to deal with this aspect. There are many challenges here in recognising the emotional state (without disturbing the state!), and modifying the course of action suitably. The affective computing initiative at the MIT media lab is an illustration of the trends in this direction.
A lot of the challenges facing HCI has no single optimal solution. Many of the concerned technological areas are far too complex for perfect solutions. Effective modelling of the domain, exploitation of the context, and some amount of engineering can often lead to effective solutions. For example, speech recognition is a difficult problem in general, but for a speech interface to the desktop, the possible utterances are limited, and this can be exploited to provide a good enough recognition performance.
As computing changes its appearance in different ways – the user profile, the task profile, the infrastructure and so on – HCI will be impacted heavily. What has been sketched in the paper is just some of the directions.

References

[1]. M Sasikumar, Aparna R, Naveen K, Rajendra Prasad M, Guide to software localisation. IOSN. http://www.ossrc.org.in/resources/index.html, 2004.
[2] Leena Ragha and M Sasikumar. Dynamic Preprocessing and Feature analysis for Handwritten Character Recognition in Kannada. IEEE International advanced computing conference, Patiala, March 2009.
[3] M Sasikumar and Preeti Khanna. Incorporating emotion in HCI. Invited paper for 'instructional conference on research trends in Information Technology'. Kolhapur, March 2007.
[4] Aparna R, Sasikumar M, Santosh M. SuTra - an Intelligent Suggestive Translator tool for Incremental Localisation. E-gov Magazine, September 2008.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License