Computers are getting advanced and handy. Slowly, the world is approaching the age of ubiquitous computing. Virtual Reality has the ability to completely occupy user's attention in an environment generated within the computer by the use of data gloves and head-mounted displays. However, these peripheral devices drag one's attention from the physical world. The idea to cope the drawback is to incorporate the virtual world into physical world. Besides that, approaches are made to reduce the intrusion made by the cumbersome devices as these devices are mostly heavy and expensive. Augmented Reality (AR) builds a bridge between the physical world of pens, people and paper, and the virtual world of pixels, programs and pointers, through cameras and projectors.
Augmented Reality (AR), also known as Mixed Reality, is a variation of Virtual Environments (VE), or normally called as Virtual Reality. Virtual Reality is a computer-generated 3D Spatial environment that allows user to participate in real-time by using hardware or gadgets. Virtual Reality technologies provide a fully immersive world that completely immerse a user inside a synthetic environment. While immersed, the user does not see the real world around him since immersive world gives the user the impression that he is in the synthetic environment while in reality, the user is physically in another place in real world. In contrast, AR allows the user to see the real world, with virtual objects superimposed upon or composited with the real world. Therefore, AR supplements reality, rather than completely replacing it.
Augmented Reality is a perceptual space where virtual information, such as text or
objects, is merged with the actual view of the user's surrounding environment. AR is an environment which combines virtual reality with real world elements. However, some researchers define AR in a way that AR requires the use of Head-Mounted Displays (HMDs) (Azuma 1997). To avoid limiting AR to specific technologies, Azuma (1997) defined AR as system that possesses three characteristics such as Combines real and virtual, Interactive in real time and Registered in 3-D. Figure 1.1 shows an example of augmented reality applications, named Magic Book. The planet together with its orbit is a virtual object displays on top of a physical book.
http://arblog.inglobetechnologies.com/wp-content/uploads/2011/06/ar_book_2.jpg
Figure 1.1: Magic Book (Source: www.google.com)
Generally AR can be classified into two categories, which are marker-based AR and marker-less AR. To compare both method, marker-based and marker-less AR have different strengths and weaknesses. It is to be found that marker based AR has superior recognition rate but the existence of marker reduces immersion since the marker is visible. On the other hand, marker-less AR, which means non-marker based, is honoured for having high immersion yet in contrast it has poor recognition rate.
One of the most important issues in augmented reality system is the interfaces AR can provide. According to the work by Wanderley et al. (2006), the interfaces can be grouped into the following three categories:
1. Classical Interfaces usually utilize certain tracking devices, such as gloves, wands and infrared sensors. ARQuake and Human Pacman are typical examples that use wearable devices such as Head-Mounted Displays (HMDs) and GPStracked laptops. Classical interfaces provide a practical and direct way of interactions to users. However, besides high-priced, such interfaces are found to be interfering and less applicable since cables and sensors have to be carried or worn by the users.
2. Tangible Interfaces use graspable physical objects to control digital models. The virtual objects are manipulated through one-to-one mapping relation to the physical metaphors or background objects. Through tangible interfaces, users can feel the touch, weight and control of tangible objects. MulTetris is one of the examples that uses graspable interface to manipulate bricks in traditional Tetris game. Further study is extended on the use of markers. As markers are attached to background objects, virtual objects can be displayed on top of these markers. This is done through vision-based tracking. Examples include an augmented reality book displaying virtual graphics on real physical books called Magic-Book, and the Harbour Game, an urban planning game for harbour areas on a physical board with markers. Tangible interfaces provide user-friendly interactions hence suit with users that acquire less skill in controlling computer. However, the physical metaphors may not be satisfactorily natural, and the markers are required to be specially designed for calibration and tracking purpose.
3. Bare-hand Interfaces use hands of the user for interaction. As hands are commonly used in real life, bare-hand interfaces are more natural if compare to classical interfaces and tangible interfaces. The interactions are done by detection of bare hands of users. None of the aid of devices is required. This benefits the users as bare-hand interfaces give more simple and natural control towards the interactions. Pre-design of markers is not necessary.
A variety of interaction attempts have been presented, including the use of tracked objects (Kiyokawa 1999), mouse and keyboard input, as well as pen and tablet (Schmalstieg 2000). However, if compare to hand interaction, those interactions are less natural to be used. Since hands are our main means of interaction with objects in real life, AR interfaces should adopt hand interaction to manipulate virtual objects by the detection of user's fingertips without wearing any devices. In this paper we will be developing an augmented reality system that uses hand interaction to manipulate 3D virtual objects by finger tracking.
The technologies for hand and finger-based interfaces can be roughly split into two categories, which is sensing-based and computer-vision based. Sensing-based systems are very robust yet not able to recognize hands or other physical object that come into view hence often limited to detecting only "touch" behaviour. On the other hand, computer-vision based systems are often limited by the lighting conditions and may not respond well to sudden changes in the field of view.
1.2 Problem Background
To achieve more natural manipulation proficiency for transformation operations such as scaling and rotation, the idea of two handed interaction were initially established in a study of 2D environments by Buxton and Myers (1986). By using two hands for manipulation and interaction, the performance and accuracy is enhanced.
However, Wellner (1992, 1993) was the first person to use the term 'Digital Desk'. The word 'digital' does not imply that the desk is being digital but it segments the idea of the finger from the 'Digit'al Desk. Examples of applications for the Digital Desk includes Calculator, Paper Paint and Double Digital Desk to name a few. Nevertheless, systems applied by Wellner were slow and merely effective in demonstrating the concepts of the Digital Desk. Later, from the idea of Digital Desk, Crowley and Coutaz (1995) refined the technology, namely the necessity that the user's hand should act as a mouse. They coped the problem of tracking fingertip in real-time, over a cluttered background with 'clicking' and 'dragging' operations simulated by the space bar on a keyboard. There is an application called Finger Paint which demonstrates their program. However, the tracking performance was stated as being unsatisfactory with respect to Fitts' Law. Fitts' Law is good at modelling "fast, aimed movement", yet not for drawing actions where motion is constrained. As these attempts concentrated more on the vision aspect of tracking general objects than on using the information already known about the user's hand, another research is done by T. Brown and Richard C Thomas (2000) to track the user's fingertip as fast as possible in real-time so the system could be compared with other input devices, using models such as Fitts' Law.
Crowley JL, Bérard F, Coutaz J (1995) featured one of the first approaches to utilize finger tracking in AR using a camera and projector mounted on the ceiling. A common attempt to finger tracking is by marker-based approach. Gloves are used with embedded retro-reflective markers in many augmented reality applications for high accuracy and reliability as in Ulhaas et al (2001). A simple colour patch was used by Zeng et al. (1997) to ensure trackability of the fingers as they moved over other parts of the body. The main drawback of marker-based approach is the need to carry or wear particular devices as aids.
Tinmith-Metro application (1998) by W. Piekarski and Bruce H. Thomas extended previous image plane techniques (Pierce, Forsberg, Conway, Hong, Zeleznik and Mine 1997) to support object manipulation such as translation, rotation, scaling and object selection in mobile augmented reality. Tinmith-Metro application persisted study on outdoor augmented reality performance by demonstrating the capture and creation of 3D geometry outdoors in real time. Vision tracking technique was implemented for 3D cursor (Kato and Billinghurst 1999) by placing fiducial markers on the tips of the thumbs. The interface could perform selection, manipulation, and creation operations by pointing into the virtual environment.
Markers were used by attaching to the index finger by Dorfmüller-Ulhaas K and Schmalstieg D (2001) to accomplish interaction in an AR board game. The finger tracking enabled user to grab, translate, rotate, and release objects in a natural way. Another study from Koike H, Sato Y, Kobayashi Y (2001) is the interaction with an augmented desk environment that implemented by automatic finger tracking.
Hardenberg and F. Brard (2001) introduced a finger tracking algorithm using a single camera. User's hand can be easily differentiated from the stagnant background with the smart image differencing in this study. Fingertips are then required to be detected for interaction use.
There are several research that uses computer vision techniques for natural hand tracking. W. Piekarski and B. H. Thomas (2002) described a hand tracker based on the ARToolKit for mobile outdoor environments. Their work provided a way to interact with augmented object using hand gestures but wearing of custom glove hardware was required.
Each thumb and fingers of a glove was attached with ARToolKit fiducial marker, besides having metallic sensors on the fingertips, thumb, and palm to detect finger presses. The wearable system captured video by using a camera mounted on the user's head and passing the video to the ARToolKit to track user's thumbs in world coordinates.
HandVu is a computer vision module for gesture recognition in an AR interface, adopted by M. Kölsch, M. Turk, and T. Höllerer (2004), which detected user's hand in a standard pose based on colour and texture. The hand was tracked by the "flock-of-features" tracking proposed by M. Kölsch, M. Turk, and T. Höllerer in spite of appearance changes. Nevertheless, weakness encountered regarding the output, as the user's hand location in 2D image coordinates cannot be easily used to manipulate virtual objects in 3D space.
T. Lee and T. Höllerer (2007) presented Handy AR which differs from previous work is that Handy AR used a human hand as a tracking pattern to display augmented virtual objects on hand. In an offline calibration phase, to make a hand pose model, each fingertip position was measured in the presence of ground-truth scale information. A camera pose relative to the hand was rebuilt in real time and a virtual object was augmented on top of the user's hand in 3D based on the camera pose. The effort in Handy AR does not concentrate on interaction with virtual objects, but rather on using the user's hand for marker-free augmentation.
While there are a number of AR interfaces use computer vision based hand tracking, none of them support 3D natural hand interaction and little research is done in multimodal input for AR interfaces. Hence, Lee M, Green R, Billinghurst M (2008) presented a detailed study on the usability and usefulness of finger tracking for AR environments using head mounted displays (HMDS). In this paper a computer vision-based 3D hand tracking system for a multimodal Augmented Reality (AR) interface was presented by developing a 3D vision-based natural hand interaction method with finger pointing and direct touch of augmented objects. The interface consists of four steps: (1) Skin colour segmentation, (2) Feature point finding, (3) Hand direction calculation, and (4) Simple collision detection based on a short finger ray for interaction between the user's hand and augmented objects. The resulting fingertip tracking accuracy assorted from 3mm to 20mm depending on the distance between the user's hand and the stereo camera. To give a better user experience, this hand tracking is applied in three AR applications which combine gesture and speech input.
In order to provide robust real-time hand and finger tracking in the presence of rapid hand movements and without the need of initial setup stage, Peng Song1, Hang Yu1 and Stefan Winkler (2009) proposed an improved finger tracking method based on Hardenberg's previous paper about fingertip shape detection method with more robustness and accuracy. A 3D finger based interaction interface was proposed in this paper for mixed reality games with physical simulation. All the operations were performed by finger gestures as the interface was implemented with an improved vision-based finger tracking technique. The interface consists of a stereo camera that tracked the 3D location and direction of user's fingers robustly and accurately. To enable additional physical realism to the manipulation of virtual objects, a physics engine was integrated into the system to manage the physics-based interactions. When the user manipulated a virtual object with his finger, the objects' physical characteristics such as gravity and collision effects were simulated by the physics engine.
Visual detection of the fingertips has been significantly applied in augmented reality. A commonly used methodology is to define a simple fingertip model and compare the model to image patches in a running window. This process can be applied in two cases. The first case is applied to the binary masks gained from previous hand segmentation step in which the model describes the shape whereas the second case is to apply directly to the original colour images with the model describing the appearance, for instance colour of the fingertips. Shape description circular masks have been proposed by K. Oka, Y. Sato, and H. Koike (2002), J. Letessier and F. Berard (2004), as well as P. Song, H. Yu and S. Winkler (2008) to encode the half-circular shape of fingertips.
Yin and Xie (2007) searched for reasonable finger positions by analyzing the binary profile of circles around the centre of mass of segmented hand with different radii. A hybrid approach that used an elongated shape model and a colour-based appearance model is adopted by Cinque et al. (2008). As alternatives to running window detection approaches, border tracing with maxima detection of local curvature has been proposed by R. O'Hagan, A. Zelinsky, and S. Rougeaux (2002), S. Malik and J. Laszlo (2004), J. Ravikiran, K. Mahesh, S. Mahishi, R. Dheeraj, S. Sudheender, and N. Pujari (2009), and A. Mustafa and K. Venkatesh (2011), as well as skeletonization (A. Dawod, J. Abdullah, and M. Alam 2010, J. An, J. Min, and K. Hong 2011).
An et al. (2011) detected fingertip gestures with the use of the rear-facing camera of mobile phones. The finger was placed close to the camera so that only the upper part of the finger was visible. The object to be detected was much larger in the image and hence made the task substantially easier.
Hurst and van Wezel (2012) evaluated the usability of potential gesture-based interaction techniques in mobile Augmented Reality applications. Coloured markers were attached to the user's fingertips to facilitate the detection.
A related marker-based prototype for mobile gesture detection was introduced in the project MIT's Sixth Sense by P. Mistry, P. Maes, and L. Chang (2009) for interacting with content projected by a wearable pico projector. SixthSense is a wearable gestural interface that augments the physical world with digital information and enable users to use natural hand gestures to interact with the digital information. Pranav Mistry of MIT Media Lab is the inventor of SixthSense.
1.3 Problem Statement
Traditional keyboard and mouse input devices are less natural to be used as interaction tool in Augmented Reality system. As hand provides natural interaction in real life, hand interaction would be very much appreciated if apply in AR system. Moreover, hand provides easier, simple and direct control while manipulating with virtual objects in AR if to be compared with other interaction approaches like keyboard and mouse as well. However, hand interaction is not easy as it involves not only hand but fingertips and gestures. The bottleneck of this project is to load more than one marker into ARTOOLKIT, a software for building AR system. Furthermore, manipulation between markers and 3D virtual objects is not an easy task as more than one finger is used in this project.
1.4 Aim
The main goal of this project is to manipulate 3D objects by using fingertips in Augmented Reality(AR) system.
1.5 Objectives
The objectives of this project are:
To develop a 3D model.
To create interaction between fingertips with 3D objects using fingertips interaction techniques.
1.6 Scope
The scope of this project includes:
1. Modelling of 3D object.
2. ARTOOLKIT is the software library used for building this Augmented Reality(AR) system.
3. Markers are used for finger tracking.
4. Two fingers are used in the interaction with 3D object.
The limitations of this project include:
1. No lighting is applied in this system.
2. No texture mapping is applied in this system.
3. The 3D object would be a simple object instead of a complex object.
4. No shadow is applied in this system.
1.7 Justification
Augmented Reality enhances user's perception of and interaction with the real world. As Virtual Reality immerses user in a synthetic world, Augmented Reality builds connection between virtual and real world by incorporating virtual world into the real environment.
Interactive augmented reality system provides more natural control by using hands and finger tracking. Besides natural, the interaction is simple. Besides enabling natural and intuitive interaction with virtual objects, it helps to ease the transition between interaction with real and virtual objects at the same time as well. Besides that, some augmented reality systems mix physical world with digital information and enable users to use natural hand gestures to interact with the digital information. This eases users in life and helps in improving computer technologies to be more advanced.
Interactive augmented reality system improves the realism and sense of immersion if compare to other methods like traditional keyboard and mouse input devices. It provides more realistic and immersive impact to users. Users are able to experience higher immersive interaction while interact between virtual objects in a real environment.