This literature review will be discussing on Advances of 3D vision technologies in the areas of Active and Passive range 3D modeling techniques. Laser scanner based approaches and Structured Light based approaches will be discussed under Active Range techniques. Stereo images based modeling, Photometric Stereo based modeling and Silhouette based modeling will be discussed under Passive Range modeling techniques. Several latest research articles will be taken and approaches in those articles will be discussed under each sub title. Finally Pros and Cons will be presented and different approaches will be compared to discuss their strength and weaknesses
\end{abstract}
\chapter{Introduction}
The purpose of this report is presenting literature survey results on Advances of 3-Dimentional Computer Vision Technologies. The survey is mainly focused on generating 3D models using computer vision systems rather than processing scenery for extracting information.
\section{Vision}
Vision is the primary medium of interaction in most of the autonomous systems that exist in the world. Vision systems may deal with pattern and object recognition and also providing a detailed description of its spatial data and orientation.
\subsection {Biological Vision}
Still the most sophisticated autonomous systems that we are familiar with are humans and animals. A biological system consists of one dominating vision system and few less accurate secondary vision systems. For example the main vision system of humans is eyes which are sensitive to light, and then there are ears which work as a less accurate secondary vision system that functions using sound. Most biological systems utilize light as the carrier of signals in their main vision systems. There are few biological vision systems that use sound in their main vision system and we have not come across any species that use heat or other energy type as the input source for its primary vision system. Vision systems are significant because they are directly sensitive to some kind of energy.
\subsection {Computer Vision}
Computer Vision deals with recognizing objects, locating objects, tracking object motion and recognizing actions. Vision systems can be categorized into two types based on how they model the environment. Systems that represent environment in 2- dimensional array are called 2D vision systems. In 2D systems data related to depth of the objects in a scene and its location is neglected. In these systems objects are modeled in a 2-dimentional plane where location can only be identified using x and y co-ordinates. The other type is 3D vision systems. In 3D vision systems data on depth of the object and its location is preserved or re-generated. Main focus of this report will be on 3D vision and its advances.
\subsection {3-Dimentional Vision}
3D vision is still a very tricky area in computer science. A 3D vision system should be able provide height, width and depth data to its user. The user can be a human or software which process data for extracting information from the scenery. Accurate calculation of the distance and depth data of an object in focus is very essential in many advanced applications of computer science. Robotics, augmented reality and Security systems are few examples for areas that need 3D vision. Robotics has a vast usage from household work to military activities. Accurate 3D vision system will allow robots to behave without colliding in real environment while other objects autonomously exist in the same environment. Augmented reality applications will use 3D vision to simulate scenarios with military and public safety values. Main target of a computer vision research so far is to come with a mechanical vision system which has capabilities of a biological vision.
\section {3- Dimensional Vision Systems}
3D vision systems can be categorized in to two according to how they capture its environment.
\subsection {Active Vision Systems}
Active vision systems emit some kind of a ray to detect objects in its environment. Radar, Laser systems are examples for active mechanical vision systems. Creatures like Bats use active bio logical vision systems. These systems calculate depth data mainly by considering the time taken by a ray to go to an object and bounce on it and then come back to its origination.
In this type of system rays that do not make its way back to the origination, due to deflection and disseminate while travelling are a point of concern.
Another is the power that should be provided to the system in order to produce strong enough ray. This could cause problems in mobility of the system.
\subsection {Passive Vision Systems}
Passive vision systems are based on rays emitted by the objects in the environment. All the photo and video cameras are passive vision systems. Most of the species like humans use biological passive vision systems. Mechanical passive vision systems mainly use 2D images to reconstruct its 3D model of the environment. This is done using stereo images. Algorithms and pre knowledge are used to combine two precisely taken images in to a 3D model of the object that images are representing.
In these systems precise calibration of the cameras are very important. This is essential for developing an accurate 3D model. And some of the latest researches are carried out to deal with camera calibration problems.
Speed and efficiency of algorithms used are very important in real time uses of the vision system.
\section {Target of the book}
This Report will approach the topic in two paths. It will analyze passive and active vision systems separately. This will contain extensive discussions on both types of vision systems. Pros and Cons will be presented in each system and methods used inside these systems.
\chapter {Acive Range 3D Modeling Techniques}
\section {Laser Scanner based approaches to 3D modeling}
Generating 3D models with laser scanner based 3d modeling system has two main phases after obtaining the range images, as stated in [1] they are
1. Registration- this deals with aligning two range images, which has captured similar parts of a scene, so that final image will overlap each other's common parts.
2. Integration - This the process of creating a 3D model with sample points from several range images.
After these two phases complete 3D model that provides a comprehensive idea about the structure of the scene is available. What lacks in the model is data on texture of the objects that are in the scenery. New research is undergoing for giving texture to a laser generated 3D model.
First focus will be given to a system which model 3D structure of an indoor scene with a laser scanner proposed by Victor et al [1]. The data acquired by the laser scanner is first converted in to scanner independent format. These range data are used to extract surface data which will be approximated to a 3D triangular surface mesh. As data acquired are used for creating a 3D model, higher accuracy in edge localization and classification is needed when geometric information is extracted. For fulfilling this need edge based method is used rather than a region based method as they are poor in edge localization and classification. Then image plane is triangulated and resulting 2D triangular tessellation is back projected to the 3D space using the edges detected in the previous phase. Then the problem of creating the complete 3D model is addressed. In image registration method based on distance minimization is used as higher accuracy is expected. First two matching points are approximated under the assumption that two images that are being processed have a partially overlapped area. Then the two images are further processed for an exact match based on the initial approximation. Contour propagation technique based integration method is proposed in the system. In this method surface descriptions of the registered images are developed and then bounding contours of the first image is moved to the second image using 3D rigid transform obtained during the registration process.
Significance of the registration method used in the proposed system is that, initial approximation is made on the two points that is going to be matched. This method results in an increase of the efficiency in the system when large images are being processed, because of conducting exhausting matching is eliminated. Secondly proposed integration method in the paper is opted to not be carried out at pixel level or surface patch level because of the following reasons.
1. Image structure losses are resulted at pixel level integration
2. Difficulty in finding identical surface patches in two neighboring views, if performed at surface patch level.
Quality of the 3D model generated is based on
1. precision of the extracted features
2. quality of registration between the two images
3. sensor resolution
4. influence of the geometrical nature
5. Quality of the extracted features.
Further research has been conducted by Vitor and Joao [2] on applying texture to 3D models generated by a laser scanner. Digital images taken of the scene are combined with the laser generated 3d model to achieve above stated objective. Primarily scene is scanned from multiple viewpoints so all the occlusions can be removed and high resolution images can be produced when the 3D model is constructed. This is done by a Laser scanner and the digital camera mounted on a rover robot which is maneuvered across the environment that is doing to be modeled. 3D model is reconstructed from range images and reflectance images from the laser scanner by using a novel algorithm based on polygonal meshes. Edge information is used to reduce noise and, model depth and orientation discontinuities by the proposed algorithm. Digital image and the reflectance image are mapped to produce textured model. Iterative registration process is needed where digital image is rescaled to fit with the reflectance image. Only digital image is rescaled as reflectance and range image is already registered perfectly since they are generated from the same laser pulse. Iterative approach is opted to reduce non satisfactory calibration results as digital and laser images are not taken from the same point. An objective function based on occlusion analysis is proposed for perception planning is which accounts for quality criterion and cost of next acquisition. The perception planning approach enables system to be used in scanning large area spaces with eight degrees of freedom.
Pre calibration or static arrangement is not required by the system and it is operated on data gathered at in a single session. System is used with all the common range scenes and digital videos. Higher quality modeling with multiple digital images, extension of perception planning technique and extraction of structural data from heritage sites are stated as future developments of the system.
Further research, similar to the above one has been conducted to generate textured 3D models form data taken by Unmanned Ariel Vehicles (UAV). Research for creating 3D models of an area using small laser scanner and a Charge-Couple Device (CCD) sensor mounted on an unmanned helicopter was conducted by Masahiko et al [3]. Inertial Measurement Unit (IMU) and a Global Positioning System (GPS) were further required by the system. Base station GPS data, remote station GPS data, IMU data, CCD images and laser range data are among gathered data for 3D modeling. Kinematic GPS post processing is conducted and the output is integrated with IMU data using Kalman Filter operation. Next bundle block adjustment of CCD images is made with the support of the GPS/IMU. Finally IMU/GPS and processed CCD images are combined to generate high precision and time series, position and altitude. CCD images of the area are intentionally taken 60\% overlapping forward direction and 30\% overlapping from sides so that bundle block adjustment would produce output with higher accuracy and reliability. Coordinates of the laser scanner is converted to Digital Camera's (CCD image sensor) coordinate system using Geo Referencing which is assumed to have global coordinate system. Then the resulting hybrid model is used to map laser range model to produce a digital surface model.
Computational overhead is reduced considerably by assuming that digital camera's coordinate system as the global coordinate system. Efficiency can be further improved if overlapping percentage of the CCD images can be reduced without compromising the accuracy and reliability of bundle block adjustment result.
Most of the digital-laser hybrid modeling systems are faced with the problem of efficiently registering two digital images. Every system is designed in such a way that it reduces the processing time by carrying out some sort of an initial approximation. The vision aided and laser scanner based registration system proposed by Henrik and Achim[4] is designed to address the matter using SIFT features. First SIFT features are extracted from an image they are compared with a previously taken image for matching points. Usually multiple matching points are detected when SIFT features are matched using Euclidian distance. This problem is solved by selecting a point as match only if the smallest distance is less than the second smallest distance. Then estimation of depth values of all the matching points is carried out by using closest projected 3D point in the laser range image. Relative pose is estimated by using pairs of matching 3D points found in the previous step.
Image registration is done using visual features rather than using distance, which eliminates the need of initial pose estimation. Because of that system can be used even when initial pose estimation is not available. Initial pose estimation is carried out by the system itself by using SIFT features. Image registration is only done on visually matching points which intern resulted in increase of robustness against changes in the environment and the accuracy of the output.
\section {Structured light based approaches to 3D modeling}
Basic functionality of any structured light 3D modeling system can be abstracted as follows.
A coded light stripe is projected on to the object that is going to be modeled. When this light stripe has fallen on the object the reflection of the light stripe is formed in a pattern which is distinct to the geometrical formation of the object. This reflection of the light stripe is photographed by a camera and processed to obtain 3D geometrical data of the object. Obtaining 3D data is done by determining the correspondence between points on the light plane (light stripe) and the camera plane (image taken by the camera). Usually multiple light stripes are used for achieving faster data acquisition rates.
\begin{figure}[h]
\includegraphics{fig1.jpg}
\caption{Structured Light Methodology}
\end{figure}
The researches carried out on structured light are mainly focused on improving accuracy, calibration and acquisition speed.
Olaf and Szymon [5] have carried out research on improving accuracy and, using structured light based 3D modeling technique on real time systems. A light stripe that changes according to time and space is described as optimal solution for the system they have introduced. 111 light stripes (there are 110 stripe boundaries) are aligned next to each other (space varying quality) and projected on to the object. One such pattern is called as a frame. Four different frames are used and they are projected at different times (time varying quality).Black and white (1, 0) color model is used for the stripes and any stripe can be identified by the binary value which will be given according to its time and space existence (e.g.: 0000 - for all white, 1111 -for all black). Stripe boundaries that separate two similar color stripes are named ghost boundaries since they cannot be seen.
\begin{figure}[h]
\includegraphics{fig2.jpg}
\caption{Structured Light Pattern -[5] }
\end{figure}
Ghost boundaries are placed in a specific pattern so that their positions will be restricted to odd numbered positions in frames 1 and 3 and even-numbered positions in frames 2 and 4. Thus any selected ghost boundary can be seen in any other frame. Then the object is illuminated with light frames at a frequency of 60Hz and images are recoded with a video camera.
Processing is mainly done with three algorithms. Illuminated and non-illuminated parts (black and white) are divided with the Segmentation algorithm. During the segmentation image is scanned along the scan lines and large gradients in intensity, regardless of texture on the object, are considered as stripe boundaries. Then stripes are matched with the stripes of the previous and next images with the Stripe Matching algorithm. Movement of the objects is tracked by the proposed Stripe Matching algorithm and presence of ghost boundaries is also taken in to account during matching. Finally decoding algorithm is used to determine 3D location of a point on the object being modeled. The depth information gathered from this system is claimed to be of higher accuracy by the researchers compared to traditional methods.
Efficiency of the stripe matching algorithm can be improved by introducing new heuristics according to the researches. The assumption made at the segmentation phase, no high frequency textures will be available on the object; can be considered as a weakness as it may not hold true in every instance. Presence of silhouettes, objects in motion at higher speeds, presence of external lighting devices are pointed out as limitations of the system in the paper.
Similar research to the Olaf and Szymon[5] were carried out by Xu et al [6] on real time 3D modeling using structured light. New research has improved accuracy and speed of 3D modeling. A space - time coded light system is also used by Xu and his fellow researches. But in this system number of light boundaries is limited to 56 and only three frames are used. According to the paper coding efficiency can be increased up to 87.5\% which is a remarkable achievement compared to 43.4\% efficiency given by 111 stripe boundary light system. Light is projected on the object and images are acquired from a synchronized camera.
\begin{figure}[h]
\includegraphics{fig3.jpg}
\caption{Structured Light Pattern -[6] }
\end{figure}
Then the captured data is registered to produce a 3D model. Advanced Iterative Closest Point (ICP) algorithm is used for aligning 3D data in to a unique coordinate system and rendering 3D points. ICP algorithm is based on data geometry characteristic of overlapped area. The algorithm used in the system is quoted below,
Step 1: Preprocessing the model data. By searching the closest three points composing tri-plane in model set, it can calculate some geometry characteristic, such as normal vector and plane equation.
Step 2: Searching points with closest distance from points of goal set to tri-plane of model set. During searching processing, it is necessary to use the prior knowledge of neighbor points, which advance searching speed.
Step 3: Rejecting edge points and points with conflicting distribution of neighbor points.
Step 4: Assigning the point-to-plane error metric, minimizing it and estimating rigid transform.
Step 5: Applying present rigid transform, iterative estimating more correct transform. The iterative processing would end when the distance between points is more than the threshold.
This is more suited for real time systems as it is faster than previous method [5]. This is achieved by the proposed color pattern, since coding and decoding is made faster by its relatively smaller frame size and number of frames used. The similar colors that are used in adjacent stripes in the projected light are decoded using the ICP algorithm. And it is evident that color code presented by Xu et al [6] is not constrained by a specific format as in the research done by Olaf and Szymon [5]. Limitations of, modeling objects with textured surfaces, silhouettes and shining surfaces (reflectance) are visible in this system [6] too in comparison with previous research [5].
Further research has been conducted and some of them have been able to answer limitations that were pointed out in previous researches [5, 6]. Research by Mazaheri and Momeni [7] has addressed above denoted limitations. They have successfully negated the effect of textured object surfaces and silhouettes by using image subtraction. The setup can be described as follows. First, two images (left and right) of the object are taken under white light from different camera positions. Then structured light is projected on the object and another two images are taken from exactly previous two positions. Then image taken under white light is subtracted from the corresponding image taken under structured light. Then, edge points of the left resulting image (projection of structured light on the object without background) are labeled using hue value of previous and next light stripes. Corresponding epipolar points on the right image is found and rest of the points on the 3D model is calculated using a space intersection method.
In contrast to other structured light modeling systems [5, 6] a colored structured light pattern is used in this system [7]. Hue-Saturation-Brightness (HSB) color system is used and color stripes are changed based on their hue value. Light patter is specifically designed so no similar colors will be placed adjacent to each other.
\begin{figure}[h]
\includegraphics{fig4.jpg}
\caption{Structured Light Pattern -[7] }
\end{figure}
The system is not designed to facilitate modeling of moving objects. But the problems of silhouettes and textures on object surface are successfully addressed by this system. Selection of HSB color system is also considered important as it is not affected by environment variables as RGB system would. Use of stereo images in structured light 3D modeling can be pointed out as a significant approach also.
\chapter {Passive methods}
\section {Stereo Images based approaches to 3D modeling}
Stereo image based3D modeling has become one of the most researched areas in 3D computer vision. Stereo vision based approach is very much similar to biological vision systems. Abstract description on stereo 3D modeling can be given as follows. Two images are taken and pixels in these images are triangulated to identify their corresponding location in the 3D space. Relative motion and calibration of the camera and corresponding image points are needed to carry out this process. Recent research on camera calibration and motion tracking has introduced self calibrating algorithms [8] which run solely on image properties. Images are scaled to a global scale factor by these algorithms. Second step is matching image with the other image. At this step pixel in the first image is matched with a pixel in the second image. New research has been conducted to match whole scan lines at once improving efficiency of the systems. Finally images used in the modeling are used again for texture mapping.
Yang and Zhang have carried out a research [9] for head pose tracking with stereo vision. The stereo cameras are aligned vertically, one top of the display screen and other one on bottom. The camera positioning is affected by dealing with ambiguities which is very common in face modeling. The camera calibration is done using method proposed by Zhang [10]. It is assumed that first camera has the global coordinate system and second camera images are transformed to fit that. Some amount of human intervening is need by this approach. Land mark features and head pose in the first two stereo images need to be selected manually. This is need only at the start of modeling a face and then on, images are registered automatically. Outlier identification and elimination is based on epipolar constrains. The process runs in a loop for every frame, so every time more accurate set of feature points are fed in to the algorithm improving the accuracy. An automatic tracking recovery system is also integrated in to the system.
The face model generated is mainly a triangular mesh with around 300 triangles. Only the model's geometric and semantic information is used in pose tracking. Satisfaction of epipolar constrains in outlier identification is relaxed a bit as camera calibration and feature localization may not be highly accurate. The features that are selected must have rich texture information, visible in every image and should be rigid parts (not like mouth region) in order to accurately track the face.
The need of explicit calibration of cameras and manual selection of feature points can be pointed out as areas to improve.
Se and Jasiobedzki [11] carried out research on modeling scenes using stereo cameras mounted on an unmanned ground vehicle. The system was named Instant Scene modeler as it produced near real time 3D models. The hardware setup for the vision system consisted of a stereo camera and a computer. Localization of image points are done using SIFT features. SIFT features are well suited for the purpose because of their uniqueness in an image. The SIFT features are stored in a database and each frame is matched with the database. 3D data is computed in the camera coordinate system and later converted to the global reference model. The 3D data obtained are converted in to a triangular mesh. Color images, precisely selected so that they will cover most number of triangles, are used to map texture on to the model.
Accuracy of the system is highly dependent on the quality of the images taken. Use of SIFT features negate explicit use of calibration algorithm on camera images. And storing computed SIFT features in database improves efficiency and speed as heavy computation power is needed by SIFT features. Use of triangular mesh has made dealing with outliers and missing data points, and matching feature points very efficient. Model quality and loading speed are improved by using images that cover most number of triangles during the texture mapping phase.
With the advancement of stereo vision based modeling systems researches have been focusing on optimizing the methods. Model quality, fast modeling algorithms are some of the areas that are mainly focused in improving. Xi and Duan have proposed an iterative surface evolution algorithm for 3D modeling using multiview stereos. Algorithm proposed contains mainly five phases. They can be listed as
1. Visual hull reconstruction
2. 3D point generation
3. outlier removal
4. implicit surface evolution
5. explicit surface evolution
First initial shape estimation is done during visual hull reconstruction phase. This is done using silhouettes generated by the object which is back projected to generate cones. These cones' intersection points are used for obtaining outer approximation of the object. Secondly depth values are estimated foe 3D point generation. Surface is rendered in to the image plane using OpenGL and then depth values are estimated according to Lambertian assumption with values Z-buffer. Influenced by the assumption a line search is conducted to find a patch in a separate camera image that has the best correlation to the camera image being processed. Outlier removal is done using Parzen-window-based nonparametric density estimation method as the third step. Then refined set of 3D points are used to generate an implicit 3D model using fast implicit distance function based region growing method. The final complete 3D model is obtained by using implicit tagging algorithm. Steps from 3D point generation to explicit surface evolution may be iterated three - four times in a practical situation to obtain a good quality 3D model.
The approach discussed here has some advantages over speed than some algorithms used. Following contributions are made by the research according to the Xi and Duan.
1. A novel iterative refinement scheme between the depth estimation and the data fusion
2. A novel anisotropic kernel density estimation based outlier removal algorithm
3. A novel data fusion algorithm that integrates the fast implicit distance function-based region growing method with the high-quality explicit surface evolution.
Algorithm can be improved to start from a very abstract shape like bounding box, as an iterative approach is used in the process.
\section {Photometric Stereo based approaches to 3D modeling}
Photometric Stereo based methods are also known as shading based 3D modeling methods. Special approach of photometric stereo, where single image is used, is known as shape from shading. Shading based methods calculate surface normals (vector that is perpendicular to the surface) or depth values based on the changes in the illumination. Approaches changes based on the light sources used in the method. Methods also varies based on the number images used. Tendency to use multiple images are high as it relieves some of constrains that apply in the process. Shadow based approaches are mainly used in 3D face reconstruction systems. New methods that use photometric 3D modeling will be discussed in this review rather than discussing how 3D models are generated.
Many problems are generated by presence of shadows when it comes to 3D modeling using shade. A solution to this problem is presented by Schl�ns in his research [12] named Shading Based 3D Shape Recovery in the Presence of Shadows. In fact with experimental results it is proved that using shadow information improves accuracy of the 3D model generated. Shadows are analyzed based on the surface normals and light incident angles in his method. Separate analysis of self-shadows (shaded region) and cast shadow are also done during the process. Then the derived information is used for developing a more accurate 3D model. Accuracy of extracting surface normals using shadow information is presented using graphical charts in this research article.
Most of the latest research on shading based techniques is focused on improving 3D models and generation methods by fusing with other approaches. One such successful approach is described in the paper by Chen and his team [13]. The approach is based on fusing models generated with contours and photometric stereo method.
Three light sources are used in the photometric stereo approach used in this research. The object is rotated and images are taken at pre defined angle intervals. These images are then process to obtain local surface orientation vectors. These partial surfaces are then merged with the 3D model from shape from contour method to produce the final 3D model. Before merging vertical profiles from partial surfaces are compared with the profile from the 3D model and depth values of the surface profiles are adjusted accordingly.
Although images acquired represent full 360 circumference view of the object those data are not used for generating a 3D model in this method. IT also stated in the paper that accuracy starts to decrease as point observed reaches 90 degrees angle with respect to viewing point as shadows start affecting. Future work on this research will be focused on merging surface patches rather than horizontal profiles and applying this method on surfaces with varying illumination conditions.
Yoshiki et al [14] carried out research on generating 3D models of faces using shade based methods. Similarity in previous work by Chen et al [13] and this is, using a predefined 3D model. In this case anatomical 3D model is used instead of a model generated from contours. Generation of the final 3D the model is simplified to a parameter adjusting problem.
Basic step in this approach is generating a model image from 3D vertices and patch images retrieved from the database. For generating the initial 3D model, pose of the camera plane and 3D position of the model need to be estimated with respect to the face image. This is done by selecting eigenvectors from the database based on Principal Component Analysis (PCA) of input face image. After pose and position estimation input face image and initial model is optimized using cost function. Cost functions used are Normalized Correlation and Range error of feature points. The final 3D model is generated by comparing shading information of the initial model and the input 2D image.
Most important feature of this approach is the simplification of 3D model generation. If initial 3D model of a face is available good quality 3D model can be generated with a single image. This approach has potential for straight forward commercial applications such as online 3D face model generation.
\section {Silhouette based approaches to 3D modeling}
Basic description of silhouette based 3D modeling can be given as follows; Object going be modeled is photographed around its circumference. Theoretically these images are placed in a circle according to the order which they were taken and image shape is projected to the middle. Then the light ray intersections are extracted. These points represent the boundaries of the modeled 3D object. The procedure is well explained by the following figure from [15].
\begin{figure}[h]
\includegraphics{fig5.jpg}
\caption{3D modeling with silhouettes methofology }
\end{figure}
Many researches have been conducted on optimizing silhouette based 3D modeling techniques and how this method can be used in practical scenarios. Esteban and Schmitt [15] have researched on silhouette coherence to model rotating objects. They use silhouette coherence for image and camera calibration. Mainly 2D computations are used by the approach. Optic rays are projected on to each silhouette and 2D intersection intervals are obtained. These intervals are then back projected to 3D plain to obtain 3D boundary. For rotation detection they an algorithm was developed based on rotation axis direction, interval size which images are taken and camera translation direction. Silhouette coherence based camera and image calibration is a novel approach which is presented for the first in this paper.
This approach is claimed to be working much faster as it process much less number of pixels than its counter parts. The drawback present is the need of accurate segmentation of the object. This is sometimes considered difficult regarding certain objects.
Complete 3D modeling system using silhouettes is described in the paper by Mulayim, Y?lmaz and Atalay [16]. Most of the techniques used in this system are available in the literature. Those separate researches are brought together to implement a complete system in this paper. The approach can be described as follows.
First cameras are calibrated using images acquired. Secondly bounding cube is estimated using the images and then results are used to extract the silhouettes from the images. Then initial estimation of the model is fine tuned using silhouettes during silhouette based volume inter section phase. The optimized bounding box is then separated in to voxels. Then these voxels are tested to check whether the completely intersect silhouette image. If not voxel is rejected. This is done during 3D model fine tuning using photo consistency phase. Then the appearance is reconstructed using 2D texture mapping with images acquired. Concept of particles is used to reduce draw backs affiliated with 2D texture mapping.
As any other silhouette based 3D modeling system this may be also affected by the lack of silhouette images. So acquiring sufficient amount of images is vital. And system performance can be improved by gaining an accurate initial estimation of the bounding box. Detection and appropriate handling of concave regions can be noted as a significant feature in this system
\chapter {Conclusion}
\section{Discussion and Future Directions}
This literature review discussed about advances in3D vision technology. Mainly five 3D modeling approaches were described in this review. They were categorized according to active range techniques and passive range techniques. 3D modeling with laser and structured light were discussed under active range techniques and Stereo vision, Photometric stereo and silhouette based approaches were discussed under passive range techniques.
Laser based modeling is extensively used in practical applications. The uses range from modeling simple object on a table to geographical areas from mini areal vehicles. And the models generated are highly accurate compared to other systems too. Extensive processing is not required by laser scanner based modeling systems. One draw in using a solely a laser scanner in modeling is lose of texture. But with new researches texture is mapped on to the 3D model using images from an optical camera. The most significant draw backs in laser based modeling are,
1. High cost
2. Limited range
3. The usage of power
4. Hazards affiliated with the laser rays. (Health and Effect on archeological monuments)
If range is increased power usage also increases and so will the hazards because of the high power laser rays.
The second active range method is using structured light. Structured light is mainly used in modeling a specific object unlike laser or stereo methods. Structure light methods are capable of providing models with high precision. Structured light methods can be considered as a good trade for low cost and good quality compared with any other approach. Practical use in open environments is very much affected by its high dependence of illumination. And again use of structure light in large scene modeling is very much limited because of limitations applied by impedance of structured light rays when travelling through open areas.
The most of the improvement in structured light is focused on generating new light patterns for effective and robust model generation. The problem of object surface textures is handled with new advances in structured light patters.
In passive methods Stereo based approaches can be considered very significant because of their wide range of applicability. Just like laser based techniques stereo image based one also can be used to model simple objects as well as wide scenes. Any of the disadvantages in laser based techniques are not available in stereo image based approaches.
But when it comes to stereo image techniques the required computational power is very much high and techniques used are not straight forward as in laser based methods. The models generated are low in precision compared to laser based methods, but can be improved using enough amount of images and integration with other techniques.
The other method that is described here is photometric stereo approaches. This method is based on shading on object surfaces. The modeling process requires knowledge on light source, especially on its placement and the angle it creates on the object. This method is used on single object modeling as well as scene modeling. But the limitations are very much high.
Integrating photometric stereo with other 3d modeling techniques is made it possible to negate some of the limitations associated with it.
One of the hottest researching areas was described with silhouette based 3d modeling. This method is low in cost compared to laser based approaches and also capable of giving high quality results. But again the limitation of only being able to model single object is prominent. This method is still in very early stages and there are lot of room for new research to be conducted.
When considering all the approaches discussed above important fact that can be seen is researches are trying to combine these methods and come up with stronger approaches. This combination of methods allow systems to use strength related to each approach and find effective solutions to limitations that prevail.