Programming Assignment 1: Single View Modeling

Due: Wed. Feb 5, midnight

In this assignment you are required to create 3D texture-mapped models from a single image using the method described in "Single View Metrology," by Criminisi, Reid, and Zisserman, ICCV 99.

See also their IJCV paper: "Single View Metrology," by Criminisi, Reid, and Zisserman, IJCV 2001.

You should follow the following steps:

Image acquisition (optional), or use the images provided for this assignment on the course web site.
Calculate vanishing points
Choose reference points
Compute textures and 3-D positions and create a VRML model

Image Acquisition

Here are couple of images to work with:

Outdoor image in pgm

Same outdoor image in jpg

Art image in bmp

Same art image in jpg

If you take your own pictures for this assignment, then you should use high resolution (e.g. 800x800, or higher) images or scans of at least two different scenes. You are also required to perform experimentations on at least a sketch or painting. Images might be from outdoors (buildings) or indoor (office). You might also want to consider aerial photographs of of a city (more occlusion may occur in this case, requiring some extrapolations of textures or geometry). The images need not be in color. Be sure to choose images that accurately model perspective projection without fisheye distortions. You'll want to choose images that are complex enough to create an interesting model.

Calculating Vanishing Points

Choose a scene coordinate frame by defining lines in the scene that are parallel to the X, Y, and Z axis. For each axis, digitize more than two lines parallel to that axis. The intersection of these lines in the image defines the corresponding vanishing point.

To compute vanishing points, choose line segments that are as long as possible and far apart in the image. Use high resolution images, and implement a zoom feature to specify line endpoints with sub-pixel accuracy. A small number of "good" lines is probably better than many inaccurate lines. You will save quite a bit of time by adding a "save" feature to your program so that you don't have to recalculate vanishing points every time you load an image. You could write them out in a simple ASCII file format of your own design, for example.

Choose Reference Points

To avoid affine distortions in your model, you will need to set the scale parameters as described in lecture and in the paper by Criminisi et al. One way of doing this is to measure, in 3-D, when you shoot the picture, the positions of 4 points on the reference plane and one point off of that plane. The 4 reference plane points and their image projections define a 3x3 matrix H that maps u-v points to X-Y positions on the plane. The fifth point determines the scale factor alpha off of the plane, as described in lecture and in the paper. Alternatively, you can specify H and alpha without physical measurement by identifying a regular structure such as a cube and choosing its dimensions to be unit lengths. This latter approach is necessary for paintings and other scenes in which physical measurements are not feasible.

Compute 3D Positions

The paper provides two different approaches for computing distances: in-plane measurements and out-of-plane measurements. You can combine these techniques to increase the power of the technique. For instance, once you have computed the height of one point X off of the reference plane P, you can compute the coordinates of any other point on the plane through X that is parallel to P. By choosing more than one reference plane, you can make even more measurements. Be creative and describe what you did to make measurements in your web page.

Compute Texture Maps

Use the points you have measured to define several planar patches in the scene. Note that even though your measurements may be in horizontal or vertical directions, you can include planes that are slanted, such as a roof.

The last step is to compute texture maps for each of these patches. If the patch is a rectangle in the scene, e.g., a wall or door, all that is needed is to warp the quadrilateral image region into a rectangular texture image, using planar projective transformation. It is best to choose the width and height of the texture image to be the about the same as that of the original quadrilateral, to avoid loss of resolution. If the warp you perform scales down the image significantly along any direction, then you might find that bilinear interpolation does not filter sufficiently, and aliasing results. There are more elegant solutions, but a simple fix is to warp to a larger rectangle using a bilinear filter, and then filter that down to the desired size.

If the patch is a non-rectangular region such as the outline of a person, you will need to perform the following steps: (1) define a quadrilateral in the image containing the region you want, (2) warp this into a rectangular texture image, as before, and (3) edit the texture image and mark out "transparent" pixels by hand using image editing software. You could choose a distinctive color or pixel value as a flag to indicate transparency.

Create a VRML model

For each image create a VRML model (see documentation below). You should include two versions of the VRML model, one with the camera position shown and one without. The version without the camera will be easier to browse with the VRML viewer (because you can rotate about the center of the scene, not the scene + camera). Here are some commented sample VRML files:

Standing man with transparent pixels

Texture mapped polygon

And here are the two texture files required for

the floor and the standing guy.

You can use any text editor to read the contents of the VRML file. Make sure you install a vrml browser (e.g. CosmoPlayer) before running this vrml file.

You should also translate/rotate the model so that the initial view is similar to the input image, based on your knowledge of the camera position.

Submit Results

I will give you the web site where you should upload your source codes and executables.

source images, show them both in their original form and with annotations and marks to show which points and lines you digitized. If you use your own images, then give details on where you got the image (name of building, book and page number, artist, etc)
a still image of a new view of the reconstructed scene, fairly far away from the input image.
some of your texture maps, show some of the more interesting ones, commenting on any hand retouching you did (perhaps show before and after retouching, if it was significant)
Include at least one non-quadrilateral object to make the scene more interesting.
VRML files--for each input image, include one with the camera position marked and one without.
A picture (screen snapshot) of your user interface.
A description of your approach and analysis of the results. Comment on your design choices, what worked, what didn't. What hardware, operating system, and support libraries (e.g. fltk, OpenGL) did you use?
Describe extensions that would be nice to include if you had more time.

Extra Credit
- Merging models from multiple images. For instance use mutiple images of a building from different viewpoints and build a full 3D model.
- Since the accuracy of your model depends on the precision of the vanishing points, implement a robust technique for computing vanishing points that uses more than two lines.
Resources
- Projective Geometry: The class web page has several links that you should definitely look at. To brush up on projective geometry, you may want to take a look at the latest version of the lecture notes. There is also a tutorial on projective geometry and a tutorial on metric \rectification online.
- Source images: an also browse the art and architecture books in the Library.
- VRML: The Virtual Reality Modeling Language, a file format for interactive 3-D models (a.k.a. virtual worlds) on the Internet. The VRML repository has specifications for the file format and information on free VRML plugins that permit a web browser to display VRML models. See this on choosing a VRML viewer. If you have trouble getting the Cosmoplayer browser, try this. More on the VRML file format. Note that we'll only be using a fraction of VRML's capabilities.
- Image Editing Tools: I recommend
  - Photoshop on Macintosh, PC
  - gimp on Unix
  Be sure to choose an image editor that supports transparent gifs.