He quotes that you need a lot of points and a fancy transformation to correct im...

He quotes that you need a lot of points and a fancy transformation to correct images while taking into consideration the differences in elevation within the scene. While it is true that having more points is important, the better way is to actually also consider the elevation of the identified points using a digital elevation model (DEM). That increases the accuracy of the transformation a lot and reduces the number of points needed. The idea is that you build a transformation from R^3 -> R^2 instead of just R^2 -> R^2, usually a rational polynomial function.

If anybody is interested the word to search for is orthorectification.

Shameless plug. I recently published a post on my blog on how to calculate a projective transformation for an image if you know a few parameters of your camera (focal length and sensor size) and its position and orientation. My use case is satellite imagery so this is always available http://maxwellrules.com/math/looking_through_a_pinhole.html