Wednesday, May 25, 2011

How to convert world to screen coordinates and vice versa

This is a thing every 3D developer should know: for a given point in 3D what is the position of that point in 2D, e.g. on the screen in pixel coordinates? This problem appears quite often (starting with mouse picking) and it' always handy to have the solution at hand :)


The first thing you should remember once and for all regarding screen coordinates - the upper left corner of the screen is (0,0) and lower right is (width, height) - no arguing! Now, lets say we got a 3D point in world coordinate space at (x, y, z) - this is not relative to the camera, but absolute (so the camera can have coordinates (cx, cy, cz)). The camera defines the viewMatrix, and I suppose you also have defined a projectionMatrix (you better have!). The last thing you need is the width and height of the are you are rendering on (not the whole screen!). If you have all these things, then it's pretty easy:

function point2D get2dPoint(Point3D point3D, Matrix viewMatrix,
                 Matrix projectionMatrix, int width, int height) {

     Matrix4 viewProjectionMatrix = projectionMatrix * viewMatrix;
     //transform world to clipping coordinates
     point3D = viewProjectionMatrix.multiply(point3D);
     int winX = (int) Math.round((( point3D.getX() + 1 ) / 2.0) *
                                   width );
      //we calculate -point3D.getY() because the screen Y axis is
      //oriented top->down 
     int winY = (int) Math.round((( 1 - point3D.getY() ) / 2.0) *
                                   height );
      return new Point2D(winX, winY);
}

And that's it! Depending on your development evironment, you might have a viewProjection matrix already computed, or your world coordinates are relative to the camera position - don't forget to modify the code according to that!

For the ones of you who want or need to know the way back from 2D to 3D - well, it's just doing the inverted opperations:



function Point3D get3dPoint(Point2D point2D, int width,
        int height, Matrix viewMatrix, Matrix projectionMatrix) {
       double x = 2.0 * winX / clientWidth - 1;
      double y = - 2.0 * winY / clientHeight + 1;
        Matrix4 viewProjectionInverse = inverse(projectionMatrix *
             viewMatrix);

       Point3D point3D = new Point3D(x, y, 0);
        return viewProjectionInverse.multiply(point3D);
}

The only unclear thing here is the z coordinate - I have set it to 0. The reason is that from the arguments passed to our function we can not figure out the Z coordinate exactly - because every number would be correct! Think quick - how many points from our 3D world can you map on a point of glass when you look through the window? Indefinitely many! To sum up: there is no unique mapping between a 2D and a 3D point, but there is between a 3D and a 2D point. 

Hope I made some things clearer.. you can find more info on the net, but as a more detailed overview I can recommend you this picking tutorial

26 comments:

  1. Clearly explained, thanks. Can you give the corresponding javascript syntax?

    ReplyDelete
  2. This here is a very generic solution, so there is no unique JavaScript syntax. What data(which matrices and in what format) do you have available as input?

    ReplyDelete
  3. I have a series of x,y,z coordinates as an array of vertices.
    (for example:
    0.205811 0.423383 0.403992
    0.432068 0.603759 0.630250
    1.032178 1.203868 1.230359)
    I'm having trouble going from 2d to 3d. I was following this algorithm http://stackoverflow.com/questions/5613718/click-to-zoom-in-webgl which seems similar to the idea outlined above, but the coordinates I'm getting don't correspond to the actual vertices I'm graphing. Any advice how to do this?

    ReplyDelete
  4. Can you describe me in more detail what you are doing - is it 3D -> 2D -> 3D, are any of the coordinates matching (e.g. x and y are, z is not)?

    ReplyDelete
  5. I'm plotting 3d data. I would like to be able to click on a screen coordinate (which is essentially 2d) and from that derive the closest corresponding 3d point. In the example it takes 2 arbitrary values for z: -1 and 0, then does a subtraction between the 2 matrices. None of the resulting values are matching. Maybe I'm misunderstanding the implementation?

    ReplyDelete
  6. To determine the Z coordinate is a bit more tricky - you would have check for a ray intersection with your object/plane. Neverthless, the x and y coordinates should be correct. I try to explain the code a bit more, maybe you find the bug:

    double x = 2.0 * winX / clientWidth - 1;
    double y = - 2.0 * winY / clientHeight + 1;

    - winX and winY - mouse coordinates relative to the 3d window or canvas, not the whole screen or browser window

    - clientWidth and clientHeight the same - size of the canvas/3D window

    Matrix4 viewProjectionInverse =
    inverse(projectionMatrix *
    viewMatrix);
    Point3D point3D = new Point3D(x, y, 0);
    return viewProjectionInverse.multiply(point3D);

    Here it's just math, with z = 0 as reference (you should still get x and correct). This order of multiplication is row mayor - maybe your using column mayor?

    You can also check what you will get when you transfrom the 3D point you got (correct or not) back to 2D.

    ReplyDelete
  7. ok, here is my code:

    var world1 = [0,0,0,0] ;
    var world2 = [0,0,0,0] ;
    var dir = [0,0,0] ;
    var w = event.srcElement.clientWidth ;
    var h = event.srcElement.clientHeight ;
    // calculate x,y clip space coordinates
    var x = (event.offsetX-w/2)/(w/2) ;
    var y = -(event.offsetY-h/2)/(h/2) ;
    mat4.inverse(pvMatrix, pvMatrixInverse) ;
    // convert clip space coordinates into world space
    mat4.multiplyVec4(pvMatrixInverse, [x,y,0,1], world1) ;

    my screen (relative to canvas) x, y are correct - I tried a number of different ways to reach this and got the same values each time.
    then taking the inverse of my pvmatrix, then multiplying that by [x,y,0,1] (for the moment, lets say that all my z's are 0). Then the result should be in world1. But when I click on 1,1,0 I do not get a matching result.

    ReplyDelete
  8. The code looks fine to me - the only value that here is unknown is the pvMatrix. I guess that one should be correct since you are rendering your scene, but double check if it has the correct value... I would try (if you did not already) to convert world1 back to screen space (do you get the event.offSet values again?). Otherwise it's the same math as in my code

    ReplyDelete
  9. Nice tutorial, clear and a good read.

    The only tiny thing I noticed, was:

    Matrix4 viewProjectionInverse = inverse(projectionMatrix * viewMatrix);

    Order of the matrix multiplication:

    Multiplication order should be switched to be:

    viewMatrix * projectionMatrix.

    Thanx for a great tutorial.

    Ben

    ReplyDelete
  10. Hi Ben, thanks for the comment, appreciate it a lot! Regarding the matrix order: the multiplication order is actually a convetion thing. Depending on whatever you are interpreting points as column or row vectors, and using row or major column mode, the order can change. In this specific case here I assumed the points to be row vectors and the matrices row major (e.g. translation component in the matrix indices 12, 13 and 14). Write down on a piece of paper the operation point*world*view*projection in math notation and you'll see what I mean.

    P.S. I anyways planned to have a post or two about transformations in detail, thanks for reminding me :)

    ReplyDelete
  11. Nice tutorial Denis! Few points:

    1) Can you describe what exactly you did while doing a 3D to 2D conversion. Code is fine but for me its giving little offset errors, so to rectify that I need to understand what the code is trying to do.

    2) mouse picking tutorial link is not working.

    ReplyDelete
  12. Hy Amit, sorry for the broken link - I'll try to find a good substitute for it (was some pdf on the net where I first saw these calculations a few years ago, and can't remember now how it was called anymore)...

    Regardin the 3D->2D conversion... the whole proces can be divided into a few simple steps, assuming that we have a 3D point in "absolute" coordinates. First we try to move that 3D into relative camera space by multiplying it with the view matrix. After that the product is multiplied by the projection matrix to get the point into clipping coordinates.

    Since the projection and view matrix are known at the begining I multipled them first and then the result matrix with the point (not neccessary for one point, but imagine you had 50 000 points to convert for whatever reason).

    Once these steps are done we get a point in normalized coordinate space ([-1,1], [-1,1]). We transform this 2D point into the domain ([0,1], [0,1]). Now, multiplying this with the window width/height we will get the pixel coordinates of the screen.

    Minor errors have to be expected (especially due to the round off when getting the normalized coordinates). You could try to specify the round function better - maybe using this here
    http://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm

    ReplyDelete
  13. "Once these steps are done we get a point in normalized coordinate space ([-1,1], [-1,1])". How?

    I did that by diving x,y,z clipping co-ordinates by clipping co-ordinate W. (which represents scaling factor of the scene).

    Did you miss this step or it wasn't really required?

    ReplyDelete
  14. By theory it should be considered and the division should be made. In this case though I assumed that the w coordinate is 1 and consistent throughout the data (therefore no division, I should point this out in the post).

    Check this two links here, maybe they help:

    http://www.opengl.org/documentation/specs/version1.1/glspec1.1/node23.html

    http://www.di.ubi.pt/~agomes/cg/teoricas/04e-windows.pdf

    Anyways, what is the exact issue you are having?

    ReplyDelete
  15. I am working on a prototype which required me to do picking and 3D->2D transformation. I have done both but figuring out doing normalization took significant time. I think it would be a good idea if you explain normalization step in the post.

    I also took W=1 at the first place but after multiplying [x,y,z,W] with view matrix and projection matrix, W becomes >1 and thats why there was a strict need for me to divide x,y by W.

    Thanks for the help :-)

    ReplyDelete
  16. Hello all

    I read the conversion function get3dPoint but I didn't understand how to use it. For example I have an image of a point with coordinates (x, y) respectively (2.3) how can I find the Point3D (x, y, z).

    And thank you in advance

    ReplyDelete
  17. Hi Denis,
    Many thanks for the tutorial. I was wondering if you have a spare moment to help me work this out. I have built the example in javascript, but I am having a bit of an issue converting back from a 3d point to xy. In other words applying the viewProjectionMatrix back on the 3d point to get the normalized screen coord.

    //starting with the nomalised screen coords:
    screen3D[0] = 0.5;
    screen3D[1] = 0.5;
    screen3D[2] = 0;
    screen3D[3] = 1;

    var viewProjectionMatrix = Matrix.mult(projectionMatrix,viewMatrix);
    var invViewProjectionMatrix =Matrix.inverse(viewProjectionMatrix);

    //apply the inverse projection view matrix get us into world space
    var worldM = Matrix.multTranslate(invProjectionViewMatrix,screen3D);

    //I can mult this worldM matix by viewProjectionMatrix now to get back to my normalized screen coords .

    var normScreenM = Matrix.multTranslate(projectionViewMatrix,worldM);

    //However taking the worldPoint coords....
    var worldPoint = Matrix.getTrans(worldM);
    //...and multiplying the projectionViewMatrix to it does not work
    var normScreenM2 = Matrix.multTranslate(projectionViewMatrix,worldPoint);

    So, I do not get the normalised screen coords from the product of (worldPoint * projectionViewMatrix). I only can get them back if I have the full matrix to multiply (worldM * projectionViewMatrix).
    I was trying to apply the inverse ops as a sanity check make sure everything checked out. Unfortunately there is something I have missed. I can't get back to my normalised screen coords multing with the worldPoint So I'd really appreciate it if you have a moment/idea of whats wrong to help me work this brain banana out...

    Many thanks!
    Wil

    ReplyDelete
  18. the clear and real view provided by the latest display devices make our experience real... the display system can be made more living with the correct supportives.. for beautiful home curtains and many more, you can visit..
    Projection screen company
    Theatrical Curtain"

    ReplyDelete
  19. Very helpful article, thanks! :) I actually wrote a similar article for performing the same conversion, but I'm specifically working in Maya. For people who have trouble with the normalization part, maybe you will find this useful: http://www.gouvatsos.com/how-to-transform-3d-coordinates-to-2d-screenspace-and-vice-versa

    ReplyDelete
  20. If the screen's origin is in the bottom left corner instead of top left, what would you need to change to convert from 2d->3d?

    ReplyDelete
    Replies
    1. Change this line

      (1) double y = - 2.0 * winY / clientHeight + 1;

      to

      (2) double y = 2.0 * winY / clientHeight - 1;
      So, you should use the same approach as with the x-axis.

      Besides that, you will get the same result if you supstitue "winY" with "clientHeight - winY" in (1).

      Hope it helps!

      Delete
  21. Should not you not divide by z to get screen coords?

    I did this using glm, and it seems to work (actual code):

    glm::vec2 getScreenCoord(glm::vec3 pos){

    glm::vec4 p = cameraPerspectiveMatrix*glm::vec4(pos, 1.0);
    glm::vec2 s;
    s.x = int(((p.x/p.z + 1.0)/2.0) * winWidth + .5);
    s.y = int(((1.0 - p.y/p.z)/2.0) * winHeight + .5);
    return s;
    }

    Also, you could extract z-coord from z-buffer, something like

    GLfloat zbuf;
    getPixelInfo(x,y, GL_COLOR_ATTACHMENT0, GL_DEPTH_COMPONENT, GL_FLOAT, &zbuf);
    double znorm = 2.0 * zbuf - 1.0;
    double zreal = 2.0 * clipNear * clipFar / (clipFar + clipNear - znorm * (clipFar - clipNear));

    ReplyDelete
  22. Please refer to http://trac.bookofhook.com/bookofhook/trac.cgi/wiki/MousePicking for a detailed explanation of how x and y computed wihtout the necessity of z.

    As for glm, this is a general coding approach written with WebGL in mind, without the use of any special libraries.

    ReplyDelete
  23. I have problem : I cant find any algorithm or program describe how to do conversion point 2D to 3D
    how did I to find Z
    point(x,y) --> point(x,y,z)
    please help me

    ReplyDelete