Where is PDF image rotation information stored?
I am trying to extract the images stored in PDF as stream. While I can do this easily, I am not able to get the accurate image rotation information. I am looking for specific information such as MediaBox, Rotate and landscape/portrait mode.
When I extract the image, its alignment does not match the what the end user sees with a pdf reader tool.
I binary compared two PDFs (where an image was rotated 90 in the former and the same image was rotated 270 in the latter) and I found difference in a particular stream object. However, I am not able to make out what that stream information is.
Here are the two documents I am talking about:
开发者_运维知识库http://bit.ly/eQZGKJ http://bit.ly/g43Whb
The position, size and orientation of the image when displayed on the page is determined by the current transformation matrix (CTM). You have to execute the entire page content stream to determine the CTM that is in place when the image is displayed. It's like a virtual rendering of the PDF page.
To almost every image is so called CTM (current transformation matrix) stored. It gives a reader information about position, rotation and skewing of the image.
Check cm operator, which described in pdf reference as "Modify the current transformation matrix (CTM) by concatenating the specified matrix (see Section 4.2.1, “Coordinate Spaces”). Although the operands specify a matrix, they are written as six separate numbers, not as an array." In your PDF documents:
- rotated1.pdf contains "0 550.08 -743.04 0 743.04 0 cm"
- rotated2.pdf contains "0 -550.08 743.04 0 0 550.08 cm"
So we can say that your image rotates on 90deg clockwise or onto 90deg in opposite direction. (and translated)
It can also have a clip so you may only see part of the image. MediaBox and rotation relate to the whole page.
精彩评论