Can Selenium verify text inside a PDF loaded by the browser?

2023-01-13 03:37 问答作者：

My web application loads a pdf in the browser. I have figured out how to check that the pdf has loaded correctly using:

verifyAttribute xpath=//em开发者_StackOverflow社区bed/@src {URL of PDF goes here}

It would be really nice to be able to check the contents of the pdf with Selenium - for example verify that some text is present. Is there any way to do this?

While not natively supported, I have found a couple ways using the java driver. One way is to have the pdf open in your browser (having adobe acrobat installed) and then use keyboard shortcut keys to select all text (CTRL+A), then copy it to the clipboard (CTRL+C) and then you can verify the text in the clipboard. eg:

protected String getLastWindow() {
    return session().getEval("var windowId; for(var x in selenium.browserbot.openedWindows ){windowId=x;} ");
}

@Test
public void testTextInPDF() {
    session().click("link=View PDF");
    String popupName = getLastWindow();
    session().waitForPopUp(popupName, PAGE_LOAD_TIMEOUT);
    session().selectWindow(popupName);

    session().windowMaximize();
    session().windowFocus();
    Thread.sleep(3000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("65"); // Stands for A "ascii code for A"
    session().keyUpNative("17"); //Releases CTRL key
    Thread.sleep(1000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("67"); // Stands for C "ascii code for C"
    session().keyUpNative("17"); //Releases CTRL key

    TextTransfer textTransfer = new TextTransfer();
    assertTrue(textTransfer.getClipboardContents().contains("Some text in my pdf"));
}

Another way, still in java, is to download the pdf and then convert the pdf to text with PDFBox, see http://www.prasannatech.net/2009/01/convert-pdf-text-parser-java-api-pdfbox.html for an example on how to do this.

You cannot do this using WebDriver natively. However, PDFBox API can be used here to read content of PDF file. You will have to first of all shift a focus to browser window where PDF file is opened. You can then parse all the content of PDF file and search for the desired text string.

Here is a code to use PDFBox API to search within PDF document.

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import org.pdfbox.cos.COSDocument;
import org.pdfbox.pdfparser.PDFParser;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.util.PDFTextStripper;

public class pdfToTextConverter {

public static void pdfToText(String path_to_PDF_file, String Path_to_output_text_file) throws FileNotFoundException, IOException{
     //Parse text from a PDF into a string variable
     File f = new File("path_to_PDF_file");

     PDFParser parser = new PDFParser(new FileInputStream(f));
     parser.parse();

     COSDocument cosDoc = parser.getDocument();
     PDDocument pdDoc = new PDDocument(cosDoc);

     PDFTextStripper pdfStripper = new PDFTextStripper();
     String parsedText = pdfStripper.getText(pdDoc);

     System.out.println(parsedText);

     //Write parsed text into a file
     PrintWriter pw = new PrintWriter("Path_to_output_text_file");
     pw.print(parsedText);
     pw.close(); 

}

}


JAR Source
http://sourceforge.net/projects/pdfbox/files/latest/download?source=files

Unfortunately you can not do this at all with Selenium

There is a way.

Before you click the link you can obtain the href value element.FindElement(By.TagName("href")).Text
Then after the PDF loads you can get the Url driver.GetUrl();
Then you can just check to see if the url contains the href.

It's not the best, but it's better than nothing.

继续阅读：firefox pdf selenium selenium-ide testing

Can Selenium verify text inside a PDF loaded by the browser?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？