How to escape/strip special characters in the LaTeX document?
We implemented the online service where it is possible to generate PDF with predefined structure. The user can choose a LaTeX template and then compile it with an appropriate inputs.
The question we worry about is the security开发者_StackOverflow, that the malicious user was not able to gain shell access through the injection of special instruction into latex document.
We need some workaround for this or at least a list of special characters that we should strip from the input data.
Preferred language would be PHP, but any suggestions, constructions and links are very welcomed.
PS. in few word we're looking for mysql_real_escape_string for LaTeX
Here's some code to implement the Geoff Reedy answer. I place this code in the public domain.
<?
$test = "Test characters: # $ % & ~ _ ^ \ { }.";
header( "content-type:text/plain" );
print latexSpecialChars( $test );
exit;
function latexSpecialChars( $string )
{
$map = array(
"#"=>"\\#",
"$"=>"\\$",
"%"=>"\\%",
"&"=>"\\&",
"~"=>"\\~{}",
"_"=>"\\_",
"^"=>"\\^{}",
"\\"=>"\\textbackslash",
"{"=>"\\{",
"}"=>"\\}",
);
return preg_replace( "/([\^\%~\\\\#\$%&_\{\}])/e", "\$map['$1']", $string );
}
The only possibility (AFAIK) to perform harmful operations using LaTeX is to enable the possibility to call external commands using \write18
. This only works if you run LaTeX with the --shell-escape or --enable-write18 argument (depending on your distribution).
So as long as you do not run it with one of these arguments you should be safe without the need to filter out any parts.
Besides that, one is still able to write other files using the \newwrite
, \openout
and \write
commands. Having the user create and (over)write files might be unwanted? So you could filter out occurrences of these commands. But keeping blacklists of certain commands is prone to fail since someone with a bad intention can easily hide the actual command by obfusticating the input document.
Edit: Running the LaTeX command using a limited account (ie no writing to non latex/project related directories) in combination with disabling \write18
might be easier and more secure than keeping a blacklist of 'dangerous' commands.
According to http://www.tug.org/tutorials/latex2e/Special_Characters.html the special characters in latex are # $ % & ~ _ ^ \ { }
. Most can be escaped with a simple backslash but _
^
and \
need special treatment.
For caret use \^{}
(or \textasciicircum
), for tilde use \~{}
(or \textasciitilde
) and for backslash use \textbackslash
If you want the user input to appear as typewriter text, there is also the \verb
command which can be used like \verb+asdf$$&\~^+
, the +
can be any character but can't be in the text.
In general, achieving security purely through escaping command sequences is hard to do without drastically reducing expressivity, since it there is no principled way to distinguish safe cs's from unsafe ones: Tex is just not a clean enough programming language to allow this. I'd say abandon this approach in favour of eliminating the existence of security holes.
Veger's summary of the security holes in Latex conforms with mine: i.e., the issues are shell escapes and file creation.overwriting, though he has missed a shell escape vulnerability. Some additional points follow, then some recommendations:
- It is not enough to avoid actively invoking
--shell-escape
, since it can be implicitly enabled in texmf.cnf. You should explicitly pass--no-shell-escape
to override texmf.cnf; \write18
is a primitive of Etex, not Knuth's Tex. So you can avoid Latexes that implement it (which, unfortunately, is most of them);- If you are using Dvips, there is another risk:
\special
commands can create .dvi files that ask dvips to execute shell commands. So you should, if you use dvips, pass the-R2
command to forbid invoking of shell commands; - texmf.cnf allows you to specify where Tex can create files;
- You might not be able to avoid disabling creation of fonts if you want your clients much freedom in which fonts they may create. Take a look at the notes on security for Kpathsea; the default behaviour seems reasonable to me, but you could have a per user font tree, to prevent one user stepping on another users toes.
Options:
- Sandbox your client's Latex invocations, and allow them freedom to misbehave in the sandbox;
- Trust in kpathsea's defaults, and forbid shell escapes in latex and any other executables used to build the PDF output;
- Drastically reduce expressivity, forbidding your clients the ability to create font files or any new client-specified files. Run latex as a process that can only write to certain already existing files;
- You can create a format file in which the
\write18
cs, and the file creation css, are not bound, and only macros that invoke them safely, such as for font/toc/bbl creation, exist. This means you have to decide what functionality your clients have: they would not be able to freely choose which packages they import, but must make use of the choices you have imposed on them. Depending on what kind of 'templates' you have in mind, this could be a good option, allowing use of packages that use shell escapes, but you will need to audit the Tex/Latex code that goes into your format file.
Postscript
There's a TUGBoat article, Server side PDF generation based on LATEX templates, addressing another take on the question to the one I have taken, namely generating PDFs from form input using Latex.
You'd probably want to make sure that your \write18
is disabled.
See http://www.fceia.unr.edu.ar/lcc/cdrom/Instalaciones/LaTex/MiKTex/doc/ch04s08.html and http://www.texdev.net/2009/10/06/what-does-write18-mean/
精彩评论