Issue with itextsharp
I have a PDF document that has several hundred fields. All of the field names have periods in them, such as "page1.line1.something"
I want to remove these periods and replace them with either an underscore or (better) nothing at all
There appears to be a bug in the itextsharp libraries where the renamefield method does not work if the field has a period, so the following does not work (always returns false)
Dim formfields As AcroFields = stamper.开发者_运维技巧AcroFields
Dim renametest As Boolean
renametest = formfields.RenameField("page1.line1.something", "page1_line1_something")
If the field does not have a period in it, it works fine.
Has anyone come across this and is there a workaround?
Is this an AcroForm form or a LiveCycle Designer (xfa) form?
If it's XFA (which is likely given the field names), iText can't help you. It can only get/set field values when working with XFA.
Okay, an AcroForm. Rather than go the route used in your source, I suggest you directly manipulate the existing field dictionaries and the acroForm field list.
I'm a Java native when it comes to iText, so you'll have to do some translation, but here goes:
A) Delete the AcroForm's field array. Leave the calculation order alone if present (/CO). I think.
PdfDictionary acroDict = reader.getCatalog().getAsDictionary(PdfName.ACROFORM);
acroDict.remove(PdfName.FIELDS);
B) Attach all the 'top level' fields to a new FIELDS array.
PdfArray newFldArray = new PdfArray();
acroDict.put(newFldArray, PdfName.FIELDS);
// you could wipe this between pages to speed things up a bit
Set<PdfIndirectReference> radioFieldsAdded = new HashSet<PdfIndirectReference>();
int numPages = reader.getNumberOfPages();
for (int curPg = 1; curPg <= numPages; ++curPg) {
PdfDictionary curPageDict = reader.getPageN(curPg);
PdfArray annotArray = curPageDict.getAsArray(PdfName.ANNOTS);
if (annotArray == null)
continue;
for (int annotIdx = 0; annotIdx < annotArray.size(); ++annotIdx) {
PdfIndirectReference fieldReference = (PdfIndirectReference) annotArray.getAsIndirect(annotIdx);
PdfDictionary field = (PdfDictionary)PdfReader.getObject(fieldReference);
// if it's a radio button
if ((PdfFormField.FF_RADIO & field.getAsNumber(PdfName.FF).intValue()) != 0) {
fieldReference = field.get(pdfName.PARENT);
field = field.getAsDict(PdfName.PARENT); // looks up indirect reference for you.
// only add each radio field once.
if (radioFieldsAdded.contains(fieldReference)) {
continue;
} else {
radioFieldsAdded.add(fieldReference);
}
}
field.remove(PdfName.PARENT);
// you'll need to assemble the original field name manually and replace the bits
// you don't like. Parent.T + '.' child.T + '.' + ...
String newFieldName = SomeFunction(field);
field.put(PdfName.T, new PdfString( newFieldName ) );
// add the reference, not the dictionary
newFldArray.add(fieldReference)
}
}
C) Clean up
reader.removeUnusedObjects();
Disadvantage:
More Work.
Advantages:
Maintains all field types, attributes, appearances, and doesn't change the file as a whole all that much. Less CPU & memory.
Your existing code ignores field script, all the field flags (read only, hidden, required, multiline text, etc), lists/combos, radio buttons, and quite a few other odds and ends.
if you use periods in your field name, only the last part can be renamed, e.g. in page1.line1.something only "something" can be renamed. This is because the "page1" and "line1" are treated by adobe as parents to the "something" field
I needed to delete this hierarchy and replace it with a flattened structure
I did this by
- creating a pdfdictionary object for each field
- reading the annotations I needed for each field into an array
- deleting the field hierarchy in my (pdfstamper) document
- creating a new set of fields from my array data
I have created some sample code for this if you want to see how I did it.
精彩评论