15 Answers, 1 is accepted
You can traverse the content of a PDF document and check only the elements containing text using the PdfProcessing library. What you would need to do is to import the document and then iterate through all the Content of each RadFixedPage:
foreach
(var page
in
this
.pdfDocument.Pages)
{
foreach
(var item
in
page.Content)
{
var textFragment = item
as
Telerik.Windows.Documents.Fixed.Model.Text.TextFragment;
if
(textFragment !=
null
)
{
var position = textFragment.Position;
}
}
}
Hope this is helpful.
Regards,
Tanya
Progress Telerik
I was unable to test it earlier, but I think this is what I am looking for.
I am still just a starter, so may be a simple question:
How can I translate "var position" to positions in a string? I read about Matrix en IPosition, but only to {set} and not to {get}
I hope you can help me.
Best regards,
André
Hi André,
I would like to first start with a clarification on the format specifics so I can ensure we are on the same page. The PDF format is a fixed-document format, which means that all the elements inside are represented by separate geometries and glyphs. Any of these elements are positioned in a fixed place. Each word in a PDF document represents several glyphs drawn on positions that are next to each other.
The Position property of the TextFragment class represents the starting position of the fragment. Please, note that a TextFragment instance might contain just a single letter from a word or several words and that depends on how the PDF document is generated. Getters for the position-related properties are available and you can use them for obtaining the position of a specific element in the document.
Can you share more information on the exact scenario you are trying to achieve? Why you need the coordinates and which exactly coordinates will work for the case? Are the ones for each letter? Or you need them per word?
Regards,
Tanya
Progress Telerik
Hi Tanya,
I want to make an XML export of each letter or words of PDF-invoices to scan for InvoiceNo, Invoicedate, Totaal Amount etc.
Having the coordinates I am able to make a template per client for future invoices.
The export will be something like:
<PDFTekst><Woord>
<Pagina>1</Pagina>
<BeginX>42,51968</BeginX>
<BeginY>115,1646</BeginY>
<EindeX>90,52768</EindeX>
<EindeY>121,2126</EindeY>
<Tekst>InvoiceNo</Tekst>
</Woord><Woord>
<Pagina>1</Pagina>
<BeginX>92,75168</BeginX>
<BeginY>115,1646</BeginY>
<EindeX>103,4237</EindeX>
<EindeY>121,2126</EindeY>
<Tekst>202017283</Tekst>
</Woord><Woord>
<Pagina>1</Pagina>
<BeginX>42,51968</BeginX>
<BeginY>123,6685</BeginY>
<EindeX>85,63969</EindeX>
<EindeY>129,7325</EindeY>
<Tekst>Date</Tekst>
</Woord><Woord>
<Pagina>1</Pagina>
<BeginX>87,86369</BeginX>
<BeginY>123,6685</BeginY>
<EindeX>92,31168</EindeX>
<EindeY>129,7325</EindeY>
<Tekst>2020-02-13</Tekst>
</Woord></PDFTekst>
I already succeed based on another library (not telerik), but I want to build it out of Telerik components.
I am able to build the words or lines out of characters based on thier positions, so positions of each letter is also fine with me. Words would be great, so I prefer this if it is also possible.
I hope this will clear my question.
Best regards,
André
Hello André,
You can create templates using interactive forms and this would be the easiest way to achieve the desired functionality. The template can be generated using the API of PdfProcessing and then visualized in PdfViewer. Would that be an option for you?
Regards,
Tanya
Progress Telerik
Hello Tanya,
I think you misunderstand my project. The templates I mentioned are only a registration of the positions of the textfragments in a database. This is done by another programm.
What I need for my Telerik project are the positions of every textfragment. In your first reply you already give me an example how to do this.
My only problem is how to translate the "var position" to the X-Begin, X-End, Y-Begin, Y-End values. The VAR seems to be a matrix and I don't know how to get the coordinates out of the VAR.
Can you show me how to get the positions out of the matrix into strings by a little example?
Thanks for helping me.
Best regards,
André
Hi André,
Please, excuse me for the misunderstanding.
The MatrixPosition object obtained from the TextFragment exposes the OffsetX and OffsetY properties. These properties determine the start position of the fragment. Its end position, however, is determined dynamically by the specific font settings applied to the content. That is why information for the ending position of the content is not available.
Hope this answers your question.
Regards,
Tanya
Progress Telerik
Hi Tanya,
I already find this link. This is all based on creating. I need to GET and not to SET.
Can you show me how to get the positions out of the matrix into strings by a little example?
Best regards,
André
Hi André,
You can use the following code to get the positions and the content of a TextFragment as a string:
string startX = textFragment.Position.Matrix.OffsetX.ToString();
string startY = textFragment.Position.Matrix.OffsetY.ToString();
string text = textFragment.Text;
Regards,
Tanya
Progress Telerik
Hi Tanya,
Thanks. This was what I needed. I tried this, but didnot succeed. To use Matrix I had to add a Reference to WindowsBase. This seems to be my biggest problem.
Regards
André
Hi André,
Can you please share more details on what is preventing you from adding a reference to WindowsBase? The assembly should be available with your .NET Framework installation and it is a dependency of the PdfProcessing library.
Regards,
Tanya
Progress Telerik
Hi Tanya,
Nothing is preventing me for adding a reference to WindowsBase. Because I didn't have one, I had the problem using Matrix. Adding WindowsBase solved my problem.
I succeeded yesterday in making my application work, thanks to you.
Regards,
André
Hello André,
Thank you for the clarification. I am glad to hear that you managed to achieve the desired result.
Regards,
Tanya
Progress Telerik
Hi Tanya,
I just experience a problem on opening, so I have still a question:
Is it possible to check if a PDF file that I want to open is Encrypted and is there a way to open it based on standard encryptions?
I know how to encrypt a PDF file on creating, but not on opening. The examples I find were all based on creating/writing.
Regards,
André
Hello André,
You can open encrypted PDF documents using the ImportSettings of PdfProcessing's PdfFormatProvider. While an encrypted document is being imported, the UserPasswordNeeded event is fired so you can provide the password.
On a side note, we are always trying to keep the sections and different conversations in the public forums in good order. Thus, I would like to ask you to submit different topics or raise support tickets for the different questions you might have. We believe that such a separation would be beneficial for both sides. Thank you for understanding.
Regards,
Tanya
Progress Telerik