PdfImagebox can open searchable PDF

Started by iPhiTech, February 22, 2018, 01:26:22 PM

Previous topic - Next topic

iPhiTech

Hello, Sir, I'm here Again

If Possible can open searchable PDF and can highlight not the selection highlight.
I'm getting Error when i try to load searchable PDF.

Richard Moss

Hello,

Sounds like you're trying to go beyond the bounds of what the PdfImageBox sample was supposed to do, which is to convert a page in a PDF to an image and display it.

You haven't posted any details about the error, but if it's an error opening a PDF it's doubtful I can help - the PDF  to image conversion is pretty much all done via Ghostscript which is a third party library that I used.

I really have no idea if this is possible or not, you're probably better off finding a library that can read the actual elements in a PDF document (iTextSharp comes to mind), that way you can perform your searching etc, although I don't know how you'd render them.

Regards;
Richard Moss

iPhiTech



when I'm trying to open searchable pdf getting this error. how about if I show some message box the pdf is searchable.

Richard Moss

#3
Check what the value of result is and see if matches a Ghostscript error code. It could be that you're passing the wrong parameters, for example.

Edit: Actually, I can see from the locals window that the result code is -100, which is a catchall for a fatal Ghostscript error. In that case, I don't know what to suggest - I actually haven't used Ghostscript for years and am unlikely to anytime soon.

iPhiTech


Richard Moss

You can use normal management exception handling

try
{
// perform ghostscript operation
}
catch (GhostScriptException ex)
{
  // do something with the exception
}

iPhiTech

sir any idea how to validate pdf that is already searchable or not

Richard Moss

Hello,

As I mentioned, I don't really work with PDF's so off the top of my head I don't know. A few seconds searching however brought up this answer on stackoverflow which describes how you can extract text from PDF files. As I believe a "searchable" PDF is simply one that has text elements then this should cover what you need.

This uses the iTextSharp package. I tested it in a console application and it spat out all the text in the sample PDF file I provided, so this may suit your needs.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using Path = System.IO.Path;

namespace PdfCheck
{
  class Program
  {
    static void Main(string[] args)
    {
      ListText(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "sample.pdf"));

      Console.ReadKey(true);
    }

    private static void ListText(string fileName)
    {
      StringBuilder text = new StringBuilder();
     
      using (PdfReader reader=new PdfReader(fileName))
      {
        for (int page = 1; page <= reader.NumberOfPages; page++)
        {
          ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
          string currentText = PdfTextExtractor.GetTextFromPage(reader, page, strategy);

          currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
          text.Append(currentText);
        }
      }

      Console.WriteLine(text.ToString());
    }
  }
}


Hope this helps.

Regards;
Richard Moss