PdfImagebox can open searchable PDF

Started by iPhiTech, February 22, 2018, 01:26:22 PM

Previous topic - Next topic

iPhiTech

Hello, Sir, I'm here Again

If Possible can open searchable PDF and can highlight not the selection highlight.
I'm getting Error when i try to load searchable PDF.

Richard Moss

Hello,

Sounds like you're trying to go beyond the bounds of what the PdfImageBox sample was supposed to do, which is to convert a page in a PDF to an image and display it.

You haven't posted any details about the error, but if it's an error opening a PDF it's doubtful I can help - the PDF  to image conversion is pretty much all done via Ghostscript which is a third party library that I used.

I really have no idea if this is possible or not, you're probably better off finding a library that can read the actual elements in a PDF document (iTextSharp comes to mind), that way you can perform your searching etc, although I don't know how you'd render them.

Regards;
Richard Moss
Read "Before You Post" before posting. Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.

iPhiTech



when I'm trying to open searchable pdf getting this error. how about if I show some message box the pdf is searchable.

Richard Moss

#3
Check what the value of result is and see if matches a Ghostscript error code. It could be that you're passing the wrong parameters, for example.

Edit: Actually, I can see from the locals window that the result code is -100, which is a catchall for a fatal Ghostscript error. In that case, I don't know what to suggest - I actually haven't used Ghostscript for years and am unlikely to anytime soon.
Read "Before You Post" before posting. Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.

iPhiTech


Richard Moss

You can use normal management exception handling

try
{
// perform ghostscript operation
}
catch (GhostScriptException ex)
{
  // do something with the exception
}
Read "Before You Post" before posting. Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.

iPhiTech

sir any idea how to validate pdf that is already searchable or not

Richard Moss

Hello,

As I mentioned, I don't really work with PDF's so off the top of my head I don't know. A few seconds searching however brought up this answer on stackoverflow which describes how you can extract text from PDF files. As I believe a "searchable" PDF is simply one that has text elements then this should cover what you need.

This uses the iTextSharp package. I tested it in a console application and it spat out all the text in the sample PDF file I provided, so this may suit your needs.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using Path = System.IO.Path;

namespace PdfCheck
{
  class Program
  {
    static void Main(string[] args)
    {
      ListText(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "sample.pdf"));

      Console.ReadKey(true);
    }

    private static void ListText(string fileName)
    {
      StringBuilder text = new StringBuilder();
     
      using (PdfReader reader=new PdfReader(fileName))
      {
        for (int page = 1; page <= reader.NumberOfPages; page++)
        {
          ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
          string currentText = PdfTextExtractor.GetTextFromPage(reader, page, strategy);

          currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
          text.Append(currentText);
        }
      }

      Console.WriteLine(text.ToString());
    }
  }
}


Hope this helps.

Regards;
Richard Moss
Read "Before You Post" before posting. Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.