Hello, Sir, I'm here Again
If Possible can open searchable PDF and can highlight not the selection highlight.
I'm getting Error when i try to load searchable PDF.
Hello,
Sounds like you're trying to go beyond the bounds of what the PdfImageBox sample was supposed to do, which is to convert a page in a PDF to an image and display it.
You haven't posted any details about the error, but if it's an error opening a PDF it's doubtful I can help - the PDF to image conversion is pretty much all done via Ghostscript which is a third party library that I used.
I really have no idea if this is possible or not, you're probably better off finding a library that can read the actual elements in a PDF document (iTextSharp comes to mind), that way you can perform your searching etc, although I don't know how you'd render them.
Regards;
Richard Moss
(https://i.imgur.com/uywDzhe.png)
when I'm trying to open searchable pdf getting this error. how about if I show some message box the pdf is searchable.
Check what the value of result is and see if matches a Ghostscript error code (https://ghostscript.com/doc/current/API.htm#return_codes). It could be that you're passing the wrong parameters, for example.
Edit: Actually, I can see from the locals window that the result code is -100, which is a catchall for a fatal Ghostscript error. In that case, I don't know what to suggest - I actually haven't used Ghostscript for years and am unlikely to anytime soon.
any suggestion to catch the error
You can use normal management exception handling
try
{
// perform ghostscript operation
}
catch (GhostScriptException ex)
{
// do something with the exception
}
sir any idea how to validate pdf that is already searchable or not
Hello,
As I mentioned, I don't really work with PDF's so off the top of my head I don't know. A few seconds searching however brought up this answer (https://stackoverflow.com/a/5003230/148962) on stackoverflow which describes how you can extract text from PDF files. As I believe a "searchable" PDF is simply one that has text elements then this should cover what you need.
This uses the iTextSharp (https://www.nuget.org/packages/iTextSharp/) package. I tested it in a console application and it spat out all the text in the sample PDF file I provided, so this may suit your needs.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using Path = System.IO.Path;
namespace PdfCheck
{
class Program
{
static void Main(string[] args)
{
ListText(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "sample.pdf"));
Console.ReadKey(true);
}
private static void ListText(string fileName)
{
StringBuilder text = new StringBuilder();
using (PdfReader reader=new PdfReader(fileName))
{
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(reader, page, strategy);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
}
}
Console.WriteLine(text.ToString());
}
}
}
Hope this helps.
Regards;
Richard Moss