Disclaimer: This is not intended to be a step by step guide. The goal with this project was to get more hands-on practice with the Azure AI capabilities while solving a real world problem I have annoyingly faced not too long ago.
The best-practices were not strictly followed – everything is ClickOps’ed and some other stuff I have not deployed the right way, I may add a section at the end addressing this topic.

/the problem leading to this project

Most company knowledge lives in documents nobody can find – especially HR documents, policies and regulations. You just want one answer to “How many PTO days do I get per year“, or “Whats the sick leave policy?“. You click on some promising PDFs, hit CTRL + F . . . nothing. You try a synonym . . . still nothing. And that was only the 10th PDF you opened on that day – that’s the problem I faced not too long ago.
The system you’re about to read replaces that with a search box you can talk in a plain language. You ask your question and it returns the right PDF, the right page and a link to the source.

This blog walks you through how it works and which Azure native services does what – not as a step by step build guide, but more as a tour of the moving parts and how they’re connected to each other – some code snippets might make it through though.

/what is RAG – keeping it short

Since its not really that important for this post, I will keep this part short. RAG stands for Retrieval-Augmented Generation. The standard AI model only knows what it was trained on – it has no idea about our HR documents. With RAG you give your AI model a knowledge base in form of an index and it retrieves the information from that – kept it pretty simple but that’s kind of the gist of it.

/the one idea that makes this work

Before we start with the Azure resources, architectures etc., it is important to understand the concept that makes this meaning-based search possible: embeddings.

An embedding is a way of turning text into numbers (called vector), that basically captures its meaning. The key property is that text with similar meaning ends up with similar vectors. So for example “accident insurance” and the German word “Unfallversicherung” will land closely together, even though they share no letters in common, but because they mean the same thing.
Meaning is literal here: the vector numbers are near each other, and that is something a computer can measure.

This is the difference between this system and the old-fashioned keyword search. Keyword search can only show you or find you the exact words you typed into the search box.
Meaning-based search finds the right answer even when the words don’t match. The whole system here is essentially a machine for creating these meaning-numbers and then comparing them.

/two pipelines, one search box

The end product is a very simple web page with a search box. The employee types something like “How many vacation days do I have per year?“, and the page returns the most relevant HR PDFs, each with the source filename, the page number, a readable snippet, and a button to open the original document -> By opening it, you will be on the right page where the most relevant passage is presented to you.

Under the hood there are two independent pipelines:

The ingestion pipeline runs whenever a PDF file is added to storage. It extracts the text, splits it into chunks, converts each chunk into numerical embedding vector, and writes everything into a search index. This happens once per document and is fully automatic.

The query pipeline runs every time someone searches something. It converts the question of the user into an embedding, asks the search index for the closest match and returns the best passages with a link to the source -> PDF file is downloadable from the source.

/the architecture

The simple workflow is basically: PDFs land in Blob Storage, which raises an event through Event Grid that triggers an Azure Function. That function calls Document Intelligence to read the PDF, calls Azure OpenAI to turn each text chunk into an embedding, and stores the result in Azure AI Search. The second function in the same app serves an HTTP search endpoint that the web page calls. Every credential the functions need is fetched at runtime from Key Vault, which the Function App access through a managed identity.

/pipeline 1: the ingestion

This is the behind-the-scenes pipeline. Nobody triggers it by hand, it runs the moment a document (specifically documents ending with “.pdf”) appears in the container named “policies”.

//blob storage – where the PDFs live

What it is: plain file storage in the cloud, lives in a “storage account”
Why it’s here: it’s the single source of truth for the original PDFs, every other resource just reads from it

This is just the usual General Purpose v2 storage account.
A container named policies holding the HR PDFs for this PoC. The best practice would be to create a Private Endpoint and make this storage account private – I was just a bit lazy and left it open for this PoC. -> NEVER LET YOUR PRODUCTIVE STORAGE ACCOUNT BE OPEN, especially if you host confidential files on it like HR Documents.

//azure document intelligence – the reader

What it is: It’s the resource that reads documents and returns you structured text.
Why it’s here: PDFs are not plain text, even when they look like it. This resource extracts the words of the visual PDF and tells you which page each passage came from. This is how a search result can later say “Page 2.”

poller = doc_client.begin_analyze_document("prebuilt-layout", {"urlSource": blob_url})
        result = poller.result()

This is the part where we hand Document Intelligence a temporary link to the PDF (blob_url), then we wait while it reads the whole thing. What comes back (result) contains the extracted sentences and their page numbers. This part of the function app is not sending the file’s contents around – we’re just pointing the reader at the file and say “go read that there”.

//azure openai – gives meaning

What it is: Microsofts hosted access to OpenAI models.
Why it’s there: It generates the embeddings (remember the “one idea that makes this work?” – comes from this resource). Each chunk of text from the document gets converted into a vector.

Azure OpenAI resource provisioned by Azure AI Foundry portal
I chose the text-embedding-3-large model. This model is also multilingual, which means that it doesn’t matter if the search is in english, german or my shaky italian

//azure AI search – stored data

What it is: An azure managed search engine – can also store data
Why it’s here: It stores the document chunks and runs the actual search. Keywords, vector and the semantic reranking that pushes the most relevant results to the top of the search. For each chunk, the resource stores a small record like this:

      "@search.score": 1,
      "id": "6a6f02f7-3480",
      "content": "Wegfallunfälle gelten als Berufsunfälle, sofern der direkte Weg ohne wesentliche Unterbrechung zurückgelegt wird. Mitarbeitende im Homeoffice sind während der Arbeitszeit ebenfalls gegen Berufsunfälle versichert.",
      "page_number": 1,
      "filename": "HR_Unfallversicherung.pdf",
      "blob_path": "HR_Unfallversicherung.pdf",
      "category": "policies",
      "language": "de"
    }

//key vault – the safe

What it is: a secure store for secrets like keys, passwords or even connection strings
Why it’s here: so none of the secrets we need ever sits in the code itself. Each service our function app talks to needs a key, and the function app fetches them form the key vault at runtime. The function app proves that it’s allowed to access the secrets with a built-in managed identity and the respective RBAC Role (Key Vault Secrets User).

//event grid – announcer

What it is: It’s a service that announces when something happens.
Why it’s here: In this use case, when a pdf was uploaded, Event Grid notifies and wakes up the processing code that makes stuff happen. Without this, the function app would never know that something new has been uploaded.

//function app – connects everything together

What it is: Is the serverless resource that runs your python (or whatever language you prefer) on demand.
Why it’s here: It is basically the glue that holds everything together. Without this nothing really happens, it coordinates what happens at what time or which stage. Since it is severless it comes to life when it needs to do something and goes quiet again after.

This triggers the function automatically whenever a new file lands in the “policies” storage container, with event Grid delivering the notification:

Generates a SAS URL so Document Intelligence for example can go and read the blob:

For each chunk of the paragraph, this builds one search record. The text itself, its vector and metadata (page number, filename etc.) and adds it to the batch that will be uploaded to the index from Azure AI Search:

The first workflow ends here.

/pipelin 2: the search

This part is where the person actually sees something. This runs every time someone types a question into the frontend at hits search.

Ignore the absolute horrible arrows, draw.io dynamic arrows are a bit wonky

//the web page

I have hosted the frontend of this app on my local machine through python. I did not host it on Azure, since I decided local hosting would be sufficient enough for the PoC. I suppose you could host it on Azure on something like App Services -> Web app.

You can host your index.html file locally with the following command, be advised that you have to be in the directory where the html file is residing in and run it from there:

The served HTTP site is going to be running on port 8000, if you prefer it to run on another port you have to specify that with a <NUMBER> after the “http.server”.

If you now open your browser and type in http://localhost:8000 you will see your frontend with which you can interact, hopefully.

//the search function and Azure AI Search

The same two resources from before (Azure Function and Azure AI Search) reappear here. The question typed into the frontend by the user is first turned into a vector by Azure OpenAI – the exact same translation model that was used before with the ingestion pipeline, so the two objects can now be compared to each other.

Then the function app asks the search engine to find matches:

 results = search_client.search(
            search_text=query,----------------------> 1. Match keywords
            vector_queries=[VectorizedQuery(--------> 2. Match the meaning
                vector=embed(query, secrets),
                k_nearest_neighbors=TOP_RESULTS,
                fields=VECTOR_FIELD,
            )],
            query_type="semantic",-----------------_> 3. re-rank by relevance
            semantic_configuration_name=SEMANTIC_CONFIG,
            query_caption="extractive",
            select=["content", "page_number", "filename", "blob_path", "category", "language"],
            top=TOP_RESULTS,
        )

1: Matches the exact words typed into the searchbox. The classical way, nothing complicated here.
2: Is the meaning-based matching, it asks for the nearest vectors – the closer the number the more it means the same thing.
3: Re-order the findings and make the most relevant object appear first by moving it to the top post.

//the download link

The final touch is the “open PDF” button. The function generates a temporary link that works for one hour and then expires.

expiry=datetime.now(timezone.utc) + timedelta(hours=1),

A user can open the source document and download that, if so desired. See the last part of the code, where the download URL is given to the frontend as part of the answer chunk:

items.append({
                "filename":     r.get("filename"),
                "page_number":  r.get("page_number"),
                "category":     r.get("category"),
                "language":     r.get("language"),
                "snippet":      snippet,
                "score":        round(float(score), 3),
                "download_url": get_blob_sas_url(blob_path, secrets["blob_connection_string"]) if blob_path else None,
            })

/cost

Since every resource created in this lab is basically near zero cost (mostly pennies), our lab costs have racked up at about 0.03 CHF for the last 2 weeks. With the first place being taken by storage account:

If you’d deploy this in production with more PDFs and more executions etc. The following table should give you some idea on how much something costs:

Resource	Tier	Rough monthly cost
Blob Storage / Storageacc	Standard LRS	< 0.02 CHF/month (first 50TB)
Document Intelligence	S0	< 1.18 CHF/1000 pages, pennies for us
Azure OpenAI	Standard	many factors, > pennies at PoC Volume
Azure Functions	Flex Consumption	Free, for 250k executions per month
Key Vault	Standard	< 0.024 CHF (10k transactions)

/possible next steps

As the possible next steps I see the following points:

Make it available for others via Azure Static web apps with Entra sign-in
Add a PDF Viewer that opens the source document to the matched page with the passage highlighted, instead of just a download link
Add generated answers (full RAG) – deploy a chat model and let that return written answers with citations from the PDF source. This just needs one additional model deployment and a small change to the search function.
Storage Account: Make it private as mentioned before – that’s a security risk at the moment
While testing, some search results weren’t quite what I was looking for. For Example it presented me Page 2 as most relevant, but Page 1 would’ve answered my questions better – would need some tuning on that.

I might look at these options in the future but for now I have seen what is possible with the native azure tools and the most important point is, I have learnt a lot while doing this project. Writing this blog has brought everything together again and it made me realize how fun this stuff is.

Keep you posted – bye bye, Casi.

Azure PoC Project: A Practical RAG Document Search Application