Setting up RAG

Deploy the Search Service (Skip this step if you already have a search service set up for AI Portal RAG)
1. Go to Deploy a Custom Template in the Azure Portal
2. Click "Build your own template in the editor”
3. Click "Load file" and upload the json file here using these parameters: Customer Name: the name used to deploy your portal (can be found in the update page of your AI Portal) Storage Name: the name you’d like to give your default storage for rag
  
  ragSearch.json
4. Click Review + Create and then Create
Get key for Embeddings model
1. Head to Microsoft Foundry in the Azure Portal
2. In Use with Foundry → Foundry click on the foundry project created with the name embeddings (named {name}-uf-aiportal-embeddings)
3. Click on the button that says Go To Foundry Portal
4. Once in Foundry portal, Under My Assets → Models + endpoints find the text-embedding-3-large deployment and copy the key. You’ll use it in one of the following steps. Also copy the endpoint field (should be something like: https://{NAME}-aiportal-embeddings.cognitiveservices.azure.com/
Add Data Source
1. Back in Microsoft Foundry in the Azure Portal, go to Use with Foundry → AI Search. Click on the search named {name}-rag-aiportal
2. In Search Management → Data Sources click Add a Data Source
3. Your data source should say: Name: Any identifying name (here I chose rag-datasource) Subscription: Subscription of your AI Portal Storage Account: {name}aiportal Blob Container: The name you gave the storage account in step 1 Click Create
4. Once created, click into the datasource, click edit, and turn on Deletion Tracking:

Add Index

In Search Management → Indexes click Add Index (JSON)

Paste this JSON into the index and click create (Feel free to update the name docsindex to better match your storage)

{
  "name": "docsindex",
  "purviewEnabled": false,
  "fields": [
    {
      "name": "id",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "stored": true,
      "sortable": true,
      "facetable": true,
      "key": true,
      "analyzer": "keyword",
      "synonymMaps": []
    },
    {
      "name": "content",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "synonymMaps": []
    },
    {
      "name": "allowedUsers",
      "type": "Collection(Edm.String)",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": true,
      "key": false,
      "synonymMaps": []
    },
    {
      "name": "sourceDoc",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "stored": true,
      "sortable": true,
      "facetable": true,
      "key": false,
      "synonymMaps": []
    },
    {
      "name": "contentVector",
      "type": "Collection(Edm.Single)",
      "searchable": true,
      "filterable": false,
      "retrievable": false,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "dimensions": 3072,
      "vectorSearchProfile": "default-profile",
      "synonymMaps": []
    },
    {
      "name": "parentId",
      "type": "Edm.String",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "synonymMaps": []
    }
  ],
  "scoringProfiles": [],
  "suggesters": [],
  "analyzers": [],
  "normalizers": [],
  "tokenizers": [],
  "tokenFilters": [],
  "charFilters": [],
  "similarity": {
    "@odata.type": "#Microsoft.Azure.Search.BM25Similarity"
  },
  "semantic": {
    "configurations": [
      {
        "name": "default",
        "flightingOptIn": false,
        "rankingOrder": "BoostedRerankerScore",
        "prioritizedFields": {
          "titleField": {
            "fieldName": "id"
          },
          "prioritizedContentFields": [
            {
              "fieldName": "content"
            }
          ],
          "prioritizedKeywordsFields": []
        }
      }
    ]
  },
  "vectorSearch": {
    "algorithms": [
      {
        "name": "default-hnsw",
        "kind": "hnsw",
        "hnswParameters": {
          "metric": "cosine",
          "m": 4,
          "efConstruction": 400,
          "efSearch": 500
        }
      }
    ],
    "profiles": [
      {
        "name": "default-profile",
        "algorithm": "default-hnsw"
      }
    ],
    "vectorizers": [],
    "compressions": []
  }
}

Create a Skillset

In Search Management → Skillsets Click Add a Skillset . Paste the JSON below, Updating these text fields: INSERT_NAME_HERE: Feel free to put any name that defines your skillset (I’d suggest index-skillset. So mine would be docsindex-skillset INSERT_EMBEDDINGS_ENDPOINT_HERE: The endpoint copied from the embeddings provider in step 2. INSERT_YOUR_API_KEY_HERE: The API key copied from the embeddings provider in step 2 INSERT_INDEX_NAME_HERE: The index name from step 4. Click Save

{
  "name": "INSERT_NAME_HERE",
  "description": "Splits document text into token-based chunks and generates embeddings.",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
      "name": "extractText",
      "description": "Extract text from PDFs / Office docs / etc.",
      "context": "/document",
      "parsingMode": "default",
      "dataToExtract": "contentAndMetadata",
      "inputs": [
        {
          "name": "file_data",
          "source": "/document/file_data",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "content",
          "targetName": "content"
        }
      ],
      "configuration": {}
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "SplitSkill",
      "description": "Split document text into pages based on Azure OpenAI tokens.",
      "context": "/document",
      "defaultLanguageCode": "en",
      "textSplitMode": "pages",
      "maximumPageLength": 1400,
      "pageOverlapLength": 150,
      "maximumPagesToTake": 0,
      "unit": "azureOpenAITokens",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content",
          "inputs": []
        },
        {
          "name": "languageCode",
          "source": "/document/language",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "pages"
        }
      ],
      "azureOpenAITokenizerParameters": {
        "encoderModelName": "cl100k_base",
        "allowedSpecialTokens": [
          "[START]",
          "[END]"
        ]
      }
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "EmbeddingSkill",
      "description": "Generate embeddings for each page chunk.",
      "context": "/document/pages/*",
      "resourceUri": "INSERT_EMBEDDINGS_ENDPOINT_HERE",
      "apiKey": "INSERT_YOUR_API_KEY_HERE",
      "deploymentId": "text-embedding-3-large",
      "dimensions": 3072,
      "modelName": "text-embedding-3-large",
      "inputs": [
        {
          "name": "text",
          "source": "/document/pages/*",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "contentVector"
        }
      ]
    }
  ],
  "indexProjections": {
    "selectors": [
      {
        "targetIndexName": "INSERT_INDEX_NAME_HERE",
        "parentKeyFieldName": "parentId",
        "sourceContext": "/document/pages/*",
        "mappings": [
          {
            "name": "content",
            "source": "/document/pages/*",
            "inputs": []
          },
          {
            "name": "contentVector",
            "source": "/document/pages/*/contentVector",
            "inputs": []
          },
          {
            "name": "sourceDoc",
            "source": "/document/metadata_storage_name",
            "inputs": []
          }
        ]
      }
    ],
    "parameters": {
      "projectionMode": "skipIndexingParentDocuments"
    }
  }
}

Add Indexer

In Search Management → Indexers. Click Add Indexer (JSON) Copy and paste the JSON below. Fill in these fields: INSERT_NAME_HERE: Insert a name for your indexer (I would put the name of the datasource-indexer. So here it would be docsindex-indexer INSERT_DATASOURCE_NAME_HERE: Datasource name from Step 4 INSERT_SKILLSET_NAME_HERE: Skillset name from Step 5 INSERT_INDEX_NAME_HERE: Index name from Step 3

{
  "name": "INSERT_NAME_HERE",
  "description": null,
  "dataSourceName": "INSERT_DATASOURCE_NAME_HERE",
  "skillsetName": "INSERT_SKILLSET_NAME_HERE",
  "targetIndexName": "INSERT_INDEX_NAME_HERE",
  "disabled": null,
  "schedule": {
    "interval": "PT5M",
    "startTime": "2025-12-02T18:51:24.38Z"
  },
  "parameters": {
    "batchSize": null,
    "maxFailedItems": null,
    "maxFailedItemsPerBatch": null,
    "configuration": {
      "dataToExtract": "storageMetadata",
      "parsingMode": "default",
      "allowSkillsetToReadFileData": true,
      "failOnUnsupportedContentType": false,
      "failOnUnprocessableDocument": false
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "id",
      "mappingFunction": {
        "name": "base64Encode",
        "parameters": null
      }
    },
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "sourceDoc",
      "mappingFunction": null
    }
  ],
  "outputFieldMappings": [
    {
      "sourceFieldName": "/document/pages/*",
      "targetFieldName": "content",
      "mappingFunction": null
    },
    {
      "sourceFieldName": "/document/pages/*/contentVector",
      "targetFieldName": "contentVector",
      "mappingFunction": null
    },
    {
      "sourceFieldName": "/document/sourceDoc",
      "targetFieldName": "sourceDoc",
      "mappingFunction": null
    }
  ],
  "cache": null,
  "encryptionKey": null
}

In Search Service → Overview, Copy down the Search Endpoint listed under URL
In AIPortal, go to the Admin Panel → RAG Config and Add a New Config: Name: The user friendly name for your RAG in AI Portal Document Return Count: Number of Returned Documents from the RAG Model Embedding Deployment Endpoint: The Endpoint for the Embeddings Deployment (should be something like https://{NAME}-aiportal-embeddings.cognitiveservices.azure.com/ ). Embedding Deployment: text-embedding-3-large Search Endpoint: The Endpoint copied from Step 6 Search Index: What you named your index in Step 3
In Your AI Portal, In Admin → Provider Config, attach the RAG Search to an existing provider, or create a new provider (e.g. copy the GPT-5.4 provider and under RAG Search select your new search)

Your RAG Provider is now Ready to use!

Notes:

To Upload Documents: In the Azure Portal, go to Storage Center → {NAME}aiportal From there, go to Data Storage → Containers and locate the container you’ve named when deploying a custom template. This container is where you can upload documents. The Indexer will scan every 5 minutes for new documents.