GET
/
api
/
knowledgebases
/
{knowledgebaseId}
/
datasets
/
{datasetId}
/
chunks
Preview Chunks from Knowledgebase
curl --request GET \
  --url https://app.pathors.com/api/knowledgebases/{knowledgebaseId}/datasets/{datasetId}/chunks \
  --header 'x-api-key: <x-api-key>'
{
  "id": "<string>",
  "content": "<string>",
  "contentLength": 123,
  "isEnabled": true
}

Preview Chunks from Knowledgebase

Retrieve all text chunks that were extracted from a specific dataset in a knowledgebase. This is useful for previewing how your content was processed and chunked.

Endpoint

GET /api/knowledgebases/{knowledgebaseId}/datasets/{datasetId}/chunks

Path Parameters

knowledgebaseId
string
required
The unique identifier of the knowledgebase containing the dataset
datasetId
string
required
The unique identifier of the dataset to retrieve chunks from

Headers

x-api-key
string
required
Your project API key for authentication
Example request:
curl -X GET \
  -H "x-api-key: your_api_key" \
  https://your-domain.com/api/knowledgebases/kb_abc123/datasets/dataset_xyz789/chunks

Response

Returns an array of text chunks extracted from the specified dataset.
id
string
Unique identifier for the chunk
content
string
The text content of the chunk
contentLength
number
Length of the chunk content in characters
isEnabled
boolean
Whether the chunk is enabled for search (can be disabled to exclude from results)
Example response:
[
  {
    "id": "chunk_abc123",
    "content": "This is the first chunk of text extracted from the document. It contains information about the company's mission and values.",
    "contentLength": 127,
    "isEnabled": true
  },
  {
    "id": "chunk_def456",
    "content": "The second chunk discusses our product offerings and how they solve customer problems in the market.",
    "contentLength": 98,
    "isEnabled": true
  }
]

Error Responses

Status CodeDescription
400Missing knowledgebase ID or dataset ID
401Invalid API key
403Dataset does not belong to the specified knowledgebase
404Knowledgebase, dataset, or chunks not found
500Internal server error

Understanding Chunks

Chunking Process

  • Documents are automatically split into smaller, searchable pieces
  • Chunk size is determined by the knowledgebase configuration
  • Overlap between chunks ensures context continuity
  • Processing preserves semantic meaning across chunk boundaries

Chunk Properties

  • Content: The actual text extracted from the document
  • Length: Character count helps understand chunk size
  • Status: Enabled chunks participate in search, disabled ones don’t

Search Integration

  • Each chunk becomes a searchable unit in semantic search
  • Chunks are converted to embeddings for similarity matching
  • Search queries return the most relevant chunks across all datasets

Use Cases

Content Review

  • Preview how your documents were processed
  • Verify important information was extracted correctly
  • Check for any processing errors or formatting issues

Search Optimization

  • Understand how content is structured for search
  • Identify chunks that might need better context
  • Optimize document structure for better chunking

Troubleshooting

  • Debug why certain content isn’t appearing in search results
  • Verify chunk content matches expectations
  • Check if chunks are properly enabled

Usage Notes

  • Only chunks from the specified dataset are returned
  • Results are ordered by their position in the original document
  • Large datasets may return many chunks
  • Chunk content reflects the processed and cleaned text, not the raw file content

Migration from Old Endpoint

If you’re migrating from the deprecated /api/datasets/{filename}/chunks endpoint:
  1. Get your knowledgebase ID using Get Knowledgebases
  2. Get the dataset ID from Get Datasets in Knowledgebase
  3. Update your API calls to use both IDs in the URL path
  4. The response format remains the same