Project Eden

Intelligent Retrieval-Augmented Generation System

Overview

What is Project Eden?

Project Eden is an intelligent Retrieval-Augmented Generation (RAG) system that transforms unstructured data files (PDFs, Excel spreadsheets, etc.) into a queryable knowledge base. It combines structured queries, semantic search, and agentic reasoning to answer questions from your data with high accuracy and proper citations.

Unlike traditional RAG systems, Project Eden uses LLM-powered planning to automatically understand your data structure, generate optimal queries, and route questions to the best retrieval strategy—all without manual configuration.

Automatic Schema Detection

Uses LLMs to analyze file structure, detect entities, infer data types, and normalize headers automatically—no manual schema definition required.

Multi-Strategy Search

Four retrieval modes: structured DSL queries, semantic vector search, hybrid filtering, and reciprocal rank fusion—each optimized for different query types.

Intelligent Query Routing

Automatically analyzes questions and selects the optimal retrieval strategy based on query characteristics (structured vs semantic, complexity, etc.).

Built-in Testing & Evaluation

Comprehensive test harness with automatic metrics, LLM-based quality evaluation, comparison tools, and performance regression tracking.

Rich CLI Interface

Full-featured command-line interface with color-coded outputs, progress indicators, file viewers, and detailed debug information.

Type-Safe DSL

Custom domain-specific language for structured queries with schema validation, type-safe accessors, and helpful error messages.

Quick Start

Setup & Process Your First File
# Install dependencies
pnpm install
pnpm build

# Configure environment
# Create .env with:
# OPENAI_API_KEY=sk-...
# DATABASE_URL=postgresql://...

# Initialize database
pnpm eden db:migrate

# Process a file
pnpm eden process fixtures/input/sample.xlsx

# Ask a question
pnpm eden ask "Show me all records with more than 5 bedrooms"

Key Capabilities

DSL Queries

Structured queries for precise filtering. Best for numeric constraints and exact field matching.

Vector Search

Semantic similarity search using embeddings. Perfect for conceptual queries and exploratory search.

Hybrid Filter

DSL pre-filtering combined with vector ranking. Filter first, then rank by semantic similarity.

Hybrid Fusion

Combines DSL and vector results using Reciprocal Rank Fusion for optimal relevance.

Auto Routing

Automatically selects the optimal retrieval strategy based on query characteristics.

Schema Detection

LLM-powered automatic schema detection. No manual configuration required.

Architecture Overview

Project Eden follows a three-phase pipeline: Planning → Ingestion → Persistence, followed by intelligent query routing and answer generation.

Data Flow
Raw File → Plan → Ingest → Persist → Query → Answer
   ↓         ↓       ↓        ↓         ↓       ↓
 XLSX     Schemas Chunks   DB+Vec   Router   LLM

Phase 1: Planning
  • LLM analyzes file structure
  • Detects schemas and entities
  • Generates normalization plan
  • Infers data types and relationships

Phase 2: Ingestion
  • Executes plan with deterministic tools
  • Normalizes headers and data
  • Extracts structured records
  • Generates summaries and evidence

Phase 3: Persistence
  • Generates embeddings (batched)
  • Stores chunks, schemas, vectors
  • Creates database indexes
  • Makes data queryable

Query Phase:
  • Router analyzes question
  • Selects retrieval strategy
  • Executes query (DSL/vector/hybrid)
  • LLM synthesizes answer with citations

Handbook

System Architecture

Core Components

Project Eden is built on a modular architecture with clear separation of concerns. Each component handles a specific aspect of the RAG pipeline.

Planning System

runPlanner(options: PlannerOptions): Promise

Main entry point for the planning phase. Analyzes file structure and generates a processing plan.

Parameters

Name	Type	Description
`fileId`	`string`	Unique identifier for the file being processed
`filePath`	`string`	Path to the file to analyze
`client`	`LlmClient`	LLM client for schema detection
`llmConfig`	`LlmConfig`	LLM configuration settings

Returns

Promise - Contains plan, schema_v0, schema_v1, and quality grade

Examples

const result = await runPlanner({
  fileId: 'abc-123',
  filePath: './data.xlsx',
  client,
  llmConfig
});

Query Routing

selectRetrievalStrategy(question: string, context: SchemaContext): Promise

Analyzes a question and selects the optimal retrieval strategy (DSL, vector, hybrid-filter, or hybrid-fusion).

Parameters

Name	Type	Description
`question`	`string`	Natural language question to answer
`context`	`SchemaContext`	Available schemas and DSL specification

Returns

RetrievalStrategy - Contains mode, dslQuery (if applicable), and semanticQuery (if applicable)

Examples

const strategy = await selectRetrievalStrategy(
  "Which chalet has 5 bedrooms?",
  schemaContext
);
// Returns: { mode: 'dsl', dslQuery: {...} }

Vector Search

vectorSearch(client: PoolClient, options: VectorSearchOptions): Promise

Performs cosine similarity search using pgvector. Returns results ranked by semantic similarity.

Parameters

Name	Type	Description
`client`	`PoolClient`	PostgreSQL client connection
`options`	`VectorSearchOptions`	Search options including queryVector, accountId, filters, and limit

Returns

Promise - Results with similarity scores and ranks

Examples

const results = await vectorSearch(client, {
  accountId: 'user-123',
  queryVector: embeddingVector,
  limit: 20,
  schemaIds: ['accommodation']
});

Database Schema

Core Tables

Project Eden uses PostgreSQL with the pgvector extension for vector similarity search. The schema is designed for efficient querying with JSONB for flexible data storage and GIN indexes for fast lookups.

files Table

Column	Type	Description
id	UUID	Primary key, auto-generated
account_id	UUID	Tenant identifier for multi-tenancy
name	TEXT	Original filename
mime	TEXT	MIME type (e.g., 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
bytes	BYTEA	File content (optional, for binary storage)
sha256	BYTEA	SHA-256 hash for deduplication
created_at	TIMESTAMPTZ	Creation timestamp

normalized_records Table

Column	Type	Description
id	UUID	Primary key, auto-generated
account_id	UUID	Tenant identifier
file_id	UUID	Foreign key to files.id
schema_id	TEXT	Schema identifier (e.g., 'accommodation')
kind	TEXT	Entity type (e.g., 'property', 'amenity')
search_data	JSONB	Normalized record data (indexed with GIN)
evidence	JSONB	Original source data and provenance
embedding	VECTOR(1024)	Embedding vector for semantic search
created_at	TIMESTAMPTZ	Creation timestamp

file_plans Table

Column	Type	Description
file_id	UUID	Primary key, references files.id
plan	JSONB	Processing plan (tool execution steps)
schema_v0	JSONB	Initial schema detected by planner
schema_v1	JSONB	Materialized execution schema
quality_grade	TEXT	Planner confidence: 'A', 'B', or 'C'
created_at	TIMESTAMPTZ	Creation timestamp

Indexes

Efficient indexing is crucial for performance. Project Eden uses several index types optimized for different query patterns.

Index Definitions
-- GIN index for JSONB containment queries (DSL queries)
CREATE INDEX nr_gin ON normalized_records 
  USING GIN (search_data jsonb_path_ops);

-- Composite indexes for filtering
CREATE INDEX nr_file_kind ON normalized_records(file_id, kind);
CREATE INDEX nr_account_kind ON normalized_records(account_id, kind);

-- IVFFlat index for vector similarity search
CREATE INDEX nr_vec ON normalized_records 
  USING ivfflat (embedding);

-- Unique index for file deduplication
CREATE UNIQUE INDEX files_account_hash_idx 
  ON files(account_id, sha256);

Extensions

Required PostgreSQL extensions must be installed before running migrations.

-- Required extensions
CREATE EXTENSION IF NOT EXISTS pgcrypto;  -- For UUID generation
CREATE EXTENSION IF NOT EXISTS vector;    -- For vector similarity search
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;  -- For fuzzy string matching

DSL Query Language

Overview

The Domain-Specific Language (DSL) provides a type-safe, composable query language for filtering JSONB records in PostgreSQL. It compiles to optimized SQL using JSONB operators and GIN indexes.

Type Accessors

All field references must be wrapped in a type accessor that matches the schema definition. This enables schema validation and proper SQL casting.

Type Accessors

Accessor	Schema Types	Example
string("path")	string, datetime, date, time	string("property.name")
number("path")	number	number("property.size_m2")
boolean("path")	boolean	boolean("property.has_wifi")
json("path")	json	json("property.amenities")

Comparators

Equality & Existence

eq(accessor, value)

Exact match. Matches NULL or missing fields when value is null.

Examples

eq(string("property.country"), "France")

eq(number("property.bedrooms"), 4)

eq(string("property.name"), null)

Numeric Comparisons

gt(accessor, value)

Greater than

Examples

gt(number("property.size_m2"), 100)

Set Membership

in(accessor, array(values...))

Value in list. Uses PostgreSQL ANY() for efficient comparison.

Examples

in(string("property.country"), array("France", "Switzerland", "Austria"))

in(number("property.bedrooms"), array(4, 5, 6))

String Pattern Matching

contains(accessor, substring)

Case-insensitive substring match. Compiles to SQL ILIKE ‘%substring%’.

Examples

contains(string("property.name"), "luxury")

contains(string("room_amenity.question"), "cleaning")

Logical Operators

Combine filters with logical operators for complex queries.

Logical Operators
and(...filters)  # All must match
or(...filters)   # At least one must match
not(filter)      # Negation

# Example: Large chalets in Three Valleys
and(
  eq(string("property.building_type"), "Chalet"),
  eq(string("property.skiarea"), "Three Valleys"),
  ge(number("property.size_m2"), 200)
)

# Example: Complex nested logic
and(
  or(
    eq(string("property.building_type"), "Chalet"),
    eq(string("property.building_type"), "Apartment")
  ),
  ge(number("property.size_m2"), 200),
  not(exists(string("property.shared_facilities")))
)

Sorting and Limiting

Control result ordering and count with sort() and limit() functions.

Sort and Limit
# Sort by field (ascending or descending)
sort(filter, "property.size_m2", "desc")

# Limit results (default: 5, max: 20)
limit(filter, 10)

# Combine: Top 3 largest properties
limit(
  sort(
    gt(number("property.size_m2"), 0),
    "property.size_m2",
    "desc"
  ),
  3
)

Schema Validation

All queries are validated against schema_v1 before execution:

Field existence: Referenced paths must exist in schema
Type compatibility: Type accessor must match schema type
Early failure: Errors caught before database query

Performance Optimization

Different operators have different performance characteristics:

Optimized Operators
# ✅ Fast: Uses GIN index
 eq(string("property.country"), "France")
 exists(string("property.sauna"))
 in(string("country"), array("FR", "CH"))

# ⚠️ Slower: Requires JSONB extraction and casting
 gt(number("property.size"), 100)  # Sequential scan

# 💡 Optimization tip: Filter first with indexed operators
# ✅ Good: Filter by country first (GIN), then scan for size
and(
  eq(string("property.country"), "France"),  # GIN index
  ge(number("property.size_m2"), 200)        # Sequential scan on subset
)

# ❌ Slower: Size scan first (no index), then filter
and(
  ge(number("property.size_m2"), 200),       # Full table scan
  eq(string("property.country"), "France")
)

Query Routing System

Strategy Selection

The query router analyzes each question using an LLM to determine the optimal retrieval strategy. The decision is based on query characteristics, available schemas, and query complexity.

Retrieval Strategies

Strategy	When Used	Best For	Example Query
DSL	Structured constraints, numeric filters	Precise filtering on known fields	"properties with exactly 5 bedrooms in France"
Vector	Semantic/conceptual queries	Descriptions, open-ended questions	"cozy mountain retreat", "what are the sauna policies?"
Hybrid-Filter	Hard constraint + semantic refinement	MUST be in France, find luxury ones	"luxury" among "bedrooms >= 4" results
Hybrid-Fusion	Mixed structured + semantic	6+ people with microwave and TV	"properties in Méribel with hot tub"

Decision Criteria

The router considers several factors when selecting a strategy:

Numeric Constraints

Presence of counts, measurements, or comparisons (bedrooms >= 5, price < 500) suggests DSL mode

Structural vs Semantic

Questions about specific fields or values favor DSL; conceptual questions favor vector search

Complexity

Questions combining both structured filters and semantic concepts use hybrid approaches

Schema Availability

DSL queries require matching fields in available schemas; vector search works without schema knowledge

Reciprocal Rank Fusion (RRF)

The hybrid-fusion strategy uses Reciprocal Rank Fusion to combine results from DSL and vector searches. RRF provides a robust way to merge rankings without requiring score normalization.

RRF Algorithm
# RRF Score Formula
RRF(rank) = 1 / (k + rank)

# Where:
# - k = constant (default: 60)
# - rank = position in result set (1-indexed)

# Combined Score:
# - Each result appears in DSL ranking: score_dsl = 1 / (k + rank_dsl)
# - Each result appears in vector ranking: score_vec = 1 / (k + rank_vec)
# - Final score = score_dsl + score_vec

# Results are sorted by combined RRF score, highest first

Processing Pipeline

Planning Phase

The planner uses LLMs to analyze file structure and generate a deterministic processing plan. This phase is critical for understanding data without manual configuration.

Schema Detection

Identifies entities (e.g., 'accommodation', 'amenity'), their attributes, and relationships between entities.

Type Inference

Determines data types for each attribute: string, number, boolean, date, datetime, time, or json.

Header Normalization

Maps messy headers (e.g., 'Bedrooms', 'bedrooms', 'Bed Rms') to normalized field names (e.g., 'bedrooms').

Plan Generation

Creates an ordered list of tool executions (normalize_headers, infer_orientation, segment_text, etc.) with parameters.

Quality Assessment

Assigns a quality grade (A/B/C) based on confidence in schema detection and plan correctness.

Ingestion Phase

The executor runs the plan deterministically, transforming raw data into normalized records.

Execution Tools

normalize_headers(tableIndex: number, mappings: Record)

Normalizes column headers according to planner mappings. Maps original headers to normalized attribute IDs.

Parameters

Name	Type	Description
`tableIndex`	`number`	Index of table in tables array
`mappings`	`Record`	Header mappings: original header → normalized ID

Persistence Phase

Normalized records are enriched with embeddings and persisted to the database.

Chunk Enrichment

LLM generates summaries and extracts evidence from raw chunks for better retrieval context

Embedding Generation

Batched embedding generation using OpenAI's text-embedding-3-small model (1024 dimensions)

Database Insertion

Batched inserts with transaction support for atomicity

Index Creation

GIN indexes for JSONB queries, IVFFlat indexes for vector similarity search

CLI Commands Reference

Database Commands

Database Management

pnpm eden db:migrate

Run database migrations to set up or update the database schema.

Returns

void

Throws

Error if migrations fail or pgvector extension not installed

Data Processing

Processing Pipeline

pnpm eden process [--output ] [--force-plan] [--debug]

End-to-end processing: plan generation, ingestion, and persistence in one command.

Parameters

Name	Type	Description
`file`	`string`	Path to file (PDF, XLSX, etc.)
`--output` optional	`string`	Output directory for artifacts
`--force-plan` optional	`boolean`	Force regeneration of plan
`--debug` optional	`boolean`	Show detailed progress

Examples

pnpm eden process fixtures/input/accommodations.xlsx

pnpm eden process data.pdf --debug

Query Commands

Query Execution

pnpm eden ask [--file-ids ] [--schema-ids ] [--limit ] [--debug]

Ask a natural language question with automatic strategy selection. Returns natural language answer with citations.

Parameters

Name	Type	Description
`question`	`string`	Natural language question
`--limit` optional	`number`	Maximum results (default: 20)
`--debug` optional	`boolean`	Show strategy selection reasoning

Examples

pnpm eden ask "Which chalet can host 10 people?"

pnpm eden ask "Do you have accommodations with a jacuzzi?" --limit 5

Testing & Evaluation

Testing Commands

pnpm eden test --input [--output ] [--limit ] [--debug]

Run a test suite with a list of questions. Generates comprehensive results with statistics and metrics.

Parameters

Name	Type	Description
`--input`	`string`	Path to questions JSON file (required)
`--output` optional	`string`	Output path for results

Examples

pnpm eden test --input fixtures/test/test-questions.json

Data Management

Data Operations

pnpm eden repair [--schema ] [--file-id ] [--batch-size ] [--force]

Repair and regenerate embeddings for existing data. Useful after changing embedding models.

Parameters

Name	Type	Description
`--schema` optional	`string`	Specific schema to repair
`--force` optional	`boolean`	Force regeneration even if embeddings exist

LLM Integration

Configuration

Project Eden uses a flexible LLM configuration system that supports multiple providers and models. Configuration is defined in config/llm.json.

LLM Configuration Schema
{
  "providers": {
    "openai": {
      "apiKeyEnvVar": "OPENAI_API_KEY"
    },
    "groq": {
      "apiKeyEnvVar": "GROQ_API_KEY"
    }
  },
  "tasks": {
    "planner": {
      "provider": "openai",
      "model": "gpt-4",
      "maxOutputTokens": 8000
    },
    "answer": {
      "provider": "openai",
      "model": "gpt-4",
      "maxOutputTokens": 4000
    },
    "embed": {
      "provider": "openai",
      "model": "text-embedding-3-small"
    }
  },
  "pricing": {
    "openai": {
      "gpt-4": {
        "inputPerMillion": 30.0,
        "outputPerMillion": 60.0
      },
      "text-embedding-3-small": {
        "perMillion": 0.02
      }
    }
  },
  "temperature": 0.2
}

LLM Tasks

LLM Task Types

Task	Purpose	Model	Output
planner	Schema detection and plan generation	gpt-4	Plan JSON with schemas
repair	Data repair and validation	gpt-4	Repaired records
classifier	Query classification	gpt-4	Retrieval strategy
answer	Answer synthesis	gpt-4	Natural language answer with citations
embed	Embedding generation	text-embedding-3-small	1024-dimension vectors

Retry & Error Handling

Project Eden implements robust retry logic for LLM API calls with exponential backoff and configurable retry policies.

Retry Policy
interface RetryPolicy {
  maxRetries: number;        // Default: 3
  initialDelayMs: number;    // Default: 1000
  maxDelayMs: number;        // Default: 10000
  backoffMultiplier: number; // Default: 2
}

// Retries on:
// - Network errors
// - Rate limit errors (429)
// - Server errors (5xx)
// - Timeout errors

Testing & Evaluation

Testing Framework

Project Eden includes a comprehensive testing framework for regression testing and quality assurance.

Test Input Format

Questions JSON Format
{
  "questions": [
    "Can you recommend a chalet for 10 people?",
    "Which property is best for families?",
    "Do any chalets offer mountain views?"
  ]
}

Automatic Metrics

The test framework automatically computes several metrics without requiring LLM calls:

Answer Similarity

Levenshtein distance between answers for consistency tracking

Mode Consistency

Tracks which retrieval strategy was used for each question

Performance Benchmarking

Timing breakdown (planning, retrieval, answer generation) and cost tracking

Confidence Distribution

Analysis of high/medium/low confidence answer distribution

LLM-Based Evaluation

Optional LLM evaluation provides deeper quality assessment (requires additional API costs):

Evaluation Dimensions

Dimension	Description	Scale
Correctness	Factual accuracy of the answer	1-5
Completeness	Whether all relevant information is included	1-5
Relevance	How well the answer addresses the question	1-5
Citation Quality	Accuracy and helpfulness of source citations	1-5

Comparison Reports

The compare command generates detailed comparison reports between test runs, including:

Performance Comparison

Timing deltas, cost differences, and throughput metrics

Answer Consistency

Similarity scores and mode changes between runs

Quality Analysis

LLM evaluation scores with comparative reasoning

Recommendations

Actionable suggestions for improving system performance

Implementation Details

Technology Stack

Core Technologies

Technology	Purpose	Version
TypeScript	Primary language	5.9.2
Node.js	Runtime	18+
PostgreSQL	Database	14+
pgvector	Vector similarity	0.2.1
OpenAI API	LLM & Embeddings	5.23.0
Zod	Schema validation	3.24.1
yargs	CLI framework	18.0.0
pdfjs-dist	PDF parsing	5.4.149
xlsx	Excel parsing	0.20.3

Project Structure

Directory Layout
project-eden/
├── src/
│   ├── cli/              # CLI commands and UI
│   │   ├── commands/      # Individual command implementations
│   │   ├── ui/            # Terminal UI components
│   │   └── viewers/       # File viewer implementations
│   ├── db/                # Database client and migrations
│   ├── dsl/               # DSL parser and query engine
│   │   ├── compile.ts     # DSL to SQL compiler
│   │   ├── parseJson.ts   # JSON DSL parser
│   │   ├── stringParser.ts # String DSL parser
│   │   └── validator.ts   # Schema validation
│   ├── executor/          # Plan execution and ingestion
│   │   ├── executePlan.ts  # Tool execution orchestrator
│   │   ├── runIngestion.ts # Main ingestion pipeline
│   │   └── semantic/      # LLM-powered enrichment
│   ├── llm/               # LLM client and tasks
│   │   ├── client.ts      # LLM client abstraction
│   │   ├── prompts/       # Prompt templates
│   │   ├── schemas/       # Zod schemas for LLM responses
│   │   └── tasks/         # Task creation utilities
│   ├── loader/            # File loaders (PDF, XLSX)
│   ├── persist/           # Database persistence layer
│   │   ├── embeddings.ts # Embedding generation
│   │   └── repositories/ # Data access layer
│   ├── planner/           # Agentic planning system
│   │   ├── runPlanner.ts  # Main planner entry point
│   │   ├── schema.ts      # Schema detection logic
│   │   └── toolCatalog.ts # Available execution tools
│   ├── router/            # Query routing and answer generation
│   │   ├── askPipeline.ts # Main ask command pipeline
│   │   └── executeRetrieval.ts # Strategy execution
│   ├── tools/             # Execution tools
│   │   ├── normalizeHeaders.ts
│   │   ├── inferOrientation.ts
│   │   ├── segmentText.ts
│   │   └── extractTable.ts
│   └── vector/            # Vector search implementation
│       ├── vectorSearch.ts
│       ├── hybridFilter.ts
│       └── hybridFusion.ts
├── db/migrations/         # SQL migrations
├── fixtures/             # Sample data and test files
└── out/                  # Output artifacts

Type Safety

Project Eden is built with strict TypeScript and uses Zod for runtime validation. All LLM responses are validated against Zod schemas before use.

Schema Validation Pattern
// Define Zod schema
const plannerResponseSchema = z.object({
  plan: z.array(plannerToolSchema),
  schema_v0: plannerSchemaV0,
  quality_hypothesis: z.enum(['A', 'B', 'C'])
});

// Validate LLM response
const validatedResponse = plannerResponseSchema.parse(response.data);

// Type-safe usage
type PlannerResponse = z.infer;

Error Handling

The system uses custom error types for better error messages and debugging:

Error Types
// DSL validation errors
class DslValidationError extends Error {
  constructor(
    message: string,
    public field?: string,
    public suggestions?: string[]
  ) {}
}

// LLM errors
class LlmError extends Error {
  constructor(
    message: string,
    public code: string,
    public retryable: boolean
  ) {}
}

// Planner errors
class PlannerError extends Error {
  constructor(
    message: string,
    public qualityGrade: 'A' | 'B' | 'C'
  ) {}
}

Performance Considerations

Key performance optimizations:

Batched Embeddings

Embeddings generated in batches (default: 50) to reduce API calls and improve throughput

GIN Indexes

JSONB containment queries use GIN indexes for fast lookups

IVFFlat Indexes

Vector similarity search uses IVFFlat indexes for approximate nearest neighbor search

Parallel Query Execution

Hybrid-fusion executes DSL and vector queries in parallel before merging

Connection Pooling

PostgreSQL connection pooling for efficient database access

Limitations & Future Work

Current Limitations

Single Language Support

Currently optimized for English. Multi-language support planned.

PostgreSQL Only

Requires PostgreSQL with pgvector. Additional vector databases planned.

OpenAI Embeddings Only

Uses OpenAI embeddings. Local embedding models (sentence-transformers) planned.

No Incremental Updates

Full re-ingestion required for updates. Delta processing planned.

Limited PDF Support

Basic PDF parsing. Advanced table extraction and complex layouts planned.

Planned Improvements

Multi-Language Support

Support for multiple languages in planning, querying, and answers

Additional Vector Databases

Support for Pinecone, Weaviate, Qdrant, and other vector databases

Local Embedding Models

Integration with sentence-transformers for local embedding generation

Streaming Answers

Stream answers as they're generated for better UX

Web Interface

Browser-based UI for querying and data management

Incremental Updates

Delta processing for updating existing data without full re-ingestion

Advanced PDF Parsing

Better table extraction, image OCR, and complex layout handling

Multi-Modal Support

Support for images, audio, and other non-text content

Github Repository Documentation Original Whitepaper