Developer Portal

INCIDB Technical Documentation

Comprehensive guide to data architecture, regulatory integrations, and analytical integration with Python PyArrow and DuckDB.

1. Production Scale Metrics

INCIDB unifies high-precision chemical composition data across four core tables:

2. 8-Source Regulatory & Clinical Architecture

Every ingredient profile is cross-referenced against 8 international regulatory monographs:

3. Audit & Verification Results

All database releases pass rigorous 1-to-1 physical parity and string sanitization audits:

4. Quick Start — Python PyArrow & Pandas

Load high-performance Parquet datasets directly into memory for filtering toxicological flags:

import pyarrow.parquet as pq
import pandas as pd

# Load formulations and canonical ingredients
products = pq.read_table('data/exports/parquet/products.parquet').to_pandas()
ingredients = pq.read_table('data/exports/parquet/ingredients.parquet').to_pandas()

# Filter endocrine disruptors and carcinogenic flagged ingredients
high_risk = ingredients[(ingredients['cancer_hazard_flag'] == 1) | (ingredients['endocrine_hazard_flag'] == 1)]
print(f"Flagged {len(high_risk)} toxicological hazard compounds across {len(products):,} formulations.")

5. Quick Start — DuckDB In-Memory SQL

Execute analytical joins directly over flat Parquet files without database servers:

import duckdb

query = """
SELECT 
    b.name AS brand,
    p.name AS product,
    COUNT(pi.ingredient_id) AS total_ingredients,
    SUM(i.is_common_allergen) AS mocra_allergens
FROM 'data/exports/parquet/products.parquet' p
JOIN 'data/exports/parquet/brands.parquet' b ON p.brand_id = b.brand_id
JOIN 'data/exports/parquet/product_ingredients.parquet' pi ON p.product_id = pi.product_id
JOIN 'data/exports/parquet/ingredients.parquet' i ON pi.ingredient_id = i.ingredient_id
GROUP BY b.name, p.name
HAVING SUM(i.is_common_allergen) > 0
ORDER BY mocra_allergens DESC
LIMIT 5;
"""

print(duckdb.query(query).to_df())