Data Dictionary

Relational Schema Specification

INCIDB ships as a 4-table normalized relational schema exported into pipe-delimited UTF-8 CSV (`|`) and high-performance Apache Parquet (`pyarrow`) archives.

1. brands.csv / brands.parquet

Contains canonical brand identities and ethical certifications across 5,994 global cosmetic lines.

Field NameData TypeDescriptionSample Value
brand_idINTEGERPrimary key unique identifier1
nameSTRINGStandardized commercial brand nameLaneige
country_of_originSTRINGCountry of formulation or headquartersSouth Korea
is_cruelty_freeINTEGERCruelty-free certification (1 = Yes, 0 = No)0
is_veganINTEGER100% vegan formulation line (1 = Yes, 0 = No)0

2. ingredients.csv / ingredients.parquet

Contains 57,181 canonical chemical compounds enriched with toxicology, allergen, and regulatory data from 8 international bodies.

Field NameData TypeDescriptionSample Value
ingredient_idINTEGERPrimary key unique identifier1
inci_nameSTRINGInternational Nomenclature Cosmetic Ingredient nameDIISOSTEARYL MALATE
cas_numberSTRINGChemical Abstracts Service registry number50-81-7
common_nameSTRINGPlain English common compound nameDiisostearyl Malate
primary_functionSTRINGFunctional category (CosIng taxonomy)Skin-Conditioning Agent
comedogenic_ratingFLOATPore-clogging likelihood scale (0.0 - 5.0)0.0
ewg_hazard_scoreFLOATEWG Skin Deep hazard score (1.0 low - 10.0 high)1.0
is_common_allergenINTEGERFDA MoCRA contact dermatitis allergen (1/0)0
is_fungal_acne_triggerINTEGERMalassezia folliculitis feeding trigger (1/0)0
descriptionSTRINGScientific or regulatory monograph summaryStandard cosmetic emollient...
cir_safety_verdictSTRINGCosmetic Ingredient Review scientific verdictSafe as used
fda_warningSTRINGUS FDA OTC warning monographNo warning required
cancer_hazard_flagINTEGERCarcinogenic hazard alert (1 = Flagged, 0 = Clean)0
endocrine_hazard_flagINTEGEREndocrine disruptor alert (1 = Flagged, 0 = Clean)0

3. products.csv / products.parquet

Contains 19,847 commercial cosmetic formulations harvested from prestige retailers, K-Beauty standards, and clinical registries.

Field NameData TypeDescriptionSample Value
product_idINTEGERPrimary key unique identifier67455
brand_idINTEGERForeign key referencing brands.brand_id46067
barcode_eanSTRINGUniversal GTIN / EAN-13 barcodeP309308
nameSTRINGFull commercial product titleGood Genes Lactic Acid Treatment
categorySTRINGPrimary skincare taxonomy categorySkincare
retail_price_usdFLOATRetail price in USD85.0
raw_ingredient_textSTRINGFull unbroken ingredient listing from package labelBotanical Blend [Water/Eau/Aqua...
created_atSTRINGIngestion timestamp (ISO 8601)2026-07-02 14:01:48

4. product_ingredients.csv / product_ingredients.parquet

Relational junction table containing 330,088 mappings preserving exact label order and concentration index.

Field NameData TypeDescriptionSample Value
product_idINTEGERForeign key referencing products.product_id67455
ingredient_idINTEGERForeign key referencing ingredients.ingredient_id12404
position_indexINTEGERExact label position index (1 = highest concentration)1
concentration_percentageFLOATExplicit clinical concentration percentage (when disclosed)10.0