Formulation Data Engine
The documentation and compendial-lookup layer that sits beneath every Module 3 formulation narrative, tech transfer BOM, and finished product specification table. It crosswalks excipients to USP-NF / Ph. Eur. / JP monographs, authors release and shelf-life specs, and assembles the Compendial References section.
Short answer. The Formulation Data Engine authors and audits the documentation that backs a formulation — Bill of Materials, compendial references, specification tables — using current USP-NF, Ph. Eur., JP, BP, ICH Q6A, and 21 CFR references. It is not a formulation discovery tool; it documents what the sponsor has developed.
What it produces
- Bill of Materials: 12-column table per component — INN, grade, function, quantity, % w/w, overages with justification, compendial monograph, example suppliers, storage.
- Compendial References section: 8+ row table covering every applicable monograph. Columns: Article / Compendium / Monograph or Chapter / Revision in force / Applied to (DS / DP / Excipient / Method). Off-compendia components are explicitly marked "no compendial monograph" — no placeholder text.
- Finished Product Specifications: 10+ rows with release and shelf-life limits side by side. Covers appearance, identification, assay, impurities and degradation products, dissolution (solid oral), content uniformity, water content (where applicable), microbial limits per USP <61> and <62>, particulate matter for injectables, container-closure integrity where applicable. Every cell is a concrete numeric limit with citation.
- Excipient compatibility: known incompatibility flagging (e.g., aldehyde excipients with secondary-amine APIs, metal stearates with weak-acid actives).
- ICH Q3A/B/C/D impurity crosswalks: thresholds matched to dose and route.
What it does NOT do
- It does not run DoE, solubility screens, or stability chambers. It documents what the bench has produced.
- It does not invent excipient suppliers it cannot cite. "Example Suppliers" column lists well-known GMP-grade vendors for the compendial article, not ad-hoc fabricated vendor names.
- It does not predict dissolution or shelf-life without real uploaded stability data.
Reference basis
- USP-NF (current revision) — General Chapters <11>, <61>, <62>, <85>, <231>, <467>, <711>, <724>, <788>, <790>, <1224>, <1225>, <1226>.
- European Pharmacopoeia (Ph. Eur.) — 2.2, 2.6, 2.9 general methods.
- Japanese Pharmacopoeia (JP) — monographs and general tests.
- British Pharmacopoeia (BP).
- ICH Q6A (specifications), Q3A(R2) (impurities in new drug substances), Q3B(R2) (impurities in new drug products), Q3C(R9) (residual solvents), Q3D(R2) (elemental impurities), Q1A(R2) through Q1E (stability).
- 21 CFR 210/211 GMP; WHO TRS 986 Annex 2 (good manufacturing practices for pharmaceutical products).
Frequently asked questions
What does the Formulation Data Engine do?
It authors BOM tables, matches excipients to compendial monographs (USP-NF, Ph. Eur., JP, BP), generates release and shelf-life specification tables, and assembles the compendial-references section for Module 3 and tech transfer packages.
Which compendia are covered?
USP-NF, Ph. Eur., JP, BP, plus harmonized ICH Q6A specification attributes and 21 CFR 210/211 GMP references.
Does it formulate a new drug product?
No. It is a documentation and compendial-lookup engine, not a discovery tool. For upstream formulation design see the Drug Design Lab — and note that Predict & Optimize routines require real uploaded bench data.
What is the BOM structure?
Component, INN, Grade, Function, Qty per unit, Qty per batch, % w/w, Overages, Compendial Monograph, Example Suppliers, Storage. Minimum 6 rows for simple dosage forms; more for complex products. Every quantity is a concrete number.