Data Sources, Licensing & Citations

The Rareytec platform is built on openly-licensed software and data. This page lists every external component we use, its license, and the citation it requires.

The Rareytec platform integrates only open-licensed software (MIT, BSD, Apache-2.0, PSF) and openly-licensed or public-domain data (CC-BY-4.0, U.S. public domain). The bulk of the property values served are Rareytec’s own predictions over public chemical identities. A first-class “source” field carries provenance on every fact, and any individual source can be switched off for a given delivery. Machine-learning–derived parameters are only used when their training data is itself unrestricted, because an ML output inherits the license of the data it was trained on. An installation ships with a clean open baseline preinstalled — the CHAOS σ-profiles, the NIST/TRC ThermoML experimental data, and Rareytec’s own estimations — so it is useful on day one. Any further databases an operator is entitled to (e.g. the Dortmund Data Bank, DIPPR, Reaxys) are connected on an in-house installation and accessed live under that operator’s own license; their data is never copied into, or redistributed by, the platform.

Software components

ComponentLicenseUse
Python PSF License Runtime language
RDKit BSD-3-Clause Cheminformatics: structure parsing, InChIKey, formula, 2D depiction
ugropy MIT UNIFAC / Dortmund group fragmentation from SMILES (ILP set-cover assignment)
thermo / chemicals / fluids MIT Build-time verified source for published UNIFAC parameters and independent validation oracle (Caleb Bell)
NumPy BSD-3-Clause Numerics
Flask / Jinja2 / Werkzeug BSD-3-Clause Web framework + templating + auth hashing
Bootstrap MIT User-interface layout
Plotly.js MIT σ-profile and VLE charts
PubChemPy MIT Chemical-name resolution via PubChem

Component & property data

CHAOS database
CC-BY-4.0 — included with attribution

σ-profiles and σ-surfaces (COSMO-SAC cavity segments), dipole moments and molecular descriptors for the 53,078-component registry. Bundled (preinstalled) with the platform.

Cite: Jirasek et al., “CHAOS — A Consistent Large-scale Database for Sigma-Profiles and Other Molecular Descriptors,” arXiv:2511.19002; Zenodo doi:10.5281/zenodo.17691924. Licensed under CC-BY-4.0 (creativecommons.org/licenses/by/4.0/). Modified by Rareytec: indexed and reformatted into the component registry, with σ-profiles and cavity segments extracted for property prediction and 3-D visualisation. link

NIST / TRC ThermoML Archive
U.S. public domain (NIST) — included with attribution

Experimental thermophysical data (VLE, density, viscosity, activity coefficients, excess properties, …) shown alongside predictions and used for validation. Bundled (preinstalled) with the platform; every data set keeps its original literature citation and DOI.

Cite: Data from the NIST/TRC ThermoML Archive (doi:10.18434/mds2-2422), a work of the U.S. Government not subject to copyright in the United States and distributed by NIST for reuse. Archive/format: Chirico, Frenkel, Diky et al., “ThermoML: An XML-Based Approach for Storage and Exchange of Experimental and Critically Evaluated Thermophysical and Thermochemical Property Data,” J. Chem. Eng. Data. The underlying measurements are credited to their respective authors through the per-data-set citation shown with each set. link

PubChem
Public domain (U.S. NLM / NCBI)

Chemical identity — resolving names to structures.

Cite: Kim et al., “PubChem 2023 update,” Nucleic Acids Res. 2023, 51, D1373. link

Model parameters

UNIFAC (original)
Open literature

Activity-coefficient model — original parameter set.

Cite: Hansen, Rasmussen, Fredenslund, Schiller, Gmehling, Ind. Eng. Chem. Res. 1991, 30, 2352–2355. Fredenslund, Jones, Prausnitz, AIChE J. 1975, 21, 1086.

Modified UNIFAC (Dortmund)
Published (free companion download)

Activity-coefficient model — published modified-UNIFAC (Dortmund) parameter matrix.

Cite: Constantinescu & Gmehling, “Further Development of Modified UNIFAC (Dortmund): Revision and Extension 6,” J. Chem. Eng. Data 2016, 61, 2738. Also in Gmehling, Kolbe, Kleiber, Rarey, “Chemical Thermodynamics for Process Simulation,” free companion site chemthermo.ddbst.com.

NIST-modified UNIFAC
NIST (published, public)

Activity-coefficient model — NIST critically-evaluated parameter set (89 main groups, 984 interactions).

Cite: Kang, Diky, Frenkel, “New modified UNIFAC parameters using critically evaluated phase equilibrium data,” Fluid Phase Equilibria 388 (2015) 128–141; doi:10.1016/j.fluid.2014.12.042.

UNIFAC 2.0
Open (free ancillary data)

Activity-coefficient model — machine-learning completion of the published original-UNIFAC parameter table.

Cite: Hayer, Wendel, Mandt, Hasse, Jirasek, “Advancing thermodynamic group-contribution methods by machine learning: UNIFAC 2.0,” Chemical Engineering Journal 504 (2025) 158667; arXiv:2408.05220.

Group fragmentation (ugropy)
MIT

SMILES → UNIFAC / Dortmund subgroup assignment.

Cite: ugropy: an optimal functional-group identification package, Ind. Eng. Chem. Res. 2025; doi:10.1021/acs.iecr.5c02552.

Importable under your own license

These are not bundled and never redistributed by Rareytec. If your organisation already holds a license, they can be connected on your own installation and read live under that license — their data never enters our database or a delivered build.

NIST ThermoData Engine (TDE)
Importable — requires your own NIST TDE license

If your organisation owns a TDE license, its experimental SOURCE archive (compounds, references, pure & binary measurements) can be imported as a private data source on your installation. Import feasibility has been verified — TDE's database is technically readable (older versions are a Microsoft Access/Jet file; newer ones via the Access database engine).

Import feasibility verified.

DIPPR (AIChE / BYU)
Importable — requires your own DIPPR license

If your organisation is entitled to DIPPR, its pure-component constants and temperature-dependent correlations can be connected as a private source under your license.

Not yet tested — import from the DIPPR distribution format has not been verified.

Dortmund Data Bank (DDB)
Importable — requires your own DDB / DDBST license

If your organisation holds a DDB license, its experimental phase-equilibrium, excess-property and pure-component data can be connected as a private source under your license — read live on your installation and shown alongside our predictions.

Not yet tested — import from the DDB distribution format has not been verified, and connection is subject to the terms of your DDB license.

Rareytec’s own contribution

  • Rarey–Nannoolal group-contribution estimations (normal boiling temperature, vapor pressure, critical properties) — our own scientific work. Cite: Nannoolal, Rarey, Ramjugernath, Cordes, Fluid Phase Equilibria 226 (2004) 45; and subsequent papers.
  • COSMO-SAC mixture predictions computed from CHAOS σ-profiles.
  • All platform predictions and the curated registry schema are Rareytec’s own work.

Sources we deliberately do not use

Dortmund Data Bank (DDB), DETHERM, DIPPR, Reaxys

Proprietary, paywalled commercial databases whose licenses restrict USE — not merely redistribution. To keep the platform’s data provenance clean and unencumbered we do not use them for any purpose, including internal development or validation. DIPPR data, where a user is entitled to it, is accessed only per-user under that user’s own license and is never stored in our database.

Modified UNIFAC (Dortmund) — consortium-only complete matrix

We use the PUBLISHED modified-UNIFAC (Dortmund) parameter table (J. Chem. Eng. Data 2016 revision; free companion download of the Gmehling et al. textbook) — see Model parameters above. What we do NOT use is the consortium-only complete/current commercial matrix distributed by DDBST to members, nor the Dortmund Data Bank experimental databank.

Last reviewed 2026-06-16. If you believe a source is mis-attributed, contact Rareytec.