Data & Assets
Languages
The language dataset adds three things that were missing from the core country data:
- A full base-language catalog with canonical codes such as
en,fr,zh,gsw, andzza - Locale variants such as
en-GB,es-419,sr-Latn, andca-ES-valencia - Estimated speaker counts plus official-language country mappings so projects can rank, filter, and group languages without maintaining their own lists
Import
import {
languages,
languageVariants,
allLanguages,
officialLanguagesByCountry,
canonicalizeLanguageCode,
getLanguageByCode,
getLanguageName,
getLanguageVariants,
searchLanguages,
getLanguagesBySpeakerCount,
getOfficialLanguagesByCountry,
getOfficialLanguageCountries,
} from "arevdata";
import type {
Language,
LanguageNameLocale,
LanguageOfficialCountry,
LanguageOfficialStatus,
} from "arevdata";
Data shape
type LanguageNameLocale =
| "en" | "ar" | "de" | "es" | "fr"
| "hi" | "it" | "ja" | "ko" | "nl"
| "pl" | "pt" | "ru" | "tr" | "zh";
type LanguageOfficialStatus =
| "official"
| "de_facto_official"
| "official_regional";
interface LanguageOfficialCountry {
countryCode: string; // ISO 3166-1 alpha-2
officialStatus: LanguageOfficialStatus;
populationPercent: number; // approximate share of speakers in that country
}
interface Language {
code: string; // "en", "en-GB", "sr-Latn", "ca-ES-valencia"
baseCode: string; // "en", "en", "sr", "ca"
type: "language" | "variant";
name: string; // English display name
estimatedSpeakers: number; // CLDR-derived estimate, useful for ranking/filtering
officialCountries: LanguageOfficialCountry[];
script?: string;
region?: string;
variants?: string[];
}
Exports
| Export | Type | Description |
|---|---|---|
languages |
Language[] |
Base-language catalog |
languageVariants |
Language[] |
Locale variants / regional or script-specific forms |
allLanguages |
Language[] |
Combined catalog |
officialLanguagesByCountry |
Record<string, Language[]> |
Base-language official-language list keyed by country code |
Examples
Build a language selector
import { languages } from "arevdata";
const options = languages
.filter((language) => language.estimatedSpeakers >= 1_000_000)
.map((language) => ({
value: language.code,
label: language.name,
}));
Render labels in the user’s UI language
import { getLanguageName } from "arevdata";
getLanguageName("en", "de"); // "Englisch"
getLanguageName("en-GB", "fr"); // "anglais britannique"
getLanguageName("sr-Latn", "ja"); // "セルビア語 (ラテン文字)"
Resolve aliases and normalize stored values
import { canonicalizeLanguageCode } from "arevdata";
canonicalizeLanguageCode("iw"); // "he"
canonicalizeLanguageCode("EN_gb"); // "en-GB"
canonicalizeLanguageCode("sh"); // "sr-Latn"
Show all locale variants of a base language
import { getLanguageVariants } from "arevdata";
const englishVariants = getLanguageVariants("en");
englishVariants.slice(0, 5).map((language) => language.code);
// ["en-AU", "en-GB", "en-CA", ...]
Filter out obscure languages
import { getLanguagesBySpeakerCount } from "arevdata";
const majorLanguages = getLanguagesBySpeakerCount(10_000_000);
Search by translated label or code
import { searchLanguages } from "arevdata";
searchLanguages("anglais", { locale: "fr" });
searchLanguages("british english", { includeVariants: true });
searchLanguages("spanish", { minSpeakers: 50_000_000 });
Map countries to official languages
import { getOfficialLanguagesByCountry } from "arevdata";
getOfficialLanguagesByCountry("BE").map((language) => language.name);
// ["Dutch", "French", "German"]
Map a language back to countries where it is official
import { getOfficialLanguageCountries } from "arevdata";
getOfficialLanguageCountries("ca");
// [
// { countryCode: "AD", officialStatus: "official", populationPercent: 51 },
// { countryCode: "ES", officialStatus: "official_regional", populationPercent: 19 },
// ]
Notes
estimatedSpeakersis meant for ranking and filtering, not precise census reporting.- Base-language estimates are aggregated from CLDR territory population percentages.
- Variant estimates are locale-scoped when that can be inferred, for example
en-GBores-419. officialCountrieson raw language entries can include territories from CLDR.getOfficialLanguagesByCountry()andgetOfficialLanguageCountries()are filtered to countries present in this package’s maincountriesdataset.- Localized language labels live in the shared translation files, while the language data itself stays English-only.
- If a translated display name is unavailable for a very obscure code, the API falls back to the English label instead of omitting the language.
Data provenance
The static data in this repo was generated from:
- IANA Language Subtag Registry for canonical language identifiers and aliases
- Unicode CLDR territory population data for official-language and speaker estimates
- CLDR locale lists for named locale variants
Module layout
The language feature lives in srcarev/languages:
languageData.tsfor the committed English datasetlanguageFunctions.tsfor runtime helperstranslations/for localized language labels used bygetLanguageName()language.test.tsfor module coverage
Related
- Countries — country alpha-2 codes used by the official-language mappings
- Cities — useful when building locale-aware selectors with capital-city defaults
- Continents & currencies — often paired with country and language selectors