This is the documentation page for Модул:data consistency check

This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.

Module:etymology languages/data

  • Bashkardi language (bsg-bas) has a canonical name that is not unique; it is also used by the code bsg.
  • Rudbari language (rdb-rud) has a canonical name that is not unique; it is also used by the code rdb.
  • Chali language (tks-cal) has a canonical name that is not unique; it is also used by the code tgf.

Module:families/data

  • Middle Iranian family (ira-mid) has no child families or languages.
  • Old Iranian family (ira-old) has no child families or languages.

Module:scripts/data

  • Blissymbols script (Blis) is not used by any language and has no characters listed for auto-detection.
  • Cypro-Minoan script (Cpmn) is not used by any language.
  • Hieratic script (Egyh) is not used by any language and has no characters listed for auto-detection.
  • Elymaic script (Elym) is not used by any language.
  • Hiragana script (Hira) is not used by any language.
  • Nyiakeng Puachue Hmong script (Hmnp) is not used by any language.
  • Kana script (Hrkt) is not used by any language.
  • Image-rendered script (Imag) is not used by any language and has no characters listed for auto-detection.
  • International Phonetic Alphabet script (Ipach) is not used by any language and has no characters listed for auto-detection.
  • Kpelle script (Kpel) is not used by any language and has no characters listed for auto-detection.
  • Loma script (Loma) is not used by any language and has no characters listed for auto-detection.
  • Moon script (Moon) is not used by any language and has no characters listed for auto-detection.
  • Morse code (Morse) is not used by any language and has no characters listed for auto-detection.
  • Musical notation script (Music) is not used by any language.
  • Nag Mundari script (Nagm) is not used by any language.
  • Unspecified script (None) is not used by any language and has no characters listed for auto-detection.
  • Rongorongo script (Roro) is not used by any language and has no characters listed for auto-detection.
  • Rumi numerals script (Rumin) is not used by any language.
  • flag semaphore (Semap) is not used by any language and has no characters listed for auto-detection.
  • Visible Speech script (Visp) is not used by any language and has no characters listed for auto-detection.
  • Vithkuqi script (Vith) is not used by any language.
  • Woleai script (Wole) is not used by any language and has no characters listed for auto-detection.
  • Yezidi script (Yezi) is not used by any language.
  • mathematical notation script (Zmth) is not used by any language.
  • symbol script (Zsym) is not used by any language.
  • undetermined script (Zyyy) is not used by any language and has no characters listed for auto-detection.
  • uncoded script (Zzzz) is not used by any language and has no characters listed for auto-detection.
  • The data key sort_by_scraping for Japanese script (Jpan) is invalid.

Checks performed

вироиш

For multiple data modules:

  • Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
  • Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
  • Each name in the list of other names must appear only once.
  • otherNames, if present, must be an array.
  • Wikidata item IDs must be a positive integer or a string starting with Q and ending with decimal digits.

The following must be true of the data used by Module:languages:

  • Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
  • The canonical name (field 1) must be present and must not be the same as the canonical name of another language.
  • If field 2 is not nil, it must a valid Wikidata item ID.
  • If field 3 or family is given and not nil, it must be a valid family code.
  • If field 4 or scripts is given and not nil, it must be an array, and each string in the array must be a valid script code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code.
  • If family is given, it must be a valid family code.
  • If type is given, it must be one of the recognised values (regular, reconstructed, appendix-constructed).
  • If entry_name is given, it must be a table that contains either two arrays (from and to) or a string (remove_diacritics) or both.
  • If sort_key is given, it may either be a string, or at table that in turn contains either two arrays (from and to) or a string (remove_diacritics).
  • If entry_name or sort_key is given, the from array must be longer or equal in length to the to array.
  • If standardChars is given, it must form a valid Lua string pattern when placed between square brackets with ^ before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.)
  • If override_translit is set, translit must also be set, because there must be a transliteration module that can override manual transliteration.
  • If link_tr is present, it must be true.
  • Have no data keys besides these: 1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr".

Checks not performed:

  • If translit is present, it should be the name of a module, and this module should contain a tr function that takes a pagename (and optionally a language code and script code) as arguments.
  • If sort_key is a string, it should be the name of a module, and this module should contain a makeSortKey function that takes a pagename (and optionally a language code and script code) as arguments.
  • If entry_name or sort_key is a table and contains a field remove_diacritics, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]).

These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link attempts to use the transliteration module.

Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.

The following must be true of the data used by Module:etymology languages:

  • canonicalName must be given.
  • parent must be given must be a valid language, family or etymology-only language code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language.
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item".

Codes in Module:families data must:

  • Have canonicalName, which must not be the same as the canonical name of another family.
  • If family is given, it must be a valid family code.
  • Have at least one language or subfamily belonging to it.
  • Have no data keys besides these: "canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item".

Codes in Module:scripts data must:

  • Have canonicalName.
  • Have at least one language that lists it as one of its scripts.
  • Have a characters pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.)
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction".