Skip to content

Module manubot.cite.csl_item

Represent bibliographic information for a single publication.

From the CSL docs:

Next up are the bibliographic details of the items you wish to cite: the item metadata.

For example, the bibliographic entry for a journal article may show the names of the
authors, the year in which the article was published, the article title, the journal
title, the volume and issue in which the article appeared, the page numbers of the
article, and the articles Digital Object Identifier (DOI). All these details help
the reader identify and find the referenced work.

Reference managers make it easy to create a library of items. While many reference
managers have their own way of storing item metadata, most support common bibliographic
exchange formats such as BibTeX and RIS. The citeproc-js CSL processor introduced a
JSON-based format for storing item metadata in a way citeproc-js could understand.
Several other CSL processors have since adopted this CSL JSON format (also known as
citeproc JSON).

-- https://github.com/citation-style-language/documentation/blob/master/primer.txt

The terminology we've adopted is csl_data for a list of csl_item dicts, and csl_json for csl_data that is JSON-serialized.

View Source
"""Represent bibliographic information for a single publication.

From the CSL docs:

    Next up are the bibliographic details of the items you wish to cite: the item metadata.

    For example, the bibliographic entry for a journal article may show the names of the

    authors, the year in which the article was published, the article title, the journal

    title, the volume and issue in which the article appeared, the page numbers of the

    article, and the article’s Digital Object Identifier (DOI). All these details help

    the reader identify and find the referenced work.

    Reference managers make it easy to create a library of items. While many reference

    managers have their own way of storing item metadata, most support common bibliographic

    exchange formats such as BibTeX and RIS. The citeproc-js CSL processor introduced a

    JSON-based format for storing item metadata in a way citeproc-js could understand.

    Several other CSL processors have since adopted this “CSL JSON” format (also known as

    “citeproc JSON”).

-- https://github.com/citation-style-language/documentation/blob/master/primer.txt

The terminology we've adopted is csl_data for a list of csl_item dicts, and csl_json

for csl_data that is JSON-serialized.

"""

import copy

import datetime

import logging

import re

from typing import Dict, List, Optional, Union

from manubot.cite.citekey import CiteKey

class CSL_Item(dict):

    """

    CSL_Item represents bibliographic information for a single citeable work.

    On a technical side CSL_Item is a Python dictionary with extra methods

    that help cleaning and manipulating it.

    These methods relate to:

    - adding an `id` key and value for CSL item

    - correcting bibliographic information and its structure

    - adding and reading a custom note to CSL item

    More information on CSL JSON (a list of CSL_Items) is available at:

    - https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html

    - http://docs.citationstyles.org/en/1.0.1/specification.html#standard-variables

    - https://github.com/citation-style-language/schema/blob/master/csl-data.json

    """

    # The ideas for CSL_Item methods come from the following parts of code:

    #  - [ ] citekey_to_csl_item(citekey, prune=True)

    # The methods in CSL_Item class provide primitives to reconstruct

    # functions above.

    type_mapping = {

        "journal-article": "article-journal",

        "book-chapter": "chapter",

        "posted-content": "manuscript",

        "proceedings-article": "paper-conference",

        "standard": "entry",

        "reference-entry": "entry",

    }

    def __init__(self, dictionary=None, **kwargs) -> None:

        """

        Can use both a dictionary or keywords to create a CSL_Item object:

            CSL_Item(title='The Book')

            CSL_Item({'title': 'The Book'})

            csl_dict = {'title': 'The Book', 'ISBN': '321-321'}

            CSL_Item(csl_dict, type='entry')

            CSL_Item(title='The Book', ISBN='321-321', type='entry')

        CSL_Item object is usually provided by bibliographic information API,

        but constructing small CSL_Item objects is useful for testing.

        """

        if dictionary is None:

            dictionary = {}

        super().__init__(copy.deepcopy(dictionary))

        self.update(copy.deepcopy(kwargs))

    def set_id(self, id_) -> "CSL_Item":

        self["id"] = id_

        return self

    def correct_invalid_type(self) -> "CSL_Item":

        """

        Correct invalid CSL item type.

        Does nothing if `type` not present.

        For detail see https://github.com/CrossRef/rest-api-doc/issues/187

        """

        if "type" in self:

            # Replace a type from in CSL_Item.type_mapping.keys(),

            # leave type intact in other cases.

            t = self["type"]

            self["type"] = self.type_mapping.get(t, t)

        return self

    def set_default_type(self) -> "CSL_Item":

        """

        Set type to 'entry', if type not specified.

        """

        self["type"] = self.get("type", "entry")

        return self

    def prune_against_schema(self) -> "CSL_Item":

        """

        Remove fields that violate the CSL Item JSON Schema.

        """

        from .citeproc import remove_jsonschema_errors

        (csl_item,) = remove_jsonschema_errors([self], in_place=True)

        assert csl_item is self

        return self

    def validate_against_schema(self) -> "CSL_Item":

        """

        Confirm that the CSL_Item validates. If not, raises a

        jsonschema.exceptions.ValidationError.

        """

        from .citeproc import get_jsonschema_csl_validator

        validator = get_jsonschema_csl_validator()

        validator.validate([self])

        return self

    def clean(self, prune: bool = True) -> "CSL_Item":

        """

        Sanitize and touch-up a potentially dirty CSL_Item.

        The following steps are performed:

        - update incorrect values for "type" field when a correct variant is known

        - remove fields that violate the JSON Schema (if prune=True)

        - set default value for "type" if missing, since CSL JSON requires type

        - validate against the CSL JSON schema (if prune=True) to ensure output

          CSL_Item is clean

        """

        logging.debug(

            f"Starting CSL_Item.clean with{'' if prune else 'out'}"

            f"CSL pruning for id: {self.get('id', 'id not specified')}"

        )

        self.correct_invalid_type()

        if prune:

            self.prune_against_schema()

        self.set_default_type()

        if prune:

            self.validate_against_schema()

        return self

    def set_date(

        self,

        date: Union[None, str, datetime.date, datetime.datetime],

        variable: str = "issued",

    ) -> "CSL_Item":

        """

        date: date either as a string (in the form YYYY, YYYY-MM, or YYYY-MM-DD)

            or as a Python date object (datetime.date or datetime.datetime).

        variable: which variable to assign the date to.

        """

        date_parts = date_to_date_parts(date)

        if date_parts:

            self[variable] = {"date-parts": [date_parts]}

        return self

    def get_date(self, variable: str = "issued", fill: bool = False) -> Optional[str]:

        """

        Return a CSL date-variable as ISO formatted string:

        ('YYYY', 'YYYY-MM', 'YYYY-MM-DD', or None).

        variable: which CSL JSON date variable to retrieve

        fill: if True, set missing months to January

            and missing days to the first day of the month.

        """

        try:

            date_parts = self[variable]["date-parts"][0]

        except (IndexError, KeyError):

            return None

        return date_parts_to_string(date_parts, fill=fill)

    @property

    def note(self) -> str:

        """

        Return the value of the "note" field as a string.

        If "note" key is not set, return empty string.

        """

        return str(self.get("note") or "")

    @note.setter

    def note(self, text: str) -> None:

        if text:

            self["note"] = text

        else:

            # if text is None or an empty string, remove the "note" field

            self.pop("note", None)

    @property

    def note_dict(self) -> Dict[str, str]:

        """

        Return a dictionary with key-value pairs encoded by this CSL Item's note.

        Extracts both forms (line-entry and braced-entry) of key-value pairs from the CSL JSON "cheater syntax"

        https://github.com/Juris-M/citeproc-js-docs/blob/93d7991d42b4a96b74b7281f38e168e365847e40/csl-json/markup.rst#cheater-syntax-for-odd-fields

        Assigning to this dict will not update `self["note"]`.

        """

        note = self.note

        line_matches = re.findall(

            r"^(?P<key>[A-Z]+|[-_a-z]+): *(?P<value>.+?) *$", note, re.MULTILINE

        )

        braced_matches = re.findall(

            r"{:(?P<key>[A-Z]+|[-_a-z]+): *(?P<value>.+?) *}", note

        )

        return dict(line_matches + braced_matches)

    def note_append_text(self, text: str) -> None:

        """

        Append text to the note field (as a new line) of a CSL Item.

        If a line already exists equal to `text`, do nothing.

        """

        if not text:

            return

        note = self.note

        if re.search(f"^{re.escape(text)}$", note, flags=re.MULTILINE):

            # do not accumulate duplicate lines of text

            # https://github.com/manubot/manubot/issues/258

            return

        if note and not note.endswith("\n"):

            note += "\n"

        note += text

        self.note = note

    def note_append_dict(self, dictionary: dict) -> None:

        """

        Append key-value pairs specified by `dictionary` to the note field of a CSL Item.

        Uses the the [CSL JSON "cheater syntax"](https://github.com/Juris-M/citeproc-js-docs/blob/93d7991d42b4a96b74b7281f38e168e365847e40/csl-json/markup.rst#cheater-syntax-for-odd-fields)

        to encode additional values not defined by the CSL JSON schema.

        """

        for key, value in dictionary.items():

            if not re.fullmatch(r"[A-Z]+|[-_a-z]+", key):

                logging.warning(

                    f"note_append_dict: skipping adding {key!r} because "

                    f"it does not conform to the variable_name syntax as per https://git.io/fjTzW."

                )

                continue

            if "\n" in value:

                logging.warning(

                    f"note_append_dict: skipping adding {key!r} because "

                    f"the value contains a newline: {value!r}"

                )

                continue

            self.note_append_text(f"{key}: {value}")

    def infer_id(self) -> "CSL_Item":

        """

        Detect and set a non-null/empty value for "id" or else raise a ValueError.

        """

        if self.get("standard_citation"):

            # "standard_citation" field is set with a non-null/empty value

            return self.set_id(self.pop("standard_citation"))

        if self.note_dict.get("standard_id"):

            # "standard_id" note field is set with a non-null/empty value

            return self.set_id(self.note_dict["standard_id"])

        if self.get("id"):

            # "id" field exists and is set with a non-null/empty value

            return self.set_id(self["id"])

        raise ValueError(

            "infer_id could not detect a field with a citation / standard_citation. "

            'Consider setting the CSL Item "id" field.'

        )

    def standardize_id(self) -> "CSL_Item":

        """

        Extract the standard_id (standard citation key) for a csl_item and modify the csl_item in-place to set its "id" field.

        The standard_id is extracted from a "standard_citation" field, the "note" field, or the "id" field.

        The extracted citation is checked for validity and standardized, after which it is the final "standard_id".

        Regarding csl_item modification, the csl_item "id" field is set to the standard_citation and the note field

        is created or updated with key-value pairs for standard_id and original_id.

        Note that the Manubot software generally refers to the "id" of a CSL Item as a citekey.

        However, in this context, we use "id" rather than "citekey" for consistency with CSL's "id" field.

        """

        original_id = self.get("id")

        self.infer_id()

        original_standard_id = self["id"]

        citekey = CiteKey(original_standard_id)

        standard_id = citekey.standard_id

        add_to_note = {}

        note_dict = self.note_dict

        if original_id and original_id != standard_id:

            if original_id != note_dict.get("original_id"):

                add_to_note["original_id"] = original_id

        if original_standard_id and original_standard_id != standard_id:

            if original_standard_id != note_dict.get("original_standard_id"):

                add_to_note["original_standard_id"] = original_standard_id

        if standard_id != note_dict.get("standard_id"):

            add_to_note["standard_id"] = standard_id

        self.note_append_dict(dictionary=add_to_note)

        self.set_id(standard_id)

        return self

def assert_csl_item_type(x) -> None:

    if not isinstance(x, CSL_Item):

        raise TypeError(f"Expected CSL_Item object, got {type(x)}")

def date_to_date_parts(

    date: Union[None, str, datetime.date, datetime.datetime],

) -> Optional[List[int]]:

    """

    Convert a date string or object to a date parts list.

    date: date either as a string (in the form YYYY, YYYY-MM, or YYYY-MM-DD)

        or as a Python date object (datetime.date or datetime.datetime).

    """

    if date is None:

        return None

    if isinstance(date, (datetime.date, datetime.datetime)):

        date = date.isoformat()

    if not isinstance(date, str):

        raise ValueError(f"date_to_date_parts: unsupported type for {date}")

    date = date.strip()

    re_year = r"(?P<year>[0-9]{4})"

    re_month = r"(?P<month>1[0-2]|0[1-9])"

    re_day = r"(?P<day>[0-3][0-9])"

    patterns = [

        f"{re_year}-{re_month}-{re_day}",

        f"{re_year}-{re_month}",

        f"{re_year}",

        ".*",  # regex to match anything

    ]

    for pattern in patterns:

        match = re.match(pattern, date)

        if match:

            break

    date_parts = []

    for part in "year", "month", "day":

        try:

            value = match.group(part)

        except IndexError:

            break

        if not value:

            break

        date_parts.append(int(value))

    if date_parts:

        return date_parts

    return None

def date_parts_to_string(date_parts, fill: bool = False) -> Optional[str]:

    """

    Return a CSL date-parts list as ISO formatted string:

    ('YYYY', 'YYYY-MM', 'YYYY-MM-DD', or None).

    date_parts: list or tuple like [year, month, day] as integers.

        Also supports [year, month] and [year] for situations where the day or month-and-day are missing.

    fill: if True, set missing months to January

        and missing days to the first day of the month.

    """

    if not date_parts:

        return None

    if not isinstance(date_parts, (tuple, list)):

        raise ValueError("date_parts must be a tuple or list")

    while fill and 1 <= len(date_parts) < 3:

        date_parts.append(1)

    widths = 4, 2, 2

    str_parts = []

    for i, part in enumerate(date_parts[:3]):

        width = widths[i]

        if isinstance(part, int):

            part = str(part)

        if not isinstance(part, str):

            break

        part = part.zfill(width)

        if len(part) != width or not part.isdigit():

            break

        str_parts.append(part)

    if not str_parts:

        return None

    iso_str = "-".join(str_parts)

    return iso_str

Functions

assert_csl_item_type

def assert_csl_item_type(
    x
) -> None
View Source
def assert_csl_item_type(x) -> None:

    if not isinstance(x, CSL_Item):

        raise TypeError(f"Expected CSL_Item object, got {type(x)}")

date_parts_to_string

def date_parts_to_string(
    date_parts,
    fill: bool = False
) -> Optional[str]

Return a CSL date-parts list as ISO formatted string:

('YYYY', 'YYYY-MM', 'YYYY-MM-DD', or None).

date_parts: list or tuple like [year, month, day] as integers. Also supports [year, month] and [year] for situations where the day or month-and-day are missing. fill: if True, set missing months to January and missing days to the first day of the month.

View Source
def date_parts_to_string(date_parts, fill: bool = False) -> Optional[str]:

    """

    Return a CSL date-parts list as ISO formatted string:

    ('YYYY', 'YYYY-MM', 'YYYY-MM-DD', or None).

    date_parts: list or tuple like [year, month, day] as integers.

        Also supports [year, month] and [year] for situations where the day or month-and-day are missing.

    fill: if True, set missing months to January

        and missing days to the first day of the month.

    """

    if not date_parts:

        return None

    if not isinstance(date_parts, (tuple, list)):

        raise ValueError("date_parts must be a tuple or list")

    while fill and 1 <= len(date_parts) < 3:

        date_parts.append(1)

    widths = 4, 2, 2

    str_parts = []

    for i, part in enumerate(date_parts[:3]):

        width = widths[i]

        if isinstance(part, int):

            part = str(part)

        if not isinstance(part, str):

            break

        part = part.zfill(width)

        if len(part) != width or not part.isdigit():

            break

        str_parts.append(part)

    if not str_parts:

        return None

    iso_str = "-".join(str_parts)

    return iso_str

date_to_date_parts

def date_to_date_parts(
    date: Union[NoneType, str, datetime.date, datetime.datetime]
) -> Optional[List[int]]

Convert a date string or object to a date parts list.

date: date either as a string (in the form YYYY, YYYY-MM, or YYYY-MM-DD) or as a Python date object (datetime.date or datetime.datetime).

View Source
def date_to_date_parts(

    date: Union[None, str, datetime.date, datetime.datetime],

) -> Optional[List[int]]:

    """

    Convert a date string or object to a date parts list.

    date: date either as a string (in the form YYYY, YYYY-MM, or YYYY-MM-DD)

        or as a Python date object (datetime.date or datetime.datetime).

    """

    if date is None:

        return None

    if isinstance(date, (datetime.date, datetime.datetime)):

        date = date.isoformat()

    if not isinstance(date, str):

        raise ValueError(f"date_to_date_parts: unsupported type for {date}")

    date = date.strip()

    re_year = r"(?P<year>[0-9]{4})"

    re_month = r"(?P<month>1[0-2]|0[1-9])"

    re_day = r"(?P<day>[0-3][0-9])"

    patterns = [

        f"{re_year}-{re_month}-{re_day}",

        f"{re_year}-{re_month}",

        f"{re_year}",

        ".*",  # regex to match anything

    ]

    for pattern in patterns:

        match = re.match(pattern, date)

        if match:

            break

    date_parts = []

    for part in "year", "month", "day":

        try:

            value = match.group(part)

        except IndexError:

            break

        if not value:

            break

        date_parts.append(int(value))

    if date_parts:

        return date_parts

    return None

Classes

CSL_Item

class CSL_Item(
    dictionary=None,
    **kwargs
)

CSL_Item represents bibliographic information for a single citeable work.

On a technical side CSL_Item is a Python dictionary with extra methods that help cleaning and manipulating it.

These methods relate to: - adding an id key and value for CSL item - correcting bibliographic information and its structure - adding and reading a custom note to CSL item

More information on CSL JSON (a list of CSL_Items) is available at: - https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html - http://docs.citationstyles.org/en/1.0.1/specification.html#standard-variables - https://github.com/citation-style-language/schema/blob/master/csl-data.json

View Source
class CSL_Item(dict):

    """

    CSL_Item represents bibliographic information for a single citeable work.

    On a technical side CSL_Item is a Python dictionary with extra methods

    that help cleaning and manipulating it.

    These methods relate to:

    - adding an `id` key and value for CSL item

    - correcting bibliographic information and its structure

    - adding and reading a custom note to CSL item

    More information on CSL JSON (a list of CSL_Items) is available at:

    - https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html

    - http://docs.citationstyles.org/en/1.0.1/specification.html#standard-variables

    - https://github.com/citation-style-language/schema/blob/master/csl-data.json

    """

    # The ideas for CSL_Item methods come from the following parts of code:

    #  - [ ] citekey_to_csl_item(citekey, prune=True)

    # The methods in CSL_Item class provide primitives to reconstruct

    # functions above.

    type_mapping = {

        "journal-article": "article-journal",

        "book-chapter": "chapter",

        "posted-content": "manuscript",

        "proceedings-article": "paper-conference",

        "standard": "entry",

        "reference-entry": "entry",

    }

    def __init__(self, dictionary=None, **kwargs) -> None:

        """

        Can use both a dictionary or keywords to create a CSL_Item object:

            CSL_Item(title='The Book')

            CSL_Item({'title': 'The Book'})

            csl_dict = {'title': 'The Book', 'ISBN': '321-321'}

            CSL_Item(csl_dict, type='entry')

            CSL_Item(title='The Book', ISBN='321-321', type='entry')

        CSL_Item object is usually provided by bibliographic information API,

        but constructing small CSL_Item objects is useful for testing.

        """

        if dictionary is None:

            dictionary = {}

        super().__init__(copy.deepcopy(dictionary))

        self.update(copy.deepcopy(kwargs))

    def set_id(self, id_) -> "CSL_Item":

        self["id"] = id_

        return self

    def correct_invalid_type(self) -> "CSL_Item":

        """

        Correct invalid CSL item type.

        Does nothing if `type` not present.

        For detail see https://github.com/CrossRef/rest-api-doc/issues/187

        """

        if "type" in self:

            # Replace a type from in CSL_Item.type_mapping.keys(),

            # leave type intact in other cases.

            t = self["type"]

            self["type"] = self.type_mapping.get(t, t)

        return self

    def set_default_type(self) -> "CSL_Item":

        """

        Set type to 'entry', if type not specified.

        """

        self["type"] = self.get("type", "entry")

        return self

    def prune_against_schema(self) -> "CSL_Item":

        """

        Remove fields that violate the CSL Item JSON Schema.

        """

        from .citeproc import remove_jsonschema_errors

        (csl_item,) = remove_jsonschema_errors([self], in_place=True)

        assert csl_item is self

        return self

    def validate_against_schema(self) -> "CSL_Item":

        """

        Confirm that the CSL_Item validates. If not, raises a

        jsonschema.exceptions.ValidationError.

        """

        from .citeproc import get_jsonschema_csl_validator

        validator = get_jsonschema_csl_validator()

        validator.validate([self])

        return self

    def clean(self, prune: bool = True) -> "CSL_Item":

        """

        Sanitize and touch-up a potentially dirty CSL_Item.

        The following steps are performed:

        - update incorrect values for "type" field when a correct variant is known

        - remove fields that violate the JSON Schema (if prune=True)

        - set default value for "type" if missing, since CSL JSON requires type

        - validate against the CSL JSON schema (if prune=True) to ensure output

          CSL_Item is clean

        """

        logging.debug(

            f"Starting CSL_Item.clean with{'' if prune else 'out'}"

            f"CSL pruning for id: {self.get('id', 'id not specified')}"

        )

        self.correct_invalid_type()

        if prune:

            self.prune_against_schema()

        self.set_default_type()

        if prune:

            self.validate_against_schema()

        return self

    def set_date(

        self,

        date: Union[None, str, datetime.date, datetime.datetime],

        variable: str = "issued",

    ) -> "CSL_Item":

        """

        date: date either as a string (in the form YYYY, YYYY-MM, or YYYY-MM-DD)

            or as a Python date object (datetime.date or datetime.datetime).

        variable: which variable to assign the date to.

        """

        date_parts = date_to_date_parts(date)

        if date_parts:

            self[variable] = {"date-parts": [date_parts]}

        return self

    def get_date(self, variable: str = "issued", fill: bool = False) -> Optional[str]:

        """

        Return a CSL date-variable as ISO formatted string:

        ('YYYY', 'YYYY-MM', 'YYYY-MM-DD', or None).

        variable: which CSL JSON date variable to retrieve

        fill: if True, set missing months to January

            and missing days to the first day of the month.

        """

        try:

            date_parts = self[variable]["date-parts"][0]

        except (IndexError, KeyError):

            return None

        return date_parts_to_string(date_parts, fill=fill)

    @property

    def note(self) -> str:

        """

        Return the value of the "note" field as a string.

        If "note" key is not set, return empty string.

        """

        return str(self.get("note") or "")

    @note.setter

    def note(self, text: str) -> None:

        if text:

            self["note"] = text

        else:

            # if text is None or an empty string, remove the "note" field

            self.pop("note", None)

    @property

    def note_dict(self) -> Dict[str, str]:

        """

        Return a dictionary with key-value pairs encoded by this CSL Item's note.

        Extracts both forms (line-entry and braced-entry) of key-value pairs from the CSL JSON "cheater syntax"

        https://github.com/Juris-M/citeproc-js-docs/blob/93d7991d42b4a96b74b7281f38e168e365847e40/csl-json/markup.rst#cheater-syntax-for-odd-fields

        Assigning to this dict will not update `self["note"]`.

        """

        note = self.note

        line_matches = re.findall(

            r"^(?P<key>[A-Z]+|[-_a-z]+): *(?P<value>.+?) *$", note, re.MULTILINE

        )

        braced_matches = re.findall(

            r"{:(?P<key>[A-Z]+|[-_a-z]+): *(?P<value>.+?) *}", note

        )

        return dict(line_matches + braced_matches)

    def note_append_text(self, text: str) -> None:

        """

        Append text to the note field (as a new line) of a CSL Item.

        If a line already exists equal to `text`, do nothing.

        """

        if not text:

            return

        note = self.note

        if re.search(f"^{re.escape(text)}$", note, flags=re.MULTILINE):

            # do not accumulate duplicate lines of text

            # https://github.com/manubot/manubot/issues/258

            return

        if note and not note.endswith("\n"):

            note += "\n"

        note += text

        self.note = note

    def note_append_dict(self, dictionary: dict) -> None:

        """

        Append key-value pairs specified by `dictionary` to the note field of a CSL Item.

        Uses the the [CSL JSON "cheater syntax"](https://github.com/Juris-M/citeproc-js-docs/blob/93d7991d42b4a96b74b7281f38e168e365847e40/csl-json/markup.rst#cheater-syntax-for-odd-fields)

        to encode additional values not defined by the CSL JSON schema.

        """

        for key, value in dictionary.items():

            if not re.fullmatch(r"[A-Z]+|[-_a-z]+", key):

                logging.warning(

                    f"note_append_dict: skipping adding {key!r} because "

                    f"it does not conform to the variable_name syntax as per https://git.io/fjTzW."

                )

                continue

            if "\n" in value:

                logging.warning(

                    f"note_append_dict: skipping adding {key!r} because "

                    f"the value contains a newline: {value!r}"

                )

                continue

            self.note_append_text(f"{key}: {value}")

    def infer_id(self) -> "CSL_Item":

        """

        Detect and set a non-null/empty value for "id" or else raise a ValueError.

        """

        if self.get("standard_citation"):

            # "standard_citation" field is set with a non-null/empty value

            return self.set_id(self.pop("standard_citation"))

        if self.note_dict.get("standard_id"):

            # "standard_id" note field is set with a non-null/empty value

            return self.set_id(self.note_dict["standard_id"])

        if self.get("id"):

            # "id" field exists and is set with a non-null/empty value

            return self.set_id(self["id"])

        raise ValueError(

            "infer_id could not detect a field with a citation / standard_citation. "

            'Consider setting the CSL Item "id" field.'

        )

    def standardize_id(self) -> "CSL_Item":

        """

        Extract the standard_id (standard citation key) for a csl_item and modify the csl_item in-place to set its "id" field.

        The standard_id is extracted from a "standard_citation" field, the "note" field, or the "id" field.

        The extracted citation is checked for validity and standardized, after which it is the final "standard_id".

        Regarding csl_item modification, the csl_item "id" field is set to the standard_citation and the note field

        is created or updated with key-value pairs for standard_id and original_id.

        Note that the Manubot software generally refers to the "id" of a CSL Item as a citekey.

        However, in this context, we use "id" rather than "citekey" for consistency with CSL's "id" field.

        """

        original_id = self.get("id")

        self.infer_id()

        original_standard_id = self["id"]

        citekey = CiteKey(original_standard_id)

        standard_id = citekey.standard_id

        add_to_note = {}

        note_dict = self.note_dict

        if original_id and original_id != standard_id:

            if original_id != note_dict.get("original_id"):

                add_to_note["original_id"] = original_id

        if original_standard_id and original_standard_id != standard_id:

            if original_standard_id != note_dict.get("original_standard_id"):

                add_to_note["original_standard_id"] = original_standard_id

        if standard_id != note_dict.get("standard_id"):

            add_to_note["standard_id"] = standard_id

        self.note_append_dict(dictionary=add_to_note)

        self.set_id(standard_id)

        return self

Ancestors (in MRO)

  • builtins.dict

Descendants

  • manubot.cite.arxiv.CSL_Item_arXiv

Class variables

type_mapping

Instance variables

note

Return the value of the "note" field as a string.

If "note" key is not set, return empty string.

note_dict

Return a dictionary with key-value pairs encoded by this CSL Item's note.

Extracts both forms (line-entry and braced-entry) of key-value pairs from the CSL JSON "cheater syntax" https://github.com/Juris-M/citeproc-js-docs/blob/93d7991d42b4a96b74b7281f38e168e365847e40/csl-json/markup.rst#cheater-syntax-for-odd-fields

Assigning to this dict will not update self["note"].

Methods

clean

def clean(
    self,
    prune: bool = True
) -> 'CSL_Item'

Sanitize and touch-up a potentially dirty CSL_Item.

The following steps are performed: - update incorrect values for "type" field when a correct variant is known - remove fields that violate the JSON Schema (if prune=True) - set default value for "type" if missing, since CSL JSON requires type - validate against the CSL JSON schema (if prune=True) to ensure output CSL_Item is clean

View Source
    def clean(self, prune: bool = True) -> "CSL_Item":

        """

        Sanitize and touch-up a potentially dirty CSL_Item.

        The following steps are performed:

        - update incorrect values for "type" field when a correct variant is known

        - remove fields that violate the JSON Schema (if prune=True)

        - set default value for "type" if missing, since CSL JSON requires type

        - validate against the CSL JSON schema (if prune=True) to ensure output

          CSL_Item is clean

        """

        logging.debug(

            f"Starting CSL_Item.clean with{'' if prune else 'out'}"

            f"CSL pruning for id: {self.get('id', 'id not specified')}"

        )

        self.correct_invalid_type()

        if prune:

            self.prune_against_schema()

        self.set_default_type()

        if prune:

            self.validate_against_schema()

        return self

clear

def clear(
    ...
)

D.clear() -> None. Remove all items from D.

copy

def copy(
    ...
)

D.copy() -> a shallow copy of D

correct_invalid_type

def correct_invalid_type(
    self
) -> 'CSL_Item'

Correct invalid CSL item type.

Does nothing if type not present.

For detail see https://github.com/CrossRef/rest-api-doc/issues/187

View Source
    def correct_invalid_type(self) -> "CSL_Item":

        """

        Correct invalid CSL item type.

        Does nothing if `type` not present.

        For detail see https://github.com/CrossRef/rest-api-doc/issues/187

        """

        if "type" in self:

            # Replace a type from in CSL_Item.type_mapping.keys(),

            # leave type intact in other cases.

            t = self["type"]

            self["type"] = self.type_mapping.get(t, t)

        return self

fromkeys

def fromkeys(
    iterable,
    value=None,
    /
)

Create a new dictionary with keys from iterable and values set to value.

get

def get(
    self,
    key,
    default=None,
    /
)

Return the value for key if key is in the dictionary, else default.

get_date

def get_date(
    self,
    variable: str = 'issued',
    fill: bool = False
) -> Optional[str]

Return a CSL date-variable as ISO formatted string:

('YYYY', 'YYYY-MM', 'YYYY-MM-DD', or None).

variable: which CSL JSON date variable to retrieve fill: if True, set missing months to January and missing days to the first day of the month.

View Source
    def get_date(self, variable: str = "issued", fill: bool = False) -> Optional[str]:

        """

        Return a CSL date-variable as ISO formatted string:

        ('YYYY', 'YYYY-MM', 'YYYY-MM-DD', or None).

        variable: which CSL JSON date variable to retrieve

        fill: if True, set missing months to January

            and missing days to the first day of the month.

        """

        try:

            date_parts = self[variable]["date-parts"][0]

        except (IndexError, KeyError):

            return None

        return date_parts_to_string(date_parts, fill=fill)

infer_id

def infer_id(
    self
) -> 'CSL_Item'

Detect and set a non-null/empty value for "id" or else raise a ValueError.

View Source
    def infer_id(self) -> "CSL_Item":

        """

        Detect and set a non-null/empty value for "id" or else raise a ValueError.

        """

        if self.get("standard_citation"):

            # "standard_citation" field is set with a non-null/empty value

            return self.set_id(self.pop("standard_citation"))

        if self.note_dict.get("standard_id"):

            # "standard_id" note field is set with a non-null/empty value

            return self.set_id(self.note_dict["standard_id"])

        if self.get("id"):

            # "id" field exists and is set with a non-null/empty value

            return self.set_id(self["id"])

        raise ValueError(

            "infer_id could not detect a field with a citation / standard_citation. "

            'Consider setting the CSL Item "id" field.'

        )

items

def items(
    ...
)

D.items() -> a set-like object providing a view on D's items

keys

def keys(
    ...
)

D.keys() -> a set-like object providing a view on D's keys

note_append_dict

def note_append_dict(
    self,
    dictionary: dict
) -> None

Append key-value pairs specified by dictionary to the note field of a CSL Item.

Uses the the CSL JSON "cheater syntax" to encode additional values not defined by the CSL JSON schema.

View Source
    def note_append_dict(self, dictionary: dict) -> None:

        """

        Append key-value pairs specified by `dictionary` to the note field of a CSL Item.

        Uses the the [CSL JSON "cheater syntax"](https://github.com/Juris-M/citeproc-js-docs/blob/93d7991d42b4a96b74b7281f38e168e365847e40/csl-json/markup.rst#cheater-syntax-for-odd-fields)

        to encode additional values not defined by the CSL JSON schema.

        """

        for key, value in dictionary.items():

            if not re.fullmatch(r"[A-Z]+|[-_a-z]+", key):

                logging.warning(

                    f"note_append_dict: skipping adding {key!r} because "

                    f"it does not conform to the variable_name syntax as per https://git.io/fjTzW."

                )

                continue

            if "\n" in value:

                logging.warning(

                    f"note_append_dict: skipping adding {key!r} because "

                    f"the value contains a newline: {value!r}"

                )

                continue

            self.note_append_text(f"{key}: {value}")

note_append_text

def note_append_text(
    self,
    text: str
) -> None

Append text to the note field (as a new line) of a CSL Item.

If a line already exists equal to text, do nothing.

View Source
    def note_append_text(self, text: str) -> None:

        """

        Append text to the note field (as a new line) of a CSL Item.

        If a line already exists equal to `text`, do nothing.

        """

        if not text:

            return

        note = self.note

        if re.search(f"^{re.escape(text)}$", note, flags=re.MULTILINE):

            # do not accumulate duplicate lines of text

            # https://github.com/manubot/manubot/issues/258

            return

        if note and not note.endswith("\n"):

            note += "\n"

        note += text

        self.note = note

pop

def pop(
    ...
)

D.pop(k[,d]) -> v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem

def popitem(
    self,
    /
)

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

prune_against_schema

def prune_against_schema(
    self
) -> 'CSL_Item'

Remove fields that violate the CSL Item JSON Schema.

View Source
    def prune_against_schema(self) -> "CSL_Item":

        """

        Remove fields that violate the CSL Item JSON Schema.

        """

        from .citeproc import remove_jsonschema_errors

        (csl_item,) = remove_jsonschema_errors([self], in_place=True)

        assert csl_item is self

        return self

set_date

def set_date(
    self,
    date: Union[NoneType, str, datetime.date, datetime.datetime],
    variable: str = 'issued'
) -> 'CSL_Item'

date: date either as a string (in the form YYYY, YYYY-MM, or YYYY-MM-DD)

or as a Python date object (datetime.date or datetime.datetime). variable: which variable to assign the date to.

View Source
    def set_date(

        self,

        date: Union[None, str, datetime.date, datetime.datetime],

        variable: str = "issued",

    ) -> "CSL_Item":

        """

        date: date either as a string (in the form YYYY, YYYY-MM, or YYYY-MM-DD)

            or as a Python date object (datetime.date or datetime.datetime).

        variable: which variable to assign the date to.

        """

        date_parts = date_to_date_parts(date)

        if date_parts:

            self[variable] = {"date-parts": [date_parts]}

        return self

set_default_type

def set_default_type(
    self
) -> 'CSL_Item'

Set type to 'entry', if type not specified.

View Source
    def set_default_type(self) -> "CSL_Item":

        """

        Set type to 'entry', if type not specified.

        """

        self["type"] = self.get("type", "entry")

        return self

set_id

def set_id(
    self,
    id_
) -> 'CSL_Item'
View Source
    def set_id(self, id_) -> "CSL_Item":

        self["id"] = id_

        return self

setdefault

def setdefault(
    self,
    key,
    default=None,
    /
)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

standardize_id

def standardize_id(
    self
) -> 'CSL_Item'

Extract the standard_id (standard citation key) for a csl_item and modify the csl_item in-place to set its "id" field.

The standard_id is extracted from a "standard_citation" field, the "note" field, or the "id" field. The extracted citation is checked for validity and standardized, after which it is the final "standard_id".

Regarding csl_item modification, the csl_item "id" field is set to the standard_citation and the note field is created or updated with key-value pairs for standard_id and original_id.

Note that the Manubot software generally refers to the "id" of a CSL Item as a citekey. However, in this context, we use "id" rather than "citekey" for consistency with CSL's "id" field.

View Source
    def standardize_id(self) -> "CSL_Item":

        """

        Extract the standard_id (standard citation key) for a csl_item and modify the csl_item in-place to set its "id" field.

        The standard_id is extracted from a "standard_citation" field, the "note" field, or the "id" field.

        The extracted citation is checked for validity and standardized, after which it is the final "standard_id".

        Regarding csl_item modification, the csl_item "id" field is set to the standard_citation and the note field

        is created or updated with key-value pairs for standard_id and original_id.

        Note that the Manubot software generally refers to the "id" of a CSL Item as a citekey.

        However, in this context, we use "id" rather than "citekey" for consistency with CSL's "id" field.

        """

        original_id = self.get("id")

        self.infer_id()

        original_standard_id = self["id"]

        citekey = CiteKey(original_standard_id)

        standard_id = citekey.standard_id

        add_to_note = {}

        note_dict = self.note_dict

        if original_id and original_id != standard_id:

            if original_id != note_dict.get("original_id"):

                add_to_note["original_id"] = original_id

        if original_standard_id and original_standard_id != standard_id:

            if original_standard_id != note_dict.get("original_standard_id"):

                add_to_note["original_standard_id"] = original_standard_id

        if standard_id != note_dict.get("standard_id"):

            add_to_note["standard_id"] = standard_id

        self.note_append_dict(dictionary=add_to_note)

        self.set_id(standard_id)

        return self

update

def update(
    ...
)

D.update([E, ]**F) -> None. Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

validate_against_schema

def validate_against_schema(
    self
) -> 'CSL_Item'

Confirm that the CSL_Item validates. If not, raises a

jsonschema.exceptions.ValidationError.

View Source
    def validate_against_schema(self) -> "CSL_Item":

        """

        Confirm that the CSL_Item validates. If not, raises a

        jsonschema.exceptions.ValidationError.

        """

        from .citeproc import get_jsonschema_csl_validator

        validator = get_jsonschema_csl_validator()

        validator.validate([self])

        return self

values

def values(
    ...
)

D.values() -> an object providing a view on D's values