django-pgbulk

django-pgplus, forked from django-manager-utils, provides several optimized bulk operations for Postgres:

  1. pgbulk.update - For updating a list of models in bulk. Although Django provides a bulk_update in 2.2, it performs individual updates for every row and does not perform a native bulk update.

  2. pgbulk.upsert - For doing a bulk update or insert. This function uses postgres UPDATE ON CONFLICT syntax to perform an atomic upsert operation. There are several options to this function that allow the user to avoid touching rows if they result in a duplicate update, along with returning which rows were updated, created, or untouched.

  3. pgbulk.sync - For syncing a list of models with a table. Does a bulk upsert and also deletes any rows in the source queryset that were not part of the input data.

pgbulk.update

pgbulk.update(queryset, model_objs, update_fields=None)[source]

Bulk updates a list of model objects that are already saved.

Parameters
  • queryset (QuerySet) – The queryset to use when bulk updating

  • model_objs (List[Model]) – Model object values to use for the update

  • update_fields (List[str], default=None) – A list of fields on the model objects to update. If None, all fields will be updated.

Example

Update an attribute of multiple models in bulk:

import pgbulk

pgbulk.update(
    MyModel,
    [
        MyModel(id=1, some_attr='some_val1'),
        MyModel(id=2, some_attr='some_val2')
    ],
    # These are the fields that will be updated. If not provided,
    # all fields will be updated
    ['some_attr']
)

pgbulk.upsert

pgbulk.upsert(queryset, model_objs, unique_fields, update_fields=None, returning=False, ignore_duplicate_updates=True, return_untouched=False)[source]

Perform a bulk upsert on a table.

Parameters
  • queryset (Model|QuerySet) – A model or a queryset that defines the collection to upsert

  • model_objs (List[Model]) – A list of Django models to upsert. All models in this list will be bulk upserted.

  • unique_fields (List[str]) – A list of fields that define the uniqueness of the model. The model must have a unique constraint on these fields

  • update_fields (List[str], default=None) – A list of fields to update whenever objects already exist. If an empty list is provided, it is equivalent to doing a bulk insert on the objects that don’t exist. If None, all fields will be updated.

  • returning (bool|List[str], default=False) – If True, returns all fields. If a list, only returns fields in the list. If False, do not return results from the upsert.

  • ignore_duplicate_updates (bool, default=True) – Don’t perform an update if all columns are identical to the row in the database.

  • return_untouched (bool, default=False) – When ignore_duplicate_updates is True, untouched rows will not be returned in upsert results. Set this to True to return untouched rows.

Examples

A basic bulk upsert on a model:

import pgbulk

pgbulk.upsert(
    MyModel,
    [
        MyModel(int_field=1, some_attr='some_val1'),
        MyModel(int_field=2, some_attr='some_val2')
    ],
    # These are the fields that identify the uniqueness constraint.
    ['int_field'],
    # These are the fields that will be updated if the row already
    # exists. If not provided, all fields will be updated
    ['some_attr']
)

Return the results of an upsert:

results = pgbulk.upsert(
    MyModel,
    [
        MyModel(int_field=1, some_attr='some_val1'),
        MyModel(int_field=2, some_attr='some_val2')
    ],
    ['int_field'],
    ['some_attr'],
    # ``True`` will return all columns. One can also explicitly
    # list which columns will be returned
    returning=True
)

# Print which results were created
print(results.created)

# Print which results were updated.
# By default, if an update results in no changes, it will not
# be updated and will not be returned.
print(results.updated)

Upsert values and update rows even when the update is meaningless (i.e. a duplicate update). This is turned off by default, but it can be enabled in case postgres triggers or other processes need to happen as a result of an update:

pgbulk.upsert(
    MyModel,
    [
        MyModel(int_field=1, some_attr='some_val1'),
        MyModel(int_field=2, some_attr='some_val2')
    ],
    ['int_field'],
    ['some_attr'],
    # Perform updates in the database even if it's a duplicate
    # update.
    ignore_duplicate_updates=False
)

Upsert values and ignore duplicate updates, but still return the rows that were untouched by the upsert:

results = pgbulk.upsert(
    MyModel,
    [
        MyModel(int_field=1, some_attr='some_val1'),
        MyModel(int_field=2, some_attr='some_val2')
    ],
    ['int_field'],
    ['some_attr'],
    returning=True,
    # Even though we don't perform an update on a duplicate,
    # return the row that was part of the upsert but untouched
    # This option is only meaningful when
    # ignore_duplicate_updates=True (the default)
    return_untouched=True,
)

# Print untouched rows
print(results.untouched)

pgbulk.sync

pgbulk.sync(queryset, model_objs, unique_fields, update_fields=None, returning=False, ignore_duplicate_updates=True, return_untouched=False)[source]

Perform a bulk sync on a table. A sync is an upsert on a list of model objects followed by a delete on the queryset whose elements were not in the list of models.

Parameters
  • queryset (Model|QuerySet) – A model or a queryset that defines the collection to sync. If a model is provided, all rows are candidates for the sync operation

  • model_objs (List[Model]) – A list of Django models to sync. All models in this list will be bulk upserted. Any models in the queryset that are not present in this list will be deleted.

  • unique_fields (List[str]) – A list of fields that define the uniqueness of the model. The model must have a unique constraint on these fields

  • update_fields (List[str], default=None) – A list of fields to update whenever objects already exist. If an empty list is provided, it is equivalent to doing a bulk insert on the objects that don’t exist. If None, all fields will be updated.

  • returning (bool|List[str]) – If True, returns all fields. If a list, only returns fields in the list

  • ignore_duplicate_updates (bool, default=True) – Don’t perform an update if the row is a duplicate.

Examples

Sync two elements to a table with three elements. The sync will upsert two elements and delete the third:

# Assume the MyModel table has objects with int_field=1,2,3
# We will sync this table to two elements
results = pgbulk.sync(
    MyModel.objects.all(),
    [
        MyModel(int_field=1, some_attr='some_val1'),
        MyModel(int_field=2, some_attr='some_val2')
    ],
    ['int_field'],
    ['some_attr'],
)

# Print created, updated, untouched, and deleted rows from the sync
print(results.created)
print(results.updated)
print(results.untouched)
print(results.deleted)