django-pgbulk¶
django-pgbulk
, forked from
django-manager-utils,
provides several optimized bulk operations for Postgres:
pgbulk.update
- For updating a list of models in bulk. Although Django provides abulk_update
in 2.2, it performs individual updates for every row and does not perform a native bulk update.pgbulk.upsert
- For doing a bulk update or insert. This function uses postgresUPDATE ON CONFLICT
syntax to perform an atomic upsert operation. There are several options to this function that allow the user to avoid touching rows if they result in a duplicate update, along with returning which rows were updated, created, or untouched. Users can also usemodels.F
objects on conflicts.pgbulk.sync
- For syncing a list of models with a table. Does a bulk upsert and also deletes any rows in the source queryset that were not part of the input data.
pgbulk.update¶
- pgbulk.update(queryset, model_objs, update_fields=None)[source]¶
Bulk updates a list of model objects that are already saved.
- Parameters
queryset (QuerySet) – The queryset to use when bulk updating
model_objs (List[Model]) – Model object values to use for the update
update_fields (List[str], default=None) – A list of fields on the model objects to update. If
None
, all fields will be updated.
Example
Update an attribute of multiple models in bulk:
import pgbulk pgbulk.update( MyModel, [ MyModel(id=1, some_attr='some_val1'), MyModel(id=2, some_attr='some_val2') ], # These are the fields that will be updated. If not provided, # all fields will be updated ['some_attr'] )
pgbulk.upsert¶
- pgbulk.upsert(queryset, model_objs, unique_fields, update_fields=None, returning=False, ignore_duplicate_updates=True, return_untouched=False)[source]¶
Perform a bulk upsert on a table.
- Parameters
queryset (Model|QuerySet) – A model or a queryset that defines the collection to upsert
model_objs (List[Model]) – A list of Django models to upsert. All models in this list will be bulk upserted.
unique_fields (List[str]) – A list of fields that define the uniqueness of the model. The model must have a unique constraint on these fields
update_fields (List[str], default=None) – A list of fields to update whenever objects already exist. If an empty list is provided, it is equivalent to doing a bulk insert on the objects that don’t exist. If
None
, all fields will be updated. If you want to perform an expression such as anF
object on a field when it is updated, use thepgbulk.UpdateField
class. See examples below.returning (bool|List[str], default=False) – If True, returns all fields. If a list, only returns fields in the list. If False, do not return results from the upsert.
ignore_duplicate_updates (bool, default=True) – Don’t perform an update if all columns are identical to the row in the database.
return_untouched (bool, default=False) – When
ignore_duplicate_updates
isTrue
, untouched rows will not be returned in upsert results. Set this toTrue
to return untouched rows.
Examples
A basic bulk upsert on a model:
import pgbulk pgbulk.upsert( MyModel, [ MyModel(int_field=1, some_attr='some_val1'), MyModel(int_field=2, some_attr='some_val2') ], # These are the fields that identify the uniqueness constraint. ['int_field'], # These are the fields that will be updated if the row already # exists. If not provided, all fields will be updated ['some_attr'] )
Return the results of an upsert:
results = pgbulk.upsert( MyModel, [ MyModel(int_field=1, some_attr='some_val1'), MyModel(int_field=2, some_attr='some_val2') ], ['int_field'], ['some_attr'], # ``True`` will return all columns. One can also explicitly # list which columns will be returned returning=True ) # Print which results were created print(results.created) # Print which results were updated. # By default, if an update results in no changes, it will not # be updated and will not be returned. print(results.updated)
Upsert values and update rows even when the update is meaningless (i.e. a duplicate update). This is turned off by default, but it can be enabled in case postgres triggers or other processes need to happen as a result of an update:
pgbulk.upsert( MyModel, [ MyModel(int_field=1, some_attr='some_val1'), MyModel(int_field=2, some_attr='some_val2') ], ['int_field'], ['some_attr'], # Perform updates in the database even if it's a duplicate # update. ignore_duplicate_updates=False )
Upsert values and ignore duplicate updates, but still return the rows that were untouched by the upsert:
results = pgbulk.upsert( MyModel, [ MyModel(int_field=1, some_attr='some_val1'), MyModel(int_field=2, some_attr='some_val2') ], ['int_field'], ['some_attr'], returning=True, # Even though we don't perform an update on a duplicate, # return the row that was part of the upsert but untouched # This option is only meaningful when # ignore_duplicate_updates=True (the default) return_untouched=True, ) # Print untouched rows print(results.untouched)
Use an expression for a field if an update happens. In the example below, we increment
some_int_field
by one whenever an update happens. Otherwise it defaults to zero:results = pgbulk.upsert( MyModel, [ MyModel(some_int_field=0, some_key='a'), MyModel(some_int_field=0, some_key='b') ], ['some_key'], [ # Use the UpdateField class to specify an expression for the update. # If provided, the expression will be executed when an update happens. pgbulk.UpdateField('some_int_field', expression=models.F('some_int_field') + 1) ], )
pgbulk.sync¶
- pgbulk.sync(queryset, model_objs, unique_fields, update_fields=None, returning=False, ignore_duplicate_updates=True, return_untouched=False)[source]¶
Perform a bulk sync on a table. A sync is an upsert on a list of model objects followed by a delete on the queryset whose elements were not in the list of models.
- Parameters
queryset (Model|QuerySet) – A model or a queryset that defines the collection to sync. If a model is provided, all rows are candidates for the sync operation
model_objs (List[Model]) – A list of Django models to sync. All models in this list will be bulk upserted. Any models in the queryset that are not present in this list will be deleted.
unique_fields (List[str]) – A list of fields that define the uniqueness of the model. The model must have a unique constraint on these fields
update_fields (List[str], default=None) – A list of fields to update whenever objects already exist. If an empty list is provided, it is equivalent to doing a bulk insert on the objects that don’t exist. If
None
, all fields will be updated.returning (bool|List[str]) – If True, returns all fields. If a list, only returns fields in the list
ignore_duplicate_updates (bool, default=True) – Don’t perform an update if the row is a duplicate.
Examples
Sync two elements to a table with three elements. The sync will upsert two elements and delete the third:
# Assume the MyModel table has objects with int_field=1,2,3 # We will sync this table to two elements results = pgbulk.sync( MyModel.objects.all(), [ MyModel(int_field=1, some_attr='some_val1'), MyModel(int_field=2, some_attr='some_val2') ], ['int_field'], ['some_attr'], ) # Print created, updated, untouched, and deleted rows from the sync print(results.created) print(results.updated) print(results.untouched) print(results.deleted)