Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db.batch_commit() mode #1539

Closed
HarryR opened this issue Mar 14, 2018 · 2 comments
Closed

db.batch_commit() mode #1539

HarryR opened this issue Mar 14, 2018 · 2 comments

Comments

@HarryR
Copy link

HarryR commented Mar 14, 2018

db = SqliteDatabase(':memory:')
with db.batch_commit(10):
    for n in range(0, 1234):
        Model.create(**...)

Every N'th model creation commits the transaction, the equivalent Python without batch_commit is:

with db.manual_commit():
    db.begin():
    for n, row in enumerate(...):
        Model.create(**row)
        if n % 100 == 0:
            db.commit()
            db.begin()
    db.commit()

Having a batch_commit would make it easier/simpler to make batch inserts faster.

Alternatively there is Model.insert_many, but to create multiple linked models at the same time the batch_commit method could be useful.

Using db.atomic() works unless you have a large number of items to insert, in that case periodic commits are used to ensure data is synchronised to disk, e.g.:

with db.atomic():
    for row in data:
        row.update(defaults)
        Model.create(**row)

it's probably easier to do it using a batch iterator instead of modifying peewee:

for work in batch(rows, n=1234):
    with db.atomic():
        for row in work:
            Model.create(**row)

Feel free to close, I answered own question...

@coleifer
Copy link
Owner

I originally thought that the 10 in the call to batch_commit() was the total number of batches, but on reading again I think you mean to imply a batch size of 10, or, a commit every 10 rows?

Something like that should be do-able, but it is probably cleaner to pass an iterator, e.g.:

for row in db.batch_commit(list_of_data, 100):
    Model.create(**row)

Continuing that thought, the batch_commit would be a generator that would:

# Helper.
def chunked(iterable, n):
    marker = object()
    for group in (list(g) for g in izip_longest(*[iter(iterable)] * n,
                                                fillvalue=marker)):
        if group[-1] is marker:
            del group[group.index(marker):]
        yield group

# method on Database
def batch_commit(self, it, n):
    for obj_group in chunked(it, n):
        with self.atomic():
            for obj in obj_group:
                yield obj

I didn't test this yet, so I'm not sure if it works. Just thinking out loud.

coleifer added a commit that referenced this issue Mar 14, 2018
@coleifer
Copy link
Owner

Added in 3d4e6e4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants