Query & search registries

This guide walks through all the ways of finding metadata records in LaminDB registries.

# !pip install lamindb
!lamin init --storage ./test-registries
Hide code cell output
→ connected lamindb: testuser1/test-registries

We’ll need some toy data.

import lamindb as ln

# create toy data
ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()

# see the content of the artifact registry
ln.Artifact.df()
Hide code cell output
→ connected lamindb: testuser1/test-registries
! no run & transform got linked, call `ln.track()` & re-run
! no run & transform got linked, call `ln.track()` & re-run
! no run & transform got linked, call `ln.track()` & re-run
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
3 lvSiRnhK62OP8aHz0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 2024-11-11 10:40:34.765474+00:00 1
2 unDx00VJiDuNvQwk0000 None True The iris collection None .parquet dataset 5097 K1jn6pPlqIC6ebZQfW84NQ None None md5 DataFrame 1 True 1 None None 2024-11-11 10:40:34.752946+00:00 1
1 mjpAC6hcA7OdpcFF0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 2024-11-11 10:40:34.652138+00:00 1

Look up metadata

For registries with less than 100k records, auto-completing a Lookup object is the most convenient way of finding a record.

For example, take the User registry:

# query the database for all users, optionally pass the field that creates the key
users = ln.User.lookup(field="handle")

# the lookup object is a NamedTuple
users
Hide code cell output
Lookup(testuser1=User(uid='DzTjkKse', handle='testuser1', name='Test User1', created_at=2024-11-11 10:40:30 UTC), dict=<bound method Lookup.dict of <lamin_utils._lookup.Lookup object at 0x7feb78cdd670>>)

With auto-complete, we find a specific user record:

user = users.testuser1
user
Hide code cell output
User(uid='DzTjkKse', handle='testuser1', name='Test User1', created_at=2024-11-11 10:40:30 UTC)

You can also get a dictionary:

users_dict = ln.User.lookup().dict()
users_dict
Hide code cell output
{'testuser1': User(uid='DzTjkKse', handle='testuser1', name='Test User1', created_at=2024-11-11 10:40:30 UTC)}

Query exactly one record

get errors if more than one matching records are found.

# by the universal base62 uid
ln.User.get("DzTjkKse")

# by any expression involving fields
ln.User.get(handle="testuser1")
Hide code cell output
User(uid='DzTjkKse', handle='testuser1', name='Test User1', created_at=2024-11-11 10:40:30 UTC)

Query sets of records

Filter for all artifacts created by a user:

ln.Artifact.filter(created_by=user).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
1 mjpAC6hcA7OdpcFF0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 2024-11-11 10:40:34.652138+00:00 1
2 unDx00VJiDuNvQwk0000 None True The iris collection None .parquet dataset 5097 K1jn6pPlqIC6ebZQfW84NQ None None md5 DataFrame 1 True 1 None None 2024-11-11 10:40:34.752946+00:00 1
3 lvSiRnhK62OP8aHz0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 2024-11-11 10:40:34.765474+00:00 1

To access the results encoded in a filter statement, execute its return value with one of:

  • .df(): A pandas DataFrame with each record in a row.

  • .all(): A QuerySet.

  • .one(): Exactly one record. Will raise an error if there is none. Is equivalent to the .get() method shown above.

  • .one_or_none(): Either one record or None if there is no query result.

Note

filter() returns a QuerySet.

The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.

Under the hood, any .filter() call translates into a SQL select statement.

.one() and .one_or_none() are two parts of LaminDB’s API that are borrowed from SQLAlchemy.

Search for records

Search the toy data:

ln.Artifact.search("iris").df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
2 unDx00VJiDuNvQwk0000 None True The iris collection None .parquet dataset 5097 K1jn6pPlqIC6ebZQfW84NQ None None md5 DataFrame 1 True 1 None None 2024-11-11 10:40:34.752946+00:00 1

Let us create 500 notebook objects with fake titles, save, and search them:

transforms = [ln.Transform(name=title, type="notebook") for title in ln.core.datasets.fake_bio_notebook_titles(n=500)]
ln.save(transforms)

# search
ln.Transform.search("intestine").df().head(5)
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_at created_by_id
id
4 7t1VAx2SKuW50000 None True Igm IgD visualize Epididymal basal cell invest... None None notebook None None None None None 2024-11-11 10:40:44.039254+00:00 1
13 YgKR4VACKH0U0000 None True Igg1 research intestine IgM IgG2 intestine. None None notebook None None None None None 2024-11-11 10:40:44.040112+00:00 1
22 vk2p80eYhzR70000 None True Intestine IgE Cytotoxic T cell IgD efficiency ... None None notebook None None None None None 2024-11-11 10:40:44.040966+00:00 1
31 xOZBdtZb1CZs0000 None True Igg2 Purkinje fiber research intestine IgM Dou... None None notebook None None None None None 2024-11-11 10:40:44.041837+00:00 1
35 UciCr1qE4TAs0000 None True Cytotoxic T Cell investigate intestine result ... None None notebook None None None None None 2024-11-11 10:40:44.042221+00:00 1

Note

Currently, the LaminHub UI search is more powerful than the search of the lamindb open-source package.

Leverage relations

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations and leverage different comparators.

ln.Artifact.filter(created_by__handle__startswith="testuse").df()  
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
1 mjpAC6hcA7OdpcFF0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 2024-11-11 10:40:34.652138+00:00 1
2 unDx00VJiDuNvQwk0000 None True The iris collection None .parquet dataset 5097 K1jn6pPlqIC6ebZQfW84NQ None None md5 DataFrame 1 True 1 None None 2024-11-11 10:40:34.752946+00:00 1
3 lvSiRnhK62OP8aHz0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 2024-11-11 10:40:34.765474+00:00 1

The filter selects all artifacts based on the users who ran the generating notebook.

Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.

Comparators

You can qualify the type of comparison in a query by using a comparator.

Below follows a list of the most import, but Django supports about two dozen field comparators field__comparator=value.

and

ln.Artifact.filter(suffix=".jpg", created_by=user).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
1 mjpAC6hcA7OdpcFF0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 2024-11-11 10:40:34.652138+00:00 1

less than/ greater than

Or subset to artifacts smaller than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.

ln.Artifact.filter(created_by=user, size__lt=1e4).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
2 unDx00VJiDuNvQwk0000 None True The iris collection None .parquet dataset 5097 K1jn6pPlqIC6ebZQfW84NQ None None md5 DataFrame 1 True 1 None None 2024-11-11 10:40:34.752946+00:00 1
3 lvSiRnhK62OP8aHz0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 2024-11-11 10:40:34.765474+00:00 1

in

ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
1 mjpAC6hcA7OdpcFF0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 2024-11-11 10:40:34.652138+00:00 1
3 lvSiRnhK62OP8aHz0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 2024-11-11 10:40:34.765474+00:00 1

order by

ln.Artifact.filter().order_by("-updated_at").df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
3 lvSiRnhK62OP8aHz0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 2024-11-11 10:40:34.765474+00:00 1
2 unDx00VJiDuNvQwk0000 None True The iris collection None .parquet dataset 5097 K1jn6pPlqIC6ebZQfW84NQ None None md5 DataFrame 1 True 1 None None 2024-11-11 10:40:34.752946+00:00 1
1 mjpAC6hcA7OdpcFF0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 2024-11-11 10:40:34.652138+00:00 1

contains

ln.Transform.filter(name__contains="search").df().head(5)
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_at created_by_id
id
7 yV5XansSwjGY0000 None True Study Golgi cells efficiency research. None None notebook None None None None None 2024-11-11 10:40:44.039542+00:00 1
12 6RZUL07l3R9q0000 None True Ige IgD research Lungs Lungs IgG3. None None notebook None None None None None 2024-11-11 10:40:44.040017+00:00 1
13 YgKR4VACKH0U0000 None True Igg1 research intestine IgM IgG2 intestine. None None notebook None None None None None 2024-11-11 10:40:44.040112+00:00 1
15 Hf7eDPCF9K2C0000 None True Research investigate IgG2 IgG3. None None notebook None None None None None 2024-11-11 10:40:44.040302+00:00 1
31 xOZBdtZb1CZs0000 None True Igg2 Purkinje fiber research intestine IgM Dou... None None notebook None None None None None 2024-11-11 10:40:44.041837+00:00 1

And case-insensitive:

ln.Transform.filter(name__icontains="Search").df().head(5)
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_at created_by_id
id
7 yV5XansSwjGY0000 None True Study Golgi cells efficiency research. None None notebook None None None None None 2024-11-11 10:40:44.039542+00:00 1
12 6RZUL07l3R9q0000 None True Ige IgD research Lungs Lungs IgG3. None None notebook None None None None None 2024-11-11 10:40:44.040017+00:00 1
13 YgKR4VACKH0U0000 None True Igg1 research intestine IgM IgG2 intestine. None None notebook None None None None None 2024-11-11 10:40:44.040112+00:00 1
15 Hf7eDPCF9K2C0000 None True Research investigate IgG2 IgG3. None None notebook None None None None None 2024-11-11 10:40:44.040302+00:00 1
31 xOZBdtZb1CZs0000 None True Igg2 Purkinje fiber research intestine IgM Dou... None None notebook None None None None None 2024-11-11 10:40:44.041837+00:00 1

startswith

ln.Transform.filter(name__startswith="Research").df()
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_at created_by_id
id
15 Hf7eDPCF9K2C0000 None True Research investigate IgG2 IgG3. None None notebook None None None None None 2024-11-11 10:40:44.040302+00:00 1
34 L4ZywPc4LhWx0000 None True Research IgG2 IgG2 efficiency IgD. None None notebook None None None None None 2024-11-11 10:40:44.042126+00:00 1
45 b6xVdVPsXdR40000 None True Research Fallopian tubes Tonsils classify rese... None None notebook None None None None None 2024-11-11 10:40:44.043170+00:00 1
74 3DwIoxktWTWO0000 None True Research Apocrine sweat gland cluster IgG1 IgG... None None notebook None None None None None 2024-11-11 10:40:44.050800+00:00 1
105 t0K7E9esqeLK0000 None True Research Epididymal basal cell IgE. None None notebook None None None None None 2024-11-11 10:40:44.053651+00:00 1
133 lH1Uim3iBmOE0000 None True Research research classify Bone marrow IgG2 IgE. None None notebook None None None None None 2024-11-11 10:40:44.059835+00:00 1
146 ltnYhRZyHzTC0000 None True Research IgD study intestinal classify IgM Fal... None None notebook None None None None None 2024-11-11 10:40:44.061057+00:00 1
169 KYxtBH3jEueS0000 None True Research research Golgi cells IgG result candi... None None notebook None None None None None 2024-11-11 10:40:44.063183+00:00 1
280 UUqj7J62umXA0000 None True Research IgG1 rank Regulatory T cell. None None notebook None None None None None 2024-11-11 10:40:44.080536+00:00 1
309 zKzs5t4I0sg80000 None True Research IgG4 Lungs efficiency IgG2 study. None None notebook None None None None None 2024-11-11 10:40:44.083295+00:00 1
318 UO8FEqFOklB70000 None True Research IgM intestinal IgG1. None None notebook None None None None None 2024-11-11 10:40:44.084140+00:00 1
347 XKiSrgyb7phI0000 None True Research IgG2 IgE IgM research IgG4 research. None None notebook None None None None None 2024-11-11 10:40:44.090496+00:00 1
379 QCHkYZZzUl9Q0000 None True Research IgM Golgi cells IgG Regulatory T cell. None None notebook None None None None None 2024-11-11 10:40:44.093457+00:00 1
381 2SMvi2GK05wx0000 None True Research Penis IgM IgM classify Cytotoxic T ce... None None notebook None None None None None 2024-11-11 10:40:44.093642+00:00 1
477 vxMgKaYSGlL80000 None True Research Apocrine sweat gland IgM Seminal vesi... None None notebook None None None None None 2024-11-11 10:40:44.109752+00:00 1

or

ln.Artifact.filter(ln.Q(suffix=".jpg") | ln.Q(suffix=".fastq.gz")).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
1 mjpAC6hcA7OdpcFF0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 2024-11-11 10:40:34.652138+00:00 1
3 lvSiRnhK62OP8aHz0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 2024-11-11 10:40:34.765474+00:00 1

negate/ unequal

ln.Artifact.filter(~ln.Q(suffix=".jpg")).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
2 unDx00VJiDuNvQwk0000 None True The iris collection None .parquet dataset 5097 K1jn6pPlqIC6ebZQfW84NQ None None md5 DataFrame 1 True 1 None None 2024-11-11 10:40:34.752946+00:00 1
3 lvSiRnhK62OP8aHz0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 2024-11-11 10:40:34.765474+00:00 1

Clean up the test instance.

!rm -r ./test-registries
!lamin delete --force test-registries
Hide code cell output
• deleting instance testuser1/test-registries