-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Database driver Parquet
This guide provides instructions on how to set up and use Parquet files with DBeaver. The Parquet driver allows you to work with Parquet data as if it were in a database. You can retrieve data and apply filters, sorting, and other operations, even combining data from multiple files.
Before you start, you need to create a connection in DBeaver and select the appropriate Parquet driver. If you haven’t done this, see our Database Connection article.
Important: When using the Parquet driver, all connected Parquet files are read-only. To make changes, you need to update the original files outside DBeaver.
This section describes how to set up a connection using the Parquet driver. The connection settings page requires the following fields:
Field | Description |
---|---|
Connect by (Host/URL) | Choose whether to connect using a local host path or a URL. |
File paths | Specify the location of the Parquet file(s). You can: |
- File: Select a single Parquet file. | |
- Folder: Choose a directory containing multiple Parquet files. | |
Driver name | This field will be auto-filled based on your selected driver type. |
Driver Settings | If there are any specific driver settings, configure them here. |
Tip: When using the Folder option, DBeaver scans the directory up to two levels deep for Parquet files. For more information, see folder structure. If you select a folder, DBeaver organizes files in schemas based on their directory structure.
The Parquet driver supports the full range of SQL queries:
-
Simple queries (e.g.,
SELECT * FROM table
): Data is read directly from the Parquet file. -
Complex queries (e.g., using
WHERE
,JOIN
,ORDER BY
,GROUP BY
): When a complex query is executed for the first time, the driver imports the entire Parquet file into an internal database to enable advanced SQL functions. Subsequent queries run faster because the data is already imported into internal database.
When working with a folder containing multiple Parquet files, DBeaver organizes them as follows:
Folder structure | Schema in DBeaver |
---|---|
Root files |
Default schema |
Subfolder files | Schema named after the subfolder |
Files in deeper folders | Ignored |
If your folder looks like this:
Data/
├── employees.parquet
├── sales.parquet
└── Reports/
└── monthly.parquet
└── yearly.parquet
DBeaver will create:
-
Default
schema:employees
,sales
-
Reports
schema:monthly
,yearly
Tip: To focus on specific files, consider selecting individual files or folders when configuring the connection.
When you execute a complex query (such as WHERE
, JOIN
, GROUP BY
, or ORDER BY
.), on a Parquet file for the first
time, the Parquet driver processes the data by importing it into a temporary internal SQLite database.
By default, this internal database stores data temporarily on the disk during your session and is cleared when DBeaver
restarts. To speed up queries on the same file in future sessions, you can specify the internalDbFilePath
option in
the Driver properties tab (e.g., C:\User\database.db
) to reuse the processed data.
DBeaver provides additional features compatible with Parquet driver, but not exclusive to it:
Category | Feature |
---|---|
Data Transfer | Data Export |
Data Visualization | Visual Query Builder |
Charts |
-
Views
-
Search
-
Data management
-
Usability enhancements
-
Security
-
Projects
- Sample Database
-
Database Connections
- Edit Connection
- Invalidate/Reconnect to Database
- Disconnect from Database
- Change current user password
- Advanced settings
- Cloud configuration settings
- Local Client Configuration
- Connection Types
- Configure Connection Initialization Settings
- Tableau integration
- Transactions
- Drivers
- Tasks
- Cloud Explorer
- Cloud Storage
- Classic
- Cloud
- Embedded
- Changing interface language
- DBeaver extensions - Office, Debugger, SVG
- Installing extensions - Themes, version control, etc
- User Interface Themes
- Command Line
- Reset UI settings
- Reset workspace
- Troubleshooting system issues
- Posting issues
- Log files
- JDBC trace
- Thread dump
- Managing connections
- Managing variables
- Managing drivers
- Managing preferences
- Managing restrictions
- Windows Silent Install
- Snap installation