Skip to content

Commit

Permalink
feat: Add parquet file detection (#578)
Browse files Browse the repository at this point in the history
Adds parquet file detection. See [docs](https://parquet.apache.org/docs/file-format/)
for specification

Co-authored-by: Keith Kelly <kkelly@morningconsult.com>
  • Loading branch information
kwkelly and Keith Kelly authored Sep 26, 2024
1 parent 4cc383c commit c4abedc
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 2 deletions.
2 changes: 2 additions & 0 deletions internal/magic/binary.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ var (
SWF = prefix([]byte("CWS"), []byte("FWS"), []byte("ZWS"))
// Torrent has bencoded text in the beginning.
Torrent = prefix([]byte("d8:announce"))
// PAR1 matches a parquet file.
Par1 = prefix([]byte{0x50, 0x41, 0x52, 0x31})
)

// Java bytecode and Mach-O binaries share the same magic number.
Expand Down
3 changes: 2 additions & 1 deletion supported_mimes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 175 Supported MIME types
## 176 Supported MIME types
This file is automatically generated when running tests. Do not edit manually.

Extension | MIME type | Aliases
Expand Down Expand Up @@ -143,6 +143,7 @@ Extension | MIME type | Aliases
**.glb** | model/gltf-binary | -
**.cab** | application/x-installshield | -
**.jxr** | image/jxr | image/vnd.ms-photo
**.parquet** | application/vnd.apache.parquet | application/x-parquet
**.txt** | text/plain | -
**.html** | text/html | -
**.svg** | image/svg+xml | -
Expand Down
Binary file added testdata/parquet.parquet
Binary file not shown.
4 changes: 3 additions & 1 deletion tree.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ var root = newMIME("application/octet-stream", "",
avi, flv, mkv, asf, aac, voc, m3u, rmvb, gzip, class, swf, crx, ttf, woff,
woff2, otf, ttc, eot, wasm, shx, dbf, dcm, rar, djvu, mobi, lit, bpg,
sqlite3, dwg, nes, lnk, macho, qcp, icns, hdr, mrc, mdb, accdb, zstd, cab,
rpm, xz, lzip, torrent, cpio, tzif, xcf, pat, gbr, glb, cabIS, jxr,
rpm, xz, lzip, torrent, cpio, tzif, xcf, pat, gbr, glb, cabIS, jxr, parquet,
// Keep text last because it is the slowest check.
text,
)
Expand Down Expand Up @@ -258,4 +258,6 @@ var (
xfdf = newMIME("application/vnd.adobe.xfdf", ".xfdf", magic.Xfdf)
glb = newMIME("model/gltf-binary", ".glb", magic.Glb)
jxr = newMIME("image/jxr", ".jxr", magic.Jxr).alias("image/vnd.ms-photo")
parquet = newMIME("application/vnd.apache.parquet", ".parquet", magic.Par1).
alias("application/x-parquet")
)

0 comments on commit c4abedc

Please sign in to comment.