Edited by dargueta, 04 September 2009 - 03:26 PM.
Detect Character Set
Started by dargueta, Sep 03 2009 07:41 PM
14 replies to this topic
#1
Posted 03 September 2009 - 07:41 PM
How would one go about detecting the character set / encoding of an arbitrary text file in a portable manner? And how would one tell the difference between, say, UTF-8 and a binary file?
sudo rm -rf /
|
|
|
#2
Guest_h4x_*
Posted 04 September 2009 - 06:00 AM
Guest_h4x_*
isnt utf-8 binary file?
only way is to create char maps, assign byte(s) to graphical representation and let user choose it. Its not computer job.
only way is to create char maps, assign byte(s) to graphical representation and let user choose it. Its not computer job.
#3
Posted 04 September 2009 - 09:13 AM
H4x, you clearly don't know what you're talking about. The Gnome GEdit text editor does it on its own just fine. I was wondering how they do it.
sudo rm -rf /
#4
Posted 04 September 2009 - 12:24 PM
That's a really good question,
Linux systems seem to know what file type
and char set automatically.
I certainly don't know, but found this link
Tux Love: Hidden Linux : File mysteries
very interesting... :rolleyes:
Linux systems seem to know what file type
and char set automatically.
I certainly don't know, but found this link
Tux Love: Hidden Linux : File mysteries
very interesting... :rolleyes:
#5
Posted 04 September 2009 - 03:26 PM
#6
Posted 04 September 2009 - 04:09 PM
Interesting read, thanks.
proudly presenting my personal website and game website: F1Simulation. a thrilling Managed DirectX racing game... also my Ask Me
look at my tutorials about cropping images and Mono: bundling Mono with programs and lambda expressions
look at my tutorials about cropping images and Mono: bundling Mono with programs and lambda expressions
#7
Guest_h4x_*
Posted 04 September 2009 - 08:24 PM
Guest_h4x_*
no. computer cant decide what type is it. you must assign program to run file, this is what extension is for. gnome has option to do that, no sucky checks.
doing otherwise is asking for problems (bugs). lets keep things simple, dont mix ai with computer.
doing otherwise is asking for problems (bugs). lets keep things simple, dont mix ai with computer.
#8
Posted 04 September 2009 - 09:06 PM
H4x...really. Did you even read the link debtboy posted?
If you're going to post something as fact, make sure it's not your personal opinion first. I have yet to see you post a single link or make a single quotation to back up your snide, highly biased, and blind opinions. I have no reason to listen to you--and neither does anyone else--unless you back up your own statements with evidence or concrete examples. I'd suggest you calm down, get off your high horse, and condescend to talk to the rest of us inferior programmers with a civil tone and back up what you say with documents or code examples that we can test and verify.
If you're going to post something as fact, make sure it's not your personal opinion first. I have yet to see you post a single link or make a single quotation to back up your snide, highly biased, and blind opinions. I have no reason to listen to you--and neither does anyone else--unless you back up your own statements with evidence or concrete examples. I'd suggest you calm down, get off your high horse, and condescend to talk to the rest of us inferior programmers with a civil tone and back up what you say with documents or code examples that we can test and verify.
sudo rm -rf /
#9
Posted 05 September 2009 - 01:18 AM
Quote
You'll find a list magic numbers in /usr/share/file/magic. You can add your own file types in /etc/magic (to make them system-wide) or $HOME/.magic locally. The format is described -- with no offence to feminists intended -- in man magic.
In debtboy's article I found a reference to magic numbers, which are used to determine a file's format. If someone could post their /usr/share/file/magic file so we can see what is there, then it would be nice. I am not sure is this related to checking charset, but at least it's some clue to investigate.
Man page for magic: MAGIC
proudly presenting my personal website and game website: F1Simulation. a thrilling Managed DirectX racing game... also my Ask Me
look at my tutorials about cropping images and Mono: bundling Mono with programs and lambda expressions
look at my tutorials about cropping images and Mono: bundling Mono with programs and lambda expressions
#10
Posted 05 September 2009 - 05:04 AM
I have a few magic files in different places than mentioned of course,
each distro seems to relocate things, my distro is gentoo.
/usr/share/mime/magic (binary file)
/usr/share/misc/file/magic.mgc (binary file)
/usr/share/misc/file/magic.mime.mgc (binary file)
Now for the text files, I posted only a portion of each below,
because both files are rather large (the magic files is nearly 500K)
/usr/share/misc/file/magic
/usr/share/misc/file/magic.mime
/usr/share/misc/file/magic
/usr/share/misc/magic.mime
each distro seems to relocate things, my distro is gentoo.
/usr/share/mime/magic (binary file)
/usr/share/misc/file/magic.mgc (binary file)
/usr/share/misc/file/magic.mime.mgc (binary file)
Now for the text files, I posted only a portion of each below,
because both files are rather large (the magic files is nearly 500K)
/usr/share/misc/file/magic
/usr/share/misc/file/magic.mime
/usr/share/misc/file/magic
# Magic
# Magic data for file(1) command.
# Machine-generated from src/cmd/file/magdir/*; edit there only!
# Format is described in magic(files), where:
# files is 5 on V7 and BSD, 4 on SV, and ?? in the SVID.
#------------------------------------------------------------------------------
# Localstuff: file(1) magic for locally observed files
#
# $File: Localstuff,v 1.4 2003/03/23 04:17:27 christos Exp $
# Add any locally observed files here. Remember:
# text if readable, executable if runnable binary, data if unreadable.
#------------------------------------------------------------------------------
# acorn: file(1) magic for files found on Acorn systems
#
# RISC OS Chunk File Format
# From RISC OS Programmer's Reference Manual, Appendix D
# We guess the file type from the type of the first chunk.
0 lelong 0xc3cbc6c5 RISC OS Chunk data
>12 string OBJ_ \b, AOF object
>12 string LIB_ \b, ALF library
# RISC OS AIF, contains "SWI OS_Exit" at offset 16.
16 lelong 0xef000011 RISC OS AIF executable
# RISC OS Draw files
# From RISC OS Programmer's Reference Manual, Appendix E
0 string Draw RISC OS Draw file data
# RISC OS new format font files
# From RISC OS Programmer's Reference Manual, Appendix E
0 string FONT\0 RISC OS outline font data,
>5 byte x version %d
0 string FONT\1 RISC OS 1bpp font data,
>5 byte x version %d
0 string FONT\4 RISC OS 4bpp font data
>5 byte x version %d
# RISC OS Music files
# From RISC OS Programmer's Reference Manual, Appendix E
0 string Maestro\r RISC OS music file
>8 byte x version %d
#------------------------------------------------------------------------------
# adi: file(1) magic for ADi's objects
# From Gregory McGarry <g.mcgarry@ieee.org>
#
0 leshort 0x521c COFF DSP21k
>18 lelong &02 executable,
>18 lelong ^02
>>18 lelong &01 static object,
>>18 lelong ^01 relocatable object,
>18 lelong &010 stripped
>18 lelong ^010 not stripped
#------------------------------------------------------------------------------
# adventure: file(1) magic for Adventure game files
#
# from Allen Garvin <earendil@faeryland.tamu-commerce.edu>
# Edited by Dave Chapeskie <dchapes@ddm.on.ca> Jun 28, 1998
# Edited by Chris Chittleborough <cchittleborough@yahoo.com.au>, March 2002
#
# ALAN
# I assume there are other, lower versions, but these are the only ones I
# saw in the archive.
0 beshort 0x0206 ALAN game data
>2 byte <10 version 2.6%d
# Infocom (see z-machine)
#------------------------------------------------------------------------------
# Z-machine: file(1) magic for Z-machine binaries.
#
# This will match ${TEX_BASE}/texmf/omega/ocp/char2uni/inbig5.ocp which
# appears to be a version-0 Z-machine binary.
#
# The (false match) message is to correct that behavior. Perhaps it is
# not needed.
#
16 belong&0xfe00f0f0 0x3030 Infocom game data
>0 ubyte 0 (false match)
>0 ubyte >0 (Z-machine %d,
>>2 ubeshort x Release %d /
>>18 string >\0 Serial %.6s)
#------------------------------------------------------------------------------
# Glulx: file(1) magic for Glulx binaries.
#
# I haven't checked for false matches yet.
#
0 string Glul Glulx game data
>4 beshort x (Version %d
>>6 byte x \b.%d
>>8 byte x \b.%d)
>36 string Info Compiled by Inform
# For Quetzal and blorb magic see iff
# TADS (Text Adventure Development System)
# All files are machine-independent (games compile to byte-code) and are tagged
# with a version string of the form "V2.<digit>.<digit>\0" (but TADS 3 is
# on the way).
# Game files start with "TADS2 bin\n\r\032\0" then the compiler version.
0 string TADS2\ bin TADS
>9 belong !0x0A0D1A00 game data, CORRUPTED
>9 belong 0x0A0D1A00
>>13 string >\0 %s game data
# Resource files start with "TADS2 rsc\n\r\032\0" then the compiler version.
0 string TADS2\ rsc TADS
>9 belong !0x0A0D1A00 resource data, CORRUPTED
>9 belong 0x0A0D1A00
>>13 string >\0 %s resource data
# Some saved game files start with "TADS2 save/g\n\r\032\0", a little-endian
# 2-byte length N, the N-char name of the game file *without* a NUL (darn!),
# "TADS2 save\n\r\032\0" and the interpreter version.
0 string TADS2\ save/g TADS
>12 belong !0x0A0D1A00 saved game data, CORRUPTED
>12 belong 0x0A0D1A00
>>(16.s+32) string >\0 %s saved game data
# Other saved game files start with "TADS2 save\n\r\032\0" and the interpreter
# version.
0 string TADS2\ save TADS
>10 belong !0x0A0D1A00 saved game data, CORRUPTED
>10 belong 0x0A0D1A00
>>14 string >\0 %s saved game data
#------------------------------------------------------------------------------
# allegro: file(1) magic for Allegro datafiles
# Toby Deshane <hac@shoelace.digivill.net>
#
0 belong 0x736C6821 Allegro datafile (packed)
0 belong 0x736C682E Allegro datafile (not packed/autodetect)
0 belong 0x736C682B Allegro datafile (appended exe data)
#------------------------------------------------------------------------------
# alliant: file(1) magic for Alliant FX series a.out files
#
# If the FX series is the one that had a processor with a 68K-derived
# instruction set, the "short" should probably become "beshort" and the
# "long" should probably become "belong".
# If it's the i860-based one, they should probably become either the
# big-endian or little-endian versions, depending on the mode they ran
# the 860 in....
#
0 short 0420 0420 Alliant virtual executable
>2 short &0x0020 common library
>16 long >0 not stripped
0 short 0421 0421 Alliant compact executable
>2 short &0x0020 common library
>16 long >0 not stripped
#------------------------------------------------------------------------------
# alpha architecture description
#
0 leshort 0603 COFF format alpha
>22 leshort&030000 !020000 executable
>24 leshort 0410 pure
>24 leshort 0413 paged
>22 leshort&020000 !0 dynamically linked
>16 lelong !0 not stripped
>16 lelong 0 stripped
>22 leshort&030000 020000 shared library
>24 leshort 0407 object
>27 byte x - version %d
>26 byte x .%d
>28 byte x -%d
# Basic recognition of Digital UNIX core dumps - Mike Bremford <mike@opac.bl.uk>
#
# The actual magic number is just "Core", followed by a 2-byte version
# number; however, treating any file that begins with "Core" as a Digital
# UNIX core dump file may produce too many false hits, so we include one
# byte of the version number as well; DU 5.0 appears only to be up to
# version 2.
#
0 string Core\001 Alpha COFF format core dump (Digital UNIX)
>24 string >\0 \b, from '%s'
0 string Core\002 Alpha COFF format core dump (Digital UNIX)
>24 string >\0 \b, from '%s'
#------------------------------------------------------------------------------
# amanda: file(1) magic for amanda file format
#
0 string AMANDA:\ AMANDA
>8 string TAPESTART\ DATE tape header file,
>>23 string X
>>>25 string >\ Unused %s
>>23 string >\ DATE %s
>8 string FILE\ dump file,
>>13 string >\ DATE %s
#------------------------------------------------------------------------------
# amigaos: file(1) magic for AmigaOS binary formats:
#
# From [email]ignatios@cs.uni-bonn.de[/email] (Ignatios Souvatzis)
#
0 belong 0x000003fa AmigaOS shared library
0 belong 0x000003f3 AmigaOS loadseg()ble executable/binary
0 belong 0x000003e7 AmigaOS object/library data
#
0 beshort 0xe310 Amiga Workbench
>2 beshort 1
>>48 byte 1 disk icon
>>48 byte 2 drawer icon
>>48 byte 3 tool icon
>>48 byte 4 project icon
>>48 byte 5 garbage icon
>>48 byte 6 device icon
>>48 byte 7 kickstart icon
>>48 byte 8 workbench application icon
>2 beshort >1 icon, vers. %d
#
# various sound formats from the Amiga
# G=F6tz Waschk <waschk@informatik.uni-rostock.de>
#
0 string FC14 Future Composer 1.4 Module sound file
0 string SMOD Future Composer 1.3 Module sound file
0 string AON4artofnoise Art Of Noise Module sound file
1 string MUGICIAN/SOFTEYES Mugician Module sound file
58 string SIDMON\ II\ -\ THE Sidmon 2.0 Module sound file
0 string Synth4.0 Synthesis Module sound file
0 string ARP. The Holy Noise Module sound file
0 string BeEp\0 JamCracker Module sound file
0 string COSO\0 Hippel-COSO Module sound file
# Too simple (short, pure ASCII, deep), MPi
#26 string V.3 Brian Postma's Soundmon Module sound file v3
#26 string BPSM Brian Postma's Soundmon Module sound file v3
#26 string V.2 Brian Postma's Soundmon Module sound file v2
# The following are from: "Stefan A. Haubenthal" <polluks@web.de>
0 beshort 0x0f00 AmigaOS bitmap font
0 beshort 0x0f03 AmigaOS outline font
0 belong 0x80001001 AmigaOS outline tag
0 string ##\ version catalog translation
0 string EMOD\0 Amiga E module
8 string ECXM\0 ECX module
0 string/c @database AmigaGuide file
# Amiga disk types
#
0 string RDSK Rigid Disk Block
>160 string x on %.24s
0 string DOS\0 Amiga DOS disk
0 string DOS\1 Amiga FFS disk
0 string DOS\2 Amiga Inter DOS disk
0 string DOS\3 Amiga Inter FFS disk
0 string DOS\4 Amiga Fastdir DOS disk
0 string DOS\5 Amiga Fastdir FFS disk
0 string KICK Kickstart disk
# From: Alex Beregszaszi <alex@fsn.hu>
0 string LZX LZX compressed archive (Amiga)
#------------------------------------------------------------------------------
# animation: file(1) magic for animation/movie formats
#
# animation formats
# MPEG, FLI, DL originally from [email]vax@ccwf.cc.utexas.edu[/email] (VaX#n8)
# FLC, SGI, Apple originally from Daniel Quinlan (quinlan@yggdrasil.com)
# SGI and Apple formats
0 string MOVI Silicon Graphics movie file
4 string moov Apple QuickTime
>12 string mvhd \b movie (fast start)
>12 string mdra \b URL
>12 string cmov \b movie (fast start, compressed header)
>12 string rmra \b multiple URLs
4 string mdat Apple QuickTime movie (unoptimized)
4 string wide Apple QuickTime movie (unoptimized)
4 string skip Apple QuickTime movie (modified)
4 string free Apple QuickTime movie (modified)
4 string idsc Apple QuickTime image (fast start)
4 string idat Apple QuickTime image (unoptimized)
4 string pckg Apple QuickTime compressed archive
4 string/B jP JPEG 2000 image
4 string ftyp ISO Media
>8 string isom \b, MPEG v4 system, version 1
>8 string iso2 \b, MPEG v4 system, part 12 revision
>8 string mp41 \b, MPEG v4 system, version 1
>8 string mp42 \b, MPEG v4 system, version 2
>8 string mp7t \b, MPEG v4 system, MPEG v7 XML
>8 string mp7b \b, MPEG v4 system, MPEG v7 binary XML
>8 string/B jp2 \b, JPEG 2000
>8 string 3gp \b, MPEG v4 system, 3GPP
>>11 byte 4 \b v4 (H.263/AMR GSM 6.10)
>>11 byte 5 \b v5 (H.263/AMR GSM 6.10)
>>11 byte 6 \b v6 (ITU H.264/AMR GSM 6.10)
>8 string mmp4 \b, MPEG v4 system, 3GPP Mobile
>8 string avc1 \b, MPEG v4 system, 3GPP JVT AVC
>8 string/B M4A \b, MPEG v4 system, iTunes AAC-LC
>8 string/B M4P \b, MPEG v4 system, iTunes AES encrypted
>8 string/B M4B \b, MPEG v4 system, iTunes bookmarked
>8 string/B qt \b, Apple QuickTime movie
# MPEG sequences
# Scans for all common MPEG header start codes
0 belong 0x00000001 JVT NAL sequence
>4 byte&0x1F 0x07 \b, H.264 video
>>5 byte 66 \b, baseline
>>5 byte 77 \b, main
>>5 byte 88 \b, extended
>>7 byte x \b @ L %u
0 belong&0xFFFFFF00 0x00000100 MPEG sequence
>3 byte 0xBA
>>4 byte &0x40 \b, v2, program multiplex
>>4 byte ^0x40 \b, v1, system multiplex
>3 byte 0xBB \b, v1/2, multiplex (missing pack header)
>3 byte&0x1F 0x07 \b, H.264 video
>>4 byte 66 \b, baseline
>>4 byte 77 \b, main
>>4 byte 88 \b, extended
>>6 byte x \b @ L %u
>3 byte 0xB0 \b, v4
>>5 belong 0x000001B5
>>>9 byte &0x80
>>>>10 byte&0xF0 16 \b, video
>>>>10 byte&0xF0 32 \b, still texture
>>>>10 byte&0xF0 48 \b, mesh
>>>>10 byte&0xF0 64 \b, face
>>>9 byte&0xF8 8 \b, video
>>>9 byte&0xF8 16 \b, still texture
>>>9 byte&0xF8 24 \b, mesh
>>>9 byte&0xF8 32 \b, face
>>4 byte 1 \b, simple @ L1
>>4 byte 2 \b, simple @ L2
>>4 byte 3 \b, simple @ L3
>>4 byte 4 \b, simple @ L0
>>4 byte 17 \b, simple scalable @ L1
>>4 byte 18 \b, simple scalable @ L2
>>4 byte 33 \b, core @ L1
>>4 byte 34 \b, core @ L2
>>4 byte 50 \b, main @ L2
>>4 byte 51 \b, main @ L3
>>4 byte 53 \b, main @ L4
>>4 byte 66 \b, n-bit @ L2
>>4 byte 81 \b, scalable texture @ L1
>>4 byte 97 \b, simple face animation @ L1
>>4 byte 98 \b, simple face animation @ L2
>>4 byte 99 \b, simple face basic animation @ L1
>>4 byte 100 \b, simple face basic animation @ L2
>>4 byte 113 \b, basic animation text @ L1
>>4 byte 114 \b, basic animation text @ L2
>>4 byte 129 \b, hybrid @ L1
>>4 byte 130 \b, hybrid @ L2
>>4 byte 145 \b, advanced RT simple @ L!
>>4 byte 146 \b, advanced RT simple @ L2
>>4 byte 147 \b, advanced RT simple @ L3
>>4 byte 148 \b, advanced RT simple @ L4
>>4 byte 161 \b, core scalable @ L1
>>4 byte 162 \b, core scalable @ L2
>>4 byte 163 \b, core scalable @ L3
>>4 byte 177 \b, advanced coding efficiency @ L1
>>4 byte 178 \b, advanced coding efficiency @ L2
>>4 byte 179 \b, advanced coding efficiency @ L3
>>4 byte 180 \b, advanced coding efficiency @ L4
>>4 byte 193 \b, advanced core @ L1
>>4 byte 194 \b, advanced core @ L2
>>4 byte 209 \b, advanced scalable texture @ L1
>>4 byte 210 \b, advanced scalable texture @ L2
>>4 byte 211 \b, advanced scalable texture @ L3
>>4 byte 225 \b, simple studio @ L1
>>4 byte 226 \b, simple studio @ L2
>>4 byte 227 \b, simple studio @ L3
>>4 byte 228 \b, simple studio @ L4
>>4 byte 229 \b, core studio @ L1
>>4 byte 230 \b, core studio @ L2
>>4 byte 231 \b, core studio @ L3
>>4 byte 232 \b, core studio @ L4
>>4 byte 240 \b, advanced simple @ L0
>>4 byte 241 \b, advanced simple @ L1
>>4 byte 242 \b, advanced simple @ L2
>>4 byte 243 \b, advanced simple @ L3
>>4 byte 244 \b, advanced simple @ L4
>>4 byte 245 \b, advanced simple @ L5
>>4 byte 247 \b, advanced simple @ L3b
>>4 byte 248 \b, FGS @ L0
>>4 byte 249 \b, FGS @ L1
>>4 byte 250 \b, FGS @ L2
>>4 byte 251 \b, FGS @ L3
>>4 byte 252 \b, FGS @ L4
>>4 byte 253 \b, FGS @ L5
>3 byte 0xB5 \b, v4
>>4 byte &0x80
>>>5 byte&0xF0 16 \b, video (missing profile header)
>>>5 byte&0xF0 32 \b, still texture (missing profile header)
>>>5 byte&0xF0 48 \b, mesh (missing profile header)
>>>5 byte&0xF0 64 \b, face (missing profile header)
>>4 byte&0xF8 8 \b, video (missing profile header)
>>4 byte&0xF8 16 \b, still texture (missing profile header)
>>4 byte&0xF8 24 \b, mesh (missing profile header)
>>4 byte&0xF8 32 \b, face (missing profile header)
>3 byte 0xB3
>>12 belong 0x000001B8 \b, v1, progressive Y'CbCr 4:2:0 video
>>12 belong 0x000001B2 \b, v1, progressive Y'CbCr 4:2:0 video
>>12 belong 0x000001B5 \b, v2,
>>>16 byte&0x0F 1 \b HP
>>>16 byte&0x0F 2 \b Spt
>>>16 byte&0x0F 3 \b SNR
>>>16 byte&0x0F 4 \b MP
>>>16 byte&0x0F 5 \b SP
>>>17 byte&0xF0 64 \b@HL
>>>17 byte&0xF0 96 \b@H-14
>>>17 byte&0xF0 128 \b@ML
>>>17 byte&0xF0 160 \b@LL
>>>17 byte &0x08 \b progressive
>>>17 byte ^0x08 \b interlaced
>>>17 byte&0x06 2 \b Y'CbCr 4:2:0 video
>>>17 byte&0x06 4 \b Y'CbCr 4:2:2 video
>>>17 byte&0x06 6 \b Y'CbCr 4:4:4 video
>>11 byte &0x02
>>>75 byte &0x01
>>>>140 belong 0x000001B8 \b, v1, progressive Y'CbCr 4:2:0 video
>>>>140 belong 0x000001B2 \b, v1, progressive Y'CbCr 4:2:0 video
>>>>140 belong 0x000001B5 \b, v2,
>>>>>144 byte&0x0F 1 \b HP
>>>>>144 byte&0x0F 2 \b Spt
>>>>>144 byte&0x0F 3 \b SNR
>>>>>144 byte&0x0F 4 \b MP
>>>>>144 byte&0x0F 5 \b SP
>>>>>145 byte&0xF0 64 \b@HL
>>>>>145 byte&0xF0 96 \b@H-14
>>>>>145 byte&0xF0 128 \b@ML
>>>>>145 byte&0xF0 160 \b@LL
>>>>>145 byte &0x08 \b progressive
>>>>>145 byte ^0x08 \b interlaced
>>>>>145 byte&0x06 2 \b Y'CbCr 4:2:0 video
>>>>>145 byte&0x06 4 \b Y'CbCr 4:2:2 video
>>>>>145 byte&0x06 6 \b Y'CbCr 4:4:4 video
>>76 belong 0x000001B8 \b, v1, progressive Y'CbCr 4:2:0 video
>>76 belong 0x000001B2 \b, v1, progressive Y'CbCr 4:2:0 video
>>76 belong 0x000001B5 \b, v2,
>>>80 byte&0x0F 1 \b HP
>>>80 byte&0x0F 2 \b Spt
>>>80 byte&0x0F 3 \b SNR
>>>80 byte&0x0F 4 \b MP
>>>80 byte&0x0F 5 \b SP
>>>81 byte&0xF0 64 \b@HL
>>>81 byte&0xF0 96 \b@H-14
>>>81 byte&0xF0 128 \b@ML
>>>81 byte&0xF0 160 \b@LL
>>>81 byte &0x08 \b progressive
>>>81 byte ^0x08 \b interlaced
>>>81 byte&0x06 2 \b Y'CbCr 4:2:0 video
>>>81 byte&0x06 4 \b Y'CbCr 4:2:2 video
>>>81 byte&0x06 6 \b Y'CbCr 4:4:4 video
>>4 belong&0xFFFFFF00 0x78043800 \b, HD-TV 1920P
>>>7 byte&0xF0 0x10 \b, 16:9
>>4 belong&0xFFFFFF00 0x50002D00 \b, SD-TV 1280I
>>>7 byte&0xF0 0x10 \b, 16:9
>>4 belong&0xFFFFFF00 0x30024000 \b, PAL Capture
>>>7 byte&0xF0 0x10 \b, 4:3
>>4 beshort&0xFFF0 0x2C00 \b, 4CIF
>>>5 beshort&0x0FFF 0x01E0 \b NTSC
>>>5 beshort&0x0FFF 0x0240 \b PAL
>>>7 byte&0xF0 0x20 \b, 4:3
>>>7 byte&0xF0 0x30 \b, 16:9
>>>7 byte&0xF0 0x40 \b, 11:5
>>>7 byte&0xF0 0x80 \b, PAL 4:3
>>>7 byte&0xF0 0xC0 \b, NTSC 4:3
>>4 belong&0xFFFFFF00 0x2801E000 \b, LD-TV 640P
>>>7 byte&0xF0 0x10 \b, 4:3
>>4 belong&0xFFFFFF00 0x1400F000 \b, 320x240
>>>7 byte&0xF0 0x10 \b, 4:3
>>4 belong&0xFFFFFF00 0x0F00A000 \b, 240x160
>>>7 byte&0xF0 0x10 \b, 4:3
>>4 belong&0xFFFFFF00 0x0A007800 \b, 160x120
>>>7 byte&0xF0 0x10 \b, 4:3
>>4 beshort&0xFFF0 0x1600 \b, CIF
>>>5 beshort&0x0FFF 0x00F0 \b NTSC
>>>5 beshort&0x0FFF 0x0120 \b PAL
>>>7 byte&0xF0 0x20 \b, 4:3
>>>7 byte&0xF0 0x30 \b, 16:9
>>>7 byte&0xF0 0x40 \b, 11:5
>>>7 byte&0xF0 0x80 \b, PAL 4:3
>>>7 byte&0xF0 0xC0 \b, NTSC 4:3
>>>5 beshort&0x0FFF 0x0240 \b PAL 625
>>>>7 byte&0xF0 0x20 \b, 4:3
>>>>7 byte&0xF0 0x30 \b, 16:9
>>>>7 byte&0xF0 0x40 \b, 11:5
>>4 beshort&0xFFF0 0x2D00 \b, CCIR/ITU
>>>5 beshort&0x0FFF 0x01E0 \b NTSC 525
>>>5 beshort&0x0FFF 0x0240 \b PAL 625
>>>7 byte&0xF0 0x20 \b, 4:3
>>>7 byte&0xF0 0x30 \b, 16:9
>>>7 byte&0xF0 0x40 \b, 11:5
>>4 beshort&0xFFF0 0x1E00 \b, SVCD
>>>5 beshort&0x0FFF 0x01E0 \b NTSC 525
>>>5 beshort&0x0FFF 0x0240 \b PAL 625
>>>7 byte&0xF0 0x20 \b, 4:3
>>>7 byte&0xF0 0x30 \b, 16:9
>>>7 byte&0xF0 0x40 \b, 11:5
>>7 byte&0x0F 1 \b, 23.976 fps
>>7 byte&0x0F 2 \b, 24 fps
>>7 byte&0x0F 3 \b, 25 fps
>>7 byte&0x0F 4 \b, 29.97 fps
>>7 byte&0x0F 5 \b, 30 fps
>>7 byte&0x0F 6 \b, 50 fps
>>7 byte&0x0F 7 \b, 59.94 fps
>>7 byte&0x0F 8 \b, 60 fps
>>11 byte &0x04 \b, Constrained
# MPEG ADTS Audio (*.mpx/mxa/aac)
# from [email]dreesen@math.fu-berlin.de[/email]
# modified to fully support MPEG ADTS
# MP3, M1A
0 beshort&0xFFFE 0xFFFA MPEG ADTS, layer III, v1
# rates
>2 byte&0xF0 0x10 \b, 32 kBits
>2 byte&0xF0 0x20 \b, 40 kBits
>2 byte&0xF0 0x30 \b, 48 kBits
>2 byte&0xF0 0x40 \b, 56 kBits
>2 byte&0xF0 0x50 \b, 64 kBits
>2 byte&0xF0 0x60 \b, 80 kBits
>2 byte&0xF0 0x70 \b, 96 kBits
>2 byte&0xF0 0x80 \b, 112 kBits
>2 byte&0xF0 0x90 \b, 128 kBits
>2 byte&0xF0 0xA0 \b, 160 kBits
>2 byte&0xF0 0xB0 \b, 192 kBits
>2 byte&0xF0 0xC0 \b, 224 kBits
>2 byte&0xF0 0xD0 \b, 256 kBits
>2 byte&0xF0 0xE0 \b, 320 kBits
# timing
>2 byte&0x0C 0x00 \b, 44.1 kHz
>2 byte&0x0C 0x04 \b, 48 kHz
>2 byte&0x0C 0x08 \b, 32 kHz
# channels/options
>3 byte&0xC0 0x00 \b, Stereo
>3 byte&0xC0 0x40 \b, JntStereo
>3 byte&0xC0 0x80 \b, 2x Monaural
>3 byte&0xC0 0xC0 \b, Monaural
#>1 byte ^0x01 \b, Data Verify
#>2 byte &0x02 \b, Packet Pad
#>2 byte &0x01 \b, Custom Flag
#>3 byte &0x08 \b, Copyrighted
#>3 byte &0x04 \b, Original Source
#>3 byte&0x03 1 \b, NR: 50/15 ms
#>3 byte&0x03 3 \b, NR: CCIT J.17
# MP2, M1A
0 beshort&0xFFFE 0xFFFC MPEG ADTS, layer II, v1
# rates
>2 byte&0xF0 0x10 \b, 32 kBits
>2 byte&0xF0 0x20 \b, 48 kBits
>2 byte&0xF0 0x30 \b, 56 kBits
>2 byte&0xF0 0x40 \b, 64 kBits
>2 byte&0xF0 0x50 \b, 80 kBits
>2 byte&0xF0 0x60 \b, 96 kBits
>2 byte&0xF0 0x70 \b, 112 kBits
>2 byte&0xF0 0x80 \b, 128 kBits
>2 byte&0xF0 0x90 \b, 160 kBits
>2 byte&0xF0 0xA0 \b, 192 kBits
>2 byte&0xF0 0xB0 \b, 224 kBits
>2 byte&0xF0 0xC0 \b, 256 kBits
>2 byte&0xF0 0xD0 \b, 320 kBits
>2 byte&0xF0 0xE0 \b, 384 kBits
# timing
>2 byte&0x0C 0x00 \b, 44.1 kHz
>2 byte&0x0C 0x04 \b, 48 kHz
>2 byte&0x0C 0x08 \b, 32 kHz
# channels/options
>3 byte&0xC0 0x00 \b, Stereo
>3 byte&0xC0 0x40 \b, JntStereo
>3 byte&0xC0 0x80 \b, 2x Monaural
>3 byte&0xC0 0xC0 \b, Monaural
#>1 byte ^0x01 \b, Data Verify
#>2 byte &0x02 \b, Packet Pad
#>2 byte &0x01 \b, Custom Flag
#>3 byte &0x08 \b, Copyrighted
#>3 byte &0x04 \b, Original Source
#>3 byte&0x03 1 \b, NR: 50/15 ms
#>3 byte&0x03 3 \b, NR: CCIT J.17
# MPA, M1A
# updated by Joerg Jenderek
# GRR the original test are too common for many DOS files, so test 32 <= kbits <= 448
0 beshort&0xFFFE 0xFFFE
>2 ubyte&0xF0 >0x0F
>>2 ubyte&0xF0 <0xE1 MPEG ADTS, layer I, v1
# rate
>>>2 byte&0xF0 0x10 \b, 32 kBits
>>>2 byte&0xF0 0x20 \b, 64 kBits
>>>2 byte&0xF0 0x30 \b, 96 kBits
>>>2 byte&0xF0 0x40 \b, 128 kBits
>>>2 byte&0xF0 0x50 \b, 160 kBits
>>>2 byte&0xF0 0x60 \b, 192 kBits
>>>2 byte&0xF0 0x70 \b, 224 kBits
>>>2 byte&0xF0 0x80 \b, 256 kBits
>>>2 byte&0xF0 0x90 \b, 288 kBits
>>>2 byte&0xF0 0xA0 \b, 320 kBits
>>>2 byte&0xF0 0xB0 \b, 352 kBits
>>>2 byte&0xF0 0xC0 \b, 384 kBits
>>>2 byte&0xF0 0xD0 \b, 416 kBits
>>>2 byte&0xF0 0xE0 \b, 448 kBits
# timing
>>>2 byte&0x0C 0x00 \b, 44.1 kHz
>>>2 byte&0x0C 0x04 \b, 48 kHz
>>>2 byte&0x0C 0x08 \b, 32 kHz
# channels/options
>>>3 byte&0xC0 0x00 \b, Stereo
>>>3 byte&0xC0 0x40 \b, JntStereo
>>>3 byte&0xC0 0x80 \b, 2x Monaural
>>>3 byte&0xC0 0xC0 \b, Monaural
#>1 byte ^0x01 \b, Data Verify
#>2 byte &0x02 \b, Packet Pad
#>2 byte &0x01 \b, Custom Flag
#>3 byte &0x08 \b, Copyrighted
#>3 byte &0x04 \b, Original Source
#>3 byte&0x03 1 \b, NR: 50/15 ms
#>3 byte&0x03 3 \b, NR: CCIT J.17
# MP3, M2A
0 beshort&0xFFFE 0xFFF2 MPEG ADTS, layer III, v2
# rate
>2 byte&0xF0 0x10 \b, 8 kBits
>2 byte&0xF0 0x20 \b, 16 kBits
>2 byte&0xF0 0x30 \b, 24 kBits
>2 byte&0xF0 0x40 \b, 32 kBits
>2 byte&0xF0 0x50 \b, 40 kBits
>2 byte&0xF0 0x60 \b, 48 kBits
>2 byte&0xF0 0x70 \b, 56 kBits
>2 byte&0xF0 0x80 \b, 64 kBits
>2 byte&0xF0 0x90 \b, 80 kBits
>2 byte&0xF0 0xA0 \b, 96 kBits
>2 byte&0xF0 0xB0 \b, 112 kBits
>2 byte&0xF0 0xC0 \b, 128 kBits
>2 byte&0xF0 0xD0 \b, 144 kBits
>2 byte&0xF0 0xE0 \b, 160 kBits
# timing
>2 byte&0x0C 0x00 \b, 22.05 kHz
>2 byte&0x0C 0x04 \b, 24 kHz
>2 byte&0x0C 0x08 \b, 16 kHz
# channels/options
>3 byte&0xC0 0x00 \b, Stereo
>3 byte&0xC0 0x40 \b, JntStereo
>3 byte&0xC0 0x80 \b, 2x Monaural
>3 byte&0xC0 0xC0 \b, Monaural
#>1 byte ^0x01 \b, Data Verify
#>2 byte &0x02 \b, Packet Pad
#>2 byte &0x01 \b, Custom Flag
#>3 byte &0x08 \b, Copyrighted
#>3 byte &0x04 \b, Original Source
#>3 byte&0x03 1 \b, NR: 50/15 ms
#>3 byte&0x03 3 \b, NR: CCIT J.17
# MP2, M2A
0 beshort&0xFFFE 0xFFF4 MPEG ADTS, layer II, v2
# rate
>2 byte&0xF0 0x10 \b, 8 kBits
>2 byte&0xF0 0x20 \b, 16 kBits
>2 byte&0xF0 0x30 \b, 24 kBits
>2 byte&0xF0 0x40 \b, 32 kBits
>2 byte&0xF0 0x50 \b, 40 kBits
>2 byte&0xF0 0x60 \b, 48 kBits
>2 byte&0xF0 0x70 \b, 56 kBits
>2 byte&0xF0 0x80 \b, 64 kBits
>2 byte&0xF0 0x90 \b, 80 kBits
>2 byte&0xF0 0xA0 \b, 96 kBits
>2 byte&0xF0 0xB0 \b, 112 kBits
>2 byte&0xF0 0xC0 \b, 128 kBits
>2 byte&0xF0 0xD0 \b, 144 kBits
>2 byte&0xF0 0xE0 \b, 160 kBits
# timing
>2 byte&0x0C 0x00 \b, 22.05 kHz
>2 byte&0x0C 0x04 \b, 24 kHz
>2 byte&0x0C 0x08 \b, 16 kHz
# channels/options
>3 byte&0xC0 0x00 \b, Stereo
>3 byte&0xC0 0x40 \b, JntStereo
>3 byte&0xC0 0x80 \b, 2x Monaural
>3 byte&0xC0 0xC0 \b, Monaural
#>1 byte ^0x01 \b, Data Verify
#>2 byte &0x02 \b, Packet Pad
#>2 byte &0x01 \b, Custom Flag
#>3 byte &0x08 \b, Copyrighted
#>3 byte &0x04 \b, Original Source
#>3 byte&0x03 1 \b, NR: 50/15 ms
#>3 byte&0x03 3 \b, NR: CCIT J.17
# MPA, M2A
0 beshort&0xFFFE 0xFFF6 MPEG ADTS, layer I, v2
# rate
>2 byte&0xF0 0x10 \b, 32 kBits
>2 byte&0xF0 0x20 \b, 48 kBits
>2 byte&0xF0 0x30 \b, 56 kBits
>2 byte&0xF0 0x40 \b, 64 kBits
>2 byte&0xF0 0x50 \b, 80 kBits
>2 byte&0xF0 0x60 \b, 96 kBits
>2 byte&0xF0 0x70 \b, 112 kBits
>2 byte&0xF0 0x80 \b, 128 kBits
>2 byte&0xF0 0x90 \b, 144 kBits
>2 byte&0xF0 0xA0 \b, 160 kBits
>2 byte&0xF0 0xB0 \b, 176 kBits
>2 byte&0xF0 0xC0 \b, 192 kBits
>2 byte&0xF0 0xD0 \b, 224 kBits
>2 byte&0xF0 0xE0 \b, 256 kBits
# timing
>2 byte&0x0C 0x00 \b, 22.05 kHz
>2 byte&0x0C 0x04 \b, 24 kHz
>2 byte&0x0C 0x08 \b, 16 kHz
# channels/options
>3 byte&0xC0 0x00 \b, Stereo
>3 byte&0xC0 0x40 \b, JntStereo
>3 byte&0xC0 0x80 \b, 2x Monaural
>3 byte&0xC0 0xC0 \b, Monaural
#>1 byte ^0x01 \b, Data Verify
#>2 byte &0x02 \b, Packet Pad
#>2 byte &0x01 \b, Custom Flag
#>3 byte &0x08 \b, Copyrighted
#>3 byte &0x04 \b, Original Source
#>3 byte&0x03 1 \b, NR: 50/15 ms
#>3 byte&0x03 3 \b, NR: CCIT J.17
# MP3, M25A
0 beshort&0xFFFE 0xFFE2 MPEG ADTS, layer III, v2.5
# rate
>2 byte&0xF0 0x10 \b, 8 kBits
>2 byte&0xF0 0x20 \b, 16 kBits
>2 byte&0xF0 0x30 \b, 24 kBits
>2 byte&0xF0 0x40 \b, 32 kBits
>2 byte&0xF0 0x50 \b, 40 kBits
>2 byte&0xF0 0x60 \b, 48 kBits
>2 byte&0xF0 0x70 \b, 56 kBits
>2 byte&0xF0 0x80 \b, 64 kBits
>2 byte&0xF0 0x90 \b, 80 kBits
>2 byte&0xF0 0xA0 \b, 96 kBits
>2 byte&0xF0 0xB0 \b, 112 kBits
>2 byte&0xF0 0xC0 \b, 128 kBits
>2 byte&0xF0 0xD0 \b, 144 kBits
>2 byte&0xF0 0xE0 \b, 160 kBits
# timing
>2 byte&0x0C 0x00 \b, 11.025 kHz
>2 byte&0x0C 0x04 \b, 12 kHz
>2 byte&0x0C 0x08 \b, 8 kHz
# channels/options
>3 byte&0xC0 0x00 \b, Stereo
>3 byte&0xC0 0x40 \b, JntStereo
>3 byte&0xC0 0x80 \b, 2x Monaural
>3 byte&0xC0 0xC0 \b, Monaural
#>1 byte ^0x01 \b, Data Verify
#>2 byte &0x02 \b, Packet Pad
#>2 byte &0x01 \b, Custom Flag
#>3 byte &0x08 \b, Copyrighted
#>3 byte &0x04 \b, Original Source
#>3 byte&0x03 1 \b, NR: 50/15 ms
#>3 byte&0x03 3 \b, NR: CCIT J.17
# AAC (aka MPEG-2 NBC audio) and MPEG-4 audio
# Stored AAC streams (instead of the MP4 format)
0 string ADIF MPEG ADIF, AAC
>4 byte &0x80
>>13 byte &0x10 \b, VBR
>>13 byte ^0x10 \b, CBR
>>16 byte&0x1E 0x02 \b, single stream
>>16 byte&0x1E 0x04 \b, 2 streams
>>16 byte&0x1E 0x06 \b, 3 streams
>>16 byte &0x08 \b, 4 or more streams
>>16 byte &0x10 \b, 8 or more streams
>>4 byte &0x80 \b, Copyrighted
>>13 byte &0x40 \b, Original Source
>>13 byte &0x20 \b, Home Flag
>4 byte ^0x80
>>4 byte &0x10 \b, VBR
>>4 byte ^0x10 \b, CBR
>>7 byte&0x1E 0x02 \b, single stream
>>7 byte&0x1E 0x04 \b, 2 streams
>>7 byte&0x1E 0x06 \b, 3 streams
>>7 byte &0x08 \b, 4 or more streams
>>7 byte &0x10 \b, 8 or more streams
>>4 byte &0x40 \b, Original Stream(s)
>>4 byte &0x20 \b, Home Source
# Live or stored single AAC stream (used with MPEG-2 systems)
0 beshort&0xFFF6 0xFFF0 MPEG ADTS, AAC
>1 byte &0x08 \b, v2
>1 byte ^0x08 \b, v4
# profile
>>2 byte &0xC0 \b LTP
>2 byte&0xc0 0x00 \b Main
>2 byte&0xc0 0x40 \b LC
>2 byte&0xc0 0x80 \b SSR
# timing
>2 byte&0x3c 0x00 \b, 96 kHz
>2 byte&0x3c 0x04 \b, 88.2 kHz
>2 byte&0x3c 0x08 \b, 64 kHz
>2 byte&0x3c 0x0c \b, 48 kHz
>2 byte&0x3c 0x10 \b, 44.1 kHz
>2 byte&0x3c 0x14 \b, 32 kHz
>2 byte&0x3c 0x18 \b, 24 kHz
>2 byte&0x3c 0x1c \b, 22.05 kHz
>2 byte&0x3c 0x20 \b, 16 kHz
>2 byte&0x3c 0x24 \b, 12 kHz
>2 byte&0x3c 0x28 \b, 11.025 kHz
>2 byte&0x3c 0x2c \b, 8 kHz
# channels
>2 beshort&0x01c0 0x0040 \b, monaural
>2 beshort&0x01c0 0x0080 \b, stereo
>2 beshort&0x01c0 0x00c0 \b, stereo + center
>2 beshort&0x01c0 0x0100 \b, stereo+center+LFE
>2 beshort&0x01c0 0x0140 \b, surround
>2 beshort&0x01c0 0x0180 \b, surround + LFE
>2 beshort &0x01C0 \b, surround + side
#>1 byte ^0x01 \b, Data Verify
#>2 byte &0x02 \b, Custom Flag
#>3 byte &0x20 \b, Original Stream
#>3 byte &0x10 \b, Home Source
#>3 byte &0x08 \b, Copyrighted
# Live MPEG-4 audio streams (instead of RTP FlexMux)
0 beshort&0xFFE0 0x56E0 MPEG-4 LOAS
#>1 beshort&0x1FFF x \b, %u byte packet
>3 byte&0xE0 0x40
>>4 byte&0x3C 0x04 \b, single stream
>>4 byte&0x3C 0x08 \b, 2 streams
>>4 byte&0x3C 0x0C \b, 3 streams
>>4 byte &0x08 \b, 4 or more streams
>>4 byte &0x20 \b, 8 or more streams
>3 byte&0xC0 0
>>4 byte&0x78 0x08 \b, single stream
>>4 byte&0x78 0x10 \b, 2 streams
>>4 byte&0x78 0x18 \b, 3 streams
>>4 byte &0x20 \b, 4 or more streams
>>4 byte &0x40 \b, 8 or more streams
0 beshort 0x4DE1 MPEG-4 LO-EP audio stream
# FLI animation format
4 leshort 0xAF11 FLI file
>6 leshort x - %d frames,
>8 leshort x width=%d pixels,
>10 leshort x height=%d pixels,
>12 leshort x depth=%d,
>16 leshort x ticks/frame=%d
# FLC animation format
4 leshort 0xAF12 FLC file
>6 leshort x - %d frames
>8 leshort x width=%d pixels,
>10 leshort x height=%d pixels,
>12 leshort x depth=%d,
>16 leshort x ticks/frame=%d
# DL animation format
# XXX - collision with most `mips' magic
#
# I couldn't find a real magic number for these, however, this
# -appears- to work. Note that it might catch other files, too, so be
# careful!
#
# Note that title and author appear in the two 20-byte chunks
# at decimal offsets 2 and 22, respectively, but they are XOR'ed with
# 255 (hex FF)! The DL format is really bad.
#
#0 byte 1 DL version 1, medium format (160x100, 4 images/screen)
#>42 byte x - %d screens,
#>43 byte x %d commands
#0 byte 2 DL version 2
#>1 byte 1 - large format (320x200,1 image/screen),
#>1 byte 2 - medium format (160x100,4 images/screen),
#>1 byte >2 - unknown format,
#>42 byte x %d screens,
#>43 byte x %d commands
# Based on empirical evidence, DL version 3 have several nulls following the
# \003. Most of them start with non-null values at hex offset 0x34 or so.
#0 string \3\0\0\0\0\0\0\0\0\0\0\0 DL version 3
# iso 13818 transport stream
#
# from Oskar Schirmer <schirmer@scara.com> Feb 3, 2001 (ISO 13818.1)
# (the following is a little bit restrictive and works fine for a stream
# that starts with PAT properly. it won't work for stream data, that is
# cut from an input device data right in the middle, but this shouldn't
# disturb)
# syncbyte 8 bit 0x47
# error_ind 1 bit -
# payload_start 1 bit 1
# priority 1 bit -
# PID 13 bit 0x0000
# scrambling 2 bit -
# adaptfld_ctrl 2 bit 1 or 3
# conti_count 4 bit 0
0 belong&0xFF5FFF1F 0x47400010 MPEG transport stream data
>188 byte !0x47 CORRUPTED
# DIF digital video file format <mpruett@sgi.com>
0 belong&0xffffff00 0x1f070000 DIF
>4 byte &0x01 (DVCPRO) movie file
>4 byte ^0x01 (DV) movie file
>3 byte &0x80 (PAL)
>3 byte ^0x80 (NTSC)
# Microsoft Advanced Streaming Format (ASF) <mpruett@sgi.com>
0 belong 0x3026b275 Microsoft ASF
# MNG Video Format, <URL:http://www.libpng.org/pub/mng/spec/>
0 string \x8aMNG MNG video data,
>4 belong !0x0d0a1a0a CORRUPTED,
>4 belong 0x0d0a1a0a
>>16 belong x %ld x
>>20 belong x %ld
# JNG Video Format, <URL:http://www.libpng.org/pub/mng/spec/>
0 string \x8bJNG JNG video data,
>4 belong !0x0d0a1a0a CORRUPTED,
>4 belong 0x0d0a1a0a
>>16 belong x %ld x
>>20 belong x %ld
# Vivo video (Wolfram Kleff)
3 string \x0D\x0AVersion:Vivo Vivo video data
# VRML (Virtual Reality Modelling Language)
0 string/b #VRML\ V1.0\ ascii VRML 1 file
0 string/b #VRML\ V2.0\ utf8 ISO/IEC 14772 VRML 97 file
#---------------------------------------------------------------------------
# HVQM4: compressed movie format designed by Hudson for Nintendo GameCube
# From Mark Sheppard <msheppard@climax.co.uk>, 2002-10-03
#
0 string HVQM4 %s
>6 string >\0 v%s
>0 byte x GameCube movie,
>0x34 ubeshort x %d x
>0x36 ubeshort x %d,
>0x26 ubeshort x %dµs,
>0x42 ubeshort 0 no audio
>0x42 ubeshort >0 %dHz audio
# From: "Stefan A. Haubenthal" <polluks@web.de>
0 string DVDVIDEO-VTS Video title set,
>0x21 byte x v%x
0 string DVDVIDEO-VMG Video manager,
>0x21 byte x v%x
#------------------------------------------------------------------------------
# apl: file(1) magic for APL (see also "pdp" and "vax" for other APL
# workspaces)
#
0 long 0100554 APL workspace (Ken's original?)
#------------------------------------------------------------------------------
# apple: file(1) magic for Apple file formats
#
0 string FiLeStArTfIlEsTaRt binscii (apple ][) text
0 string \x0aGL Binary II (apple ][) data
0 string \x76\xff Squeezed (apple ][) data
0 string NuFile NuFile archive (apple ][) data
0 string N\xf5F\xe9l\xe5 NuFile archive (apple ][) data
0 belong 0x00051600 AppleSingle encoded Macintosh file
0 belong 0x00051607 AppleDouble encoded Macintosh file
# magic for Newton PDA package formats
# from Ruda Moura <ruda@helllabs.org>
0 string package0 Newton package, NOS 1.x,
>12 belong &0x80000000 AutoRemove,
>12 belong &0x40000000 CopyProtect,
>12 belong &0x10000000 NoCompression,
>12 belong &0x04000000 Relocation,
>12 belong &0x02000000 UseFasterCompression,
>16 belong x version %d
0 string package1 Newton package, NOS 2.x,
>12 belong &0x80000000 AutoRemove,
>12 belong &0x40000000 CopyProtect,
>12 belong &0x10000000 NoCompression,
>12 belong &0x04000000 Relocation,
>12 belong &0x02000000 UseFasterCompression,
>16 belong x version %d
0 string package4 Newton package,
>8 byte 8 NOS 1.x,
>8 byte 9 NOS 2.x,
>12 belong &0x80000000 AutoRemove,
>12 belong &0x40000000 CopyProtect,
>12 belong &0x10000000 NoCompression,
# The following entries for the Apple II are for files that have
# been transferred as raw binary data from an Apple, without having
# been encapsulated by any of the above archivers.
#
# In general, Apple II formats are hard to identify because Apple DOS
# and especially Apple ProDOS have strong typing in the file system and
# therefore programmers never felt much need to include type information
# in the files themselves.
#
# Eric Fischer <enf@pobox.com>
# AppleWorks word processor:
#
# This matches the standard tab stops for an AppleWorks file, but if
# a file has a tab stop set in the first four columns this will fail.
#
# The "O" is really the magic number, but that's so common that it's
# necessary to check the tab stops that follow it to avoid false positives.
4 string O==== AppleWorks word processor data
>85 byte&0x01 >0 \b, zoomed
>90 byte&0x01 >0 \b, paginated
>92 byte&0x01 >0 \b, with mail merge
#>91 byte x \b, left margin %d
# AppleWorks database:
#
# This isn't really a magic number, but it's the closest thing to one
# that I could find. The 1 and 2 really mean "order in which you defined
# categories" and "left to right, top to bottom," respectively; the D and R
# mean that the cursor should move either down or right when you press Return.
#30 string \x01D AppleWorks database data
#30 string \x02D AppleWorks database data
#30 string \x01R AppleWorks database data
#30 string \x02R AppleWorks database data
# AppleWorks spreadsheet:
#
# Likewise, this isn't really meant as a magic number. The R or C means
# row- or column-order recalculation; the A or M means automatic or manual
# recalculation.
#131 string RA AppleWorks spreadsheet data
#131 string RM AppleWorks spreadsheet data
#131 string CA AppleWorks spreadsheet data
#131 string CM AppleWorks spreadsheet data
# Applesoft BASIC:
#
# This is incredibly sloppy, but will be true if the program was
# written at its usual memory location of 2048 and its first line
# number is less than 256. Yuck.
0 belong&0xff00ff 0x80000 Applesoft BASIC program data
#>2 leshort x \b, first line number %d
# ORCA/EZ assembler:
#
# This will not identify ORCA/M source files, since those have
# some sort of date code instead of the two zero bytes at 6 and 7
# XXX Conflicts with ELF
#4 belong&0xff00ffff 0x01000000 ORCA/EZ assembler source data
#>5 byte x \b, build number %d
# Broderbund Fantavision
#
# I don't know what these values really mean, but they seem to recur.
# Will they cause too many conflicts?
# Probably :-)
#2 belong&0xFF00FF 0x040008 Fantavision movie data
# Some attempts at images.
#
# These are actually just bit-for-bit dumps of the frame buffer, so
# there's really no reasonably way to distinguish them except for their
# address (if preserved) -- 8192 or 16384 -- and their length -- 8192
# or, occasionally, 8184.
#
# Nevertheless this will manage to catch a lot of images that happen
# to have a solid-colored line at the bottom of the screen.
8144 string \x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F Apple II image with white background
8144 string \x55\x2A\x55\x2A\x55\x2A\x55\x2A Apple II image with purple background
8144 string \x2A\x55\x2A\x55\x2A\x55\x2A\x55 Apple II image with green background
8144 string \xD5\xAA\xD5\xAA\xD5\xAA\xD5\xAA Apple II image with blue background
8144 string \xAA\xD5\xAA\xD5\xAA\xD5\xAA\xD5 Apple II image with orange background
# Beagle Bros. Apple Mechanic fonts
0 belong&0xFF00FFFF 0x6400D000 Apple Mechanic font
# Apple Universal Disk Image Format (UDIF) - dmg files.
# From Johan Gade.
# These entries are disabled for now until we fix the following issues.
#
# Note there might be some problems with the "VAX COFF executable"
# entry. Note this entry should be placed before the mac filesystem section,
# particularly the "Apple Partition data" entry.
#
# The intended meaning of these tests is, that the file is only of the
# specified type if both of the lines are correct - i.e. if the first
# line matches and the second doesn't then it is not of that type.
#
#0 long 0x7801730d
#>4 long 0x62626060 UDIF read-only zlib-compressed image (UDZO)
#
# Note that this entry is recognized correctly by the "Apple Partition
# data" entry - however since this entry is more specific - this
# information seems to be more useful.
#0 long 0x45520200
#>0x410 string disk\ image UDIF read/write image (UDRW)
# From: Toby Peterson <toby@apple.com>
0 string bplist00 Apple binary property list
# Apple binary property list (bplist)
# Assumes version bytes are hex.
# Provides content hints for version 0 files. Assumes that the root
# object is the first object (true for CoreFoundation implementation).
# From: David Remahl <dremahl@apple.com>
0 string bplist
>6 byte x \bCoreFoundation binary property list data, version 0x%c
>>7 byte x \b%c
>6 string 00 \b
>>8 byte&0xF0 0x00 \b
>>>8 byte&0x0F 0x00 \b, root type: null
>>>8 byte&0x0F 0x08 \b, root type: false boolean
>>>8 byte&0x0F 0x09 \b, root type: true boolean
>>8 byte&0xF0 0x10 \b, root type: integer
>>8 byte&0xF0 0x20 \b, root type: real
>>8 byte&0xF0 0x30 \b, root type: date
>>8 byte&0xF0 0x40 \b, root type: data
>>8 byte&0xF0 0x50 \b, root type: ascii string
>>8 byte&0xF0 0x60 \b, root type: unicode string
>>8 byte&0xF0 0x80 \b, root type: uid (CORRUPT)
>>8 byte&0xF0 0xa0 \b, root type: array
>>8 byte&0xF0 0xd0 \b, root type: dictionary
# Apple/NeXT typedstream data
# Serialization format used by NeXT and Apple for various
# purposes in YellowStep/Cocoa, including some nib files.
# From: David Remahl <dremahl@apple.com>
2 string typedstream NeXT/Apple typedstream data, big endian
>0 byte x \b, version %hhd
>0 byte <5 \b
>>13 byte 0x81 \b
>>>14 ubeshort x \b, system %hd
2 string streamtyped NeXT/Apple typedstream data, little endian
>0 byte x \b, version %hhd
>0 byte <5 \b
>>13 byte 0x81 \b
>>>14 uleshort x \b, system %hd
#------------------------------------------------------------------------------
# applix: file(1) magic for Applixware
# From: Peter Soos <sp@osb.hu>
#
0 string *BEGIN Applixware
>7 string WORDS Words Document
>7 string GRAPHICS Graphic
>7 string RASTER Bitmap
>7 string SPREADSHEETS Spreadsheet
>7 string MACRO Macro
>7 string BUILDER Builder Object
#------------------------------------------------------------------------------
# archive: file(1) magic for archive formats (see also "msdos" for self-
# extracting compressed archives)
#
# cpio, ar, arc, arj, hpack, lha/lharc, rar, squish, uc2, zip, zoo, etc.
# pre-POSIX "tar" archives are handled in the C code.
# POSIX tar archives
257 string ustar\0 POSIX tar archive
257 string ustar\040\040\0 GNU tar archive
# cpio archives
#
# Yes, the top two "cpio archive" formats *are* supposed to just be "short".
# The idea is to indicate archives produced on machines with the same
# byte order as the machine running "file" with "cpio archive", and
# to indicate archives produced on machines with the opposite byte order
# from the machine running "file" with "byte-swapped cpio archive".
#
# The SVR4 "cpio(4)" hints that there are additional formats, but they
# are defined as "short"s; I think all the new formats are
# character-header formats and thus are strings, not numbers.
0 short 070707 cpio archive
0 short 0143561 byte-swapped cpio archive
0 string 070707 ASCII cpio archive (pre-SVR4 or odc)
0 string 070701 ASCII cpio archive (SVR4 with no CRC)
0 string 070702 ASCII cpio archive (SVR4 with CRC)
# Debian package (needs to go before regular portable archives)
#
0 string =!<arch>\ndebian
>8 string debian-split part of multipart Debian package
>8 string debian-binary Debian binary package
>68 string >\0 (format %s)
# These next two lines do not work, because a bzip2 Debian archive
# still uses gzip for the control.tar (first in the archive). Only
# data.tar varies, and the location of its filename varies too.
# file/libmagic does not current have support for ascii-string based
# (offsets) as of 2005-09-15.
#>81 string bz2 \b, uses bzip2 compression
#>84 string gz \b, uses gzip compression
#>136 ledate x created: %s
# other archives
0 long 0177555 very old archive
0 short 0177555 very old PDP-11 archive
0 long 0177545 old archive
0 short 0177545 old PDP-11 archive
0 long 0100554 apl workspace
0 string =<ar> archive
# MIPS archive (needs to go before regular portable archives)
#
0 string =!<arch>\n__________E MIPS archive
>20 string U with MIPS Ucode members
>21 string L with MIPSEL members
>21 string B with MIPSEB members
>19 string L and an EL hash table
>19 string B and an EB hash table
>22 string X -- out of date
0 string -h- Software Tools format archive text
#
# XXX - why are there multiple <ar> thingies? Note that 0x213c6172 is
# "!<ar", so, for new-style (4.xBSD/SVR2andup) archives, we have:
#
# 0 string =!<arch> current ar archive
# 0 long 0x213c6172 archive file
#
# and for SVR1 archives, we have:
#
# 0 string \<ar> System V Release 1 ar archive
# 0 string =<ar> archive
#
# XXX - did Aegis really store shared libraries, breakpointed modules,
# and absolute code program modules in the same format as new-style
# "ar" archives?
#
0 string =!<arch> current ar archive
>8 string __.SYMDEF random library
>0 belong =65538 - pre SR9.5
>0 belong =65539 - post SR9.5
>0 beshort 2 - object archive
>0 beshort 3 - shared library module
>0 beshort 4 - debug break-pointed module
>0 beshort 5 - absolute code program module
0 string \<ar> System V Release 1 ar archive
0 string =<ar> archive
#
# XXX - from "vax", which appears to collect a bunch of byte-swapped
# thingies, to help you recognize VAX files on big-endian machines;
# with "leshort", "lelong", and "string", that's no longer necessary....
#
0 belong 0x65ff0000 VAX 3.0 archive
0 belong 0x3c61723e VAX 5.0 archive
#
0 long 0x213c6172 archive file
0 lelong 0177555 very old VAX archive
0 leshort 0177555 very old PDP-11 archive
#
# XXX - "pdp" claims that 0177545 can have an __.SYMDEF member and thus
# be a random library (it said 0xff65 rather than 0177545).
#
0 lelong 0177545 old VAX archive
>8 string __.SYMDEF random library
0 leshort 0177545 old PDP-11 archive
>8 string __.SYMDEF random library
#
# From "pdp" (but why a 4-byte quantity?)
#
0 lelong 0x39bed PDP-11 old archive
0 lelong 0x39bee PDP-11 4.0 archive
# ARC archiver, from Daniel Quinlan (quinlan@yggdrasil.com)
#
# The first byte is the magic (0x1a), byte 2 is the compression type for
# the first file (0x01 through 0x09), and bytes 3 to 15 are the MS-DOS
# filename of the first file (null terminated). Since some types collide
# we only test some types on basis of frequency: 0x08 (83%), 0x09 (5%),
# 0x02 (5%), 0x03 (3%), 0x04 (2%), 0x06 (2%). 0x01 collides with terminfo.
0 lelong&0x8080ffff 0x0000081a ARC archive data, dynamic LZW
0 lelong&0x8080ffff 0x0000091a ARC archive data, squashed
0 lelong&0x8080ffff 0x0000021a ARC archive data, uncompressed
0 lelong&0x8080ffff 0x0000031a ARC archive data, packed
0 lelong&0x8080ffff 0x0000041a ARC archive data, squeezed
0 lelong&0x8080ffff 0x0000061a ARC archive data, crunched
# [JW] stuff taken from idarc, obviously ARC successors:
0 lelong&0x8080ffff 0x00000a1a PAK archive data
0 lelong&0x8080ffff 0x0000141a ARC+ archive data
0 lelong&0x8080ffff 0x0000481a HYP archive data
# Acorn archive formats (Disaster prone simpleton, [email]m91dps@ecs.ox.ac.uk[/email])
# I can't create either SPARK or ArcFS archives so I have not tested this stuff
# [GRR: the original entries collide with ARC, above; replaced with combined
# version (not tested)]
#0 byte 0x1a RISC OS archive (spark format)
0 string \032archive RISC OS archive (ArcFS format)
0 string Archive\000 RISC OS archive (ArcFS format)
# All these were taken from idarc, many could not be verified. Unfortunately,
# there were many low-quality sigs, i.e. easy to trigger false positives.
# Please notify me of any real-world fishy/ambiguous signatures and I'll try
# to get my hands on the actual archiver and see if I find something better. [JW]
# probably many can be enhanced by finding some 0-byte or control char near the start
/usr/share/misc/magic.mime
# Magic data for KMimeMagic (originally for file(1) command)
#
# The format is 4-5 columns:
# Column #1: byte number to begin checking from, ">" indicates continuation
# Column #2: type of data to match
# Column #3: contents of data to match
# Column #4: MIME type of result
# Column #5: MIME encoding of result (optional)
#------------------------------------------------------------------------------
# Localstuff: file(1) magic for locally observed files
# Add any locally observed files here.
# Real Audio (Magic .ra\0375)
0 belong 0x2e7261fd audio/x-pn-realaudio
0 string .RMF application/vnd.rn-realmedia
#video/x-pn-realvideo
#video/vnd.rn-realvideo
#application/vnd.rn-realmedia
# sigh, there are many mimes for that but the above are the most common.
# Taken from magic, converted to magic.mime
# mime types according to [url]http://www.geocities.com/nevilo/mod.htm:[/url]
# audio/it .it
# audio/x-zipped-it .itz
# audio/xm fasttracker modules
# audio/x-s3m screamtracker modules
# audio/s3m screamtracker modules
# audio/x-zipped-mod mdz
# audio/mod mod
# audio/x-mod All modules (mod, s3m, 669, mtm, med, xm, it, mdz, stm, itz, xmz, s3z)
# Taken from loader code from mikmod version 2.14
# by Steve McIntyre (stevem@chiark.greenend.org.uk)
# <doj@cubic.org> added title printing on 2003-06-24
0 string MAS_UTrack_V00
>14 string >/0 audio/x-mod
#audio/x-tracker-module
#0 string UN05 MikMod UNI format module sound data
0 string Extended\ Module: audio/x-mod
#audio/x-tracker-module
##>17 string >\0 Title: "%s"
21 string/c \!SCREAM! audio/x-mod
#audio/x-screamtracker-module
21 string BMOD2STM audio/x-mod
#audio/x-screamtracker-module
1080 string M.K. audio/x-mod
#audio/x-protracker-module
#>0 string >\0 Title: "%s"
1080 string M!K! audio/x-mod
#audio/x-protracker-module
#>0 string >\0 Title: "%s"
1080 string FLT4 audio/x-mod
#audio/x-startracker-module
#>0 string >\0 Title: "%s"
1080 string FLT8 audio/x-mod
#audio/x-startracker-module
#>0 string >\0 Title: "%s"
1080 string 4CHN audio/x-mod
#audio/x-fasttracker-module
#>0 string >\0 Title: "%s"
1080 string 6CHN audio/x-mod
#audio/x-fasttracker-module
#>0 string >\0 Title: "%s"
1080 string 8CHN audio/x-mod
#audio/x-fasttracker-module
#>0 string >\0 Title: "%s"
1080 string CD81 audio/x-mod
#audio/x-oktalyzer-tracker-module
#>0 string >\0 Title: "%s"
1080 string OKTA audio/x-mod
#audio/x-oktalyzer-tracker-module
#>0 string >\0 Title: "%s"
# Not good enough.
#1082 string CH
#>1080 string >/0 %.2s-channel Fasttracker "oktalyzer" module sound data
1080 string 16CN audio/x-mod
#audio/x-taketracker-module
#>0 string >\0 Title: "%s"
1080 string 32CN audio/x-mod
#audio/x-taketracker-module
#>0 string >\0 Title: "%s"
# Impuse tracker module (it)
0 string IMPM audio/x-mod
#>4 string >\0 "%s"
#>40 leshort !0 compatible w/ITv%x
#>42 leshort !0 created w/ITv%x
#------------------------------------------------------------------------------
# end local stuff
#------------------------------------------------------------------------------
# xml based formats!
# svg
38 string \<\!DOCTYPE\040svg image/svg+xml
# xml
0 string \<?xml text/xml
#------------------------------------------------------------------------------
# Java
0 beshort 0xcafe
>2 beshort 0xbabe application/java
#------------------------------------------------------------------------------
# audio: file(1) magic for sound formats
#
# from Jan Nicolai Langfeldt <janl@ifi.uio.no>,
#
# Sun/NeXT audio data
0 string .snd
>12 belong 1 audio/basic
>12 belong 2 audio/basic
>12 belong 3 audio/basic
>12 belong 4 audio/basic
>12 belong 5 audio/basic
>12 belong 6 audio/basic
>12 belong 7 audio/basic
>12 belong 23 audio/x-adpcm
# DEC systems (e.g. DECstation 5000) use a variant of the Sun/NeXT format
# that uses little-endian encoding and has a different magic number
# (0x0064732E in little-endian encoding).
0 lelong 0x0064732E
>12 lelong 1 audio/x-dec-basic
>12 lelong 2 audio/x-dec-basic
>12 lelong 3 audio/x-dec-basic
>12 lelong 4 audio/x-dec-basic
>12 lelong 5 audio/x-dec-basic
>12 lelong 6 audio/x-dec-basic
>12 lelong 7 audio/x-dec-basic
# compressed (G.721 ADPCM)
>12 lelong 23 audio/x-dec-adpcm
# Bytes 0-3 of AIFF, AIFF-C, & 8SVX audio files are "FORM"
# AIFF audio data
8 string AIFF audio/x-aiff
# AIFF-C audio data
8 string AIFC audio/x-aiff
# IFF/8SVX audio data
8 string 8SVX audio/x-aiff
# Creative Labs AUDIO stuff
# Standard MIDI data
0 string MThd audio/unknown
#>9 byte >0 (format %d)
#>11 byte >1 using %d channels
# Creative Music (CMF) data
0 string CTMF audio/unknown
# SoundBlaster instrument data
0 string SBI audio/unknown
# Creative Labs voice data
0 string Creative\ Voice\ File audio/unknown
## is this next line right? it came this way...
#>19 byte 0x1A
#>23 byte >0 - version %d
#>22 byte >0 \b.%d
# [GRR 950115: is this also Creative Labs? Guessing that first line
# should be string instead of unknown-endian long...]
#0 long 0x4e54524b MultiTrack sound data
#0 string NTRK MultiTrack sound data
#>4 long x - version %ld
# Microsoft WAVE format (*.wav)
# [GRR 950115: probably all of the shorts and longs should be leshort/lelong]
# Microsoft RIFF
0 string RIFF
# - WAVE format
>8 string WAVE audio/x-wav
>8 string/B AVI video/x-msvideo
#
>8 string CDRA image/x-coreldraw
# AAC (aka MPEG-2 NBC)
0 beshort&0xfff6 0xfff0 audio/X-HX-AAC-ADTS
0 string ADIF audio/X-HX-AAC-ADIF
0 beshort&0xffe0 0x56e0 audio/MP4A-LATM
0 beshort 0x4De1 audio/MP4A-LATM
# MPEG Layer 3 sound files
# modified by Joerg Jenderek
# GRR the original test are too common for many DOS files
# so test 1 <= kbits nibble <= E
0 beshort &0xffe0
>2 ubyte&0xF0 >0x0F
>>2 ubyte&0xF0 <0xE1 audio/mpeg
#MP3 with ID3 tag
0 string ID3 audio/mpeg
# Ogg/Vorbis
0 string OggS application/ogg
#------------------------------------------------------------------------------
# c-lang: file(1) magic for C programs or various scripts
#
# XPM icons (Greg Roelofs, [email]newt@uchicago.edu[/email])
# ideally should go into "images", but entries below would tag XPM as C source
0 string /*\ XPM image/x-xpmi 7bit
# 3DS (3d Studio files) Conflicts with diff output 0x3d '='
#16 beshort 0x3d3d image/x-3ds
# this first will upset you if you're a PL/1 shop... (are there any left?)
# in which case rm it; ascmagic will catch real C programs
# C or REXX program text
#0 string /* text/x-c
# C++ program text
#0 string // text/x-c++
#------------------------------------------------------------------------------
# commands: file(1) magic for various shells and interpreters
#
#0 string :\ shell archive or commands for antique kernel text
0 string #!/bin/sh application/x-shellscript
0 string #!\ /bin/sh application/x-shellscript
0 string #!/bin/csh application/x-shellscript
0 string #!\ /bin/csh application/x-shellscript
# korn shell magic, sent by George Wu, [email]gwu@clyde.att.com[/email]
0 string #!/bin/ksh application/x-shellscript
0 string #!\ /bin/ksh application/x-shellscript
0 string #!/bin/tcsh application/x-shellscript
0 string #!\ /bin/tcsh application/x-shellscript
0 string #!/usr/local/tcsh application/x-shellscript
0 string #!\ /usr/local/tcsh application/x-shellscript
0 string #!/usr/local/bin/tcsh application/x-shellscript
0 string #!\ /usr/local/bin/tcsh application/x-shellscript
# bash shell magic, from Peter Tobias (tobias@server.et-inf.fho-emden.de)
0 string #!/bin/bash application/x-shellscript
0 string #!\ /bin/bash application/x-shellscript
0 string #!/usr/local/bin/bash application/x-shellscript
0 string #!\ /usr/local/bin/bash application/x-shellscript
#
# zsh/ash/ae/nawk/gawk magic from [email]cameron@cs.unsw.oz.au[/email] (Cameron Simpson)
0 string #!/bin/zsh application/x-shellscript
0 string #!/usr/bin/zsh application/x-shellscript
0 string #!/usr/local/bin/zsh application/x-shellscript
0 string #!\ /usr/local/bin/zsh application/x-shellscript
0 string #!/usr/local/bin/ash application/x-shellscript
0 string #!\ /usr/local/bin/ash application/x-shellscript
#0 string #!/usr/local/bin/ae Neil Brown's ae
#0 string #!\ /usr/local/bin/ae Neil Brown's ae
0 string #!/bin/nawk application/x-nawk
0 string #!\ /bin/nawk application/x-nawk
0 string #!/usr/bin/nawk application/x-nawk
0 string #!\ /usr/bin/nawk application/x-nawk
0 string #!/usr/local/bin/nawk application/x-nawk
0 string #!\ /usr/local/bin/nawk application/x-nawk
0 string #!/bin/gawk application/x-gawk
0 string #!\ /bin/gawk application/x-gawk
0 string #!/usr/bin/gawk application/x-gawk
0 string #!\ /usr/bin/gawk application/x-gawk
0 string #!/usr/local/bin/gawk application/x-gawk
0 string #!\ /usr/local/bin/gawk application/x-gawk
#
0 string #!/bin/awk application/x-awk
0 string #!\ /bin/awk application/x-awk
0 string #!/usr/bin/awk application/x-awk
0 string #!\ /usr/bin/awk application/x-awk
# update to distinguish from *.vcf files by Joerg Jenderek: joerg dot jenderek at web dot de
0 regex BEGIN[[:space:]]*[{] application/x-awk
# For Larry Wall's perl language. The ``eval'' line recognizes an
# outrageously clever hack for USG systems.
# Keith Waclena <keith@cerberus.uchicago.edu>
0 string #!/bin/perl application/x-perl
0 string #!\ /bin/perl application/x-perl
0 string eval\ "exec\ /bin/perl application/x-perl
0 string #!/usr/bin/perl application/x-perl
0 string #!\ /usr/bin/perl application/x-perl
0 string eval\ "exec\ /usr/bin/perl application/x-perl
0 string #!/usr/local/bin/perl application/x-perl
0 string #!\ /usr/local/bin/perl application/x-perl
0 string eval\ "exec\ /usr/local/bin/perl application/x-perl
#------------------------------------------------------------------------------
# compress: file(1) magic for pure-compression formats (no archives)
#
# compress, gzip, pack, compact, huf, squeeze, crunch, freeze, yabba, whap, etc.
#
# Formats for various forms of compressed data
# Formats for "compress" proper have been moved into "compress.c",
# because it tries to uncompress it to figure out what's inside.
# standard unix compress
0 string \037\235 application/x-compress
# gzip (GNU zip, not to be confused with [Info-ZIP/PKWARE] zip archiver)
0 string \037\213 application/x-gzip
0 string PK\003\004 application/x-zip
# RAR archiver (Greg Roelofs, [email]newt@uchicago.edu[/email])
0 string Rar! application/x-rar
# According to gzip.h, this is the correct byte order for packed data.
0 string \037\036 application/octet-stream
#
# This magic number is byte-order-independent.
#
0 short 017437 application/octet-stream
# XXX - why *two* entries for "compacted data", one of which is
# byte-order independent, and one of which is byte-order dependent?
#
# compacted data
0 short 0x1fff application/octet-stream
0 string \377\037 application/octet-stream
# huf output
0 short 0145405 application/octet-stream
# Squeeze and Crunch...
# These numbers were gleaned from the Unix versions of the programs to
# handle these formats. Note that I can only uncrunch, not crunch, and
# I didn't have a crunched file handy, so the crunch number is untested.
# Keith Waclena <keith@cerberus.uchicago.edu>
#0 leshort 0x76FF squeezed data (CP/M, DOS)
#0 leshort 0x76FE crunched data (CP/M, DOS)
# Freeze
#0 string \037\237 Frozen file 2.1
#0 string \037\236 Frozen file 1.0 (or gzip 0.5)
# lzh?
#0 string \037\240 LZH compressed data
257 string ustar\0 application/x-tar posix
257 string ustar\040\040\0 application/x-tar gnu
0 short 070707 application/x-cpio
0 short 0143561 application/x-cpio swapped
0 string =<ar> application/x-archive
0 string \!<arch> application/x-archive
>8 string debian application/x-debian-package
#------------------------------------------------------------------------------
#
# RPM: file(1) magic for Red Hat Packages Erik Troan (ewt@redhat.com)
#
0 beshort 0xedab
>2 beshort 0xeedb application/x-rpm
0 lelong&0x8080ffff 0x0000081a application/x-arc lzw
0 lelong&0x8080ffff 0x0000091a application/x-arc squashed
0 lelong&0x8080ffff 0x0000021a application/x-arc uncompressed
0 lelong&0x8080ffff 0x0000031a application/x-arc packed
0 lelong&0x8080ffff 0x0000041a application/x-arc squeezed
0 lelong&0x8080ffff 0x0000061a application/x-arc crunched
0 leshort 0xea60 application/x-arj
# LHARC/LHA archiver (Greg Roelofs, [email]newt@uchicago.edu[/email])
2 string -lh0- application/x-lharc lh0
2 string -lh1- application/x-lharc lh1
2 string -lz4- application/x-lharc lz4
2 string -lz5- application/x-lharc lz5
# [never seen any but the last; -lh4- reported in comp.compression:]
2 string -lzs- application/x-lha lzs
2 string -lh\ - application/x-lha lh
2 string -lhd- application/x-lha lhd
2 string -lh2- application/x-lha lh2
2 string -lh3- application/x-lha lh3
2 string -lh4- application/x-lha lh4
2 string -lh5- application/x-lha lh5
2 string -lh6- application/x-lha lh6
2 string -lh7- application/x-lha lh7
# Shell archives
10 string #\ This\ is\ a\ shell\ archive application/octet-stream x-shell
#------------------------------------------------------------------------------
# frame: file(1) magic for FrameMaker files
#
# This stuff came on a FrameMaker demo tape, most of which is
# copyright, but this file is "published" as witness the following:
#
0 string \<MakerFile application/x-frame
0 string \<MIFFile application/x-frame
0 string \<MakerDictionary application/x-frame
0 string \<MakerScreenFon application/x-frame
0 string \<MML application/x-frame
0 string \<Book application/x-frame
0 string \<Maker application/x-frame
#------------------------------------------------------------------------------
# html: file(1) magic for HTML (HyperText Markup Language) docs
#
# from Daniel Quinlan <quinlan@yggdrasil.com>
#
0 string \<HEAD text/html
0 string \<head text/html
0 string \<TITLE text/html
0 string \<title text/html
0 string \<html text/html
0 string \<HTML text/html
0 string \<!-- text/html
0 string \<h1 text/html
0 string \<H1 text/html
0 string/c \<!doctype\ html text/html
#------------------------------------------------------------------------------
# images: file(1) magic for image formats (see also "c-lang" for XPM bitmaps)
#
# originally from [email]jef@helios.ee.lbl.gov[/email] (Jef Poskanzer),
# additions by [email]janl@ifi.uio.no[/email] as well as others. Jan also suggested
# merging several one- and two-line files into here.
#
# XXX - byte order for GIF and TIFF fields?
# [GRR: TIFF allows both byte orders; GIF is probably little-endian]
#
# [GRR: what the hell is this doing in here?]
#0 string xbtoa btoa'd file
# PBMPLUS
# PBM file
0 string P1 image/x-portable-bitmap 7bit
# PGM file
0 string P2 image/x-portable-greymap 7bit
# PPM file
0 string P3 image/x-portable-pixmap 7bit
# PBM "rawbits" file
0 string P4 image/x-portable-bitmap
# PGM "rawbits" file
0 string P5 image/x-portable-greymap
# PPM "rawbits" file
0 string P6 image/x-portable-pixmap
# NIFF (Navy Interchange File Format, a modification of TIFF)
# [GRR: this *must* go before TIFF]
0 string IIN1 image/x-niff
# TIFF and friends
# TIFF file, big-endian
0 string MM image/tiff
# TIFF file, little-endian
0 string II image/tiff
# possible GIF replacements; none yet released!
# (Greg Roelofs, [email]newt@uchicago.edu[/email])
#
# GRR 950115: this was mine ("Zip GIF"):
# ZIF image (GIF+deflate alpha)
0 string GIF94z image/unknown
#
# GRR 950115: this is Jeremy Wohl's Free Graphics Format (better):
# FGF image (GIF+deflate beta)
0 string FGF95a image/unknown
#
# GRR 950115: this is Thomas Boutell's Portable Bitmap Format proposal
# (best; not yet implemented):
# PBF image (deflate compression)
0 string PBF image/unknown
# GIF
0 string GIF image/gif
# JPEG images
0 beshort 0xffd8 image/jpeg
# PC bitmaps (OS/2, Windoze BMP files) (Greg Roelofs, [email]newt@uchicago.edu[/email])
0 string BM image/bmp
#>14 byte 12 (OS/2 1.x format)
#>14 byte 64 (OS/2 2.x format)
#>14 byte 40 (Windows 3.x format)
#0 string IC icon
#0 string PI pointer
#0 string CI color icon
#0 string CP color pointer
#0 string BA bitmap array
# CDROM Filesystems
32769 string CD001 application/x-iso9660
# Newer StuffIt archives (grant@netbsd.org)
0 string StuffIt application/x-stuffit
#>162 string >0 : %s
# BinHex is the Macintosh ASCII-encoded file format (see also "apple")
# Daniel Quinlan, [email]quinlan@yggdrasil.com[/email]
11 string must\ be\ converted\ with\ BinHex\ 4 application/mac-binhex40
##>41 string x \b, version %.3s
#------------------------------------------------------------------------------
# lisp: file(1) magic for lisp programs
#
# various lisp types, from Daniel Quinlan (quinlan@yggdrasil.com)
0 string ;; text/plain 8bit
# Emacs 18 - this is always correct, but not very magical.
0 string \012( application/x-elc
# Emacs 19
0 string ;ELC\023\000\000\000 application/x-elc
#------------------------------------------------------------------------------
# mail.news: file(1) magic for mail and news
#
# There are tests to ascmagic.c to cope with mail and news.
0 string Relay-Version: message/rfc822 7bit
0 string #!\ rnews message/rfc822 7bit
0 string N#!\ rnews message/rfc822 7bit
0 string Forward\ to message/rfc822 7bit
0 string Pipe\ to message/rfc822 7bit
0 string Return-Path: message/rfc822 7bit
0 string Received: message/rfc822
0 string Path: message/news 8bit
0 string Xref: message/news 8bit
0 string From: message/rfc822 7bit
0 string Article message/news 8bit
#------------------------------------------------------------------------------
# msword: file(1) magic for MS Word files
#
# Contributor claims:
# Reversed-engineered MS Word magic numbers
#
0 string \376\067\0\043 application/msword
# disable this one because it applies also to other
# Office/OLE documents for which msword is not correct. See PR#2608.
# from magic file of the apache
#0 string \320\317\021\340\241\261 application/msword
512 string \354\245\301 application/msword
0 string \333\245-\0\0\0 application/msword
#------------------------------------------------------------------------------
Edited by debtboy, 05 September 2009 - 06:05 AM.
swap quote tags for code tags to save space
#11
Posted 05 September 2009 - 05:53 AM
Wow, debtboy, could you put that into [noparse][code][/noparse] tags instead of [noparse][quote][/noparse] tags, so your post isn't monstrous?
Wow I changed my sig!
#12
Posted 05 September 2009 - 06:08 AM
Good idea, swapped tags as requested.
Much more manageable now.
The actual files are way larger than what was posted.
Much more manageable now.
The actual files are way larger than what was posted.


Sign In
Create Account

Back to top










