Jump to content

Detect Character Set

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
14 replies to this topic

#1
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,720 posts
How would one go about detecting the character set / encoding of an arbitrary text file in a portable manner? And how would one tell the difference between, say, UTF-8 and a binary file?

Edited by dargueta, 04 September 2009 - 03:26 PM.

sudo rm -rf /

#2
Guest_h4x_*

Guest_h4x_*
  • Guests
isnt utf-8 binary file?
only way is to create char maps, assign byte(s) to graphical representation and let user choose it. Its not computer job.

#3
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,720 posts
H4x, you clearly don't know what you're talking about. The Gnome GEdit text editor does it on its own just fine. I was wondering how they do it.
sudo rm -rf /

#4
debtboy

debtboy

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 916 posts
That's a really good question,
Linux systems seem to know what file type
and char set automatically.

I certainly don't know, but found this link
Tux Love: Hidden Linux : File mysteries

very interesting... :rolleyes:

#5
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,720 posts
Great link, never knew that.
sudo rm -rf /

#6
ArekBulski

ArekBulski

    Speaks fluent binary

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,376 posts
Interesting read, thanks.

#7
Guest_h4x_*

Guest_h4x_*
  • Guests
no. computer cant decide what type is it. you must assign program to run file, this is what extension is for. gnome has option to do that, no sucky checks.
doing otherwise is asking for problems (bugs). lets keep things simple, dont mix ai with computer.

#8
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,720 posts
H4x...really. Did you even read the link debtboy posted?

If you're going to post something as fact, make sure it's not your personal opinion first. I have yet to see you post a single link or make a single quotation to back up your snide, highly biased, and blind opinions. I have no reason to listen to you--and neither does anyone else--unless you back up your own statements with evidence or concrete examples. I'd suggest you calm down, get off your high horse, and condescend to talk to the rest of us inferior programmers with a civil tone and back up what you say with documents or code examples that we can test and verify.
sudo rm -rf /

#9
ArekBulski

ArekBulski

    Speaks fluent binary

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,376 posts

Quote

You'll find a list magic numbers in /usr/share/file/magic. You can add your own file types in /etc/magic (to make them system-wide) or $HOME/.magic locally. The format is described -- with no offence to feminists intended -- in man magic.

In debtboy's article I found a reference to magic numbers, which are used to determine a file's format. If someone could post their /usr/share/file/magic file so we can see what is there, then it would be nice. I am not sure is this related to checking charset, but at least it's some clue to investigate.

Man page for magic: MAGIC

#10
debtboy

debtboy

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 916 posts
I have a few magic files in different places than mentioned of course,
each distro seems to relocate things, my distro is gentoo.

/usr/share/mime/magic (binary file)
/usr/share/misc/file/magic.mgc (binary file)
/usr/share/misc/file/magic.mime.mgc (binary file)

Now for the text files, I posted only a portion of each below,
because both files are rather large (the magic files is nearly 500K)

/usr/share/misc/file/magic
/usr/share/misc/file/magic.mime

/usr/share/misc/file/magic
# Magic
# Magic data for file(1) command.
# Machine-generated from src/cmd/file/magdir/*; edit there only!
# Format is described in magic(files), where:
# files is 5 on V7 and BSD, 4 on SV, and ?? in the SVID.

#------------------------------------------------------------------------------
# Localstuff:  file(1) magic for locally observed files
#
# $File: Localstuff,v 1.4 2003/03/23 04:17:27 christos Exp $
# Add any locally observed files here.  Remember:
# text if readable, executable if runnable binary, data if unreadable.
#------------------------------------------------------------------------------
# acorn:  file(1) magic for files found on Acorn systems
#

# RISC OS Chunk File Format
# From RISC OS Programmer's Reference Manual, Appendix D
# We guess the file type from the type of the first chunk.
0	lelong		0xc3cbc6c5	RISC OS Chunk data
>12	string		OBJ_		\b, AOF object
>12	string		LIB_		\b, ALF library

# RISC OS AIF, contains "SWI OS_Exit" at offset 16.
16	lelong		0xef000011	RISC OS AIF executable

# RISC OS Draw files
# From RISC OS Programmer's Reference Manual, Appendix E
0	string 		Draw		RISC OS Draw file data

# RISC OS new format font files
# From RISC OS Programmer's Reference Manual, Appendix E
0	string		FONT\0		RISC OS outline font data,
>5	byte		x		version %d
0	string		FONT\1		RISC OS 1bpp font data,
>5	byte		x		version %d
0	string		FONT\4		RISC OS 4bpp font data
>5	byte		x		version %d

# RISC OS Music files
# From RISC OS Programmer's Reference Manual, Appendix E
0	string		Maestro\r	RISC OS music file
>8	byte		x		version %d


#------------------------------------------------------------------------------
# adi: file(1) magic for ADi's objects
# From Gregory McGarry <g.mcgarry@ieee.org>
#
0	leshort		0x521c		COFF DSP21k
>18	lelong		&02		executable,
>18	lelong		^02
>>18	lelong		&01		static object,
>>18	lelong		^01		relocatable object,
>18	lelong		&010		stripped
>18	lelong		^010		not stripped

#------------------------------------------------------------------------------
# adventure: file(1) magic for Adventure game files
#
# from Allen Garvin <earendil@faeryland.tamu-commerce.edu>
# Edited by Dave Chapeskie <dchapes@ddm.on.ca> Jun 28, 1998
# Edited by Chris Chittleborough <cchittleborough@yahoo.com.au>, March 2002
#
# ALAN
# I assume there are other, lower versions, but these are the only ones I
# saw in the archive.
0	beshort	0x0206	ALAN game data
>2	byte	<10	version 2.6%d


# Infocom (see z-machine)
#------------------------------------------------------------------------------
# Z-machine:  file(1) magic for Z-machine binaries.
#
# This will match ${TEX_BASE}/texmf/omega/ocp/char2uni/inbig5.ocp which
# appears to be a version-0 Z-machine binary.
#
# The (false match) message is to correct that behavior.  Perhaps it is
# not needed.
#
16	belong&0xfe00f0f0	0x3030	Infocom game data
>0	ubyte			0	(false match)
>0	ubyte			>0	(Z-machine %d,
>>2	ubeshort		x	Release %d /
>>18	string			>\0	Serial %.6s)

#------------------------------------------------------------------------------
# Glulx:  file(1) magic for Glulx binaries.
#
# I haven't checked for false matches yet.
#
0	string			Glul	Glulx game data
>4	beshort			x	(Version %d
>>6	byte			x	\b.%d
>>8	byte			x	\b.%d)
>36	string			Info	Compiled by Inform



# For Quetzal and blorb magic see iff


# TADS (Text Adventure Development System)
#  All files are machine-independent (games compile to byte-code) and are tagged
#  with a version string of the form "V2.<digit>.<digit>\0" (but TADS 3 is
#  on the way).
#  Game files start with "TADS2 bin\n\r\032\0" then the compiler version.
0	string	TADS2\ bin	TADS
>9	belong  !0x0A0D1A00	game data, CORRUPTED
>9	belong	 0x0A0D1A00
>>13	string	>\0		%s game data
#  Resource files start with "TADS2 rsc\n\r\032\0" then the compiler version.
0	string	TADS2\ rsc	TADS
>9	belong  !0x0A0D1A00	resource data, CORRUPTED
>9	belong	 0x0A0D1A00
>>13	string	>\0		%s resource data
#  Some saved game files start with "TADS2 save/g\n\r\032\0", a little-endian
#  2-byte length N, the N-char name of the game file *without* a NUL (darn!),
# "TADS2 save\n\r\032\0" and the interpreter version. 
0	string	TADS2\ save/g	TADS
>12	belong	!0x0A0D1A00	saved game data, CORRUPTED
>12	belong	 0x0A0D1A00
>>(16.s+32) string >\0		%s saved game data
#  Other saved game files start with "TADS2 save\n\r\032\0" and the interpreter
#  version.
0	string	TADS2\ save	TADS
>10	belong	!0x0A0D1A00	saved game data, CORRUPTED
>10	belong	 0x0A0D1A00
>>14	string	>\0		%s saved game data

#------------------------------------------------------------------------------
# allegro:  file(1) magic for Allegro datafiles
# Toby Deshane <hac@shoelace.digivill.net>
#
0 belong 0x736C6821   Allegro datafile (packed)
0 belong 0x736C682E   Allegro datafile (not packed/autodetect)
0 belong 0x736C682B   Allegro datafile (appended exe data)

#------------------------------------------------------------------------------
# alliant:  file(1) magic for Alliant FX series a.out files
#
# If the FX series is the one that had a processor with a 68K-derived
# instruction set, the "short" should probably become "beshort" and the
# "long" should probably become "belong".
# If it's the i860-based one, they should probably become either the
# big-endian or little-endian versions, depending on the mode they ran
# the 860 in....
#
0	short		0420		0420 Alliant virtual executable
>2	short		&0x0020		common library
>16	long		>0		not stripped
0	short		0421		0421 Alliant compact executable
>2	short		&0x0020		common library
>16	long		>0		not stripped
#------------------------------------------------------------------------------
# alpha architecture description
#

0	leshort		0603		COFF format alpha
>22	leshort&030000	!020000		executable
>24	leshort		0410		pure
>24	leshort		0413		paged
>22	leshort&020000	!0		dynamically linked
>16	lelong		!0		not stripped
>16	lelong		0		stripped
>22	leshort&030000	020000		shared library
>24	leshort		0407		object
>27	byte		x		- version %d
>26	byte		x		.%d
>28	byte		x		-%d

# Basic recognition of Digital UNIX core dumps - Mike Bremford <mike@opac.bl.uk>
#
# The actual magic number is just "Core", followed by a 2-byte version
# number; however, treating any file that begins with "Core" as a Digital
# UNIX core dump file may produce too many false hits, so we include one
# byte of the version number as well; DU 5.0 appears only to be up to
# version 2.
#
0	string		Core\001	Alpha COFF format core dump (Digital UNIX)
>24	string		>\0		\b, from '%s'
0	string		Core\002	Alpha COFF format core dump (Digital UNIX)
>24	string		>\0		\b, from '%s'

#------------------------------------------------------------------------------
# amanda:  file(1) magic for amanda file format
#
0	string	AMANDA:\ 		AMANDA 
>8	string	TAPESTART\ DATE		tape header file,
>>23	string	X
>>>25	string	>\ 			Unused %s
>>23	string	>\ 			DATE %s
>8	string	FILE\ 			dump file,
>>13	string	>\ 			DATE %s
#------------------------------------------------------------------------------
# amigaos:  file(1) magic for AmigaOS binary formats:

#
# From [email]ignatios@cs.uni-bonn.de[/email] (Ignatios Souvatzis)
#
0	belong		0x000003fa	AmigaOS shared library
0	belong		0x000003f3	AmigaOS loadseg()ble executable/binary
0	belong		0x000003e7	AmigaOS object/library data
#
0	beshort		0xe310		Amiga Workbench
>2	beshort		1		
>>48	byte		1		disk icon
>>48	byte		2		drawer icon
>>48	byte		3		tool icon
>>48	byte		4		project icon
>>48	byte		5		garbage icon
>>48	byte		6		device icon
>>48	byte		7		kickstart icon
>>48	byte		8		workbench application icon
>2	beshort		>1		icon, vers. %d
#
# various sound formats from the Amiga
# G=F6tz Waschk <waschk@informatik.uni-rostock.de>
#
0	string		FC14		Future Composer 1.4 Module sound file
0	string		SMOD		Future Composer 1.3 Module sound file
0	string		AON4artofnoise	Art Of Noise Module sound file
1	string		MUGICIAN/SOFTEYES Mugician Module sound file
58	string		SIDMON\ II\ -\ THE	Sidmon 2.0 Module sound file
0	string		Synth4.0	Synthesis Module sound file
0	string		ARP.		The Holy Noise Module sound file
0	string		BeEp\0		JamCracker Module sound file
0	string		COSO\0		Hippel-COSO Module sound file
# Too simple (short, pure ASCII, deep), MPi
#26	string		V.3		Brian Postma's Soundmon Module sound file v3
#26	string		BPSM		Brian Postma's Soundmon Module sound file v3
#26	string		V.2		Brian Postma's Soundmon Module sound file v2

# The following are from: "Stefan A. Haubenthal" <polluks@web.de>
0	beshort		0x0f00		AmigaOS bitmap font
0	beshort		0x0f03		AmigaOS outline font
0	belong		0x80001001	AmigaOS outline tag
0	string		##\ version	catalog translation
0	string		EMOD\0		Amiga E module
8	string		ECXM\0		ECX module
0	string/c	@database	AmigaGuide file

# Amiga disk types
# 
0	string		RDSK		Rigid Disk Block
>160	string		x		on %.24s
0	string		DOS\0		Amiga DOS disk
0	string		DOS\1		Amiga FFS disk
0	string		DOS\2		Amiga Inter DOS disk
0	string		DOS\3		Amiga Inter FFS disk
0	string		DOS\4		Amiga Fastdir DOS disk
0	string		DOS\5		Amiga Fastdir FFS disk
0	string		KICK		Kickstart disk

# From: Alex Beregszaszi <alex@fsn.hu>
0	string		LZX		LZX compressed archive (Amiga)


#------------------------------------------------------------------------------
# animation:  file(1) magic for animation/movie formats
#
# animation formats
# MPEG, FLI, DL originally from [email]vax@ccwf.cc.utexas.edu[/email] (VaX#n8)
# FLC, SGI, Apple originally from Daniel Quinlan (quinlan@yggdrasil.com)

# SGI and Apple formats
0	string		MOVI		Silicon Graphics movie file
4       string          moov            Apple QuickTime
>12     string          mvhd            \b movie (fast start)
>12     string          mdra            \b URL
>12     string          cmov            \b movie (fast start, compressed header)
>12     string          rmra            \b multiple URLs
4       string          mdat            Apple QuickTime movie (unoptimized)
4       string          wide            Apple QuickTime movie (unoptimized)
4       string          skip            Apple QuickTime movie (modified)
4       string          free            Apple QuickTime movie (modified)
4       string          idsc            Apple QuickTime image (fast start)
4       string          idat            Apple QuickTime image (unoptimized)
4       string          pckg            Apple QuickTime compressed archive
4	string/B	jP		JPEG 2000 image
4	string		ftyp		ISO Media
>8	string		isom		\b, MPEG v4 system, version 1
>8	string		iso2		\b, MPEG v4 system, part 12 revision
>8	string		mp41		\b, MPEG v4 system, version 1
>8	string		mp42		\b, MPEG v4 system, version 2
>8	string		mp7t		\b, MPEG v4 system, MPEG v7 XML
>8	string		mp7b		\b, MPEG v4 system, MPEG v7 binary XML
>8	string/B	jp2		\b, JPEG 2000
>8	string		3gp		\b, MPEG v4 system, 3GPP
>>11	byte		4		\b v4 (H.263/AMR GSM 6.10)
>>11	byte		5		\b v5 (H.263/AMR GSM 6.10)
>>11	byte		6		\b v6 (ITU H.264/AMR GSM 6.10)
>8	string		mmp4		\b, MPEG v4 system, 3GPP Mobile
>8	string		avc1		\b, MPEG v4 system, 3GPP JVT AVC
>8	string/B	M4A		\b, MPEG v4 system, iTunes AAC-LC
>8	string/B	M4P		\b, MPEG v4 system, iTunes AES encrypted
>8	string/B	M4B		\b, MPEG v4 system, iTunes bookmarked
>8	string/B	qt		\b, Apple QuickTime movie

# MPEG sequences
# Scans for all common MPEG header start codes
0        belong             0x00000001     JVT NAL sequence
>4       byte&0x1F          0x07           \b, H.264 video
>>5      byte               66             \b, baseline
>>5      byte               77             \b, main
>>5      byte               88             \b, extended
>>7      byte               x              \b @ L %u
0        belong&0xFFFFFF00  0x00000100     MPEG sequence
>3       byte               0xBA
>>4      byte               &0x40          \b, v2, program multiplex
>>4      byte               ^0x40          \b, v1, system multiplex
>3       byte               0xBB           \b, v1/2, multiplex (missing pack header)
>3       byte&0x1F          0x07           \b, H.264 video
>>4      byte               66             \b, baseline
>>4      byte               77             \b, main
>>4      byte               88             \b, extended
>>6      byte               x              \b @ L %u
>3       byte               0xB0           \b, v4
>>5      belong             0x000001B5
>>>9     byte               &0x80
>>>>10   byte&0xF0          16             \b, video
>>>>10   byte&0xF0          32             \b, still texture
>>>>10   byte&0xF0          48             \b, mesh
>>>>10   byte&0xF0          64             \b, face
>>>9     byte&0xF8          8              \b, video
>>>9     byte&0xF8          16             \b, still texture
>>>9     byte&0xF8          24             \b, mesh
>>>9     byte&0xF8          32             \b, face
>>4      byte               1              \b, simple @ L1
>>4      byte               2              \b, simple @ L2
>>4      byte               3              \b, simple @ L3
>>4      byte               4              \b, simple @ L0
>>4      byte               17             \b, simple scalable @ L1
>>4      byte               18             \b, simple scalable @ L2
>>4      byte               33             \b, core @ L1
>>4      byte               34             \b, core @ L2
>>4      byte               50             \b, main @ L2
>>4      byte               51             \b, main @ L3
>>4      byte               53             \b, main @ L4
>>4      byte               66             \b, n-bit @ L2
>>4      byte               81             \b, scalable texture @ L1
>>4      byte               97             \b, simple face animation @ L1
>>4      byte               98             \b, simple face animation @ L2
>>4      byte               99             \b, simple face basic animation @ L1
>>4      byte               100            \b, simple face basic animation @ L2
>>4      byte               113            \b, basic animation text @ L1
>>4      byte               114            \b, basic animation text @ L2
>>4      byte               129            \b, hybrid @ L1
>>4      byte               130            \b, hybrid @ L2
>>4      byte               145            \b, advanced RT simple @ L!
>>4      byte               146            \b, advanced RT simple @ L2
>>4      byte               147            \b, advanced RT simple @ L3
>>4      byte               148            \b, advanced RT simple @ L4
>>4      byte               161            \b, core scalable @ L1
>>4      byte               162            \b, core scalable @ L2
>>4      byte               163            \b, core scalable @ L3
>>4      byte               177            \b, advanced coding efficiency @ L1
>>4      byte               178            \b, advanced coding efficiency @ L2
>>4      byte               179            \b, advanced coding efficiency @ L3
>>4      byte               180            \b, advanced coding efficiency @ L4
>>4      byte               193            \b, advanced core @ L1
>>4      byte               194            \b, advanced core @ L2
>>4      byte               209            \b, advanced scalable texture @ L1
>>4      byte               210            \b, advanced scalable texture @ L2
>>4      byte               211            \b, advanced scalable texture @ L3
>>4      byte               225            \b, simple studio @ L1
>>4      byte               226            \b, simple studio @ L2
>>4      byte               227            \b, simple studio @ L3
>>4      byte               228            \b, simple studio @ L4
>>4      byte               229            \b, core studio @ L1
>>4      byte               230            \b, core studio @ L2
>>4      byte               231            \b, core studio @ L3
>>4      byte               232            \b, core studio @ L4
>>4      byte               240            \b, advanced simple @ L0
>>4      byte               241            \b, advanced simple @ L1
>>4      byte               242            \b, advanced simple @ L2
>>4      byte               243            \b, advanced simple @ L3
>>4      byte               244            \b, advanced simple @ L4
>>4      byte               245            \b, advanced simple @ L5
>>4      byte               247            \b, advanced simple @ L3b
>>4      byte               248            \b, FGS @ L0
>>4      byte               249            \b, FGS @ L1
>>4      byte               250            \b, FGS @ L2
>>4      byte               251            \b, FGS @ L3
>>4      byte               252            \b, FGS @ L4
>>4      byte               253            \b, FGS @ L5
>3       byte               0xB5           \b, v4
>>4      byte               &0x80
>>>5     byte&0xF0          16             \b, video (missing profile header)
>>>5     byte&0xF0          32             \b, still texture (missing profile header)
>>>5     byte&0xF0          48             \b, mesh (missing profile header)
>>>5     byte&0xF0          64             \b, face (missing profile header)
>>4      byte&0xF8          8              \b, video (missing profile header)
>>4      byte&0xF8          16             \b, still texture (missing profile header)
>>4      byte&0xF8          24             \b, mesh (missing profile header)
>>4      byte&0xF8          32             \b, face (missing profile header)
>3       byte               0xB3
>>12     belong             0x000001B8     \b, v1, progressive Y'CbCr 4:2:0 video
>>12     belong             0x000001B2     \b, v1, progressive Y'CbCr 4:2:0 video
>>12     belong             0x000001B5     \b, v2,
>>>16    byte&0x0F          1              \b HP
>>>16    byte&0x0F          2              \b Spt
>>>16    byte&0x0F          3              \b SNR
>>>16    byte&0x0F          4              \b MP
>>>16    byte&0x0F          5              \b SP
>>>17    byte&0xF0          64             \b@HL
>>>17    byte&0xF0          96             \b@H-14
>>>17    byte&0xF0          128            \b@ML
>>>17    byte&0xF0          160            \b@LL
>>>17    byte               &0x08          \b progressive
>>>17    byte               ^0x08          \b interlaced
>>>17    byte&0x06          2              \b Y'CbCr 4:2:0 video
>>>17    byte&0x06          4              \b Y'CbCr 4:2:2 video
>>>17    byte&0x06          6              \b Y'CbCr 4:4:4 video
>>11     byte               &0x02
>>>75    byte               &0x01
>>>>140  belong             0x000001B8     \b, v1, progressive Y'CbCr 4:2:0 video
>>>>140  belong             0x000001B2     \b, v1, progressive Y'CbCr 4:2:0 video
>>>>140  belong             0x000001B5     \b, v2,
>>>>>144 byte&0x0F          1              \b HP
>>>>>144 byte&0x0F          2              \b Spt
>>>>>144 byte&0x0F          3              \b SNR
>>>>>144 byte&0x0F          4              \b MP
>>>>>144 byte&0x0F          5              \b SP
>>>>>145 byte&0xF0          64             \b@HL
>>>>>145 byte&0xF0          96             \b@H-14
>>>>>145 byte&0xF0          128            \b@ML
>>>>>145 byte&0xF0          160            \b@LL
>>>>>145 byte               &0x08          \b progressive
>>>>>145 byte               ^0x08          \b interlaced
>>>>>145 byte&0x06          2              \b Y'CbCr 4:2:0 video
>>>>>145 byte&0x06          4              \b Y'CbCr 4:2:2 video
>>>>>145 byte&0x06          6              \b Y'CbCr 4:4:4 video
>>76    belong             0x000001B8     \b, v1, progressive Y'CbCr 4:2:0 video
>>76    belong             0x000001B2     \b, v1, progressive Y'CbCr 4:2:0 video
>>76    belong             0x000001B5     \b, v2,
>>>80   byte&0x0F          1              \b HP
>>>80   byte&0x0F          2              \b Spt
>>>80   byte&0x0F          3              \b SNR
>>>80   byte&0x0F          4              \b MP
>>>80   byte&0x0F          5              \b SP
>>>81   byte&0xF0          64             \b@HL
>>>81   byte&0xF0          96             \b@H-14
>>>81   byte&0xF0          128            \b@ML
>>>81   byte&0xF0          160            \b@LL
>>>81   byte               &0x08          \b progressive
>>>81   byte               ^0x08          \b interlaced
>>>81   byte&0x06          2              \b Y'CbCr 4:2:0 video
>>>81   byte&0x06          4              \b Y'CbCr 4:2:2 video
>>>81   byte&0x06          6              \b Y'CbCr 4:4:4 video
>>4      belong&0xFFFFFF00  0x78043800     \b, HD-TV 1920P
>>>7     byte&0xF0          0x10           \b, 16:9
>>4      belong&0xFFFFFF00  0x50002D00     \b, SD-TV 1280I
>>>7     byte&0xF0          0x10           \b, 16:9
>>4      belong&0xFFFFFF00  0x30024000     \b, PAL Capture
>>>7     byte&0xF0          0x10           \b, 4:3
>>4      beshort&0xFFF0     0x2C00         \b, 4CIF
>>>5     beshort&0x0FFF     0x01E0         \b NTSC
>>>5     beshort&0x0FFF     0x0240         \b PAL
>>>7     byte&0xF0          0x20           \b, 4:3
>>>7     byte&0xF0          0x30           \b, 16:9
>>>7     byte&0xF0          0x40           \b, 11:5
>>>7     byte&0xF0          0x80           \b, PAL 4:3
>>>7     byte&0xF0          0xC0           \b, NTSC 4:3
>>4      belong&0xFFFFFF00  0x2801E000     \b, LD-TV 640P
>>>7     byte&0xF0          0x10           \b, 4:3
>>4      belong&0xFFFFFF00  0x1400F000     \b, 320x240
>>>7     byte&0xF0          0x10           \b, 4:3
>>4      belong&0xFFFFFF00  0x0F00A000     \b, 240x160
>>>7     byte&0xF0          0x10           \b, 4:3
>>4      belong&0xFFFFFF00  0x0A007800     \b, 160x120
>>>7     byte&0xF0          0x10           \b, 4:3
>>4      beshort&0xFFF0     0x1600         \b, CIF
>>>5     beshort&0x0FFF     0x00F0         \b NTSC
>>>5     beshort&0x0FFF     0x0120         \b PAL
>>>7     byte&0xF0          0x20           \b, 4:3
>>>7     byte&0xF0          0x30           \b, 16:9
>>>7     byte&0xF0          0x40           \b, 11:5
>>>7     byte&0xF0          0x80           \b, PAL 4:3
>>>7     byte&0xF0          0xC0           \b, NTSC 4:3
>>>5     beshort&0x0FFF     0x0240         \b PAL 625
>>>>7    byte&0xF0          0x20           \b, 4:3
>>>>7    byte&0xF0          0x30           \b, 16:9
>>>>7    byte&0xF0          0x40           \b, 11:5
>>4      beshort&0xFFF0     0x2D00         \b, CCIR/ITU
>>>5     beshort&0x0FFF     0x01E0         \b NTSC 525
>>>5     beshort&0x0FFF     0x0240         \b PAL 625
>>>7     byte&0xF0          0x20           \b, 4:3
>>>7     byte&0xF0          0x30           \b, 16:9
>>>7     byte&0xF0          0x40           \b, 11:5
>>4      beshort&0xFFF0     0x1E00         \b, SVCD
>>>5     beshort&0x0FFF     0x01E0         \b NTSC 525
>>>5     beshort&0x0FFF     0x0240         \b PAL 625
>>>7     byte&0xF0          0x20           \b, 4:3
>>>7     byte&0xF0          0x30           \b, 16:9
>>>7     byte&0xF0          0x40           \b, 11:5
>>7      byte&0x0F          1              \b, 23.976 fps
>>7      byte&0x0F          2              \b, 24 fps
>>7      byte&0x0F          3              \b, 25 fps
>>7      byte&0x0F          4              \b, 29.97 fps
>>7      byte&0x0F          5              \b, 30 fps
>>7      byte&0x0F          6              \b, 50 fps
>>7      byte&0x0F          7              \b, 59.94 fps
>>7      byte&0x0F          8              \b, 60 fps
>>11     byte               &0x04          \b, Constrained

# MPEG ADTS Audio (*.mpx/mxa/aac)
# from [email]dreesen@math.fu-berlin.de[/email]
# modified to fully support MPEG ADTS

# MP3, M1A
0       beshort&0xFFFE  0xFFFA         MPEG ADTS, layer III, v1
# rates
>2      byte&0xF0       0x10           \b,  32 kBits
>2      byte&0xF0       0x20           \b,  40 kBits
>2      byte&0xF0       0x30           \b,  48 kBits
>2      byte&0xF0       0x40           \b,  56 kBits
>2      byte&0xF0       0x50           \b,  64 kBits
>2      byte&0xF0       0x60           \b,  80 kBits
>2      byte&0xF0       0x70           \b,  96 kBits
>2      byte&0xF0       0x80           \b, 112 kBits
>2      byte&0xF0       0x90           \b, 128 kBits
>2      byte&0xF0       0xA0           \b, 160 kBits
>2      byte&0xF0       0xB0           \b, 192 kBits
>2      byte&0xF0       0xC0           \b, 224 kBits
>2      byte&0xF0       0xD0           \b, 256 kBits
>2      byte&0xF0       0xE0           \b, 320 kBits
# timing
>2      byte&0x0C       0x00           \b, 44.1 kHz
>2      byte&0x0C       0x04           \b, 48 kHz
>2      byte&0x0C       0x08           \b, 32 kHz
# channels/options
>3      byte&0xC0       0x00           \b, Stereo
>3      byte&0xC0       0x40           \b, JntStereo
>3      byte&0xC0       0x80           \b, 2x Monaural
>3      byte&0xC0       0xC0           \b, Monaural
#>1     byte            ^0x01          \b, Data Verify
#>2     byte            &0x02          \b, Packet Pad
#>2     byte            &0x01          \b, Custom Flag
#>3     byte            &0x08          \b, Copyrighted
#>3     byte            &0x04          \b, Original Source
#>3     byte&0x03       1              \b, NR: 50/15 ms
#>3     byte&0x03       3              \b, NR: CCIT J.17

# MP2, M1A
0       beshort&0xFFFE  0xFFFC         MPEG ADTS, layer II, v1
# rates
>2      byte&0xF0       0x10           \b,  32 kBits
>2      byte&0xF0       0x20           \b,  48 kBits
>2      byte&0xF0       0x30           \b,  56 kBits
>2      byte&0xF0       0x40           \b,  64 kBits
>2      byte&0xF0       0x50           \b,  80 kBits
>2      byte&0xF0       0x60           \b,  96 kBits
>2      byte&0xF0       0x70           \b, 112 kBits
>2      byte&0xF0       0x80           \b, 128 kBits
>2      byte&0xF0       0x90           \b, 160 kBits
>2      byte&0xF0       0xA0           \b, 192 kBits
>2      byte&0xF0       0xB0           \b, 224 kBits
>2      byte&0xF0       0xC0           \b, 256 kBits
>2      byte&0xF0       0xD0           \b, 320 kBits
>2      byte&0xF0       0xE0           \b, 384 kBits
# timing
>2      byte&0x0C       0x00           \b, 44.1 kHz
>2      byte&0x0C       0x04           \b, 48 kHz
>2      byte&0x0C       0x08           \b, 32 kHz
# channels/options
>3      byte&0xC0       0x00           \b, Stereo
>3      byte&0xC0       0x40           \b, JntStereo
>3      byte&0xC0       0x80           \b, 2x Monaural
>3      byte&0xC0       0xC0           \b, Monaural
#>1     byte            ^0x01          \b, Data Verify
#>2     byte            &0x02          \b, Packet Pad
#>2     byte            &0x01          \b, Custom Flag
#>3     byte            &0x08          \b, Copyrighted
#>3     byte            &0x04          \b, Original Source
#>3     byte&0x03       1              \b, NR: 50/15 ms
#>3     byte&0x03       3              \b, NR: CCIT J.17

# MPA, M1A
# updated by Joerg Jenderek
# GRR the original test are too common for many DOS files, so test 32 <= kbits <= 448
0	beshort&0xFFFE		0xFFFE	
>2	ubyte&0xF0	>0x0F		
>>2	ubyte&0xF0	<0xE1		MPEG ADTS, layer I, v1
# rate
>>>2      byte&0xF0       0x10           \b,  32 kBits
>>>2      byte&0xF0       0x20           \b,  64 kBits
>>>2      byte&0xF0       0x30           \b,  96 kBits
>>>2      byte&0xF0       0x40           \b, 128 kBits
>>>2      byte&0xF0       0x50           \b, 160 kBits
>>>2      byte&0xF0       0x60           \b, 192 kBits
>>>2      byte&0xF0       0x70           \b, 224 kBits
>>>2      byte&0xF0       0x80           \b, 256 kBits
>>>2      byte&0xF0       0x90           \b, 288 kBits
>>>2      byte&0xF0       0xA0           \b, 320 kBits
>>>2      byte&0xF0       0xB0           \b, 352 kBits
>>>2      byte&0xF0       0xC0           \b, 384 kBits
>>>2      byte&0xF0       0xD0           \b, 416 kBits
>>>2      byte&0xF0       0xE0           \b, 448 kBits
# timing
>>>2      byte&0x0C       0x00           \b, 44.1 kHz
>>>2      byte&0x0C       0x04           \b, 48 kHz
>>>2      byte&0x0C       0x08           \b, 32 kHz
# channels/options
>>>3      byte&0xC0       0x00           \b, Stereo
>>>3      byte&0xC0       0x40           \b, JntStereo
>>>3      byte&0xC0       0x80           \b, 2x Monaural
>>>3      byte&0xC0       0xC0           \b, Monaural
#>1     byte            ^0x01          \b, Data Verify
#>2     byte            &0x02          \b, Packet Pad
#>2     byte            &0x01          \b, Custom Flag
#>3     byte            &0x08          \b, Copyrighted
#>3     byte            &0x04          \b, Original Source
#>3     byte&0x03       1              \b, NR: 50/15 ms
#>3     byte&0x03       3              \b, NR: CCIT J.17

# MP3, M2A
0       beshort&0xFFFE  0xFFF2         MPEG ADTS, layer III, v2
# rate
>2      byte&0xF0       0x10           \b,   8 kBits
>2      byte&0xF0       0x20           \b,  16 kBits
>2      byte&0xF0       0x30           \b,  24 kBits
>2      byte&0xF0       0x40           \b,  32 kBits
>2      byte&0xF0       0x50           \b,  40 kBits
>2      byte&0xF0       0x60           \b,  48 kBits
>2      byte&0xF0       0x70           \b,  56 kBits
>2      byte&0xF0       0x80           \b,  64 kBits
>2      byte&0xF0       0x90           \b,  80 kBits
>2      byte&0xF0       0xA0           \b,  96 kBits
>2      byte&0xF0       0xB0           \b, 112 kBits
>2      byte&0xF0       0xC0           \b, 128 kBits
>2      byte&0xF0       0xD0           \b, 144 kBits
>2      byte&0xF0       0xE0           \b, 160 kBits
# timing
>2      byte&0x0C       0x00           \b, 22.05 kHz
>2      byte&0x0C       0x04           \b, 24 kHz
>2      byte&0x0C       0x08           \b, 16 kHz
# channels/options
>3      byte&0xC0       0x00           \b, Stereo
>3      byte&0xC0       0x40           \b, JntStereo
>3      byte&0xC0       0x80           \b, 2x Monaural
>3      byte&0xC0       0xC0           \b, Monaural
#>1     byte            ^0x01          \b, Data Verify
#>2     byte            &0x02          \b, Packet Pad
#>2     byte            &0x01          \b, Custom Flag
#>3     byte            &0x08          \b, Copyrighted
#>3     byte            &0x04          \b, Original Source
#>3     byte&0x03       1              \b, NR: 50/15 ms
#>3     byte&0x03       3              \b, NR: CCIT J.17

# MP2, M2A
0       beshort&0xFFFE  0xFFF4         MPEG ADTS, layer II, v2
# rate 
>2      byte&0xF0       0x10           \b,   8 kBits
>2      byte&0xF0       0x20           \b,  16 kBits 
>2      byte&0xF0       0x30           \b,  24 kBits
>2      byte&0xF0       0x40           \b,  32 kBits
>2      byte&0xF0       0x50           \b,  40 kBits
>2      byte&0xF0       0x60           \b,  48 kBits
>2      byte&0xF0       0x70           \b,  56 kBits
>2      byte&0xF0       0x80           \b,  64 kBits
>2      byte&0xF0       0x90           \b,  80 kBits
>2      byte&0xF0       0xA0           \b,  96 kBits
>2      byte&0xF0       0xB0           \b, 112 kBits
>2      byte&0xF0       0xC0           \b, 128 kBits
>2      byte&0xF0       0xD0           \b, 144 kBits
>2      byte&0xF0       0xE0           \b, 160 kBits
# timing
>2      byte&0x0C       0x00           \b, 22.05 kHz
>2      byte&0x0C       0x04           \b, 24 kHz
>2      byte&0x0C       0x08           \b, 16 kHz
# channels/options
>3      byte&0xC0       0x00           \b, Stereo
>3      byte&0xC0       0x40           \b, JntStereo
>3      byte&0xC0       0x80           \b, 2x Monaural
>3      byte&0xC0       0xC0           \b, Monaural
#>1     byte            ^0x01          \b, Data Verify
#>2     byte            &0x02          \b, Packet Pad
#>2     byte            &0x01          \b, Custom Flag
#>3     byte            &0x08          \b, Copyrighted
#>3     byte            &0x04          \b, Original Source
#>3     byte&0x03       1              \b, NR: 50/15 ms
#>3     byte&0x03       3              \b, NR: CCIT J.17

# MPA, M2A
0       beshort&0xFFFE  0xFFF6         MPEG ADTS, layer I, v2
# rate
>2      byte&0xF0       0x10           \b,  32 kBits
>2      byte&0xF0       0x20           \b,  48 kBits
>2      byte&0xF0       0x30           \b,  56 kBits
>2      byte&0xF0       0x40           \b,  64 kBits
>2      byte&0xF0       0x50           \b,  80 kBits
>2      byte&0xF0       0x60           \b,  96 kBits
>2      byte&0xF0       0x70           \b, 112 kBits
>2      byte&0xF0       0x80           \b, 128 kBits
>2      byte&0xF0       0x90           \b, 144 kBits
>2      byte&0xF0       0xA0           \b, 160 kBits
>2      byte&0xF0       0xB0           \b, 176 kBits
>2      byte&0xF0       0xC0           \b, 192 kBits
>2      byte&0xF0       0xD0           \b, 224 kBits
>2      byte&0xF0       0xE0           \b, 256 kBits
# timing
>2      byte&0x0C       0x00           \b, 22.05 kHz
>2      byte&0x0C       0x04           \b, 24 kHz
>2      byte&0x0C       0x08           \b, 16 kHz
# channels/options
>3      byte&0xC0       0x00           \b, Stereo
>3      byte&0xC0       0x40           \b, JntStereo
>3      byte&0xC0       0x80           \b, 2x Monaural
>3      byte&0xC0       0xC0           \b, Monaural
#>1     byte            ^0x01          \b, Data Verify
#>2     byte            &0x02          \b, Packet Pad
#>2     byte            &0x01          \b, Custom Flag
#>3     byte            &0x08          \b, Copyrighted
#>3     byte            &0x04          \b, Original Source
#>3     byte&0x03       1              \b, NR: 50/15 ms
#>3     byte&0x03       3              \b, NR: CCIT J.17

# MP3, M25A
0       beshort&0xFFFE  0xFFE2         MPEG ADTS, layer III,  v2.5
# rate  
>2      byte&0xF0       0x10           \b,   8 kBits
>2      byte&0xF0       0x20           \b,  16 kBits
>2      byte&0xF0       0x30           \b,  24 kBits
>2      byte&0xF0       0x40           \b,  32 kBits
>2      byte&0xF0       0x50           \b,  40 kBits
>2      byte&0xF0       0x60           \b,  48 kBits
>2      byte&0xF0       0x70           \b,  56 kBits
>2      byte&0xF0       0x80           \b,  64 kBits
>2      byte&0xF0       0x90           \b,  80 kBits
>2      byte&0xF0       0xA0           \b,  96 kBits
>2      byte&0xF0       0xB0           \b, 112 kBits
>2      byte&0xF0       0xC0           \b, 128 kBits
>2      byte&0xF0       0xD0           \b, 144 kBits
>2      byte&0xF0       0xE0           \b, 160 kBits
# timing
>2      byte&0x0C       0x00           \b, 11.025 kHz
>2      byte&0x0C       0x04           \b, 12 kHz
>2      byte&0x0C       0x08           \b, 8 kHz
# channels/options
>3      byte&0xC0       0x00           \b, Stereo
>3      byte&0xC0       0x40           \b, JntStereo
>3      byte&0xC0       0x80           \b, 2x Monaural
>3      byte&0xC0       0xC0           \b, Monaural
#>1     byte            ^0x01          \b, Data Verify
#>2     byte            &0x02          \b, Packet Pad
#>2     byte            &0x01          \b, Custom Flag
#>3     byte            &0x08          \b, Copyrighted
#>3     byte            &0x04          \b, Original Source
#>3     byte&0x03       1              \b, NR: 50/15 ms
#>3     byte&0x03       3              \b, NR: CCIT J.17

# AAC (aka MPEG-2 NBC audio) and MPEG-4 audio

# Stored AAC streams (instead of the MP4 format)
0       string          ADIF           MPEG ADIF, AAC
>4      byte            &0x80
>>13    byte            &0x10          \b, VBR
>>13    byte            ^0x10          \b, CBR
>>16    byte&0x1E       0x02           \b, single stream
>>16    byte&0x1E       0x04           \b, 2 streams
>>16    byte&0x1E       0x06           \b, 3 streams
>>16    byte            &0x08          \b, 4 or more streams
>>16    byte            &0x10          \b, 8 or more streams
>>4    byte            &0x80          \b, Copyrighted
>>13   byte            &0x40          \b, Original Source
>>13   byte            &0x20          \b, Home Flag
>4      byte            ^0x80
>>4     byte            &0x10          \b, VBR
>>4     byte            ^0x10          \b, CBR
>>7     byte&0x1E       0x02           \b, single stream
>>7     byte&0x1E       0x04           \b, 2 streams
>>7     byte&0x1E       0x06           \b, 3 streams
>>7     byte            &0x08          \b, 4 or more streams
>>7     byte            &0x10          \b, 8 or more streams
>>4    byte            &0x40          \b, Original Stream(s)
>>4    byte            &0x20          \b, Home Source

# Live or stored single AAC stream (used with MPEG-2 systems)
0       beshort&0xFFF6  0xFFF0         MPEG ADTS, AAC
>1      byte            &0x08          \b, v2
>1      byte            ^0x08          \b, v4
# profile
>>2     byte            &0xC0          \b LTP
>2      byte&0xc0       0x00           \b Main
>2      byte&0xc0       0x40           \b LC
>2      byte&0xc0       0x80           \b SSR
# timing
>2      byte&0x3c       0x00           \b, 96 kHz
>2      byte&0x3c       0x04           \b, 88.2 kHz
>2      byte&0x3c       0x08           \b, 64 kHz
>2      byte&0x3c       0x0c           \b, 48 kHz
>2      byte&0x3c       0x10           \b, 44.1 kHz
>2      byte&0x3c       0x14           \b, 32 kHz
>2      byte&0x3c       0x18           \b, 24 kHz
>2      byte&0x3c       0x1c           \b, 22.05 kHz
>2      byte&0x3c       0x20           \b, 16 kHz
>2      byte&0x3c       0x24           \b, 12 kHz
>2      byte&0x3c       0x28           \b, 11.025 kHz
>2      byte&0x3c       0x2c           \b, 8 kHz
# channels
>2      beshort&0x01c0  0x0040         \b, monaural
>2      beshort&0x01c0  0x0080         \b, stereo
>2      beshort&0x01c0  0x00c0         \b, stereo + center
>2      beshort&0x01c0  0x0100         \b, stereo+center+LFE
>2      beshort&0x01c0  0x0140         \b, surround
>2      beshort&0x01c0  0x0180         \b, surround + LFE
>2      beshort         &0x01C0        \b, surround + side
#>1     byte            ^0x01           \b, Data Verify
#>2     byte            &0x02           \b, Custom Flag
#>3     byte            &0x20           \b, Original Stream
#>3     byte            &0x10           \b, Home Source
#>3     byte            &0x08           \b, Copyrighted

# Live MPEG-4 audio streams (instead of RTP FlexMux)
0       beshort&0xFFE0  0x56E0         MPEG-4 LOAS
#>1     beshort&0x1FFF  x              \b, %u byte packet
>3      byte&0xE0       0x40
>>4     byte&0x3C       0x04           \b, single stream
>>4     byte&0x3C       0x08           \b, 2 streams
>>4     byte&0x3C       0x0C           \b, 3 streams
>>4     byte            &0x08          \b, 4 or more streams
>>4     byte            &0x20          \b, 8 or more streams
>3      byte&0xC0       0
>>4     byte&0x78       0x08           \b, single stream
>>4     byte&0x78       0x10           \b, 2 streams
>>4     byte&0x78       0x18           \b, 3 streams
>>4     byte            &0x20          \b, 4 or more streams
>>4     byte            &0x40          \b, 8 or more streams
0       beshort         0x4DE1         MPEG-4 LO-EP audio stream

# FLI animation format
4	leshort		0xAF11			FLI file
>6	leshort		x			- %d frames,
>8	leshort		x			width=%d pixels,
>10	leshort		x			height=%d pixels,
>12	leshort		x			depth=%d,
>16	leshort		x			ticks/frame=%d
# FLC animation format
4	leshort		0xAF12			FLC file
>6	leshort		x			- %d frames
>8	leshort		x			width=%d pixels,
>10	leshort		x			height=%d pixels,
>12	leshort		x			depth=%d,
>16	leshort		x			ticks/frame=%d

# DL animation format
# XXX - collision with most `mips' magic
#
# I couldn't find a real magic number for these, however, this
# -appears- to work.  Note that it might catch other files, too, so be
# careful!
#
# Note that title and author appear in the two 20-byte chunks
# at decimal offsets 2 and 22, respectively, but they are XOR'ed with
# 255 (hex FF)!  The DL format is really bad.
#
#0	byte	1	DL version 1, medium format (160x100, 4 images/screen)
#>42	byte	x	- %d screens,
#>43	byte	x	%d commands
#0	byte	2	DL version 2
#>1	byte	1	- large format (320x200,1 image/screen),
#>1	byte	2	- medium format (160x100,4 images/screen),
#>1	byte	>2	- unknown format,
#>42	byte	x	%d screens,
#>43	byte	x	%d commands
# Based on empirical evidence, DL version 3 have several nulls following the
# \003.  Most of them start with non-null values at hex offset 0x34 or so.
#0	string	\3\0\0\0\0\0\0\0\0\0\0\0	DL version 3

# iso 13818 transport stream
#
# from Oskar Schirmer <schirmer@scara.com> Feb 3, 2001 (ISO 13818.1)
# (the following is a little bit restrictive and works fine for a stream
#  that starts with PAT properly. it won't work for stream data, that is
#  cut from an input device data right in the middle, but this shouldn't
#  disturb)
# syncbyte      8 bit	0x47
# error_ind     1 bit	-
# payload_start 1 bit	1
# priority      1 bit	-
# PID          13 bit	0x0000
# scrambling    2 bit	-
# adaptfld_ctrl 2 bit	1 or 3
# conti_count   4 bit	0
0	belong&0xFF5FFF1F	0x47400010	MPEG transport stream data
>188	byte			!0x47		CORRUPTED

# DIF digital video file format <mpruett@sgi.com>
0	belong&0xffffff00	0x1f070000      DIF
>4	byte			&0x01		(DVCPRO) movie file
>4	byte			^0x01		(DV) movie file
>3	byte			&0x80		(PAL)
>3	byte			^0x80		(NTSC)

# Microsoft Advanced Streaming Format (ASF) <mpruett@sgi.com>
0	belong			0x3026b275	Microsoft ASF

# MNG Video Format, <URL:http://www.libpng.org/pub/mng/spec/>
0	string			\x8aMNG		MNG video data,
>4	belong			!0x0d0a1a0a	CORRUPTED,
>4	belong			0x0d0a1a0a
>>16    belong	x				%ld x
>>20    belong	x				%ld

# JNG Video Format, <URL:http://www.libpng.org/pub/mng/spec/>
0	string			\x8bJNG		JNG video data,
>4	belong			!0x0d0a1a0a	CORRUPTED,
>4	belong			0x0d0a1a0a
>>16    belong	x				%ld x
>>20    belong	x				%ld

# Vivo video (Wolfram Kleff)
3	string		\x0D\x0AVersion:Vivo	Vivo video data

# VRML (Virtual Reality Modelling Language)
0       string/b        #VRML\ V1.0\ ascii	VRML 1 file
0	string/b	#VRML\ V2.0\ utf8	ISO/IEC 14772 VRML 97 file

#---------------------------------------------------------------------------
# HVQM4: compressed movie format designed by Hudson for Nintendo GameCube
# From Mark Sheppard <msheppard@climax.co.uk>, 2002-10-03
#
0	string		HVQM4		%s
>6	string		>\0		v%s
>0	byte		x		GameCube movie,
>0x34	ubeshort	x		%d x
>0x36	ubeshort	x		%d,
>0x26	ubeshort	x		%dµs,
>0x42	ubeshort	0		no audio
>0x42	ubeshort	>0		%dHz audio

# From: "Stefan A. Haubenthal" <polluks@web.de>
0	string		DVDVIDEO-VTS	Video title set,
>0x21	byte		x		v%x
0	string		DVDVIDEO-VMG	Video manager,
>0x21	byte		x		v%x

#------------------------------------------------------------------------------
# apl:  file(1) magic for APL (see also "pdp" and "vax" for other APL
#       workspaces)
#
0	long		0100554		APL workspace (Ken's original?)

#------------------------------------------------------------------------------
# apple:  file(1) magic for Apple file formats
#
0	string		FiLeStArTfIlEsTaRt	binscii (apple ][) text
0	string		\x0aGL			Binary II (apple ][) data
0	string		\x76\xff		Squeezed (apple ][) data
0	string		NuFile			NuFile archive (apple ][) data
0	string		N\xf5F\xe9l\xe5		NuFile archive (apple ][) data
0	belong		0x00051600		AppleSingle encoded Macintosh file
0	belong		0x00051607		AppleDouble encoded Macintosh file

# magic for Newton PDA package formats
# from Ruda Moura <ruda@helllabs.org>
0	string	package0	Newton package, NOS 1.x,
>12	belong	&0x80000000	AutoRemove,
>12	belong	&0x40000000	CopyProtect,
>12	belong	&0x10000000	NoCompression,
>12	belong	&0x04000000	Relocation,
>12	belong	&0x02000000	UseFasterCompression,
>16	belong	x		version %d

0	string	package1	Newton package, NOS 2.x,
>12	belong	&0x80000000	AutoRemove,
>12	belong	&0x40000000	CopyProtect,
>12	belong	&0x10000000	NoCompression,
>12	belong	&0x04000000	Relocation,
>12	belong	&0x02000000	UseFasterCompression,
>16	belong	x		version %d

0	string	package4	Newton package,
>8	byte	8		NOS 1.x,
>8	byte	9		NOS 2.x,
>12	belong	&0x80000000	AutoRemove,
>12	belong	&0x40000000	CopyProtect,
>12	belong	&0x10000000	NoCompression,

# The following entries for the Apple II are for files that have
# been transferred as raw binary data from an Apple, without having
# been encapsulated by any of the above archivers.
#
# In general, Apple II formats are hard to identify because Apple DOS
# and especially Apple ProDOS have strong typing in the file system and
# therefore programmers never felt much need to include type information
# in the files themselves.
#
# Eric Fischer <enf@pobox.com>

# AppleWorks word processor:
#
# This matches the standard tab stops for an AppleWorks file, but if
# a file has a tab stop set in the first four columns this will fail.
#
# The "O" is really the magic number, but that's so common that it's
# necessary to check the tab stops that follow it to avoid false positives.

4       string          O====   AppleWorks word processor data
>85     byte&0x01       >0      \b, zoomed
>90     byte&0x01       >0      \b, paginated
>92     byte&0x01       >0      \b, with mail merge
#>91    byte            x       \b, left margin %d

# AppleWorks database:
#
# This isn't really a magic number, but it's the closest thing to one
# that I could find.  The 1 and 2 really mean "order in which you defined
# categories" and "left to right, top to bottom," respectively; the D and R
# mean that the cursor should move either down or right when you press Return.

#30	string		\x01D	AppleWorks database data
#30	string		\x02D	AppleWorks database data
#30	string		\x01R	AppleWorks database data
#30	string		\x02R	AppleWorks database data

# AppleWorks spreadsheet:
#
# Likewise, this isn't really meant as a magic number.  The R or C means
# row- or column-order recalculation; the A or M means automatic or manual
# recalculation.

#131	string		RA	AppleWorks spreadsheet data
#131	string		RM	AppleWorks spreadsheet data
#131	string		CA	AppleWorks spreadsheet data
#131	string		CM	AppleWorks spreadsheet data

# Applesoft BASIC:
#
# This is incredibly sloppy, but will be true if the program was
# written at its usual memory location of 2048 and its first line
# number is less than 256.  Yuck.

0       belong&0xff00ff 0x80000 Applesoft BASIC program data
#>2     leshort         x       \b, first line number %d

# ORCA/EZ assembler:
# 
# This will not identify ORCA/M source files, since those have
# some sort of date code instead of the two zero bytes at 6 and 7
# XXX Conflicts with ELF
#4       belong&0xff00ffff       0x01000000      ORCA/EZ assembler source data
#>5      byte                    x               \b, build number %d

# Broderbund Fantavision
#
# I don't know what these values really mean, but they seem to recur.
# Will they cause too many conflicts?

# Probably :-)
#2	belong&0xFF00FF		0x040008	Fantavision movie data

# Some attempts at images.
#
# These are actually just bit-for-bit dumps of the frame buffer, so
# there's really no reasonably way to distinguish them except for their
# address (if preserved) -- 8192 or 16384 -- and their length -- 8192
# or, occasionally, 8184.
#
# Nevertheless this will manage to catch a lot of images that happen
# to have a solid-colored line at the bottom of the screen.

8144	string	\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F	Apple II image with white background
8144	string	\x55\x2A\x55\x2A\x55\x2A\x55\x2A	Apple II image with purple background
8144	string	\x2A\x55\x2A\x55\x2A\x55\x2A\x55	Apple II image with green background
8144	string	\xD5\xAA\xD5\xAA\xD5\xAA\xD5\xAA	Apple II image with blue background
8144	string	\xAA\xD5\xAA\xD5\xAA\xD5\xAA\xD5	Apple II image with orange background

# Beagle Bros. Apple Mechanic fonts

0	belong&0xFF00FFFF	0x6400D000	Apple Mechanic font

# Apple Universal Disk Image Format (UDIF) - dmg files.
# From Johan Gade.
# These entries are disabled for now until we fix the following issues.
#
# Note there might be some problems with the "VAX COFF executable" 
# entry. Note this entry should be placed before the mac filesystem section, 
# particularly the "Apple Partition data" entry.
#
# The intended meaning of these tests is, that the file is only of the 
# specified type if both of the lines are correct - i.e. if the first
# line matches and the second doesn't then it is not of that type.
#
#0	long	0x7801730d
#>4	long	0x62626060	UDIF read-only zlib-compressed image (UDZO)
#
# Note that this entry is recognized correctly by the "Apple Partition 
# data" entry - however since this entry is more specific - this
# information seems to be more useful.
#0	long	0x45520200
#>0x410	string	disk\ image	UDIF read/write image (UDRW)

# From: Toby Peterson <toby@apple.com>
0	string	bplist00	Apple binary property list

# Apple binary property list (bplist)
#  Assumes version bytes are hex.
#  Provides content hints for version 0 files. Assumes that the root
#  object is the first object (true for CoreFoundation implementation).
# From: David Remahl <dremahl@apple.com>
0		string	bplist
>6		byte	x	\bCoreFoundation binary property list data, version 0x%c
>>7		byte	x	\b%c
>6		string		00		\b
>>8		byte&0xF0	0x00	\b
>>>8	byte&0x0F	0x00	\b, root type: null
>>>8	byte&0x0F	0x08	\b, root type: false boolean
>>>8	byte&0x0F	0x09	\b, root type: true boolean
>>8		byte&0xF0	0x10	\b, root type: integer
>>8		byte&0xF0	0x20	\b, root type: real
>>8		byte&0xF0	0x30	\b, root type: date
>>8		byte&0xF0	0x40    \b, root type: data
>>8		byte&0xF0	0x50	\b, root type: ascii string
>>8		byte&0xF0	0x60	\b, root type: unicode string
>>8		byte&0xF0	0x80	\b, root type: uid (CORRUPT)
>>8		byte&0xF0	0xa0	\b, root type: array
>>8		byte&0xF0	0xd0	\b, root type: dictionary

# Apple/NeXT typedstream data
#  Serialization format used by NeXT and Apple for various
#  purposes in YellowStep/Cocoa, including some nib files.
# From: David Remahl <dremahl@apple.com>
2		string		typedstream	NeXT/Apple typedstream data, big endian
>0		byte		x		\b, version %hhd
>0		byte		<5		\b
>>13	byte		0x81	\b
>>>14	ubeshort	x		\b, system %hd
2		string		streamtyped NeXT/Apple typedstream data, little endian
>0		byte		x		\b, version %hhd
>0		byte		<5		\b
>>13	byte		0x81	\b
>>>14	uleshort	x		\b, system %hd

#------------------------------------------------------------------------------
# applix:  file(1) magic for Applixware
# From: Peter Soos <sp@osb.hu>
#
0	string		*BEGIN		Applixware
>7	string		WORDS			Words Document
>7	string		GRAPHICS		Graphic
>7	string		RASTER			Bitmap
>7	string		SPREADSHEETS		Spreadsheet
>7	string		MACRO			Macro
>7	string		BUILDER			Builder Object

#------------------------------------------------------------------------------
# archive:  file(1) magic for archive formats (see also "msdos" for self-
#           extracting compressed archives)
#
# cpio, ar, arc, arj, hpack, lha/lharc, rar, squish, uc2, zip, zoo, etc.
# pre-POSIX "tar" archives are handled in the C code.

# POSIX tar archives
257	string		ustar\0		POSIX tar archive
257	string		ustar\040\040\0	GNU tar archive

# cpio archives
#
# Yes, the top two "cpio archive" formats *are* supposed to just be "short".
# The idea is to indicate archives produced on machines with the same
# byte order as the machine running "file" with "cpio archive", and
# to indicate archives produced on machines with the opposite byte order
# from the machine running "file" with "byte-swapped cpio archive".
#
# The SVR4 "cpio(4)" hints that there are additional formats, but they
# are defined as "short"s; I think all the new formats are
# character-header formats and thus are strings, not numbers.
0	short		070707		cpio archive
0	short		0143561		byte-swapped cpio archive
0	string		070707		ASCII cpio archive (pre-SVR4 or odc)
0	string		070701		ASCII cpio archive (SVR4 with no CRC)
0	string		070702		ASCII cpio archive (SVR4 with CRC)

# Debian package (needs to go before regular portable archives)
#
0	string		=!<arch>\ndebian
>8	string		debian-split	part of multipart Debian package
>8	string		debian-binary	Debian binary package
>68	string		>\0		(format %s)
# These next two lines do not work, because a bzip2 Debian archive
# still uses gzip for the control.tar (first in the archive).  Only
# data.tar varies, and the location of its filename varies too.
# file/libmagic does not current have support for ascii-string based
# (offsets) as of 2005-09-15.
#>81	string		bz2		\b, uses bzip2 compression
#>84	string		gz		\b, uses gzip compression
#>136	ledate		x		created: %s

# other archives
0	long		0177555		very old archive
0	short		0177555		very old PDP-11 archive
0	long		0177545		old archive
0	short		0177545		old PDP-11 archive
0	long		0100554		apl workspace
0	string		=<ar>		archive

# MIPS archive (needs to go before regular portable archives)
#
0	string	=!<arch>\n__________E	MIPS archive
>20	string	U			with MIPS Ucode members
>21	string	L			with MIPSEL members
>21	string	B			with MIPSEB members
>19	string	L			and an EL hash table
>19	string	B			and an EB hash table
>22	string	X			-- out of date

0	string		-h-		Software Tools format archive text

#
# XXX - why are there multiple <ar> thingies?  Note that 0x213c6172 is
# "!<ar", so, for new-style (4.xBSD/SVR2andup) archives, we have:
#
# 0	string		=!<arch>		current ar archive
# 0	long		0x213c6172	archive file
#
# and for SVR1 archives, we have:
#
# 0	string		\<ar>		System V Release 1 ar archive
# 0	string		=<ar>		archive
#
# XXX - did Aegis really store shared libraries, breakpointed modules,
# and absolute code program modules in the same format as new-style
# "ar" archives?
#
0	string		=!<arch>		current ar archive
>8	string		__.SYMDEF	random library
>0	belong		=65538		- pre SR9.5
>0	belong		=65539		- post SR9.5
>0	beshort		2		- object archive
>0	beshort		3		- shared library module
>0	beshort		4		- debug break-pointed module
>0	beshort		5		- absolute code program module
0	string		\<ar>		System V Release 1 ar archive
0	string		=<ar>		archive
#
# XXX - from "vax", which appears to collect a bunch of byte-swapped
# thingies, to help you recognize VAX files on big-endian machines;
# with "leshort", "lelong", and "string", that's no longer necessary....
#
0	belong		0x65ff0000	VAX 3.0 archive
0	belong		0x3c61723e	VAX 5.0 archive
#
0	long		0x213c6172	archive file
0	lelong		0177555		very old VAX archive
0	leshort		0177555		very old PDP-11 archive
#
# XXX - "pdp" claims that 0177545 can have an __.SYMDEF member and thus
# be a random library (it said 0xff65 rather than 0177545).
#
0	lelong		0177545		old VAX archive
>8	string		__.SYMDEF	random library
0	leshort		0177545		old PDP-11 archive
>8	string		__.SYMDEF	random library
#
# From "pdp" (but why a 4-byte quantity?)
#
0	lelong		0x39bed		PDP-11 old archive
0	lelong		0x39bee		PDP-11 4.0 archive

# ARC archiver, from Daniel Quinlan (quinlan@yggdrasil.com)
#
# The first byte is the magic (0x1a), byte 2 is the compression type for
# the first file (0x01 through 0x09), and bytes 3 to 15 are the MS-DOS
# filename of the first file (null terminated).  Since some types collide
# we only test some types on basis of frequency: 0x08 (83%), 0x09 (5%),
# 0x02 (5%), 0x03 (3%), 0x04 (2%), 0x06 (2%).  0x01 collides with terminfo.
0	lelong&0x8080ffff	0x0000081a	ARC archive data, dynamic LZW
0	lelong&0x8080ffff	0x0000091a	ARC archive data, squashed
0	lelong&0x8080ffff	0x0000021a	ARC archive data, uncompressed
0	lelong&0x8080ffff	0x0000031a	ARC archive data, packed
0	lelong&0x8080ffff	0x0000041a	ARC archive data, squeezed
0	lelong&0x8080ffff	0x0000061a	ARC archive data, crunched
# [JW] stuff taken from idarc, obviously ARC successors:
0	lelong&0x8080ffff	0x00000a1a	PAK archive data
0	lelong&0x8080ffff	0x0000141a	ARC+ archive data
0	lelong&0x8080ffff	0x0000481a	HYP archive data

# Acorn archive formats (Disaster prone simpleton, [email]m91dps@ecs.ox.ac.uk[/email])
# I can't create either SPARK or ArcFS archives so I have not tested this stuff
# [GRR:  the original entries collide with ARC, above; replaced with combined
#  version (not tested)]
#0	byte		0x1a		RISC OS archive (spark format)
0	string		\032archive	RISC OS archive (ArcFS format)
0       string          Archive\000     RISC OS archive (ArcFS format)

# All these were taken from idarc, many could not be verified. Unfortunately,
# there were many low-quality sigs, i.e. easy to trigger false positives.
# Please notify me of any real-world fishy/ambiguous signatures and I'll try
# to get my hands on the actual archiver and see if I find something better. [JW]
# probably many can be enhanced by finding some 0-byte or control char near the start

/usr/share/misc/magic.mime
# Magic data for KMimeMagic (originally for file(1) command)
#
# The format is 4-5 columns:
#    Column #1: byte number to begin checking from, ">" indicates continuation
#    Column #2: type of data to match
#    Column #3: contents of data to match
#    Column #4: MIME type of result
#    Column #5: MIME encoding of result (optional)

#------------------------------------------------------------------------------
# Localstuff:  file(1) magic for locally observed files
# Add any locally observed files here.

# Real Audio (Magic .ra\0375)
0	belong		0x2e7261fd	audio/x-pn-realaudio
0	string		.RMF		application/vnd.rn-realmedia

#video/x-pn-realvideo
#video/vnd.rn-realvideo
#application/vnd.rn-realmedia
#	sigh, there are many mimes for that but the above are the most common.

# Taken from magic, converted to magic.mime
# mime types according to [url]http://www.geocities.com/nevilo/mod.htm:[/url]
#	audio/it	.it
#	audio/x-zipped-it	.itz
#	audio/xm	fasttracker modules
#	audio/x-s3m	screamtracker modules
#	audio/s3m	screamtracker modules
#	audio/x-zipped-mod	mdz
#	audio/mod	mod
#	audio/x-mod	All modules (mod, s3m, 669, mtm, med, xm, it, mdz, stm, itz, xmz, s3z)

# Taken from loader code from mikmod version 2.14
# by Steve McIntyre (stevem@chiark.greenend.org.uk)
# <doj@cubic.org> added title printing on 2003-06-24
0	string	MAS_UTrack_V00
>14	string	>/0		audio/x-mod
#audio/x-tracker-module

#0	string	UN05		MikMod UNI format module sound data

0	string	Extended\ Module: audio/x-mod
#audio/x-tracker-module
##>17	string	>\0		Title: "%s"

21	string/c	\!SCREAM!	audio/x-mod
#audio/x-screamtracker-module
21	string	BMOD2STM	audio/x-mod
#audio/x-screamtracker-module
1080	string	M.K.		audio/x-mod
#audio/x-protracker-module
#>0	string	>\0		Title: "%s"
1080	string	M!K!		audio/x-mod
#audio/x-protracker-module
#>0	string	>\0		Title: "%s"
1080	string	FLT4		audio/x-mod
#audio/x-startracker-module
#>0	string	>\0		Title: "%s"
1080	string	FLT8		audio/x-mod
#audio/x-startracker-module
#>0	string	>\0		Title: "%s"
1080	string	4CHN		audio/x-mod
#audio/x-fasttracker-module
#>0	string	>\0		Title: "%s"
1080	string	6CHN		audio/x-mod
#audio/x-fasttracker-module
#>0	string	>\0		Title: "%s"
1080	string	8CHN		audio/x-mod
#audio/x-fasttracker-module
#>0	string	>\0		Title: "%s"
1080	string	CD81		audio/x-mod
#audio/x-oktalyzer-tracker-module
#>0	string	>\0		Title: "%s"
1080	string	OKTA		audio/x-mod
#audio/x-oktalyzer-tracker-module
#>0	string	>\0		Title: "%s"
# Not good enough.
#1082	string	CH
#>1080	string	>/0		%.2s-channel Fasttracker "oktalyzer" module sound data
1080	string	16CN		audio/x-mod
#audio/x-taketracker-module
#>0	string	>\0		Title: "%s"
1080	string	32CN		audio/x-mod
#audio/x-taketracker-module
#>0	string	>\0		Title: "%s"

# Impuse tracker module (it)
0	string		IMPM		audio/x-mod
#>4	string		>\0		"%s"
#>40	leshort		!0		compatible w/ITv%x
#>42	leshort		!0		created w/ITv%x

#------------------------------------------------------------------------------
# end local stuff
#------------------------------------------------------------------------------

# xml based formats!

# svg

38	string		\<\!DOCTYPE\040svg	image/svg+xml


# xml
0	string		\<?xml			text/xml


#------------------------------------------------------------------------------
# Java

0	beshort		0xcafe
>2	beshort		0xbabe		application/java

#------------------------------------------------------------------------------
# audio:  file(1) magic for sound formats
#
# from Jan Nicolai Langfeldt <janl@ifi.uio.no>,
#

# Sun/NeXT audio data
0	string		.snd
>12	belong		1		audio/basic
>12	belong		2		audio/basic
>12	belong		3		audio/basic
>12	belong		4		audio/basic
>12	belong		5		audio/basic
>12	belong		6		audio/basic
>12	belong		7		audio/basic

>12	belong		23		audio/x-adpcm

# DEC systems (e.g. DECstation 5000) use a variant of the Sun/NeXT format
# that uses little-endian encoding and has a different magic number
# (0x0064732E in little-endian encoding).
0	lelong		0x0064732E
>12	lelong		1		audio/x-dec-basic
>12	lelong		2		audio/x-dec-basic
>12	lelong		3		audio/x-dec-basic
>12	lelong		4		audio/x-dec-basic
>12	lelong		5		audio/x-dec-basic
>12	lelong		6		audio/x-dec-basic
>12	lelong		7		audio/x-dec-basic
#                                       compressed (G.721 ADPCM)
>12	lelong		23		audio/x-dec-adpcm

# Bytes 0-3 of AIFF, AIFF-C, & 8SVX audio files are "FORM"
#					AIFF audio data
8	string		AIFF		audio/x-aiff
#					AIFF-C audio data
8	string		AIFC		audio/x-aiff
#					IFF/8SVX audio data
8	string		8SVX		audio/x-aiff



# Creative Labs AUDIO stuff
#					Standard MIDI data
0	string	MThd			audio/unknown
#>9 	byte	>0			(format %d)
#>11	byte	>1			using %d channels
#					Creative Music (CMF) data
0	string	CTMF			audio/unknown
#					SoundBlaster instrument data
0	string	SBI			audio/unknown
#					Creative Labs voice data
0	string	Creative\ Voice\ File	audio/unknown
## is this next line right?  it came this way...
#>19	byte	0x1A
#>23	byte	>0			- version %d
#>22	byte	>0			\b.%d

# [GRR 950115:  is this also Creative Labs?  Guessing that first line
#  should be string instead of unknown-endian long...]
#0	long		0x4e54524b	MultiTrack sound data
#0	string		NTRK		MultiTrack sound data
#>4	long		x		- version %ld

# Microsoft WAVE format (*.wav)
# [GRR 950115:  probably all of the shorts and longs should be leshort/lelong]
#					Microsoft RIFF
0	string		RIFF
#					- WAVE format
>8	string		WAVE		audio/x-wav
>8	string/B	AVI		video/x-msvideo
#
>8 	string		CDRA		image/x-coreldraw

# AAC (aka MPEG-2 NBC)
0       beshort&0xfff6    0xfff0          audio/X-HX-AAC-ADTS
0       string          ADIF            audio/X-HX-AAC-ADIF
0       beshort&0xffe0  0x56e0          audio/MP4A-LATM
0       beshort         0x4De1          audio/MP4A-LATM

# MPEG Layer 3 sound files
# modified by Joerg Jenderek
# GRR the original test are too common for many DOS files
# so test 1 <= kbits nibble <= E
0       beshort		&0xffe0		
>2	ubyte&0xF0	>0x0F		
>>2	ubyte&0xF0	<0xE1		audio/mpeg
#MP3 with ID3 tag
0	string		ID3		audio/mpeg
# Ogg/Vorbis
0	string		OggS		application/ogg

#------------------------------------------------------------------------------
# c-lang:  file(1) magic for C programs or various scripts
#

# XPM icons (Greg Roelofs, [email]newt@uchicago.edu[/email])
# ideally should go into "images", but entries below would tag XPM as C source
0	string		/*\ XPM		image/x-xpmi 7bit

# 3DS (3d Studio files) Conflicts with diff output 0x3d '='
#16	beshort		0x3d3d		image/x-3ds

# this first will upset you if you're a PL/1 shop... (are there any left?)
# in which case rm it; ascmagic will catch real C programs
#					C or REXX program text
#0	string		/*		text/x-c
#					C++ program text
#0	string		//		text/x-c++

#------------------------------------------------------------------------------
# commands:  file(1) magic for various shells and interpreters
#
#0       string          :\ shell archive or commands for antique kernel text
0       string          #!/bin/sh               application/x-shellscript
0       string          #!\ /bin/sh             application/x-shellscript
0       string          #!/bin/csh              application/x-shellscript
0       string          #!\ /bin/csh            application/x-shellscript
# korn shell magic, sent by George Wu, [email]gwu@clyde.att.com[/email]
0       string          #!/bin/ksh              application/x-shellscript
0       string          #!\ /bin/ksh            application/x-shellscript
0       string          #!/bin/tcsh             application/x-shellscript
0       string          #!\ /bin/tcsh           application/x-shellscript
0       string          #!/usr/local/tcsh       application/x-shellscript
0       string          #!\ /usr/local/tcsh     application/x-shellscript
0       string          #!/usr/local/bin/tcsh   application/x-shellscript
0       string          #!\ /usr/local/bin/tcsh application/x-shellscript
# bash shell magic, from Peter Tobias (tobias@server.et-inf.fho-emden.de)
0       string          #!/bin/bash     		application/x-shellscript
0       string          #!\ /bin/bash           application/x-shellscript
0       string          #!/usr/local/bin/bash   application/x-shellscript
0       string          #!\ /usr/local/bin/bash application/x-shellscript

#
# zsh/ash/ae/nawk/gawk magic from [email]cameron@cs.unsw.oz.au[/email] (Cameron Simpson)
0       string          #!/bin/zsh	        application/x-shellscript
0       string          #!/usr/bin/zsh	        application/x-shellscript
0       string          #!/usr/local/bin/zsh    application/x-shellscript
0       string          #!\ /usr/local/bin/zsh  application/x-shellscript
0       string          #!/usr/local/bin/ash    application/x-shellscript
0       string          #!\ /usr/local/bin/ash  application/x-shellscript
#0       string          #!/usr/local/bin/ae     Neil Brown's ae
#0       string          #!\ /usr/local/bin/ae   Neil Brown's ae
0       string          #!/bin/nawk             application/x-nawk
0       string          #!\ /bin/nawk           application/x-nawk
0       string          #!/usr/bin/nawk         application/x-nawk
0       string          #!\ /usr/bin/nawk       application/x-nawk
0       string          #!/usr/local/bin/nawk   application/x-nawk
0       string          #!\ /usr/local/bin/nawk application/x-nawk
0       string          #!/bin/gawk             application/x-gawk
0       string          #!\ /bin/gawk           application/x-gawk
0       string          #!/usr/bin/gawk         application/x-gawk
0       string          #!\ /usr/bin/gawk       application/x-gawk
0       string          #!/usr/local/bin/gawk   application/x-gawk
0       string          #!\ /usr/local/bin/gawk application/x-gawk
#
0       string          #!/bin/awk              application/x-awk
0       string          #!\ /bin/awk            application/x-awk
0       string          #!/usr/bin/awk          application/x-awk
0       string          #!\ /usr/bin/awk        application/x-awk
# update to distinguish from *.vcf files by Joerg Jenderek: joerg dot jenderek at web dot de
0	regex		BEGIN[[:space:]]*[{]	application/x-awk

# For Larry Wall's perl language.  The ``eval'' line recognizes an
# outrageously clever hack for USG systems.
#                               Keith Waclena <keith@cerberus.uchicago.edu>
0       string          #!/bin/perl                     application/x-perl
0       string          #!\ /bin/perl                   application/x-perl
0       string          eval\ "exec\ /bin/perl          application/x-perl
0       string          #!/usr/bin/perl                 application/x-perl
0       string          #!\ /usr/bin/perl               application/x-perl
0       string          eval\ "exec\ /usr/bin/perl      application/x-perl
0       string          #!/usr/local/bin/perl           application/x-perl
0       string          #!\ /usr/local/bin/perl         application/x-perl
0       string          eval\ "exec\ /usr/local/bin/perl application/x-perl

#------------------------------------------------------------------------------
# compress:  file(1) magic for pure-compression formats (no archives)
#
# compress, gzip, pack, compact, huf, squeeze, crunch, freeze, yabba, whap, etc.
#
# Formats for various forms of compressed data
# Formats for "compress" proper have been moved into "compress.c",
# because it tries to uncompress it to figure out what's inside.

# standard unix compress
0	string		\037\235	application/x-compress

# gzip (GNU zip, not to be confused with [Info-ZIP/PKWARE] zip archiver)
0       string          \037\213        application/x-gzip

0		string			PK\003\004		application/x-zip

# RAR archiver (Greg Roelofs, [email]newt@uchicago.edu[/email])
0	string		Rar!		application/x-rar

# According to gzip.h, this is the correct byte order for packed data.
0	string		\037\036	application/octet-stream
#
# This magic number is byte-order-independent.
#
0	short		017437		application/octet-stream

# XXX - why *two* entries for "compacted data", one of which is
# byte-order independent, and one of which is byte-order dependent?
#
# compacted data
0	short		0x1fff		application/octet-stream
0	string		\377\037	application/octet-stream
# huf output
0	short		0145405		application/octet-stream

# Squeeze and Crunch...
# These numbers were gleaned from the Unix versions of the programs to
# handle these formats.  Note that I can only uncrunch, not crunch, and
# I didn't have a crunched file handy, so the crunch number is untested.
#				Keith Waclena <keith@cerberus.uchicago.edu>
#0	leshort		0x76FF		squeezed data (CP/M, DOS)
#0	leshort		0x76FE		crunched data (CP/M, DOS)

# Freeze
#0	string		\037\237	Frozen file 2.1
#0	string		\037\236	Frozen file 1.0 (or gzip 0.5)

# lzh?
#0	string		\037\240	LZH compressed data

257	string		ustar\0		application/x-tar	posix
257	string		ustar\040\040\0		application/x-tar	gnu

0	short		070707		application/x-cpio
0	short		0143561		application/x-cpio	swapped

0	string		=<ar>		application/x-archive
0	string		\!<arch>	application/x-archive
>8	string		debian		application/x-debian-package

#------------------------------------------------------------------------------
#
# RPM: file(1) magic for Red Hat Packages   Erik Troan (ewt@redhat.com)
#
0       beshort         0xedab
>2      beshort         0xeedb          application/x-rpm

0	lelong&0x8080ffff	0x0000081a	application/x-arc	lzw
0	lelong&0x8080ffff	0x0000091a	application/x-arc	squashed
0	lelong&0x8080ffff	0x0000021a	application/x-arc	uncompressed
0	lelong&0x8080ffff	0x0000031a	application/x-arc	packed
0	lelong&0x8080ffff	0x0000041a	application/x-arc	squeezed
0	lelong&0x8080ffff	0x0000061a	application/x-arc	crunched

0	leshort	0xea60	application/x-arj

# LHARC/LHA archiver (Greg Roelofs, [email]newt@uchicago.edu[/email])
2	string	-lh0-	application/x-lharc	lh0
2	string	-lh1-	application/x-lharc	lh1
2	string	-lz4-	application/x-lharc	lz4
2	string	-lz5-	application/x-lharc	lz5
#	[never seen any but the last; -lh4- reported in comp.compression:]
2	string	-lzs-	application/x-lha	lzs
2	string	-lh\ -	application/x-lha	lh
2	string	-lhd-	application/x-lha	lhd
2	string	-lh2-	application/x-lha	lh2
2	string	-lh3-	application/x-lha	lh3
2	string	-lh4-	application/x-lha	lh4
2	string	-lh5-	application/x-lha	lh5
2	string	-lh6-	application/x-lha	lh6
2	string	-lh7-	application/x-lha	lh7
# Shell archives
10	string	#\ This\ is\ a\ shell\ archive	application/octet-stream	x-shell

#------------------------------------------------------------------------------
# frame:  file(1) magic for FrameMaker files
#
# This stuff came on a FrameMaker demo tape, most of which is
# copyright, but this file is "published" as witness the following:
#
0	string		\<MakerFile	application/x-frame
0	string		\<MIFFile	application/x-frame
0	string		\<MakerDictionary	application/x-frame
0	string		\<MakerScreenFon	application/x-frame
0	string		\<MML		application/x-frame
0	string		\<Book		application/x-frame
0	string		\<Maker		application/x-frame

#------------------------------------------------------------------------------
# html:  file(1) magic for HTML (HyperText Markup Language) docs
#
# from Daniel Quinlan <quinlan@yggdrasil.com>
#
0	string		\<HEAD	text/html
0	string		\<head	text/html
0	string		\<TITLE	text/html
0	string		\<title	text/html
0       string          \<html	text/html
0       string          \<HTML	text/html
0	string		\<!--	text/html
0	string		\<h1	text/html
0	string		\<H1	text/html
0	string/c	\<!doctype\ html	text/html

#------------------------------------------------------------------------------
# images:  file(1) magic for image formats (see also "c-lang" for XPM bitmaps)
#
# originally from [email]jef@helios.ee.lbl.gov[/email] (Jef Poskanzer),
# additions by [email]janl@ifi.uio.no[/email] as well as others. Jan also suggested
# merging several one- and two-line files into here.
#
# XXX - byte order for GIF and TIFF fields?
# [GRR:  TIFF allows both byte orders; GIF is probably little-endian]
#

# [GRR:  what the hell is this doing in here?]
#0	string		xbtoa		btoa'd file

# PBMPLUS
#					PBM file
0	string		P1		image/x-portable-bitmap	7bit
#					PGM file
0	string		P2		image/x-portable-greymap	7bit
#					PPM file
0	string		P3		image/x-portable-pixmap	7bit
#					PBM "rawbits" file
0	string		P4		image/x-portable-bitmap
#					PGM "rawbits" file
0	string		P5		image/x-portable-greymap
#					PPM "rawbits" file
0	string		P6		image/x-portable-pixmap

# NIFF (Navy Interchange File Format, a modification of TIFF)
# [GRR:  this *must* go before TIFF]
0	string		IIN1		image/x-niff

# TIFF and friends
#					TIFF file, big-endian
0	string		MM		image/tiff
#					TIFF file, little-endian
0	string		II		image/tiff

# possible GIF replacements; none yet released!
# (Greg Roelofs, [email]newt@uchicago.edu[/email])
#
# GRR 950115:  this was mine ("Zip GIF"):
#					ZIF image (GIF+deflate alpha)
0	string		GIF94z		image/unknown
#
# GRR 950115:  this is Jeremy Wohl's Free Graphics Format (better):
#					FGF image (GIF+deflate beta)
0	string		FGF95a		image/unknown
#
# GRR 950115:  this is Thomas Boutell's Portable Bitmap Format proposal
# (best; not yet implemented):
#					PBF image (deflate compression)
0	string		PBF		image/unknown

# GIF
0	string		GIF		image/gif

# JPEG images
0	beshort		0xffd8		image/jpeg

# PC bitmaps (OS/2, Windoze BMP files)  (Greg Roelofs, [email]newt@uchicago.edu[/email])
0	string		BM		image/bmp
#>14	byte		12		(OS/2 1.x format)
#>14	byte		64		(OS/2 2.x format)
#>14	byte		40		(Windows 3.x format)
#0	string		IC		icon
#0	string		PI		pointer
#0	string		CI		color icon
#0	string		CP		color pointer
#0	string		BA		bitmap array

# CDROM Filesystems
32769    string    CD001     application/x-iso9660

# Newer StuffIt archives (grant@netbsd.org)
0	string		StuffIt			application/x-stuffit
#>162	string		>0			: %s

# BinHex is the Macintosh ASCII-encoded file format (see also "apple")
# Daniel Quinlan, [email]quinlan@yggdrasil.com[/email]
11	string	must\ be\ converted\ with\ BinHex\ 4	application/mac-binhex40
##>41	string	x					\b, version %.3s


#------------------------------------------------------------------------------
# lisp:  file(1) magic for lisp programs
#
# various lisp types, from Daniel Quinlan (quinlan@yggdrasil.com)
0	string	;;			text/plain	8bit
# Emacs 18 - this is always correct, but not very magical.
0	string	\012(			application/x-elc
# Emacs 19
0	string	;ELC\023\000\000\000	application/x-elc

#------------------------------------------------------------------------------
# mail.news:  file(1) magic for mail and news
#
# There are tests to ascmagic.c to cope with mail and news.
0	string		Relay-Version: 	message/rfc822	7bit
0	string		#!\ rnews	message/rfc822	7bit
0	string		N#!\ rnews	message/rfc822	7bit
0	string		Forward\ to 	message/rfc822	7bit
0	string		Pipe\ to 	message/rfc822	7bit
0	string		Return-Path:	message/rfc822	7bit
0	string		Received:	message/rfc822
0	string		Path:		message/news	8bit
0	string		Xref:		message/news	8bit
0	string		From:		message/rfc822	7bit
0	string		Article 	message/news	8bit
#------------------------------------------------------------------------------
# msword: file(1) magic for MS Word files
#
# Contributor claims:
# Reversed-engineered MS Word magic numbers
#

0	string		\376\067\0\043			application/msword
# disable this one because it applies also to other
# Office/OLE documents for which msword is not correct. See PR#2608.
# from magic file of the apache
#0	string		\320\317\021\340\241\261	application/msword
512	string		\354\245\301			application/msword
0	string		\333\245-\0\0\0			application/msword



#------------------------------------------------------------------------------

Edited by debtboy, 05 September 2009 - 06:05 AM.
swap quote tags for code tags to save space


#11
ZekeDragon

ZekeDragon

    Writes binary right handed and hex left handed

  • Moderators
  • 2,103 posts
Wow, debtboy, could you put that into [noparse][code][/noparse] tags instead of [noparse][quote][/noparse] tags, so your post isn't monstrous?
Wow I changed my sig!

#12
debtboy

debtboy

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 916 posts
Good idea, swapped tags as requested.
Much more manageable now.
The actual files are way larger than what was posted.