charset

Character set detection, decoding, and encoding streams.

Install

npm install @datastream/charset

charsetDetectStream PassThrough

Detects the character encoding of the data passing through by analyzing byte patterns.

Options

OptionTypeDefaultDescription
resultKeystring"charset"Key in pipeline result

Result

Returns the most likely charset with confidence score:

{ charset: 'UTF-8', confidence: 80 }

Supported charsets

UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, Shift_JIS, ISO-2022-JP, ISO-2022-CN, ISO-2022-KR, GB18030, EUC-JP, EUC-KR, Big5, ISO-8859-1, ISO-8859-2, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, windows-1250, windows-1251, windows-1252, windows-1254, windows-1256, KOI8-R

Example

import { pipeline } from '@datastream/core'
import { fileReadStream } from '@datastream/file'
import { charsetDetectStream, charsetDecodeStream } from '@datastream/charset'

const detect = charsetDetectStream()

const result = await pipeline([
  fileReadStream({ path: './data.csv' }),
  detect,
])

console.log(result.charset)
// { charset: 'UTF-8', confidence: 80 }

charsetDecodeStream Transform

Decodes binary data to text using the specified character encoding. Uses TextDecoderStream internally.

Options

OptionTypeDefaultDescription
charsetstringCharacter encoding name (e.g. "UTF-8", "ISO-8859-1")

Example

import { charsetDecodeStream } from '@datastream/charset'

charsetDecodeStream({ charset: 'ISO-8859-1' })

charsetEncodeStream Transform

Encodes text to binary using the specified character encoding. Uses TextEncoderStream internally.

Options

OptionTypeDefaultDescription
charsetstringCharacter encoding name

Example

import { charsetEncodeStream } from '@datastream/charset'

charsetEncodeStream({ charset: 'UTF-8' })