Skip to main content

StructuredFile/Parse Task

Overview

The StructuredFile/Parse@1 task converts position-based fixed-width flat files into structured JSON format. This task is essential for processing legacy file formats commonly used in warehouse management systems, transportation, and EDI integrations.

Task Definition

task: "StructuredFile/Parse@1"
name: parseStructuredFile
inputs:
fileData: "AASNTBCHUSPSWAREHOUSE123 SHR123456 MBL1234567890..."
config: { ... } # Detailed configuration object
parseMode: "structured" # Optional: "flat" or "structured" (default: "flat")
encoding: "UTF-8" # Optional: Character encoding (default: "UTF-8")
skipEmptyLines: true # Optional: Skip empty lines (default: true)
trimFields: true # Optional: Trim whitespace from fields (default: true)
outputs:
- name: "parsedData"
mapping: "result"
- name: "recordCount"
mapping: "recordCount"
- name: "recordTypes"
mapping: "recordTypes"
- name: "parseErrors"
mapping: "errors"

Input Parameters

ParameterTypeRequiredDefaultDescription
fileDatastringYes-Raw flat file content to parse
configobjectYes-Configuration defining file structure and parsing rules
parseModestringNo"flat"Output format: "flat" returns array of records, "structured" returns hierarchical object
encodingstringNo"UTF-8"Character encoding of the input file
skipEmptyLinesbooleanNotrueWhether to skip empty lines during parsing
trimFieldsbooleanNotrueWhether to trim whitespace from extracted field values

Configuration Object

The configuration object is the core of the parser, defining how to interpret the flat file structure.

Basic Structure

config:
# Define output structure (for structured mode)
structured: true
structure: [...]

# Define record formats
records: [...]

# Global options
options:
includeFillers: false
includeRecordType: true
dateFormat: "YYYYMMDD"

Records Configuration

Each record type in the flat file must be defined in the records array:

records:
- id: "A" # Single character that identifies this record type
name: "header" # Logical name for the record type
description: "File header record" # Optional description
required: true # Whether this record must appear in valid files
minOccurrences: 1 # Minimum times this record must appear
maxOccurrences: 1 # Maximum times this record can appear
fields: [...] # Field definitions for this record

Field Configuration

Each field within a record is defined with precise positioning and optional transformations:

fields:
- name: "carrier_code" # Field name in output
description: "SCAC code" # Optional field description
start: 9 # Starting position (1-based)
length: 4 # Number of characters
type: "string" # Data type (see Field Types section)
required: true # Whether field must have a value
trim: true # Override global trim setting
padCharacter: " " # Character used for padding
padDirection: "left" # Padding direction: "left" or "right"
defaultValue: "" # Default if field is empty
skip: false # Whether to exclude from output
validation: # Optional validation rules
pattern: "^[A-Z]{4}$"
minLength: 4
maxLength: 4

Field Types

The parser supports several field types with automatic conversion:

String (default)

- name: "description"
start: 10
length: 30
type: "string"
transform: "uppercase" # Optional: "uppercase", "lowercase", "capitalize"

Number

- name: "quantity"
start: 40
length: 6
type: "number"
divisor: 100 # Divide extracted value by this amount
decimals: 2 # Number of decimal places
thousandSeparator: "," # Optional thousand separator
defaultValue: 0

Date

- name: "ship_date"
start: 10
length: 8
type: "date"
inputFormat: "YYYYMMDD" # Format in the file
outputFormat: "YYYY-MM-DD" # Desired output format
timezone: "UTC" # Optional timezone

Boolean

- name: "is_hazmat"
start: 50
length: 1
type: "boolean"
trueValues: ["Y", "1", "T"] # Values that represent true
falseValues: ["N", "0", "F"] # Values that represent false
defaultValue: false

Decimal/Currency

- name: "unit_price"
start: 60
length: 9
type: "decimal"
divisor: 1000 # Common for implied decimal places
precision: 2 # Decimal precision
currencySymbol: "$" # Optional currency symbol in output
format: "0,0.00" # Number format pattern

Structure Configuration (Hierarchical Output)

For parseMode: "structured", define how records relate to each other:

structure:
- type: "header" # Record type name
level: 0 # Nesting level (0 = root)
key: "header" # Property name in output
singleton: true # Only one instance expected

- type: "load"
level: 0
key: "loads"
collection: true # Multiple instances form an array

- type: "store"
level: 1
parent: "load" # Parent record type
collection: "stores" # Collection name within parent

- type: "carton"
level: 2
parent: "store"
collection: "cartons"

- type: "carton_content"
level: 3
parent: "carton"
collection: "items"

Complete Configuration Example

task: "StructuredFile/Parse@1"
name: parseShipmentManifest
inputs:
fileData: "{{ workflow.input.manifestFile }}"
parseMode: "structured"
config:
structured: true
structure:
- type: "header"
level: 0
key: "header"
singleton: true
- type: "load"
level: 0
key: "currentLoad"
singleton: true
- type: "store"
level: 1
parent: "load"
collection: "stores"
- type: "carton"
level: 2
parent: "store"
collection: "cartons"
- type: "carton_content"
level: 3
parent: "carton"
collection: "items"
- type: "store_totals"
level: 2
parent: "store"
key: "totals"
- type: "trailer_totals"
level: 1
parent: "load"
key: "totals"

records:
# Header Record
- id: "A"
name: "header"
description: "ASN file header"
required: true
maxOccurrences: 1
fields:
- name: "record_type"
start: 1
length: 1
type: "string"
defaultValue: "A"
skip: true
- name: "file_type"
start: 2
length: 3
type: "string"
validation:
pattern: "ASN"
- name: "batch_code"
start: 5
length: 4
type: "string"
- name: "carrier_code"
start: 9
length: 4
type: "string"
required: true
transform: "uppercase"
validation:
pattern: "^[A-Z]{4}$"
message: "Carrier code must be 4 uppercase letters"
- name: "origin_facility"
start: 13
length: 15
type: "string"
trim: true
- name: "shipper_reference"
start: 28
length: 10
type: "string"
- name: "master_bol"
start: 38
length: 30
type: "string"
trim: true

# Load/Trailer Record
- id: "T"
name: "load"
description: "Trailer/Load information"
required: true
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "load_date"
start: 2
length: 8
type: "date"
inputFormat: "YYYYMMDD"
outputFormat: "YYYY-MM-DD"
required: true
- name: "trailer_number"
start: 14
length: 12
type: "string"
trim: true
required: true
- name: "seal_number"
start: 26
length: 10
type: "string"
trim: true
- name: "ship_date"
start: 36
length: 8
type: "date"
inputFormat: "YYYYMMDD"
outputFormat: "YYYY-MM-DD"

# Store/Destination Record
- id: "B"
name: "store"
description: "Store destination information"
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "carrier_code"
start: 2
length: 4
type: "string"
transform: "uppercase"
- name: "pool_location"
start: 6
length: 9
type: "string"
trim: true
- name: "invoice_number"
start: 17
length: 6
type: "string"
- name: "bol_number"
start: 23
length: 12
type: "string"
trim: true

# Carton Record
- id: "C"
name: "carton"
description: "Individual carton/package"
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "tracking_number"
start: 2
length: 28
type: "string"
trim: true
required: true
validation:
minLength: 10
message: "Tracking number must be at least 10 characters"
- name: "weight"
start: 30
length: 7
type: "decimal"
divisor: 100
precision: 2
defaultValue: 0
- name: "is_signature_required"
start: 44
length: 1
type: "boolean"
trueValues: ["Y", "1"]
falseValues: ["N", "0", " "]
defaultValue: false
- name: "declared_value"
start: 68
length: 8
type: "decimal"
divisor: 1000
precision: 2
currencySymbol: "$"

# Carton Content Record
- id: "P"
name: "carton_content"
description: "Items within a carton"
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "style_number"
start: 2
length: 15
type: "string"
trim: true
- name: "sku"
start: 27
length: 13
type: "string"
trim: true
required: true
- name: "quantity"
start: 40
length: 6
type: "number"
divisor: 100000
decimals: 0
- name: "color"
start: 46
length: 30
type: "string"
trim: true
- name: "size"
start: 76
length: 6
type: "string"
trim: true
- name: "retail_price"
start: 82
length: 9
type: "decimal"
divisor: 1000
precision: 2
currencySymbol: "$"
- name: "item_type"
start: 100
length: 8
type: "string"
trim: true
- name: "description"
start: 108
length: 30
type: "string"
trim: true

# Store Totals Record
- id: "D"
name: "store_totals"
description: "Store-level summary"
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "total_units"
start: 2
length: 7
type: "number"
divisor: 100000
- name: "total_weight"
start: 9
length: 9
type: "decimal"
divisor: 1000
precision: 3

# Trailer Totals Record
- id: "E"
name: "trailer_totals"
description: "Trailer-level summary"
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "total_units"
start: 2
length: 7
type: "number"
divisor: 100000
- name: "total_weight"
start: 9
length: 9
type: "decimal"
divisor: 1000
precision: 3

options:
includeFillers: false # Don't include FILLER fields
includeRecordType: false # Don't include record type in output
strictValidation: true # Enforce all validation rules
continueOnError: false # Stop on first error

outputs:
- name: "manifest"
mapping: "result"
- name: "recordCounts"
mapping: "recordTypes"
- name: "hasErrors"
mapping: "errors[0] != null"

Output Examples

Flat Mode Output

{
"result": [
{
"_type": "header",
"file_type": "ASN",
"batch_code": "TBCH",
"carrier_code": "USPS",
"origin_facility": "WAREHOUSE123",
"shipper_reference": "SHR123456",
"master_bol": "MBL1234567890"
},
{
"_type": "load",
"load_date": "2024-01-15",
"trailer_number": "TRL123456",
"seal_number": "SEAL789",
"ship_date": "2024-01-15"
},
{
"_type": "store",
"carrier_code": "USPS",
"pool_location": "TXPOOL",
"invoice_number": "INV123",
"bol_number": "BOL12345678"
}
],
"recordCount": 7,
"recordTypes": {
"header": 1,
"load": 1,
"store": 1,
"carton": 1,
"carton_content": 1,
"store_totals": 1,
"trailer_totals": 1
},
"errors": []
}

Structured Mode Output

{
"result": {
"header": {
"_type": "header",
"file_type": "ASN",
"batch_code": "TBCH",
"carrier_code": "USPS",
"origin_facility": "WAREHOUSE123",
"shipper_reference": "SHR123456",
"master_bol": "MBL1234567890"
},
"currentLoad": {
"_type": "load",
"load_date": "2024-01-15",
"trailer_number": "TRL123456",
"seal_number": "SEAL789",
"ship_date": "2024-01-15",
"stores": [
{
"_type": "store",
"carrier_code": "USPS",
"pool_location": "TXPOOL",
"invoice_number": "INV123",
"bol_number": "BOL12345678",
"cartons": [
{
"_type": "carton",
"tracking_number": "00123456789012345678901234567890",
"weight": 12.34,
"is_signature_required": true,
"declared_value": 12.35,
"items": [
{
"_type": "carton_content",
"style_number": "STYLE123456789",
"sku": "MSK1234567890",
"quantity": 1,
"color": "BLACK",
"size": "M",
"retail_price": 12.35,
"item_type": "CTNBOX",
"description": "Men's T-Shirt Black"
}
]
}
],
"totals": {
"_type": "store_totals",
"total_units": 1,
"total_weight": 9.876
}
}
],
"totals": {
"_type": "trailer_totals",
"total_units": 10,
"total_weight": 98.765
}
}
},
"recordCount": 7,
"recordTypes": {
"header": 1,
"load": 1,
"store": 1,
"carton": 1,
"carton_content": 1,
"store_totals": 1,
"trailer_totals": 1
},
"errors": []
}

Error Handling

The parser returns detailed error information when issues are encountered:

{
"errors": [
{
"line": 3,
"record": "store",
"field": "carrier_code",
"message": "Carrier code must be 4 uppercase letters",
"value": "ups",
"position": {
"start": 2,
"end": 5
}
},
{
"line": 5,
"record": "carton",
"field": "tracking_number",
"message": "Required field is empty",
"position": {
"start": 2,
"end": 29
}
}
]
}

Advanced Configuration Options

Conditional Fields

Define fields that are only parsed based on conditions:

fields:
- name: "hazmat_code"
start: 100
length: 4
type: "string"
condition:
field: "is_hazmat"
operator: "equals"
value: true

Computed Fields

Add fields calculated from other fields:

fields:
- name: "total_value"
computed: true
expression: "quantity * retail_price"
type: "decimal"
precision: 2

Field Groups

Group related fields for cleaner output:

fields:
- name: "dimensions"
type: "group"
fields:
- name: "length"
start: 50
length: 5
type: "number"
divisor: 10
- name: "width"
start: 55
length: 5
type: "number"
divisor: 10
- name: "height"
start: 60
length: 5
type: "number"
divisor: 10

Custom Transformations

Apply custom transformations using expressions:

fields:
- name: "status_code"
start: 80
length: 2
type: "string"
transform:
type: "map"
mapping:
"01": "pending"
"02": "in_transit"
"03": "delivered"
"04": "exception"
default: "unknown"

Best Practices

  1. Start Position Accuracy: Always use 1-based positioning as specified in file documentation
  2. Field Length Validation: Ensure field lengths match exactly to avoid data bleeding
  3. Type Safety: Use appropriate field types for automatic conversion and validation
  4. Error Handling: Implement proper error handling for malformed records
  5. Testing: Test with various file samples including edge cases
  6. Performance: For large files, consider streaming parse options
  7. Documentation: Document custom formats and maintain sample files

Common Issues and Solutions

Issue: Overlapping Fields

# Wrong - Fields overlap
- name: "field1"
start: 10
length: 5 # Ends at position 14
- name: "field2"
start: 14 # Starts at position 14 - overlap!
length: 3

# Correct
- name: "field1"
start: 10
length: 5 # Ends at position 14
- name: "field2"
start: 15 # Starts at position 15
length: 3

Issue: Incorrect Divisor for Implied Decimals

# File contains: "000123" representing $1.23

# Wrong
- name: "amount"
type: "decimal"
divisor: 10 # Results in 12.3

# Correct
- name: "amount"
type: "decimal"
divisor: 100 # Results in 1.23

Issue: Date Parsing Failures

# Handle various date formats
- name: "ship_date"
start: 10
length: 8
type: "date"
inputFormat: "YYYYMMDD"
outputFormat: "YYYY-MM-DD"
defaultValue: null # Return null for invalid dates
validation:
allowEmpty: true # Allow empty date fields