StructuredFile/Parse Task
Overview
The StructuredFile/Parse@1 task converts position-based fixed-width flat files into structured JSON format. This task is essential for processing legacy file formats commonly used in warehouse management systems, transportation, and EDI integrations.
Task Definition
task: "StructuredFile/Parse@1"
name: parseStructuredFile
inputs:
fileData: "AASNTBCHUSPSWAREHOUSE123 SHR123456 MBL1234567890..."
config: { ... } # Detailed configuration object
parseMode: "structured" # Optional: "flat" or "structured" (default: "flat")
encoding: "UTF-8" # Optional: Character encoding (default: "UTF-8")
skipEmptyLines: true # Optional: Skip empty lines (default: true)
trimFields: true # Optional: Trim whitespace from fields (default: true)
outputs:
- name: "parsedData"
mapping: "result"
- name: "recordCount"
mapping: "recordCount"
- name: "recordTypes"
mapping: "recordTypes"
- name: "parseErrors"
mapping: "errors"
Input Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
fileData | string | Yes | - | Raw flat file content to parse |
config | object | Yes | - | Configuration defining file structure and parsing rules |
parseMode | string | No | "flat" | Output format: "flat" returns array of records, "structured" returns hierarchical object |
encoding | string | No | "UTF-8" | Character encoding of the input file |
skipEmptyLines | boolean | No | true | Whether to skip empty lines during parsing |
trimFields | boolean | No | true | Whether to trim whitespace from extracted field values |
Configuration Object
The configuration object is the core of the parser, defining how to interpret the flat file structure.
Basic Structure
config:
# Define output structure (for structured mode)
structured: true
structure: [...]
# Define record formats
records: [...]
# Global options
options:
includeFillers: false
includeRecordType: true
dateFormat: "YYYYMMDD"
Records Configuration
Each record type in the flat file must be defined in the records
array:
records:
- id: "A" # Single character that identifies this record type
name: "header" # Logical name for the record type
description: "File header record" # Optional description
required: true # Whether this record must appear in valid files
minOccurrences: 1 # Minimum times this record must appear
maxOccurrences: 1 # Maximum times this record can appear
fields: [...] # Field definitions for this record
Field Configuration
Each field within a record is defined with precise positioning and optional transformations:
fields:
- name: "carrier_code" # Field name in output
description: "SCAC code" # Optional field description
start: 9 # Starting position (1-based)
length: 4 # Number of characters
type: "string" # Data type (see Field Types section)
required: true # Whether field must have a value
trim: true # Override global trim setting
padCharacter: " " # Character used for padding
padDirection: "left" # Padding direction: "left" or "right"
defaultValue: "" # Default if field is empty
skip: false # Whether to exclude from output
validation: # Optional validation rules
pattern: "^[A-Z]{4}$"
minLength: 4
maxLength: 4
Field Types
The parser supports several field types with automatic conversion:
String (default)
- name: "description"
start: 10
length: 30
type: "string"
transform: "uppercase" # Optional: "uppercase", "lowercase", "capitalize"
Number
- name: "quantity"
start: 40
length: 6
type: "number"
divisor: 100 # Divide extracted value by this amount
decimals: 2 # Number of decimal places
thousandSeparator: "," # Optional thousand separator
defaultValue: 0
Date
- name: "ship_date"
start: 10
length: 8
type: "date"
inputFormat: "YYYYMMDD" # Format in the file
outputFormat: "YYYY-MM-DD" # Desired output format
timezone: "UTC" # Optional timezone
Boolean
- name: "is_hazmat"
start: 50
length: 1
type: "boolean"
trueValues: ["Y", "1", "T"] # Values that represent true
falseValues: ["N", "0", "F"] # Values that represent false
defaultValue: false
Decimal/Currency
- name: "unit_price"
start: 60
length: 9
type: "decimal"
divisor: 1000 # Common for implied decimal places
precision: 2 # Decimal precision
currencySymbol: "$" # Optional currency symbol in output
format: "0,0.00" # Number format pattern
Structure Configuration (Hierarchical Output)
For parseMode: "structured"
, define how records relate to each other:
structure:
- type: "header" # Record type name
level: 0 # Nesting level (0 = root)
key: "header" # Property name in output
singleton: true # Only one instance expected
- type: "load"
level: 0
key: "loads"
collection: true # Multiple instances form an array
- type: "store"
level: 1
parent: "load" # Parent record type
collection: "stores" # Collection name within parent
- type: "carton"
level: 2
parent: "store"
collection: "cartons"
- type: "carton_content"
level: 3
parent: "carton"
collection: "items"
Complete Configuration Example
task: "StructuredFile/Parse@1"
name: parseShipmentManifest
inputs:
fileData: "{{ workflow.input.manifestFile }}"
parseMode: "structured"
config:
structured: true
structure:
- type: "header"
level: 0
key: "header"
singleton: true
- type: "load"
level: 0
key: "currentLoad"
singleton: true
- type: "store"
level: 1
parent: "load"
collection: "stores"
- type: "carton"
level: 2
parent: "store"
collection: "cartons"
- type: "carton_content"
level: 3
parent: "carton"
collection: "items"
- type: "store_totals"
level: 2
parent: "store"
key: "totals"
- type: "trailer_totals"
level: 1
parent: "load"
key: "totals"
records:
# Header Record
- id: "A"
name: "header"
description: "ASN file header"
required: true
maxOccurrences: 1
fields:
- name: "record_type"
start: 1
length: 1
type: "string"
defaultValue: "A"
skip: true
- name: "file_type"
start: 2
length: 3
type: "string"
validation:
pattern: "ASN"
- name: "batch_code"
start: 5
length: 4
type: "string"
- name: "carrier_code"
start: 9
length: 4
type: "string"
required: true
transform: "uppercase"
validation:
pattern: "^[A-Z]{4}$"
message: "Carrier code must be 4 uppercase letters"
- name: "origin_facility"
start: 13
length: 15
type: "string"
trim: true
- name: "shipper_reference"
start: 28
length: 10
type: "string"
- name: "master_bol"
start: 38
length: 30
type: "string"
trim: true
# Load/Trailer Record
- id: "T"
name: "load"
description: "Trailer/Load information"
required: true
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "load_date"
start: 2
length: 8
type: "date"
inputFormat: "YYYYMMDD"
outputFormat: "YYYY-MM-DD"
required: true
- name: "trailer_number"
start: 14
length: 12
type: "string"
trim: true
required: true
- name: "seal_number"
start: 26
length: 10
type: "string"
trim: true
- name: "ship_date"
start: 36
length: 8
type: "date"
inputFormat: "YYYYMMDD"
outputFormat: "YYYY-MM-DD"
# Store/Destination Record
- id: "B"
name: "store"
description: "Store destination information"
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "carrier_code"
start: 2
length: 4
type: "string"
transform: "uppercase"
- name: "pool_location"
start: 6
length: 9
type: "string"
trim: true
- name: "invoice_number"
start: 17
length: 6
type: "string"
- name: "bol_number"
start: 23
length: 12
type: "string"
trim: true
# Carton Record
- id: "C"
name: "carton"
description: "Individual carton/package"
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "tracking_number"
start: 2
length: 28
type: "string"
trim: true
required: true
validation:
minLength: 10
message: "Tracking number must be at least 10 characters"
- name: "weight"
start: 30
length: 7
type: "decimal"
divisor: 100
precision: 2
defaultValue: 0
- name: "is_signature_required"
start: 44
length: 1
type: "boolean"
trueValues: ["Y", "1"]
falseValues: ["N", "0", " "]
defaultValue: false
- name: "declared_value"
start: 68
length: 8
type: "decimal"
divisor: 1000
precision: 2
currencySymbol: "$"
# Carton Content Record
- id: "P"
name: "carton_content"
description: "Items within a carton"
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "style_number"
start: 2
length: 15
type: "string"
trim: true
- name: "sku"
start: 27
length: 13
type: "string"
trim: true
required: true
- name: "quantity"
start: 40
length: 6
type: "number"
divisor: 100000
decimals: 0
- name: "color"
start: 46
length: 30
type: "string"
trim: true
- name: "size"
start: 76
length: 6
type: "string"
trim: true
- name: "retail_price"
start: 82
length: 9
type: "decimal"
divisor: 1000
precision: 2
currencySymbol: "$"
- name: "item_type"
start: 100
length: 8
type: "string"
trim: true
- name: "description"
start: 108
length: 30
type: "string"
trim: true
# Store Totals Record
- id: "D"
name: "store_totals"
description: "Store-level summary"
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "total_units"
start: 2
length: 7
type: "number"
divisor: 100000
- name: "total_weight"
start: 9
length: 9
type: "decimal"
divisor: 1000
precision: 3
# Trailer Totals Record
- id: "E"
name: "trailer_totals"
description: "Trailer-level summary"
fields:
- name: "record_type"
start: 1
length: 1
skip: true
- name: "total_units"
start: 2
length: 7
type: "number"
divisor: 100000
- name: "total_weight"
start: 9
length: 9
type: "decimal"
divisor: 1000
precision: 3
options:
includeFillers: false # Don't include FILLER fields
includeRecordType: false # Don't include record type in output
strictValidation: true # Enforce all validation rules
continueOnError: false # Stop on first error
outputs:
- name: "manifest"
mapping: "result"
- name: "recordCounts"
mapping: "recordTypes"
- name: "hasErrors"
mapping: "errors[0] != null"
Output Examples
Flat Mode Output
{
"result": [
{
"_type": "header",
"file_type": "ASN",
"batch_code": "TBCH",
"carrier_code": "USPS",
"origin_facility": "WAREHOUSE123",
"shipper_reference": "SHR123456",
"master_bol": "MBL1234567890"
},
{
"_type": "load",
"load_date": "2024-01-15",
"trailer_number": "TRL123456",
"seal_number": "SEAL789",
"ship_date": "2024-01-15"
},
{
"_type": "store",
"carrier_code": "USPS",
"pool_location": "TXPOOL",
"invoice_number": "INV123",
"bol_number": "BOL12345678"
}
],
"recordCount": 7,
"recordTypes": {
"header": 1,
"load": 1,
"store": 1,
"carton": 1,
"carton_content": 1,
"store_totals": 1,
"trailer_totals": 1
},
"errors": []
}
Structured Mode Output
{
"result": {
"header": {
"_type": "header",
"file_type": "ASN",
"batch_code": "TBCH",
"carrier_code": "USPS",
"origin_facility": "WAREHOUSE123",
"shipper_reference": "SHR123456",
"master_bol": "MBL1234567890"
},
"currentLoad": {
"_type": "load",
"load_date": "2024-01-15",
"trailer_number": "TRL123456",
"seal_number": "SEAL789",
"ship_date": "2024-01-15",
"stores": [
{
"_type": "store",
"carrier_code": "USPS",
"pool_location": "TXPOOL",
"invoice_number": "INV123",
"bol_number": "BOL12345678",
"cartons": [
{
"_type": "carton",
"tracking_number": "00123456789012345678901234567890",
"weight": 12.34,
"is_signature_required": true,
"declared_value": 12.35,
"items": [
{
"_type": "carton_content",
"style_number": "STYLE123456789",
"sku": "MSK1234567890",
"quantity": 1,
"color": "BLACK",
"size": "M",
"retail_price": 12.35,
"item_type": "CTNBOX",
"description": "Men's T-Shirt Black"
}
]
}
],
"totals": {
"_type": "store_totals",
"total_units": 1,
"total_weight": 9.876
}
}
],
"totals": {
"_type": "trailer_totals",
"total_units": 10,
"total_weight": 98.765
}
}
},
"recordCount": 7,
"recordTypes": {
"header": 1,
"load": 1,
"store": 1,
"carton": 1,
"carton_content": 1,
"store_totals": 1,
"trailer_totals": 1
},
"errors": []
}
Error Handling
The parser returns detailed error information when issues are encountered:
{
"errors": [
{
"line": 3,
"record": "store",
"field": "carrier_code",
"message": "Carrier code must be 4 uppercase letters",
"value": "ups",
"position": {
"start": 2,
"end": 5
}
},
{
"line": 5,
"record": "carton",
"field": "tracking_number",
"message": "Required field is empty",
"position": {
"start": 2,
"end": 29
}
}
]
}
Advanced Configuration Options
Conditional Fields
Define fields that are only parsed based on conditions:
fields:
- name: "hazmat_code"
start: 100
length: 4
type: "string"
condition:
field: "is_hazmat"
operator: "equals"
value: true
Computed Fields
Add fields calculated from other fields:
fields:
- name: "total_value"
computed: true
expression: "quantity * retail_price"
type: "decimal"
precision: 2
Field Groups
Group related fields for cleaner output:
fields:
- name: "dimensions"
type: "group"
fields:
- name: "length"
start: 50
length: 5
type: "number"
divisor: 10
- name: "width"
start: 55
length: 5
type: "number"
divisor: 10
- name: "height"
start: 60
length: 5
type: "number"
divisor: 10
Custom Transformations
Apply custom transformations using expressions:
fields:
- name: "status_code"
start: 80
length: 2
type: "string"
transform:
type: "map"
mapping:
"01": "pending"
"02": "in_transit"
"03": "delivered"
"04": "exception"
default: "unknown"
Best Practices
- Start Position Accuracy: Always use 1-based positioning as specified in file documentation
- Field Length Validation: Ensure field lengths match exactly to avoid data bleeding
- Type Safety: Use appropriate field types for automatic conversion and validation
- Error Handling: Implement proper error handling for malformed records
- Testing: Test with various file samples including edge cases
- Performance: For large files, consider streaming parse options
- Documentation: Document custom formats and maintain sample files
Common Issues and Solutions
Issue: Overlapping Fields
# Wrong - Fields overlap
- name: "field1"
start: 10
length: 5 # Ends at position 14
- name: "field2"
start: 14 # Starts at position 14 - overlap!
length: 3
# Correct
- name: "field1"
start: 10
length: 5 # Ends at position 14
- name: "field2"
start: 15 # Starts at position 15
length: 3
Issue: Incorrect Divisor for Implied Decimals
# File contains: "000123" representing $1.23
# Wrong
- name: "amount"
type: "decimal"
divisor: 10 # Results in 12.3
# Correct
- name: "amount"
type: "decimal"
divisor: 100 # Results in 1.23
Issue: Date Parsing Failures
# Handle various date formats
- name: "ship_date"
start: 10
length: 8
type: "date"
inputFormat: "YYYYMMDD"
outputFormat: "YYYY-MM-DD"
defaultValue: null # Return null for invalid dates
validation:
allowEmpty: true # Allow empty date fields