External Publication
Visit Post

Zero Variance: Proving COBOL-to-Java Semantic Equivalence with a Live Mainframe Emulator on AWS

DEV Community [Unofficial] June 16, 2026
Source

By Banu Parasuraman

Distinguished Engineer | Account CTO | Mphasis

"The mainframe isn't going away. But the teams who understand it are. The window to modernize safely — before institutional knowledge walks out the door — is closing."

Introduction

Mainframe modernization is one of the most consequential — and most feared — programs in enterprise IT. Billions of dollars of financial transactions flow daily through COBOL batch jobs that have been running, largely unchanged, for three to four decades. The fear is rational: these systems work. The risk of breaking them during migration is real, and the consequences in financial services are severe.

The standard industry answer to this risk is the parallel run : run the old system and the new system simultaneously, compare their outputs, and only cut over when you can prove the new system produces identical results. In theory, elegant. In practice, parallel runs are notoriously difficult to instrument, especially when the legacy system is a mainframe batch job with VSAM files, JES2 job scheduling, and OS/VS COBOL business logic that predates most of its current operators.

This article documents a working parallel run proof-of-concept I built to address exactly this challenge. The goal was to answer a single engineering question with evidence, not architecture diagrams:

Can we prove, at the record level, that a Java Spring Batch job produces identical NAV calculations to an OS/VS COBOL batch job running on a real S/370 mainframe — before we touch a single line of production code?

The answer is yes. And it runs on AWS EC2.

The Technical Challenge

The legacy system in scope is a COBOL batch orchestration layer that drives a commercial financial platform. The batch processes position data — fund holdings, security quantities, and net asset values — through a series of computational waves, writing intermediate state to VSAM KSDS datasets and committing final results to an Oracle database.

The modernization target is Java 17 with Spring Batch 5.x, reading from the same Oracle input, computing the same NAV calculations, and writing to a parallel Oracle schema. The reconciliation question is simple: does COBOL_NAV == JAVA_NAV for every position, for every fund, for every batch run?

To answer that question credibly, you need a real mainframe running real COBOL, not a simulator or a translation layer. That's where SDL Hercules comes in.

The Architecture

The POC runs on three AWS components connected by a private VPC:

┌─────────────────────────────────────┐
│  LEGACY LANE  (EC2 m5.xlarge)       │
│                                     │
│  SDL Hyperion 4.9.1                 │
│  MVS 3.8j / JES2                    │
│  OS/VS COBOL (IKFCBL00)             │
│  VSAM KSDS position store           │
│  Card reader socket :3505           │
└──────────────────┬──────────────────┘
                   │ Oracle write
                   ▼
┌─────────────────────────────────────┐     ┌─────────────────────────┐
│  MODERNIZATION LANE  (EC2 m5.large) │     │  Amazon RDS 19c         │
│                                     │     │                         │
│  Java 17 / Spring Boot 3.2          │────►│  cobol_pos.positions    │
│  Spring Batch 5.x                   │     │  java_pos.positions     │
│  NavCalculationProcessor            │     │  recon.parallel_run_diff│
│  OraclePositionWriter               │     │  admin.position_seed    │
└─────────────────────────────────────┘     └─────────────────────────┘

The key design principle: Java never touches VSAM. VSAM is internal to the COBOL batch — it is the scratchpad between waves. Java reads from the same Oracle input that COBOL reads, computes NAV independently, and writes to a separate Oracle schema. The reconciliation view diffs the two schemas. When variance is zero across a full batch window, the cutover flag flips.

This is architecturally honest. In production, COBOL writes to VSAM as intermediate working storage and commits final results to Oracle. Java replaces that entire pipeline — VSAM disappears when the last COBOL job retires, replaced by Spring Batch's in-memory step context and database row locking.

Building the Mainframe Emulator Layer

Why SDL Hyperion, Not Docker

The first architectural decision was how to run the mainframe emulator. Docker images exist for Hercules + MVS TK4-, and they work. But for this use case — demonstrating parallel run credibility to senior engineering stakeholders — a bare-metal build is more defensible. It shows the emulator is running the actual hardware instruction set, not a pre-packaged black box.

SDL Hyperion 4.9.1 was built from source on Ubuntu 22.04 with the SoftFloat-3a external package for IEEE 754 floating point emulation. The build sequence:

# Clone SDL Hyperion
git clone https://github.com/SDL-Hercules-390/hyperion.git
cd hercules

# Build external packages (SoftFloat, crypto, telnet)
# Critical: use absolute paths in extpkgs.sh.ini — tilde ~ does not expand
./extpkgs.sh c d s t

# Configure and build
./configure --enable-extpkgs=/home/ubuntu/mainframe/extpkgs/install
make -j$(nproc)
sudo make install
sudo ldconfig

Build time on an m5.xlarge is approximately 20 minutes. The result is Hercules 4.9.1 with full z/Architecture support, CCKD DASD emulation, and TN3270 terminal support.

MVS 3.8j TK4- as the Guest OS

MVS 3.8j TK4- Update 08 is the de facto standard free MVS distribution, maintained by the Tur(n)key project. It includes JES2, RACF security (via RAKF), TSO/ISPF, the OS/VS COBOL compiler (IKFCBL00), IDCAMS, and a full set of system utilities.

The primary mirror (ETH Zurich) was unavailable at time of build; the distribution was cloned from a GitHub mirror. The bundled Hercules binary was removed and replaced with our SDL 4.9.1 build.

Auto-IPL as a Systemd Service

For a POC that needs to survive EC2 reboots, the emulator must start automatically. The solution is a systemd service with an auto-IPL script:

# /etc/systemd/system/hercules-mvs.service
[Service]
User=ubuntu
WorkingDirectory=/home/ubuntu/mainframe/tk4/tk4-_v1.00_current
Environment=MODE=CONSOLE
Environment=RDRPORT=3505
ExecStart=/usr/local/bin/hercules -f conf/tk4-.cnf -r scripts/auto_ipl.rc

The auto_ipl.rc script pauses for device initialization, IPLs from volume 148 (ipl 148), and responds to the MVS system parameter prompt (/). MVS boots in approximately 20 seconds post-EC2-start, JES2 initializes, and the card reader socket on port 3505 begins accepting JCL.

Lesson learned: MVS 3.8j's IPL device address is 148, not 150 as some documentation suggests. The COBUCG procedure lives in SYS2.PROCLIB (on PUB000), not SYS1.PROCLIB (on MVSRES) — a detail that matters when patching compiler parameters.

The COBOL Batch Program

OS/VS COBOL Dialect Constraints

OS/VS COBOL (compiler IKFCBL00) predates COBOL 85 by several years. Several features that modern COBOL programmers take for granted are not available:

Feature Status in IKFCBL00
ORGANIZATION IS INDEXED (VSAM) Not supported — causes S0C4
PERFORM ... END-PERFORM inline Not supported — use paragraph PERFORM
Substring reference modification (1:2) Not supported — use REDEFINES
ASSIGN TO ddname for sequential Must use UT-S-ddname format

The VSAM limitation is the most significant. The standard expectation — that COBOL writes directly to a VSAM KSDS using ORGANIZATION IS INDEXED — does not work in this compiler version. The correct pattern for this dialect is the two-step approach :

COBOL writes fixed-length records to a sequential temp dataset (&&POS)
IDCAMS REPRO loads &&POS into the VSAM KSDS

This is not a POC workaround — it is architecturally accurate. In production mainframe shops, COBOL batch jobs routinely write to intermediate sequential datasets that are then processed by IDCAMS or sort utilities. The two-step pattern reflects real-world practice.

The COBUCG Proc BUF Parameter

The most time-consuming debugging issue was the IKF0015I-C BUF PARM TOO SMALL FOR DD-CARD BLKSIZES compiler abandonment error. The root cause: the COBUCG procedure in SYS2.PROCLIB had a BUF=1024K parameter that was too small for the SYSUT work file block sizes, and the SYSUT DDs used SPACE=(460,...) — a value the compiler interprets as blocksize.

The fix required three changes to the COBUCG proc via IEBUPDTE:

  1. Increase BUF=1024K to BUF=4096K
  2. Change SYSUT SPACE from (460,...) to (800,...)
  3. Change SYSLIN SPACE from (80,...) to (800,...)

And specify BUF=4096 in the PARM.COB override in the calling JCL, since PARM.COB appends to the proc's PARM rather than replacing it.

The Final JCL Pattern

//IVCOBOL  JOB (ACCT),'POC JOB',CLASS=A,MSGCLASS=A,
//         USER=HERC01,PASSWORD=CUL8TR
//STEP1    EXEC COBUCG,PARM.COB='SUPMAP,NOSEQ,NOTRUNC,BUF=4096'
//COB.SYSLIB DD DSN=SYS1.COBLIB,DISP=SHR
//COB.SYSIN DD *
       ... COBOL source ...
       SELECT POS-FILE ASSIGN TO UT-S-POSOUT
       ORGANIZATION IS SEQUENTIAL.
       ...
       WRITE POS-RECORD
/*
//GO.POSOUT DD DSN=&&POS,DISP=(NEW,PASS),UNIT=SYSDA,
//             SPACE=(TRK,(5,2)),DCB=(RECFM=FB,LRECL=85,BLKSIZE=850)
//GO.SYSOUT DD SYSOUT=A
//STEP2    EXEC PGM=IDCAMS,COND=(0,NE)
//SYSPRINT DD SYSOUT=A
//SEQIN    DD DSN=&&POS,DISP=(OLD,DELETE)
//VSAMOUT  DD DSN=HERC01.IVPOS,DISP=SHR
//SYSIN    DD *
  REPRO INFILE(SEQIN) OUTFILE(VSAMOUT)
/*
//STEP3    EXEC PGM=IDCAMS,COND=(0,NE)
//SYSPRINT DD SYSOUT=A
//SYSIN    DD *
  LISTCAT ENTRIES(HERC01.IVPOS) ALL
/*

Final job result:

STEP1  COB   IKFCBL00  RC= 0000
STEP1  GO    LOADER    RC= 0000
STEP2        IDCAMS    RC= 0000
STEP3        IDCAMS    RC= 0000
IDC0005I NUMBER OF RECORDS PROCESSED WAS 6

VSAM KSDS Configuration

The VSAM dataset was defined with the UNIQUE parameter — critical in MVS 3.8j when VSAM suballocation pools are exhausted or uncatalogued:

DEFINE CLUSTER (NAME(HERC01.IVPOS)
    INDEXED
    KEYS(8 0)
    RECORDSIZE(85 85)
    VOLUMES(PUB000)
    CYLINDERS(1 1)
    UNIQUE)
DATA (NAME(HERC01.IVPOS.DATA))
INDEX (NAME(HERC01.IVPOS.INDEX))
CATALOG(SYS1.UCAT.TSO)

UNIQUE allocates space directly on the DASD volume without requiring a pre-existing VSAM space entry — bypassing the IDC3025I INSUFFICIENT SUBALLOCATION DATA SPACE error that occurs when the catalog's freespace records have been consumed by previous delete/redefine cycles.

The Java Spring Batch Layer

Dual Datasource Configuration

Spring Boot's datasource autoconfiguration assumes a single primary datasource. With both PostgreSQL (for Spring Batch metadata) and Oracle (for business data), the configuration requires explicit @Primary annotation:

@Configuration
public class PrimaryDataSourceConfig {
    @Bean @Primary
    @ConfigurationProperties(prefix = "spring.datasource")
    public DataSourceProperties primaryDataSourceProperties() {
        return new DataSourceProperties();
    }

    @Bean @Primary
    public DataSource dataSource() {
        return primaryDataSourceProperties()
            .initializeDataSourceBuilder().build();
    }
}

@Configuration
public class OracleDataSourceConfig {
    @Bean
    @ConfigurationProperties(prefix = "oracle.datasource")
    public DataSourceProperties oracleDataSourceProperties() {
        return new DataSourceProperties();
    }

    @Bean(name = "oracleDataSource")
    public DataSource oracleDataSource() {
        return oracleDataSourceProperties()
            .initializeDataSourceBuilder().build();
    }
}

Without @Primary on the PostgreSQL datasource, Spring Batch's schema initializer picks up the Oracle datasource and fails with Failed to determine DatabaseDriver.

The NAV Calculation Processor

The core calculation mirrors COBOL's COMPUTE MARKET-VALUE = QUANTITY * NAV:

@Component
public class NavCalculationProcessor implements ItemProcessor<Position, Position> {
    @Override
    public Position process(Position position) throws Exception {
        BigDecimal marketValue = position.getQuantity()
            .multiply(position.getNav())
            .setScale(2, RoundingMode.HALF_UP);
        position.setMarketValue(marketValue);
        position.setSource("JAVA");
        return position;
    }
}

The use of BigDecimal with explicit RoundingMode.HALF_UP is deliberate. COBOL's packed decimal arithmetic rounds at half-up by default. Using double or float in Java introduces floating-point precision errors that will produce spurious mismatches in the reconciliation view — a subtle but critical correctness issue.

Composite Writer — Dual Output

The Spring Batch step writes to both Oracle (java_pos.positions) and PostgreSQL simultaneously using a CompositeItemWriter:

@Bean
public Step positionStep(JobRepository jobRepository,
                         PlatformTransactionManager transactionManager) {
    CompositeItemWriter<Position> writer = new CompositeItemWriter<>();
    writer.setDelegates(List.of(oracleWriter, rdsWriter));

    return new StepBuilder("positionStep", jobRepository)
            .<Position, Position>chunk(10, transactionManager)
            .reader(oracleItemReader)     // reads admin.position_seed
            .processor(processor)         // computes QTY × NAV
            .writer(writer)               // writes to Oracle + PostgreSQL
            .build();
}

Oracle receives the primary business output. PostgreSQL stores a copy for audit and for the reconciliation dashboard.

The Oracle Reconciliation Schema

The parallel run schema separates COBOL and Java outputs into distinct schemas on the same Oracle RDS instance:

-- COBOL lane
CREATE TABLE cobol_pos.positions (
    fund_id       VARCHAR2(10),
    security_id   VARCHAR2(20),
    quantity      NUMBER(18,6),
    nav           NUMBER(18,6),
    market_value  NUMBER(18,2),
    as_of_date    DATE,
    source        VARCHAR2(10) DEFAULT 'COBOL',
    CONSTRAINT pk_cobol_pos PRIMARY KEY (fund_id, security_id, as_of_date)
);

-- Java lane
CREATE TABLE java_pos.positions (
    ... identical structure, source DEFAULT 'JAVA' ...
);

-- Reconciliation view
CREATE OR REPLACE VIEW recon.parallel_run_diff AS
SELECT
    c.fund_id,
    c.security_id,
    c.as_of_date,
    c.market_value                              AS cobol_mv,
    j.market_value                              AS java_mv,
    ABS(c.market_value - j.market_value)        AS variance,
    CASE WHEN ABS(c.market_value - j.market_value) < 0.01
         THEN 'MATCH' ELSE 'MISMATCH' END       AS status
FROM cobol_pos.positions c
JOIN java_pos.positions j
    ON c.fund_id = j.fund_id
    AND c.security_id = j.security_id
    AND c.as_of_date = j.as_of_date;

The reconciliation view is the sign-off artifact. When this view returns zero rows with status = 'MISMATCH' across a full batch window, the cutover decision can be made with engineering confidence.

The Result

SELECT fund_id, security_id, cobol_mv, java_mv, variance, status
FROM recon.parallel_run_diff
ORDER BY fund_id, security_id;



FUND_ID   SECURITY_ID   COBOL_MV    JAVA_MV     VARIANCE   STATUS
--------- ------------- ----------  ----------  ----------  --------
FUND01    AAPL          146625      146625      0           MATCH
FUND01    IBM           185250      185250      0           MATCH
FUND01    MSFT          210050      210050      0           MATCH
FUND02    BAC           63450       63450       0           MATCH
FUND02    GS            93000       93000       0           MATCH
FUND02    JPM           64725       64725       0           MATCH

6 rows selected — VARIANCE = 0 — STATUS = MATCH

Zero variance. Six positions. COBOL and Java produce identical results.

Key Engineering Lessons

1. VSAM Is a Scratchpad, Not a Database

The most important architectural clarification for modernization teams: VSAM is the batch processing scratchpad, not the system of record. COBOL reads VSAM during a batch run and discards it afterward. Java does not need to read VSAM — it needs to produce the same Oracle output that COBOL produces after it has processed VSAM.

The parallel run proves semantic equivalence at the Oracle layer. VSAM disappears when the last COBOL job retires, replaced by Spring Batch's StepExecutionContext for intra-step state and database row locking for the concurrency control that VSAM lock files previously provided.

2. BigDecimal Is Non-Negotiable in Financial Calculations

Every financial calculation in the Java lane uses BigDecimal with explicit RoundingMode. COBOL's packed decimal arithmetic is deterministic and rounds at half-up. Floating-point types in Java are not — they will produce false mismatches in the reconciliation view, eroding confidence in the comparison.

An ArchUnit rule enforcing this constraint across the codebase is a practical investment:

@ArchTest
static final ArchRule noDoubleInFinancialCalculations =
    noFields()
        .that().areDeclaredInClassesThat()
            .resideInAPackage("..processor..")
        .should().haveRawType(double.class)
        .orShould().haveRawType(float.class);

3. The Cutover Gate Is a Feature Flag, Not a Flag Day

The parallel run pattern is most powerful when cutover is decoupled from deployment. An AWS AppConfig property (modernization.cutover.enabled) controls which lane is the system of record. When the reconciliation view shows zero variance across a defined number of consecutive batch windows, the flag flips — no weekend outage, no code deployment, no rollback plan.

@ConditionalOnProperty(name = "modernization.cutover.enabled",
                       havingValue = "true")
@Component
public class JavaPositionWriter implements ItemWriter<Position> { ... }

@ConditionalOnProperty(name = "modernization.cutover.enabled",
                       havingValue = "false", matchIfMissing = true)
@Component
public class CobolBridgeWriter implements ItemWriter<Position> { ... }

4. Mainframe Emulators Are Underused in Modernization Programs

SDL Hercules running MVS 3.8j is not a toy. It executes real S/370 machine instructions, runs JES2 batch scheduling, supports real VSAM I/O, and compiles real OS/VS COBOL. For modernization programs where production mainframe access is restricted, a Hercules-based POC environment provides a credible, observable, reproducible platform for proving migration approaches before production engagement.

The cost: one m5.xlarge EC2 instance at approximately $0.19/hour. The credibility: significant.

5. JCL Is Unforgiving — Automate Its Generation

JCL has a 80-column fixed-format structure, specific column rules for continuation lines, and character encoding requirements (EBCDIC on the mainframe, but our card reader accepts ASCII with the ascii and trunc parameters). Manual JCL authoring on a modern keyboard produces subtle errors. All JCL in this POC was generated programmatically using Python's ljust(80) padding, then submitted via nc -w 3 localhost 3505 < job.jcl.

lines = [
    "//MYJOB    JOB (ACCT),'DESC',CLASS=A,MSGCLASS=A",
    "//STEP1    EXEC PGM=IEFBR14",
    "//"
]
with open('/tmp/job.jcl', 'w') as f:
    for line in lines:
        f.write(line.ljust(80) + '\n')

Infrastructure as Code

The entire environment is provisioned via Terraform. One critical lesson: security groups with cross-references must use separate aws_security_group_rule resources, not inline security_groups references. The latter creates a circular dependency that Terraform cannot resolve:

# WRONG — causes Terraform cycle error
resource "aws_security_group" "sg_a" {
  ingress {
    security_groups = [aws_security_group.sg_b.id]  # circular!
  }
}

# CORRECT — separate resource breaks the cycle
resource "aws_security_group_rule" "a_allows_b" {
  type                     = "ingress"
  security_group_id        = aws_security_group.sg_a.id
  source_security_group_id = aws_security_group.sg_b.id
  from_port = 3505  to_port = 3505  protocol = "tcp"
}

What This Enables

This POC framework generalizes beyond NAV calculation. Any COBOL batch program that reads input data and writes to a database can be placed in this parallel run harness:

  1. Define the input contract — what Oracle tables or files does the COBOL job read?
  2. Define the output contract — what Oracle tables does it write?
  3. Implement the Java equivalent — Spring Batch reads the same input, computes the same output
  4. Wire the reconciliation view — diff COBOL output vs Java output
  5. Set the cutover threshold — zero variance for N consecutive runs

The wave sequencer step — modeling InvestOne's multi-wave batch processing as a Spring Batch FlowJobBuilder — is the next engineering milestone. Once wave sequencing is proven, the entire COBOL orchestration layer can be retired wave by wave, with each wave's cutover gated by its own reconciliation view.

Conclusion

Mainframe modernization does not require a big-bang migration or a heroic cutover weekend. It requires a systematic framework for proving equivalence — record by record, calculation by calculation — before any production system is touched.

SDL Hercules on EC2, combined with Spring Batch and Oracle RDS, provides exactly that framework. The COBOL runs on a real S/370 emulator. The VSAM stores real position data. The Java produces real NAV calculations. The Oracle reconciliation view shows the difference between them.

In this POC, that difference is zero.

The parallel run is not just a testing pattern. It is a risk management instrument — one that converts the binary choice between "migrate" and "don't migrate" into a graduated, observable, reversible process. When the variance is zero and the cutover flag flips, it is not a leap of faith. It is a confirmation of what the data has already proven.

References and Resources

About the Author

Banu Parasuraman is a Distinguished Engineer and Account CTO at Mphasis, specializing in Forward Deployed Engineering across financial services modernization programs. With over 20 years of experience spanning IBM, Wipro Digital, and General Motors, Banu works embedded in client accounts across application modernization, agentic AI, and enterprise platform initiatives.

Connect on LinkedIn | Follow for more on mainframe modernization, agentic SDLC, and enterprise platform engineering.

© 2026 Banu Parasuraman. Published under Creative Commons Attribution 4.0. Technical details have been generalized to protect client confidentiality.

Discussion in the ATmosphere

Loading comments...