Part 5: Input validation¶
In this fifth part of the Hello nf-core training course, we show you how to use the nf-schema plugin to validate pipeline inputs and parameters.
Why validation matters¶
Imagine running your pipeline for two hours, only to have it crash because a user provided a file with the wrong extension. Or spending hours debugging cryptic errors, only to discover that a parameter was misspelled. Without input validation, these scenarios are common.
Consider this example:
$ nextflow run my-pipeline --input data.txt --output results
...2 hours later...
ERROR ~ No such file: 'data.fq.gz'
Expected FASTQ format but received TXT
The pipeline accepted invalid inputs and ran for hours before failing. With proper validation:
$ nextflow run my-pipeline --input data.txt --output results
ERROR ~ Validation of pipeline parameters failed!
* --input (data.txt): File extension '.txt' does not match required pattern '.fq.gz' or '.fastq.gz'
* --output: required parameter is missing (expected: --outdir)
Pipeline failed before execution - please fix the errors above
The pipeline fails immediately with clear, actionable error messages. This saves time, compute resources, and frustration.
Two types of validation¶
nf-core pipelines validate two different kinds of input:
Parameter validation¶
This validates command-line parameters (flags like --outdir, --batch, --input):
- Checks parameter types, ranges, and formats
- Ensures required parameters are provided
- Validates file paths exist
- Defined in
nextflow_schema.json
Input data validation¶
This validates the contents of input files (like sample sheets or CSV files)
- Checks column structure and data types
- Validates file references within the input file
- Ensures required fields are present
- Defined in
assets/schema_input.json
Note
This section assumes you have completed Part 4: Make an nf-core module and have a working core-hello pipeline with nf-core-style modules.
If you didn't complete Part 4 or want to start fresh for this section, you can use the core-hello-part4 solution as your starting point:
This gives you a fully functional nf-core pipeline with modules ready for adding input validation.
1. The nf-schema plugin¶
The nf-schema plugin is a Nextflow plugin that provides comprehensive validation capabilities for any Nextflow pipeline. While nf-schema is a standalone tool that can be used in any Nextflow workflow, it's heavily integrated into the nf-core ecosystem and is the standard validation solution for all nf-core pipelines.
1.1. Core functionality¶
nf-schema provides several key functions:
- Parameter validation: Validates pipeline parameters against
nextflow_schema.json - Sample sheet validation: Validates input files against
assets/schema_input.json - Channel conversion: Converts validated sample sheets to Nextflow channels
- Help text generation: Automatically generates
--helpoutput from schema definitions - Parameter summary: Displays which parameters differ from defaults
nf-schema is the successor to the deprecated nf-validation plugin and uses standard JSON Schema Draft 2020-12 for validation.
1.2. The two schema files¶
An nf-core pipeline uses two schema files for validation:
| Schema File | Purpose | Validates |
|---|---|---|
nextflow_schema.json |
Parameter validation | Command-line flags: --input, --outdir, --batch |
assets/schema_input.json |
Input data validation | Contents of sample sheets and input files |
Both schemas use JSON Schema format, a widely-adopted standard for describing and validating data structures.
1.3. When validation occurs¶
graph LR
A[User runs pipeline] --> B[Parameter validation]
B -->|✓ Valid| C[Input data validation]
B -->|✗ Invalid| D[Error: Fix parameters]
C -->|✓ Valid| E[Pipeline executes]
C -->|✗ Invalid| F[Error: Fix input data]
Validation happens before any pipeline processes run, providing fast feedback and preventing wasted compute time.
1.4. Configure validation to skip input file validation¶
The nf-core pipeline template comes with nf-schema already installed and configured:
- The nf-schema plugin is installed via the
plugins{}block innextflow.config - Parameter validation is enabled by default via
params.validate_params = true - The validation is performed by the
UTILS_NFSCHEMA_PLUGINsubworkflow during pipeline initialization
The validation behavior is controlled through the validation{} scope in nextflow.config.
Since we'll be working on parameter validation first (section 2) and won't configure the input data schema until section 3, we need to temporarily tell nf-schema to skip validating the input parameter's file contents.
Open nextflow.config and find the validation block (around line 246). Add ignoreParams to skip input file validation:
This configuration tells nf-schema to:
defaultIgnoreParams: Skip validation of complex parameters likegenomes(set by template developers)ignoreParams: Skip validation of theinputparameter's file contents (temporary - we'll remove this in section 3)monochromeLogs: Control colored output in validation messages
Why ignore the input parameter?
The input parameter in nextflow_schema.json has "schema": "assets/schema_input.json" which tells nf-schema to validate the contents of the input CSV file against that schema.
Since we haven't configured that schema yet, we temporarily ignore this validation.
We'll remove this setting in section 3 after configuring the input data schema.
Takeaway¶
You now understand what nf-schema does, the two types of validation it provides, when validation occurs, and how to configure validation behavior. You've also temporarily disabled input file validation so we can focus on parameter validation first.
What's next?¶
Start by implementing parameter validation for command-line flags.
2. Parameter validation (nextflow_schema.json)¶
Let's start by adding parameter validation to our pipeline. This validates command-line flags like --input, --outdir, and --batch.
2.1. Examine the parameter schema¶
Let's look at a section of the nextflow_schema.json file that came with our pipeline template:
The parameter schema is organized into groups. Here's the input_output_options group:
Key validation features:
type: Data type (string, integer, boolean, number)format: Special formats likefile-pathordirectory-pathexists: For file paths, check if the file existspattern: Regular expression the value must matchrequired: Array of parameter names that must be providedmimetype: Expected file mimetype for validation
Where do schema parameters come from?
The schema validation uses nextflow.config as the base for parameter definitions. Parameters declared elsewhere in your workflow scripts (like in main.nf or module files) are not automatically picked up by the schema validator.
This means you should always declare your pipeline parameters in nextflow.config, and then define their validation rules in nextflow_schema.json.
Notice the batch parameter we've been using isn't defined yet in the schema!
2.2. Add the batch parameter¶
While the schema is a JSON file that can be edited manually, manual editing is error-prone and not recommended. Instead, nf-core provides an interactive GUI tool that handles the JSON Schema syntax for you and validates your changes:
You'll see output like:
,--./,-.
___ __ __ __ ___ /,-._.--\
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/tools version 3.4.1 - https://nf-co.re
INFO [✓] Default parameters match schema validation
INFO [✓] Pipeline schema looks valid (found 17 params)
INFO Writing schema with 17 params: 'nextflow_schema.json'
🚀 Launch web builder for customisation and editing? [y/n]:
Type y and press Enter to launch the interactive web interface.
Your browser will open showing the Parameter schema builder:

To add the batch parameter:
- Click the "Add parameter" button at the top
- Use the drag handle (⋮⋮) to move the new parameter up into the "Input/output options" group, below the
inputparameter - Fill in the parameter details:
- ID:
batch - Description:
Name for this batch of greetings - Type:
string - Check the Required checkbox
- Optionally, select an icon from the icon picker (e.g.,
fas fa-layer-group)

When you're done, click the "Finished" button at the top right.
Back in your terminal, you'll see:
INFO Writing schema with 18 params: 'nextflow_schema.json'
⣾ Use ctrl+c to stop waiting and force exit.
Press Ctrl+C to exit the schema builder.
The tool has now updated your nextflow_schema.json file with the new batch parameter, handling all the JSON Schema syntax correctly.
2.2.1. Verify the changes¶
You should see that the batch parameter has been added to the schema with the "required" field now showing ["input", "outdir", "batch"].
2.3. Test parameter validation¶
Now let's test that parameter validation works correctly.
First, try running without the required input parameter:
ERROR ~ Validation of pipeline parameters failed!
-- Check '.nextflow.log' file for details
The following invalid input values have been detected:
* Missing required parameter(s): input, batch
Perfect! The validation catches the missing required parameter before the pipeline runs.
Now try with a valid set of parameters:
The pipeline should run successfully, and the batch parameter is now validated.
Takeaway¶
You now know how to use the interactive nf-core pipelines schema build tool to add parameters to nextflow_schema.json and test parameter validation.
The web interface handles all the JSON Schema syntax for you, making it easy to manage complex parameter schemas without error-prone manual JSON editing.
What's next?¶
Now that parameter validation is working, let's add validation for the input data file contents.
3. Input data validation (schema_input.json)¶
Now let's add validation for the contents of our input CSV file. While parameter validation checks command-line flags, input data validation ensures the data inside the CSV file is structured correctly.
3.1. Understand the greetings.csv format¶
Let's remind ourselves what our input looks like:
This is a simple CSV with:
- One column (no header)
- One greeting per line
- Text strings with no special format requirements
3.2. Design the schema structure¶
For our use case, we want to:
- Accept CSV input with one column
- Treat each row as a greeting string
- Ensure greetings are not empty
- Ensure no whitespace-only entries
We'll structure this as an array of objects, where each object has a greeting field.
3.3. Update the schema file¶
The nf-core pipeline template includes a default assets/schema_input.json designed for paired-end sequencing data.
We need to replace it with a simpler schema for our greetings use case.
Open assets/schema_input.json and replace the properties and required sections:
The key changes:
description: Updated to mention "greetings file"properties: Replacedsample,fastq_1, andfastq_2with a singlegreetingfieldtype: "string": Must be a text stringpattern: "^\\S.*$": Must start with a non-whitespace character (but can contain spaces after that)errorMessage: Custom error message shown if validation failsrequired: Changed from["sample", "fastq_1"]to["greeting"]
3.4. Add a header to the greetings.csv file¶
When nf-schema reads a CSV file, it expects the first row to contain column headers that match the field names in the schema.
For our simple case, we need to add a greeting header to our greetings file:
Add a header line to the greetings file:
Now the CSV file has a header that matches the field name in our schema.
Takeaway¶
You've created a JSON schema for the greetings input file and added the required header to the CSV file.
What's next?¶
Implement the validation in the pipeline code using samplesheetToList.
3.5. Implement samplesheetToList in the pipeline¶
Now we need to replace our simple CSV parsing with nf-schema's samplesheetToList function, which validates and converts the sample sheet.
The samplesheetToList function:
- Reads the input sample sheet (CSV, TSV, JSON, or YAML)
- Validates it against the provided JSON schema
- Returns a Groovy list where each entry corresponds to a row
- Throws helpful error messages if validation fails
Let's update the input handling code:
Open subworkflows/local/utils_nfcore_hello_pipeline/main.nf and locate the section where we create the input channel (around line 80).
We need to:
- Use the
samplesheetToListfunction (already imported in the template) - Validate and parse the input
- Extract just the greeting strings for our workflow
What are Nextflow plugins?
Plugins are extensions that add new functionality to the Nextflow language itself. They're installed via a plugins{} block in nextflow.config and can provide:
- New functions and classes that can be imported (like
samplesheetToList) - New DSL features and operators
- Integration with external services
The nf-schema plugin is specified in nextflow.config:
Once installed, you can import functions from plugins using include { functionName } from 'plugin/plugin-name' syntax.
First, note that the samplesheetToList function is already imported at the top of the file (the nf-core template includes this by default):
Now update the channel creation code:
| core-hello/subworkflows/local/utils_nfcore_hello_pipeline/main.nf | |
|---|---|
| core-hello/subworkflows/local/utils_nfcore_hello_pipeline/main.nf | |
|---|---|
Let's break down what changed:
samplesheetToList(params.input, "${projectDir}/assets/schema_input.json"): Validates the input file against our schema and returns a listChannel.fromList(...): Converts the list into a Nextflow channel
Takeaway¶
You've successfully implemented input data validation using samplesheetToList and JSON schemas.
What's next?¶
Re-enable input validation in the config and test both parameter and input data validation to see them in action.
3.6. Re-enable input validation¶
Now that we've configured the input data schema, we can remove the temporary ignore setting we added in section 1.4.
Open nextflow.config and remove the ignoreParams line from the validation block:
Now nf-schema will validate both parameter types AND the input file contents.
3.7. Test input validation¶
Let's verify that our validation works by testing both valid and invalid inputs.
3.7.1. Test with valid input¶
First, confirm the pipeline runs successfully with valid input:
Note that we no longer need --validate_params false since validation is working!
------------------------------------------------------
WARN: The following invalid input values have been detected:
* --character: tux
executor > local (10)
[c1/39f64a] CORE_HELLO:HELLO:sayHello (1) | 4 of 4 ✔
[44/c3fb82] CORE_HELLO:HELLO:convertToUpper (4) | 4 of 4 ✔
[62/80fab2] CORE_HELLO:HELLO:CAT_CAT (test) | 1 of 1 ✔
[e1/4db4fd] CORE_HELLO:HELLO:cowpy | 1 of 1 ✔
-[core/hello] Pipeline completed successfully-
Great! The pipeline runs successfully and validation passes silently. The warning about --character is just informational since it's not defined in the schema. If you want, use what you've learned to add validation for that parameter too!
3.7.2. Test with invalid input¶
Now let's test that validation catches errors. Create a test file with an invalid column name:
This file uses message as the column name instead of greeting, which doesn't match our schema.
Try running the pipeline with this invalid input:
ERROR ~ Validation of pipeline parameters failed!
-- Check '.nextflow.log' file for details
The following invalid input values have been detected:
* Missing required parameter(s): batch
* --input (/tmp/invalid_greetings.csv): Validation of file failed:
-> Entry 1: Missing required field(s): greeting
-> Entry 2: Missing required field(s): greeting
-> Entry 3: Missing required field(s): greeting
-- Check script 'subworkflows/nf-core/utils_nfschema_plugin/main.nf' at line: 68 or see '.nextflow.log' file for more details
Perfect! The validation caught the error and provided a clear, helpful error message pointing to:
- Which file failed validation
- Which entry (row 1, the first data row) has the problem
- What the specific problem is (missing required field
greeting)
The schema validation ensures that input files have the correct structure before the pipeline runs, saving time and preventing confusing errors later in execution.
Takeaway¶
You now know how to implement and test both parameter validation and input data validation. Your pipeline validates inputs before execution, providing fast feedback and clear error messages.
Further reading
To learn more about advanced validation features and patterns, check out the nf-schema documentation. The nf-core pipelines schema build command provides an interactive GUI for managing complex schemas.
What's next?¶
You've completed all five parts of the Hello nf-core training course!
Continue to the Summary to reflect on what you've built and learned.