Loading Data
With the infrastructure configured, it's time to load data into the portal. Conductor is a CLI tool that reads CSV files, loads each row into PostgreSQL (persistent storage), then indexes them into Elasticsearch as structured documents for search.
Conductor runs as a Docker container, no Node.js installation required. A wrapper script at the root of the repository handles the Docker details for you.
Run all ./conductor commands from the root of the prelude repository (i.e. where docker-compose.yml lives).
Optional: run conductor from any directory
By default ./conductor must be called from the repo root. If you'd prefer to run conductor from anywhere on your system, add the repo to your shell's PATH:
export PATH="$PATH:/path/to/prelude"
Add that line to your ~/.zshrc (Zsh) or ~/.bashrc (Bash) and reload:
source ~/.zshrc # or source ~/.bashrc
You can then run conductor upload ... from any directory. The script resolves the data/ folder relative to its own location in the repo, so paths like ./data/datatable1.csv still refer to the repo's data/ directory, run the command from the repo root or use an absolute path to a file outside it.
Uploading Data
Run the upload command to load your data:
./conductor upload -f ./data/datatable1.csv -t datatable1 -i datatable1-index
Command breakdown
upload: the Conductor command for the full CSV → PostgreSQL → Elasticsearch pipeline-f ./data/datatable1.csv: path to the input CSV file (relative to the repo root)-t datatable1: target PostgreSQL table name (must match the table created by your SQL schema)-i datatable1-index: target Elasticsearch index name (must match the index created by the setup service)
Additional options:
-b, --batch-size <n>: records per batch (default:5000)--delimiter <char>: CSV delimiter character (default:,)--statement-timeout <ms>: max time allowed per database statement in milliseconds (default:120000)
For a full reference run: ./conductor upload -h
What happens during upload
Conductor processes each CSV row in two stages. First, it inserts the raw records into the PostgreSQL table (providing persistent, queryable storage). Then it reads from PostgreSQL, wraps each record in a structured JSON document, and bulk-indexes it into Elasticsearch:
{
"data": {
"donor_id": "DO-001",
"gender": "Female",
"age_at_diagnosis": 45,
"cancer_type": "Breast Cancer",
"..."
},
"submission_metadata": {
"submitter_id": "DO-001",
"processed_at": "2026-04-21T14:30:00.000Z",
"source_file": "datatable1.csv",
"record_number": 1
}
}
Your CSV fields go into the data object. Conductor adds submission_metadata automatically for tracking purposes. Records are inserted into PostgreSQL and indexed into Elasticsearch in batches.
Upload behaviour: re-runs, interruptions, and partial failures
Re-uploading the same file: Re-running upload against an already-loaded file is safe. Records that already exist in PostgreSQL are skipped automatically, nothing is duplicated in either PostgreSQL or Elasticsearch.
If the upload is interrupted: If the process is stopped (Ctrl+C, network drop, etc.), any records already written to PostgreSQL are preserved. Re-run the same command to resume, previously loaded records are skipped and only the remaining rows are uploaded and indexed.
If Elasticsearch indexing fails: Records may be written to PostgreSQL but fail to reach Elasticsearch. If this happens, a warning is printed at the end of the run:
Some records were inserted into "<table>" but not indexed. Re-run indexing with:
conductor index-db -t <table> -i <index>
Run ./conductor index-db to re-index from PostgreSQL without re-parsing the CSV.
Verifying the Upload
Open http://localhost:3000 in your browser:
- Navigate to the data exploration page
- Verify records appear in the data table
- Test the facet filters: click on values in the sidebar and confirm the table updates
- Try sorting columns
Verify directly via Elasticsearch
Check the document count:
curl -u elastic:myelasticpassword http://localhost:9200/datatable1_centric/_count?pretty
View a sample document:
curl -u elastic:myelasticpassword "http://localhost:9200/datatable1_centric/_search?pretty&size=1"
If you installed Elasticvue, connect to http://localhost:9200 with credentials elastic / myelasticpassword, navigate to Indices, and select datatable1-index to browse documents and verify the data structure.
Reloading Data (supplemental)
The right approach depends on what changed, expand the relevant scenario below for more information.
Mapping changed (data already in PostgreSQL)
If you updated the Elasticsearch mapping but your CSV data is unchanged, use index-db to re-index directly from PostgreSQL, no need to re-parse the CSV:
-
Delete the existing index:
curl -u elastic:myelasticpassword -X DELETE "http://localhost:9200/datatable1-index" -
Restart to recreate the index from the updated mapping:
make restartWindows (PowerShell).\run.ps1 restart -
Re-index from PostgreSQL:
./conductor index-db -t datatable1 -i datatable1-index
CSV corrected (data needs to be re-uploaded)
If you fixed errors in the CSV itself, you need to clear both PostgreSQL and Elasticsearch. The existing table already contains the old records and re-uploading would cause duplicates:
-
Delete the existing index:
curl -u elastic:myelasticpassword -X DELETE "http://localhost:9200/datatable1-index" -
Truncate the PostgreSQL table:
docker exec postgres psql -U admin -d overtureDb -c "TRUNCATE TABLE datatable1;" -
Restart to recreate the index:
make restartWindows (PowerShell).\run.ps1 restart -
Re-upload from the corrected CSV:
./conductor upload -f ./data/datatable1.csv -t datatable1 -i datatable1-index
Other Conductor commands
The standard upload command runs the full CSV → PostgreSQL → Elasticsearch pipeline in one pass, which is what you need for the workshop. Conductor also exposes two targeted commands that operate on only one destination, useful when the two stages need to happen separately, for example if PostgreSQL and Elasticsearch are being managed independently, or if you need to debug one layer in isolation:
-
upload-db: loads CSV data into PostgreSQL only, without indexing to Elasticsearch:./conductor upload-db -f ./data/datatable1.csv -t datatable1 -
upload-es: uploads CSV data directly to Elasticsearch, bypassing PostgreSQL. The index must already exist and the CSV headers must match the index mapping:./conductor upload-es -f ./data/datatable1.csv -i datatable1-index
For a full list of available commands and options:
./conductor -h
Checkpoint
Before proceeding, confirm:
./conductor -hruns without errors- The upload command completed successfully (check terminal output for record count)
curl -u elastic:myelasticpassword http://localhost:9200/datatable1_centric/_count?prettyreturns a count matching your CSV row count- The portal at http://localhost:3000 shows data in the table
- Facet filters work: clicking a value updates the table
Stuck? If the upload fails with a connection error, make sure both PostgreSQL and Elasticsearch are running:
docker exec postgres pg_isready -U adminandcurl -u elastic:myelasticpassword http://localhost:9200/_cluster/health?pretty. If the index or table doesn't exist, runmake restartfirst.
Next: This is the stage where issues are most likely to surface. If something isn't working, that's expected, the next page walks through how to diagnose which layer of the stack has the problem.