Sequence: interconvert FASTA and tabular
Operation
Move sequence records between FASTA and a tabular form where the header is one column and the sequence is another, so the corpus’s tabular tools (galaxy-tabular-patterns) can read, filter, join, or rewrite records that FASTA tooling cannot easily touch. This is the most-reached-for sequence operation in IWC and the substrate for the relabel-fasta-headers-via-tabular recipe.
Two tools, inverse of each other:
toolshed.g2.bx.psu.edu/repos/devteam/fasta_to_tabular/fasta2tab(“FASTA-to-Tabular”) — FASTA → tabular.toolshed.g2.bx.psu.edu/repos/devteam/tabular_to_fasta/tab2fasta(“Tabular-to-FASTA”) — tabular → FASTA.
Parameter names below are corpus-inferred from tool_state; verify against the live form when authoring.
When to reach for it
- You need to edit FASTA headers — inject a sample id, strip a prefix, regex-rewrite — which is awkward on raw FASTA but trivial as column 1 of a table. Pair with relabel-fasta-headers-via-tabular.
- You need to join sequence records against a table (annotations, scores, ids) on the header. Convert one way (
fasta2tab) and stay tabular (clinicalmp-discoveryopens a protein DB to tabular and never returns to FASTA). - You need a per-record table for counting or filtering that carries the sequence too. For length only, sequence-compute-length is leaner (it drops the sequence).
Parameters
fasta2tab
input: the FASTA (connected).descr_columns: how many leading whitespace-delimited header tokens become their own columns. Corpus pins"1"— the whole description line is column 1, the sequence is column 2.keep_first: truncate each sequence to the first N characters;"0"(keep all) in the corpus.
tab2fasta
input: the table (connected). Expects the header column(s) then the sequence column.- Title/sequence column selection mirrors the
descr_columnssplit produced byfasta2tab; round-trip with the same shape.
Idiomatic shape
Open records to a two-column table, ready for a column-1 header edit:
tool_id: toolshed.g2.bx.psu.edu/repos/devteam/fasta_to_tabular/fasta2tab/1.1.1
tool_state:
input: { __class__: ConnectedValue }
descr_columns: "1"
keep_first: "0"
Then convert back after the edit:
tool_id: toolshed.g2.bx.psu.edu/repos/devteam/tabular_to_fasta/tab2fasta/1.1.1
tool_state:
input: { __class__: ConnectedValue } # the edited table
Pitfalls
descr_columns: "1"is load-bearing for the round-trip. If you split the header into multiple columns going out (descr_columns > 1) but reassemble assuming one,tab2fastarebuilds the header from the wrong column set. Keep the split symmetric.keep_firstsilently truncates. A non-zero value clips sequences; the corpus never uses it for interconversion. Leave it"0"unless you actually want a prefix.- The table is opaque to coordinates. Once in tabular form the record is just text columns — see galaxy-tabular-patterns. This is the right tool for header/string edits, the wrong one for anything needing sequence semantics (length, translate, masking live on their own pages).
- One-way is fine. Not every use returns to FASTA;
clinicalmp-discoveryconverts once and joins. Don’t add atab2fastayou don’t need.
See also
- relabel-fasta-headers-via-tabular — the recipe this operation underpins.
- sequence-compute-length — when you only want (id, length), not the sequence.
- galaxy-sequence-patterns — the sequence pattern map.
- galaxy-tabular-patterns — what to do while the records are tabular.
- iwc-sequence-operations-survey — corpus evidence.