Quickbrowse: Go to a tool by selecting the abbreviated tool name from the menu.
Contents: Click a blue triangle to expand or collapse a list
The scripts in this section perform simple transformations of entire
files or lines in files.
To use a script, cut and paste the code from the light green or blue box into a
terminal window, change the bold, red text as needed, and hit Enter.
See More Information for notes on using these tools.
This tool is important because most of the Scriptome tools require
tab-separated data.
Warning: a few weird separators (like ' or ``) might not work. (It might
help to put a backslash before it.)
Example: Change comma-separated file.csv
to tab-separated file.tab
by
running the above script.
Input file (file.csv ) |
Output file (file.tab ) |
Screen Output |
Fly,7
Human,14
Worm,28
Yeast,35
|
Fly 7
Human 14
Worm 28
Yeast 35
|
Changed , to tab on 4 lines
|
Example 2: Given a list of Swiss-Prot identifiers, separate the
protein name and species abbreviation into two separate columns.
Run the above script using $sep="_"
Replace the tabs in tab-separated data with some other separator. The separator
does not have to be one character: "---" would work, for example, or even "" to
merge all columns.
This tool is important because most of the Scriptome tools require
tab-separated data. After running one or more Scriptome tools, use this
script to export data back to other programs which expect comma-separated data,
for example.
Warning: a few weird separators (like ' or ``) might not work. Also, if there's a
comma in your data, and you change to a comma separator, you'll get too
many columns.
Example: Change tab-separated file.tab
to comma-separated file.csv
by
running the above script.
Input file (file.tab ) |
Output file (file.csv ) |
Screen Output |
Fly 7
Human 14
Worm 28
Yeast 35
|
Fly,7
Human,14
Worm,28
Yeast,35
|
Changed tab to , on 4 lines
|
Use
choose_cols.
Choose some or all of the columns, in whatever order you want.
To switch the order of the first two columns of a ten-column file, you could
set @cols to be 1, 0, 2..9.
Remove all spaces (but not tabs) from a line.
Example: TODO
See Also: remove empty lines (Choose)
Change all characters in each line to upper case. Numbers and punctuation
will not be changed. (Change "uc" in the script to "lc" to get lower case.)
Example: Change a list of gene names to upper-case, to compare with another
list.
Change each FASTA sequence in a file into one line of three, tab-separated
columns: the ID (not including the '>'); the rest of the description line
(or an empty column if the description line contains only an ID);
and the sequence itself.
Once you have run this script, you can use the many Scriptome tools that
work on tab-separated data.
Note: translating to FASTA format and back will generate a file with
the same information, but the files may not be identical. This tool will
replace any tabs with single spaces (otherwise the tabular output file will
have too many columns) and removes any spaces from the amino acid
or nucleic acid sequences.
Example: Run the above script on seqs.fna
to get seqs.tab
.
Input file (seqs.fna ) |
Output file (seqs.tab ) |
Screen Output |
>CG123 A small sequence
ACGTTGCA
GTTACCAG
>EG12
ACCGGA
>DG124 A smaller sequence
GTTACCAG
|
CG123 A small sequence ACGTTGCAGTTACCAG
EG12 ACCGGA
DG124 A smaller sequence GTTACCAG
|
Converted 3 FASTA records in 7 lines to tabular format
Total sequence length: 30
|
Change each line in a three-column, tab-separated file (containing ID,
description and sequence - e.g., a file created by the above
change_tab_to_fasta tool) to FASTA sequence.
Note: translating to FASTA format and back will generate a file with
the same information, but the files may not be identical. This tool
will put a single space between the ID and the description,
and will put 60 characters per line in the sequence portion.
Example: Run the above script on seqs.tab
to get seqs.fasta
.
Input file (seqs.tab ) |
Output file (seqs.fasta ) |
Screen Output |
CG123 A small sequence ACGTTGCAGTTACCAG
EG12 ACCGGA
DG124 A smaller sequence GTTACCAG
|
>CG123 A small sequence
ACGTTGCAGTTACCAG
>EG12
ACCGGA
>DG124 A smaller sequence
GTTACCAG
|
Converted 3 tab-delimited lines to FASTA format
Total sequence length: 30
|
Change files with one or more sequences into a different format.
The input and output formats can be embl, fasta, gcg, genbank, swiss,
or a whole bunch of other formats: see
The Bioperl SeqIO HOWTO
for details.
Warning: Converting from genbank to FASTA (for example) will necessarily lose
some annotation information.
This script requires Bioperl to be installed (on whichever machine the
script runs on). Many biology computers will have it installed. If the script
breaks because it "can't locate Bio/Perl.pm", you can download Bioperl from
bioperl.org.
Example: TODO
Change rows to columns and vice versa, for a tab-separated file.
Data should have the same number of columns in every row.
Example: Transpose the table original.tab
to get transposed.tab
.
Input file (original.tab ) |
Top Col2 Col3
Row2 r2c2 r2c3
Row3 r3c2 r3c3
Row4 r4c2 r4c3
Row5 r5c2 r5c3
|
Output file (transposed.tab ) |
Top Row2 Row3 Row4 Row5
Col2 r2c2 r3c2 r4c2 r5c2
Col3 r2c3 r3c3 r4c3 r5c3
|
Screen Output |
Transposed table: result has 5 columns and 3 rows
|
---|
Split one big FASTA file into multiple smaller ones. If the output filename template is small_NUMBER.fasta
, the output files will be called small_1.fasta
, small_2.fasta
, etc.
Example: Split big.fasta
, with five sequences, into two files, small_1.fasta
and small_2.fasta
. (Since there are only five sequences, the second file has only two sequences in it.)
Input file (big.fasta ) |
>seq1
ACCTTGTCGCA
>seq2
ACCTTGTCGCAAAGC
>seq3
ACCTTGTCGCACCGGAACGA
>seq4
ACCTTGTCGCACCGGAACGACCGGAACGA
>seq5
GTCGCA
|
Output file 1 (small_1.fasta ) |
>seq1
ACCTTGTCGCA
>seq2
ACCTTGTCGCAAAGC
>seq3
ACCTTGTCGCACCGGAACGA
|
Output file 2 (small_2.fasta ) |
>seq4
ACCTTGTCGCACCGGAACGACCGGAACGA
>seq5
GTCGCA
|
Screen Output |
Split 5 FASTA records in 10 lines, with total sequence length 81
Created 2 files like small_2.fasta
|
---|
As always, when in doubt, check your output files after each step!
Scriptome tools are in blue or green boxes. Cut and paste the text of the
tool into a terminal window. Then edit the line as needed.
Things that will often need to be edited are highlighted in
red. Input and output filenames will almost always need to be changed.
All scripts that work on tabular data assume the data is tab-separated.
Use a Change script to change, e.g., comma-separated data
to tab-separated before using these scripts.
When working with tabular data, remember that the first column is called
column 0, the second column is column 1, etc. The last column can
also be referred to as column -1, second-to-last column is -2, etc.