Scriptome Home
UNIX/Mac Home
Windows Home
Information
FAQ
Help
Overview
Principles
Resources
Tips
Tools
Calc
Change
Choose
Fetch
Merge
Sort
Protocols
Sequences
Microarray

Quickbrowse: Go to a tool by selecting the abbreviated tool name from the menu.

change_

Contents: Click a blue triangle to expand or collapse a list


Change files

The scripts in this section perform simple transformations of entire files or lines in files.

To use a script, cut and paste the code from the light green or blue box into a terminal window, change the bold, red text as needed, and hit Enter.

See More Information for notes on using these tools.

Change columns in each line of files

Change tabular data with a given separator to tab-separated values (change_any_separator_to_tab)

This tool is important because most of the Scriptome tools require tab-separated data.

Warning: a few weird separators (like ' or ``) might not work. (It might help to put a backslash before it.)

$sep Character(s) to change to tab
Input file(s)
Output file

perl -e ' $sep=","; while(<>) { s/\Q$sep\E/\t/g; print $_; } warn "Changed $sep to tab on $. lines\n" ' file.csv > file.tab

Example: Change comma-separated file.csv to tab-separated file.tab by running the above script.

Input file (file.csv) Output file (file.tab) Screen Output
 Fly,7
 Human,14
 Worm,28
 Yeast,35
 Fly    7
 Human  14
 Worm   28
 Yeast  35
 Changed , to tab on 4 lines

Example 2: Given a list of Swiss-Prot identifiers, separate the protein name and species abbreviation into two separate columns. Run the above script using $sep="_"

Change tab-separated data to use a different delimiter (change_tab_to_any_separator)

Replace the tabs in tab-separated data with some other separator. The separator does not have to be one character: "---" would work, for example, or even "" to merge all columns.

This tool is important because most of the Scriptome tools require tab-separated data. After running one or more Scriptome tools, use this script to export data back to other programs which expect comma-separated data, for example.

Warning: a few weird separators (like ' or ``) might not work. Also, if there's a comma in your data, and you change to a comma separator, you'll get too many columns.

$sep Character(s) to change tab to
Input file(s)
Output file

perl -e ' $sep=","; while(<>) { s/\t/$sep/g; print $_; } warn "Changed tab to $sep on $. lines\n" ' file.tab > file.csv

Example: Change tab-separated file.tab to comma-separated file.csv by running the above script.

Input file (file.tab) Output file (file.csv) Screen Output
 Fly    7
 Human  14
 Worm   28
 Yeast  35
 Fly,7
 Human,14
 Worm,28
 Yeast,35
 Changed tab to , on 4 lines

Reorder columns

Use choose_cols. Choose some or all of the columns, in whatever order you want. To switch the order of the first two columns of a ten-column file, you could set @cols to be 1, 0, 2..9.

Change entire lines in files

Remove spaces in a line (change_remove_spaces)

Remove all spaces (but not tabs) from a line.

Input file(s)
Output file

perl -e ' while(<>) { s/ //g; print $_; } warn "Removed all spaces from $. lines\n" ' file.spaces > file.nospace

Example: TODO

See Also: remove empty lines (Choose)

Change all characters to upper case (change_upper_case)

Change all characters in each line to upper case. Numbers and punctuation will not be changed. (Change "uc" in the script to "lc" to get lower case.)

Input file(s)
Output file

perl -e ' while(<>) { print uc($_); } warn "Changed $. lines to upper case\n" ' file.mixed > file.uc

Example: Change a list of gene names to upper-case, to compare with another list.

Change between different biological data formats

Change a FASTA file into tabular format (change_fasta_to_tab)

Change each FASTA sequence in a file into one line of three, tab-separated columns: the ID (not including the '>'); the rest of the description line (or an empty column if the description line contains only an ID); and the sequence itself.

Once you have run this script, you can use the many Scriptome tools that work on tab-separated data.

Note: translating to FASTA format and back will generate a file with the same information, but the files may not be identical. This tool will replace any tabs with single spaces (otherwise the tabular output file will have too many columns) and removes any spaces from the amino acid or nucleic acid sequences.

Input file(s)
Output file

perl -e ' $count=0; $len=0; while(<>) { s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) { print "\n" } s/ |$/\t/; $count++; $_ .= "\t"; } else { s/ //g; $len += length($_) } print $_; } print "\n"; warn "\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n"; ' seqs.fna > seqs.tab

Example: Run the above script on seqs.fna to get seqs.tab.

Input file (seqs.fna) Output file (seqs.tab) Screen Output
 >CG123 A small sequence
 ACGTTGCA
 GTTACCAG
 >EG12
 ACCGGA
 >DG124  A smaller sequence
 GTTACCAG
 CG123  A small sequence        ACGTTGCAGTTACCAG
 EG12   ACCGGA
 DG124   A smaller sequence     GTTACCAG
 
 Converted 3 FASTA records in 7 lines to tabular format
 Total sequence length: 30

Change a tabular file into FASTA format (change_tab_to_fasta)

Change each line in a three-column, tab-separated file (containing ID, description and sequence - e.g., a file created by the above change_tab_to_fasta tool) to FASTA sequence.

Note: translating to FASTA format and back will generate a file with the same information, but the files may not be identical. This tool will put a single space between the ID and the description, and will put 60 characters per line in the sequence portion.

Input file(s)
Output file

perl -e ' $len=0; while(<>) { s/\r?\n//; @F=split /\t/, $_; print ">$F[0]"; if (length($F[1])) { print " $F[1]" } print "\n"; $s=$F[2]; $len+= length($s); $s=~s/.{60}(?=.)/$&\n/g; print "$s\n"; } warn "\nConverted $. tab-delimited lines to FASTA format\nTotal sequence length: $len\n\n"; ' seqs.tab > seqs.fasta

Example: Run the above script on seqs.tab to get seqs.fasta.

Input file (seqs.tab) Output file (seqs.fasta) Screen Output
 CG123  A small sequence        ACGTTGCAGTTACCAG
 EG12           ACCGGA
 DG124   A smaller sequence     GTTACCAG
 >CG123 A small sequence
 ACGTTGCAGTTACCAG
 >EG12
 ACCGGA
 >DG124  A smaller sequence
 GTTACCAG
 
 Converted 3 tab-delimited lines to FASTA format
 Total sequence length: 30

Change from one biological format to another (change_bio_format_to_bio_format)

Change files with one or more sequences into a different format. The input and output formats can be embl, fasta, gcg, genbank, swiss, or a whole bunch of other formats: see

The Bioperl SeqIO HOWTO

for details.

Warning: Converting from genbank to FASTA (for example) will necessarily lose some annotation information.

This script requires Bioperl to be installed (on whichever machine the script runs on). Many biology computers will have it installed. If the script breaks because it "can't locate Bio/Perl.pm", you can download Bioperl from bioperl.org.

$informat Input file format
$outformat Output file format
Input file(s)
Output file

perl -MBio::SeqIO -e ' $informat="genbank"; $outformat="fasta"; $count = 0; for $infile (@ARGV) { $in = Bio::SeqIO->newFh(-file => $infile , -format => $informat); $out = Bio::SeqIO->newFh(-format => $outformat); while (<$in>) { print $out $_; $count++; } } warn "Translated $count sequences from $informat to $outformat format\n" ' myseqs.genbank > myseqs.fasta

Example: TODO

Change entire files

Transpose a table (change_transpose_table)

Change rows to columns and vice versa, for a tab-separated file. Data should have the same number of columns in every row.

Input file(s)
Output file

perl -e ' $unequal=0; $_=<>; s/\r?\n//; @out_rows = split /\t/, $_; $num_out_rows = $#out_rows+1; while(<>) { s/\r?\n//; @F = split /\t/, $_; foreach $i (0 .. $#F) { $out_rows[$i] .= "\t$F[$i]"; } if ($num_out_rows != $#F+1) { $unequal=1; } } END { foreach $row (@out_rows) { print "$row\n" } warn "\nWARNING! Rows in input had different numbers of columns\n" if $unequal; warn "\nTransposed table: result has $. columns and $num_out_rows rows\n\n" } ' original.tab > transposed.tab

Example: Transpose the table original.tab to get transposed.tab.

Input file
(original.tab)
 Top    Col2    Col3
 Row2   r2c2    r2c3
 Row3   r3c2    r3c3
 Row4   r4c2    r4c3
 Row5   r5c2    r5c3
Output file
(transposed.tab)
 Top    Row2    Row3    Row4    Row5
 Col2   r2c2    r3c2    r4c2    r5c2
 Col3   r2c3    r3c3    r4c3    r5c3
Screen Output
 
 Transposed table: result has 5 columns and 3 rows

Split big FASTA file into smaller files (change_split_fasta)

Split one big FASTA file into multiple smaller ones. If the output filename template is small_NUMBER.fasta, the output files will be called small_1.fasta, small_2.fasta, etc.

$split_seqs Number of sequences per output file
$out_template Template for output file name
Input file(s)

perl -e ' $split_seqs=3; $out_template="small_NUMBER.fasta"; $count=0; $filenum=0; $len=0; while (<>) { s/\r?\n//; if (/^>/) { if ($count % $split_seqs == 0) { $filenum++; $filename = $out_template; $filename =~ s/NUMBER/$filenum/g; if ($filenum > 1) { close SHORT } open (SHORT, ">$filename") or die $!; } $count++; } else { $len += length($_) } print SHORT "$_\n"; } close(SHORT); warn "\nSplit $count FASTA records in $. lines, with total sequence length $len\nCreated $filenum files like $filename\n\n"; ' big.fasta

Example: Split big.fasta, with five sequences, into two files, small_1.fasta and small_2.fasta. (Since there are only five sequences, the second file has only two sequences in it.)

Input file
(big.fasta)
 >seq1
 ACCTTGTCGCA
 >seq2
 ACCTTGTCGCAAAGC
 >seq3
 ACCTTGTCGCACCGGAACGA
 >seq4
 ACCTTGTCGCACCGGAACGACCGGAACGA
 >seq5
 GTCGCA
Output file 1
(small_1.fasta)
 >seq1
 ACCTTGTCGCA
 >seq2
 ACCTTGTCGCAAAGC
 >seq3
 ACCTTGTCGCACCGGAACGA
Output file 2
(small_2.fasta)
 >seq4
 ACCTTGTCGCACCGGAACGACCGGAACGA
 >seq5
 GTCGCA
Screen Output
 Split 5 FASTA records in 10 lines, with total sequence length 81
 Created 2 files like small_2.fasta


More Information

General Changing Notes

As always, when in doubt, check your output files after each step!

General Scriptome Notes

Scriptome tools are in blue or green boxes. Cut and paste the text of the tool into a terminal window. Then edit the line as needed. Things that will often need to be edited are highlighted in red. Input and output filenames will almost always need to be changed.

All scripts that work on tabular data assume the data is tab-separated. Use a Change script to change, e.g., comma-separated data to tab-separated before using these scripts.

When working with tabular data, remember that the first column is called column 0, the second column is column 1, etc. The last column can also be referred to as column -1, second-to-last column is -2, etc.

 

HomeContact UsDirectoriesSearch