Contents: Click a blue triangle to expand or collapse a list
The Scriptome is a toolbox helping biologists explore and manipulate their
data. For more information, see the Overview page.
An atom is a single Scriptome tool. It's called an atom because it does
one, small thing. Also, because atoms can be combined to make molecules
(which we, breaking our analogy completely, call protocols).
Atoms can be found in the Tools section of the navigation bar on the left.
A protocol is an ordered set of Scriptome atoms which can solve a larger
problem. The analogy is to a biological protocol (such as those at
http://www.genome.ou.edu/protocol_book/protocol_index.html), which
is an ordered set of actions that solve a larger biological problem.
Protocols can be found in the Protocols section of the
navigation bar on the left, although users will often want to build
their own protocols.
You don't! You just cut and paste. Windows users without access to a
UNIX or Mac will need to download Perl (free, one-click install) from
http://activestate.com/Products/ActivePerl/.
First, choose whether you want the Windows or non-Windows tools. Find the right
tool, change parameters and filenames in the form as desired, cut and paste
the text in the colored box onto a Windows, UNIX, or Mac command line,
and hit Enter. For more information, see the Help page.
Each tool is a command that runs Perl. Rather than running
Perl programs (like "perl blah.pl infile > outfile"), we use the -e
option
to write a short Perl program right on the command line, also known as
a "Perl one-liner". So each tool is usually in the form
perl -e 'Perl code' infile(s) > outfile
The '>'
sends the output from running the Perl code on the input file(s)
to the given outfile instead of the screen.
Choose the Windows site when you first start, or use the links in the very
top left of the page to switch between Unix/Mac and Windows. (Linux
and Mac count as UNIX.) This will change the color of the navigation bar and
the light backgrounds for the code you cut and paste. UNIX is blue, Windows
is green.
See the information on the different
Scriptome mailing lists.
Bug reports and other feedback options
are described on the Resources page.
Send email to akarger@cgr.harvard.edu with a detailed request. We won't
guarantee that we'll create it, but if several people ask for similar things,
we're more likely to create them. You might also want to discuss it on the
mailing list.
Send email to akarger@cgr.harvard.edu including code, sample input
and output files, and clear, concise documentation.
It's frustrating for programmers to see non-programming biologists spending
hours to edit files by hand (or even abandoning certain avenues of analysis),
when a short program can do the work in a few seconds. Yet more and more
questions require high-throughput analysis that makes some programming
necessary. (The Scriptome can also be viewed as a repository of frequently
requested scripts, so that we don't have to write the same script each
time a different biologist asks for it.)
See the Principles page for more information on why
certain design decisions were made (including hints to many questions below).
The Scriptome project is essentially a project to give non-programmers
some of the power of programming, without forcing them to learn a (classic)
programming language. This is a field of current research
in computer science, and we're hoping to use (steal) some computer scientists'
ideas on how to do this more effectively in the future.
Any interface, no matter how "intuitive", is another interface users need
to learn. Since users may be using the Scriptome only occasionally, why not
just use cut-and-paste? Also, given that we may need to write many tools
to solve diverse and evolving needs, a text/web interface allows much
faster tool development. Finally, the current interface requires no
downloads or installs, and is generally platform- and browser-independent.
That said, these tools are simple enough that they can be wrapped inside
graphical tools. We're working on some ideas....
We're all for it! However, many biologists - especially those who need to
use computers for analysis only occasionally - do not consider it worth
the time to learn programming. Alternatively, the Scriptome can be viewed
as a first step towards programming: it can help biologists learn about
problem decomposition, high-throughput analysis, and debugging, while
avoiding the additional barrier of learning syntax and other programming
techniques (variables, arrays, hashes, regular expressions...); and it
provides short, working examples for student programmers to read and tweak.
Yes, many of these tools can be implemented with UNIX tools like grep,
cut, paste, join, awk, sed.... However: UNIX tools tend to give "concise"
feedback; they tend to have many, many options; and shell scripting can be
even more cryptic than Perl. It's possible to imagine a Scriptome with
cut-and-paste pure UNIX commands instead of Perl, which would get around some
of these problems. But using Perl means we can also write biology-specific
tools pretty easily (possibly stealing them from Bioperl).
Many biologists do use Excel, but they might not use fancy functions
or macros. The interface may not work for occasional users, or they may feel
that learning an "advanced" program like Access is a waste of time. And while
Excel and Access can do many of the jobs the Scriptome does, they are not
designed to handle FASTA formats, for example. (Excel 2003 will also choke on
files with more than 65535 rows, and won't merge files without VB programming.)
Perl was specifically designed for "data munging" - formatting,
filtering, etc. - which is why it's so commonly used for bioinformatics.
It was also designed to allow fast code development, which is important
given the diverse and changing questions biologists will be asking.
We can take advantage of existing tools, most obviously Bioperl.
And Perl is automatically available on every UNIX/Mac computer, so no install
is needed.
The Windows command line expects you to say perl -e "blah"
while the UNIX
shell expects perl -e 'blah'
. So we have to change single quotes to double.
But then any double quotes within the script would confuse Windows, so, e.g.,
instead of print "blah"
we use the more mystical print qq~blah~
.
Boring and annoying, but necessary.
The Scriptome was developed by the Computational Biology Group at Harvard
University's Bauer Center for Genomics Research. (Follow links at the top of
the page.)
-
Amir Karger is the main developer.
-
Jason Konrad is a non-biologist working on the Perl side of the Scriptome.
-
Eitan Rubin, who provided many ideas about the principles is now at
Ben Gurion University, working on embedding the Scriptome in Excel.
-
Chris Botka, who provided initial ideas for the Scriptome, is now
at the Joslin Diabetes Center.
-
Professor Rob Miller at MIT provided many user interface improvements.
-
Many others provided ideas for tools or protocols.
We hope to have a real credits page someday.
Believe it or not, we're providing these tools for you absolutely free!
On the other hand, if you have some money available to fund objectives like
ours (creating tools for non-programming scientists, giving non-programmers
some of the power of programming), send an email to akarger@cgr.harvard.edu.
We have several zillion ideas. We might decide to change the interface if
too many biologists dislike the command line. There are plenty
of tools and protocols to write. We would like to add an "explain this"
button to each tool, so that Perl students can get a detailed explanation
of how the tools work.