The Scriptome Manifesto

The Scriptome project vision is to help biologists manipulate and explore their data. This page describes design principles for a solution that is effective, easy to learn and remember, and inexpensive to develop and maintain. It also discusses more generally the theory of equipping non-programmers with programming tools. (The principles are even more general if you replace "biologists" below with "scientists".) If you just want to know how to use the Scriptome, you'll probably be more interested in the Help or Overview pages.

Help biologists manipulate data
Biologists may know exactly which small pieces of data they want, but may not be able to find them in numerous, large data files. Or they may want to switch data between two known formats, but editing the files by hand is too time-consuming. Help biologists perform these clearly defined data manipulations.
Help biologists explore data
With the vast amount of raw, uninterpreted data available, sometimes the best way to learn about data is to play with it. Help biologists explore their data, quickly assessing their results, trying different avenues of analysis, and finding connections between data sets.
Present an interface appropriate for occasional, non-programmer users
Provide an interface that stresses simplicity over power, enabling biologists to find and use the right tools. The interface should be easy to learn - and easy to remember after a month working in the lab. (For example, leverage biologists' familiarity with web or other computer tools, rather than building a fancy, "intuitive" interface. If possible, do not require them to remember command names or parameters.) Simplifying installation and upgrade will also make the system more attractive to non-programmers.
Solve "easy" problems
Many tools exist to perform complex bioinformatics algorithms, but there are fewer tools available for reformatting or filtering data. Although data manipulation may be trivial for experienced programmers, it can present a major obstacle for non-programming scientists.
Do a little at a time
A large program is unlikely to solve every biologists' need, even if it has many, complex parameters. Exploring data is also easier when each step is small: intermediate results can be checked, and it is easier to retrace steps if the biologist goes down the wrong path. Borrow the Unix philosophy of creating many, small tools, each of which performs a small task (but provide only one or two parameters, to avoid confusion).
Keep the biologist in the loop
Don't try to provide complete solutions for everything a biologist will want to do. Biologists know their problems better than anyone else, so give them tools to work on these problems by themselves. If tools are small and flexible, biologists can string them together to solve their diverse, evolving problems.
Take advantage of existing tools
Complement existing tools rather than replacing them. For example, Bioperl has a huge amount of functionality with an accessible API. This will be much simpler than creating a huge set of extra tools from scratch. (On the other hand, use only "standard" tools, so scientists don't need to install packages.)
Encourage development of programming skills
Biologists who do not program should still be able to reformat or filter their data. But many problems can realistically be solved only by using full-fledged programming techniques, such as looping, problem decomposition, and (process) debugging. Create a high-level and domain-specific language, to help biologists learn these skills while (mostly) avoiding ugly details of language syntax.
Catalyze the process of learning programming
One of the most effective ways of learning a language is to read and then tweak working code - especially when it solves problems that interest the student. Provide scientists with short, working, documented source code. This will help them learn programming without taking time off from their research - all while obtaining useful results on their own data manipulation problems.
Create a system that can evolve
New tools will constantly be needed to handle the diverse and changing array of questions biologists are asking (not to mention new data formats). Developing new tools and making them available to users must be simple and fast. Soliciting code from the bioinformatics community will speed development; soliciting input from the greater biology community will guide development of relevant tools.

The Frequently Asked Questions list describes how the particular implementation chosen for the Scriptome meets these design principles.