simulate_sublibrary

Overview

simulate_sublibrary is a program which takes a library file and generates a sublibrary file. A sublibrary file has only a small number of unique sequences (200-300 usually) and many copies of each sequence. The sublibrary is useful for using as a test data set to test the performance of your model. It is also useful for using totalinfo to calculate the best possible performance of any model.

After you install `sortseq_tools`_, this program will be available to run at the command line.

Command-line usage

usage: sortseq simulate_sublibrary [-h] [-d {constant,exponential}]
                                   [-ns NUMSUB] [-nc NUMCOPIES]
                                   [-dc DECAYCONSTANT] [-t {dna,rna,protein}]
                                   [-o OUT]
Options:
-d=constant, --disttype=constant
 

how should the number of times each sequence appears be distributed?

Possible choices: constant, exponential

-ns=200, --numsub=200
 Number of Different SubLib Sequences
-nc=1000.0, --numcopies=1000.0
 Number of times to measure a given sequence, if an exponential distrubtion type is selected. This will be the maximum number of seqs
-dc=100, --decayconstant=100
 If exponential distribution is selected, the decay constant of the distribution
-t=dna, --type=dna
 

Undocumented

Possible choices: dna, rna, protein

-o, --out Undocumented

Example Input and Output

The input should be a sequence library.

Example Input Table:

seq    ct
ATTAG  1
ACCTA  15
GGATT  9
...

Example Output Table:

seq    ct
ATTAG  6000
AGGAT  6000
...

By default, each chosen sequence is given a uniform number of counts.

Example command to perform the analysis:

sortseq simulate_sublibrary -i my_library.txt -o my_sublibrary.txt

Table Of Contents

Previous topic

predictiveinfo

Next topic

simulate_evaluate

This Page