learn_model is a program within the sortseq_tools package which generates linear energy matrix models for sections of a sorted library.
After you install `sortseq_tools`_, this program will be available to run at the command line.
usage: sortseq learn_model [-h] [-s START] [-e END] [-t {dna,rna,protein}]
[-et {sortseq,mpra,selection}]
[-lm {least_squares,lasso,MImax}]
[--initialize {Rand,LeastSquares}] [-rn RUNNUM]
[-db DB_FILENAME] [-iter NUMITERATIONS] [-b BURNIN]
[-th THIN] [-i I] [-o OUT]
| -s=0, --start=0 | |
| Position to start your analyzed region | |
| -e, --end | Position to end your analyzed region |
| -t=dna, --type=dna | |
Undocumented Possible choices: dna, rna, protein | |
| -et=sortseq, --exptype=sortseq | |
Type of experiment. Possible choices: sortseq, mpra, selection | |
| -lm=least_squares, --learningmethod=least_squares | |
Algorithm for determining matrix parameters. Possible choices: least_squares, lasso, MImax | |
| --initialize=Rand | |
How to choose starting point for MCMC Possible choices: Rand, LeastSquares | |
| -rn=0, --runnum=0 | |
| For multiple runs this will change output data base file name | |
| -db, --db_filename | |
| For MImax, If you wish to save the trace in a database, put the name of the sqlite data base | |
| -iter=30000, --numiterations=30000 | |
| For MImax, Number of MCMC iterations | |
| -b=1000, --burnin=1000 | |
| For MImax, Number of burn in iterations | |
| -th=10, --thin=10 | |
| For MImax, this option will set the number of iterations during which only 1 iteration will be saved. | |
| -i=False, --i=False | |
| Read input from file instead of stdin | |
| -o, --out | Undocumented |
The input table to this program must contain a sequences column and counts columns for each bin. For a sort seq experiment, this can be any number of bins. For MPRA and selection experiments this must be ct_0 and ct_1.
Example Input Table:
seq ct_0 ct_1 ct_2 ...
ACG 1 5 7
GGT 8 5 5
...
Example Output Table:
pos val_A val_C val_G val_T
0 .04 -.3 -.2 .15
1 .2 .1 -.44 .05
...