Repeat Cleaner
Objective of the Script: This tool is designed to enhance the quality of retrovirus libraries. It focuses on reducing recombination caused by template switching, a phenomenon influenced by the number and length of repeat sequences in a nucleotide sequence. By employing high-quality synonymous codon replacements in repetitive regions, the script maintains the integrity of the open reading frame (ORF) while minimizing repeat sequences. This is particularly useful in the context of human codon usage. Use this tool for cDNA less than 1,500 nucleotides.
How to Use the Tool:
1. Input Your Sequence:
- Enter your open reading frame (ORF) nucleotide sequence in the provided text area.
- Ensure that only DNA sequences are inputted.
- Ensure that your cDNA is short, less than 1,500 nt.
- Preferably, codon-optimize your sequence before using this tool.
2. Set Analysis Parameters:
- Choose the minimum word size for the repeat sequences you want to analyze.
- Set the minimum count for word frequency analysis.
- When using long DNA sequence inputs, avoid choosing repeat lengths less than 8 nt.
3. Processing Your Sequence:
- Click the 'Analyze' button to start the process.
- The script will translate your sequence, and analyze the DNA for repeat word sizes (as per your set minimum) and frequency.
4. Codon Replacement Strategy:
- The script will identify repetitive codon sequences and replace the first instance (starting from the 5’-end) of each repeat with synonymous codons.
- This replacement uses the next highest frequency human codon, ensuring the amino acid sequence remains unchanged.
- The result is a new ‘cleaned’ sequence with fewer repeats.
5. Iterative Improvement:
- You can re-run the improved sequence through the tool to potentially reduce repeats further.
- Be aware that new repeats might be uncovered in this process, and some repeats might be challenging to eliminate due to the limited variability of certain codons (e.g., ATG, TGG, etc).
Manual Review and Tweaking:
- It's important to manually review the changes post-processing.
- Make additional adjustments as necessary to optimize your sequence.
Note: This tool is a powerful way to refine your sequences, but it is not infallible. Success depends on available codon alternatives, the length, and the specific structure of your sequence. It is also experimental and may make mistakes so please use at your own risk.
Minimum Word Size
Minimum Count
Click the button to identify and clean your repeats.