TIPs encoding process
1. Prepare your data
Before submitting your sequence file for encoding, ensure that your data is pre-processed according to your specific needs and research goals. Follow these steps to optimize your data for encoding:
Clean and correct sequences: Remove any ambiguous bases and validate your sequences against a reference genome, assembly, or SRA database.
Handle invalid characters: TIPs will attempt to represent any non-DNA characters as ‘N,’ which can affect the representation of 6-mers where these characters are found.
File format: Your sequence data must be in FASTA format and saved as a plain text (.txt) file.
2. Open a request
For security and resource management, all users must open a request before submitting their sequence file. This ensures equal access to TIPs server resources and promotes responsible use.
Login or register: Use an institutional email (preferably one with ‘.edu’). Requests from commercial or personal email domains will be automatically denied.
Exception handling: If you do not have an institutional email, contact support or refer to the troubleshooting page for assistance.
Reference number: Once your request is approved, you will receive a unique reference number for tracking your encoding request and its associated output file.
3. Rename your file
Proper file naming is crucial for processing your submission correctly. Rename your text sequence file using the assigned reference number.
✅ Example: If your reference number is tips12345, rename your file as 'tips12345.txt'.
❌ Incorrect naming examples:
sequence_data.txt ❌
my_project.fasta ❌
tips12345.fasta ❌ (Use .txt format instead)
4. Upload your data
Once your file is properly formatted and renamed, proceed with the upload process:
Select cloud storage: Depending on system availability, you may be prompted to choose a preferred storage option.
Storage availability: If a selected storage type is unavailable, the system will notify you. Choose a different option to continue.
Follow submission instructions: Carefully read the upload guidelines and ensure compliance.
TIPs
Translator-Interpreter Pre-seeding, a family of encoding schemes for augmenting the representation of genetic sequences in machine learning.
© 2025. Marvin De los Santos. All rights reserved.