New
MiXCR v4.7.0
❗ Breaking changes
- Starting from version 4.7.0 of MiXCR, users are required to specify the assembling feature for all presets in cases where it's not defined by the protocol. This can be achieved using either the option
--assemble-clonotypes-by [feature]or--assemble-contigs-by [feature]for fragmented data (such as RNA-seq or 10x VDJ data). This ensures consistency in assembling features when integrating various samples or types of samples, such as 10x single-cell VDJ and AIRR sequencing data, for downstream analyses like inferring alleles or building SHM trees. The previous behavior for fragmented data, which aimed to assemble as long sequences as possible, can still be achieved with either the option--assemble-contigs-by-cellfor single-cell data or--assemble-longest-contigsfor RNA-seq/Exom-seq data.
🚀 Major fixes and upgrades
- Fixed
assemblebehavior for single-cell data, before the fix, in rare cases consensuses were assembled from reads coming from different cells. Now reads from different cells are strictly isolated. - Significant improvement of V genes assignment precision. To facilitate this improvement
assembleandassembleContigssteps now have individualrelativeMeanScoreandmaxHitsparameters. - Improved robustness against expression level differences between TCR/IG chains. Consensus assembly in
assemblenow is performed separately for each chain. This change is specifically important for single-cell presets with cell-level assembly (most of the MiXCR presets for single-cell data). - Now options
--dont-correct-tag-with-name <tag_name>or--dont-correct-tag-type (Molecule|Cell|Sample)can be specified to skip tag correction. It will trade off some analysis quality and error correction performance, for significantly lower memory and analysis time requirements, in deeply sequenced datasets with many Cell and Molecular barcodes. - Ability to trigger realignments of left or right reads boundaries with global alignment algorithm using parameters
rightForceRealignmentTriggerorleftForceRealignmentTriggerin cases where reads do not cover the CDR3 regions (rescue alignments in case of fragmented data, like single-cell). - MiTool-based contig pre-assembly step integrated into
10x-sc-xcr-vdjpreset, significantly improving overall analysis performance.
🛠️ Other improvements & fixes
- Default input quality filter in
assemble(badQualityThreshold) stage was decreased to 10, improving total analysis yield - Added validation for
assembleCellsthat input files should be assembled by fixed feature - Export of trees and tree nodes now support imputed features
- Fixed parsing of optional arguments for
exportShmTreesWithNodes:-nMutationsRelative,-aaMutations,-nMutations,-aaMutationsRelative,-allNMutations,-allAAMutations,-allNMutationsCount,-allAAMutationsCount. - Fixed parsing of optional arguments for
exportClonesandexportAlignments:-allNMutations,-allAAMutations,-allNMutationsCount,-allAAMutationsCount. - Fixed possible errors on exporting amino acid mutations in
exportShmTreesWithNodes - Fixed list of required options in
listPresetscommand - Fixed error on building trees in case of
JBeginTrimmedstarted beforeCDR3Begin - Fixed usage
--remove-step qc - Added
--remove-qc-checkoption - Remove
-topChainsfield fromexportShmTreesWithNodescommand. Use-chainsinstead - Removed default splitting clones by V and J for presets where clones are assembled by full-length.
- Fixed
NullPointerExceptionin some cases of building trees by SC+bulk data - Fixed
java.lang.IllegalArgumentException: While adding VEndTrimmedinexportClones - Fixed combination trees step in
findShmTrees: in some cases trees weren't combined even if it could be - Fixed
NoSuchElementExceptionin some cases of SC combining of trees - Fixed export of
-jBestIdentityPercentinexportShmTreesWithNodes - Added validation on export
-aaFeaturefor features containing UTR - Fixed usage of command
exportPlots shmTrees - Fixed topology of trees: before common V and J mutations were included in the root node, now root includes only reconstructed NDN. Previous behavior lead to underestimated distance from the germline. Now sequence for the germline exports with common mutations. To fully apply the fix to previously analyzed data, rerun the pipeline starting from
🧬 Reference gene library changes
- IG reference for new species:
- Rabbit (IGH, IGK, IGL)
- Sheep (IGH, IGK, IGL)
- Human reference corrections:
- Duplicated entries removed:
IGHV1-69*00,IGHV1-69*01,IGHV3-23*00,IGHV3-23*01 - Fix for
CDR3Beginposition inIGHV4-30-4 - Fix for
FR1Beginposition inTRBV21-1 - Names of the following human
TRAVgenes were changed:TRAV14DV4->TRAV14/DV4TRAV23DV6->TRAV23/DV6TRAV29DV5->TRAV29/DV5TRAV36DV7->TRAV36/DV7TRAV38-1DV8->TRAV38-1/DV8
- Duplicated entries removed:
- Correct mapping of V-gene UTRs in Alpaca reference
📚 New Presets
- Added preset
takara-mouse-rna-bcr-umi-smarseqfor new Takara SMART-Seq Mouse BCR (with UMIs) kit - Added preset
idt-human-rna-bcr-umi-archerandidt-human-rna-tcr-umi-archerfor IDT Archer kits - Presets for Cellecta kits that include TCR/BCR Spike-in mix QC metrics:
cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-1-1-1,cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-16-4-1,cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-1-1-1,cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-16-4-1