a versatile tool for masking and exploring of sequences from plant species
FAQ
We provide hands-on training for Kmasker plants available as DOI (https://doi.org/10.5447/ipk/2019/17). It includes input and output data as well as all commands used to generate these. Updated versions of the command files are available on our GitHub project page under the examples section. In addition, the GitHub repository includes documentation of the tool (https://github.com/tschmutzer/kmasker).
To avoid inconvenient run times, plotting is deactivated for datasets that contain >100 sequence IDs. Use a subset of your input data or use a local installation of the Kmasker plants command line version.
This can have multiple reasons. Contact us for details. BUT there are some reasons that were observed a few times causing failures. First, please do not upload sequences in Microsoft Word documents (*.doc or *docx) into Kmasker. Sequences have to be in FASTA format. Second, only the four base characters ‘A’, ‘C’, ’G’ and ’T’ and additionally ‘N’ are permitted as characters in your input sequence. Make sure your input is fulfilling this criterion.
Please adjust the threshold for sequence length of the output. The default (100 bp) might discard your primer sequence.
No, please adjust your input before running Kmasker.
No, the frequency values in the OCC files are already normalized to the corresponding WGS sequencing depth.
All positions with frequency values larger than the ‘repeat threshold’ parameter will be masked by ‘X’ in the output file. In consequence, the setting of the repeat threshold (default: 5) is directly influencing the masking. Large repeat thresholds will decrease the number of X since only highly repetitive regions will be masked. On the other hand, if small values are used the proportion of X will increase.
This parameter is used as a filter after the index has been applied to your sequence and a frequency is computed for all sequence positions. The filter selects regions from the complete set of positions with a low frequency that are larger than the provided length. The shorter the ‘length threshold’ the more results will pass the filter. If you require long sequences (e.g. for FISH) you might increase this threshold.
No, the limitation is the length of your input sequence.
The error rate, in general, is not affected by the length threshold. You might just not get a sequence output if your setting is too ambitious.
You can provide FASTA sequences including your individual curated libraries of repeats. Kmasker will perform a blastn search and assign this information to repeats detected in your input data (see resulting GFF file). This option only is available in the local command line installation of Kmasker plants.
If you use most recent methods published in 2019, please cite Beier et al. 2019 (The Plant Journal):
Link to publication: https://onlinelibrary.wiley.com/doi/10.1111/tpj.14645
DOI: https://doi.org/10.1111/tpj.14645
Original methods are described in Schmutzer et al. 2014:
Link to publication: https://www.karger.com/Article/FullText/356460
DOI: https://doi.org/10.1159/000356460