bash - trim sequences and quality in fastq file -
i have bunch of fastq files in directory , want trim sequence 2 nucleotides , quality(if read has 51 base pairs , ends-with ctg or ttg).
here wrote shell script getting errors,need new shell scripting
input:
@hwi-st1072:187:c35yuacxx:7:1101:1609:1983 1:n:0:acagtg nggagaaagagagtgtgtttttagggggagatttttaaaatggttgttttg + #0<bfffffffff<bfffiifffffiiibfffffiifiiiiiffbffffff @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttattcgggaggttgagctg + #0<bffffffffffiibffiiiiiifiiiffiifiiifiifiiffffiiff @hwi-st1072:187:c35yuacxx:7:1101:9351:2210 1:n:0:acagtg cggttttgttttattttgtatgattaggagggttttggaggtttagttacc + bbbffffffffffiiiiiffiifiiiiiiiiiffiififiiffiiifiiii @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttat + #0<bffffffffffiibffiiiiiifiiiffiifi
output:
@hwi-st1072:187:c35yuacxx:7:1101:1609:1983 1:n:0:acagtg nggagaaagagagtgtgtttttagggggagatttttaaaatggttgttt + #0<bfffffffff<bfffiifffffiiibfffffiifiiiiiffbffff @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttattcgggaggttgagc + #0<bffffffffffiibffiiiiiifiiiffiifiiifiifiiffffii @hwi-st1072:187:c35yuacxx:7:1101:9351:2210 1:n:0:acagtg cggttttgttttattttgtatgattaggagggttttggaggtttagttacc + bbbffffffffffiiiiiffiifiiiiiiiiiffiififiiffiiifiiii @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttat + #0<bffffffffffiibffiiiiiifiiiffiifi
script:
for sample in *.fastq;do name=$(echo ${sample} | sed 's/.fastq//') while read line;do if [ ${line:0:1} == "@" ] ; head="${line}" $echo $head elif [ "${head}" ] && [ "${line}" ] ; length=${#line} if [ "${length}" = 51 -a "${line}" =~ *ctg|*ttg ] ; sequence= substr($line,0,49) #echo $sequence fi elif [ ${line:0:1} == "+" ] ; plus="${line}" #echo $plus elif [ "${plus}" ] && [ "${line}" ] ; quality= substr($line,0,49) #echo $quality fi print "${head}\n${sequence}\n${plus}\n${quality}" > ${name}_new.fq done < $sample done
don't 100% understand you're doing, fixed few things. try below
#!/bin/bash sample in *.fastq; name="${sample/.fastq/}" while read -r line; if [[ $line == '@'* ]]; head="$line" && echo "$head" >> "${name}_new.fq" elif [[ -n $head && ${#line} == 51 && $line =~ (ctg|ttg)$ ]]; sequence="${line:0:49}" && echo "$sequence" >> "${name}_new.fq" elif [[ $line == '+'* ]]; plus="$line" && echo "$line" >> "${name}_new.fq" else quality="$line" && echo "$quality" >> "${name}_new.fq" fi done < "$sample" done
example output
> cat sample_new.fq > cat sample.fastq @hwi-st1072:187:c35yuacxx:7:1101:1609:1983 1:n:0:acagtg nggagaaagagagtgtgtttttagggggagatttttaaaatggttgttttg + #0<bfffffffff<bfffiifffffiiibfffffiifiiiiiffbffffff @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttattcgggaggttgagctg + #0<bffffffffffiibffiiiiiifiiiffiifiiifiifiiffffiiff @hwi-st1072:187:c35yuacxx:7:1101:9351:2210 1:n:0:acagtg cggttttgttttattttgtatgattaggagggttttggaggtttagttacc + bbbffffffffffiiiiiffiifiiiiiiiiiffiififiiffiiifiiii @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttat + #0<bffffffffffiibffiiiiiifiiiffiifi > ./abovescript > cat sample_new.fq @hwi-st1072:187:c35yuacxx:7:1101:1609:1983 1:n:0:acagtg nggagaaagagagtgtgtttttagggggagatttttaaaatggttgttt + @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttattcgggaggttgagc + @hwi-st1072:187:c35yuacxx:7:1101:9351:2210 1:n:0:acagtg cggttttgttttattttgtatgattaggagggttttggaggtttagttacc + bbbffffffffffiiiiiffiifiiiiiiiiiffiififiiffiiifiiii @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttat +
Comments
Post a Comment