bash - trim sequences and quality in fastq file -


i have bunch of fastq files in directory , want trim sequence 2 nucleotides , quality(if read has 51 base pairs , ends-with ctg or ttg).

here wrote shell script getting errors,need new shell scripting

input:

@hwi-st1072:187:c35yuacxx:7:1101:1609:1983 1:n:0:acagtg nggagaaagagagtgtgtttttagggggagatttttaaaatggttgttttg + #0<bfffffffff<bfffiifffffiiibfffffiifiiiiiffbffffff @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttattcgggaggttgagctg + #0<bffffffffffiibffiiiiiifiiiffiifiiifiifiiffffiiff @hwi-st1072:187:c35yuacxx:7:1101:9351:2210 1:n:0:acagtg cggttttgttttattttgtatgattaggagggttttggaggtttagttacc + bbbffffffffffiiiiiffiifiiiiiiiiiffiififiiffiiifiiii @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttat + #0<bffffffffffiibffiiiiiifiiiffiifi 

output:

@hwi-st1072:187:c35yuacxx:7:1101:1609:1983 1:n:0:acagtg nggagaaagagagtgtgtttttagggggagatttttaaaatggttgttt + #0<bfffffffff<bfffiifffffiiibfffffiifiiiiiffbffff @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttattcgggaggttgagc + #0<bffffffffffiibffiiiiiifiiiffiifiiifiifiiffffii @hwi-st1072:187:c35yuacxx:7:1101:9351:2210 1:n:0:acagtg cggttttgttttattttgtatgattaggagggttttggaggtttagttacc + bbbffffffffffiiiiiffiifiiiiiiiiiffiififiiffiiifiiii @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttat + #0<bffffffffffiibffiiiiiifiiiffiifi 

script:

for sample in *.fastq;do     name=$(echo ${sample} | sed 's/.fastq//')     while read line;do         if [ ${line:0:1} == "@" ] ;                 head="${line}"                 $echo $head         elif [ "${head}" ] && [ "${line}" ] ;                 length=${#line}                 if [ "${length}" = 51 -a "${line}" =~ *ctg|*ttg ] ;                         sequence= substr($line,0,49)                         #echo $sequence                 fi         elif [ ${line:0:1} == "+" ] ;                 plus="${line}"                 #echo $plus         elif [ "${plus}" ] && [ "${line}" ] ;                 quality= substr($line,0,49)                 #echo $quality         fi         print "${head}\n${sequence}\n${plus}\n${quality}" > ${name}_new.fq    done < $sample done 

don't 100% understand you're doing, fixed few things. try below

#!/bin/bash sample in *.fastq;   name="${sample/.fastq/}"   while read -r line;     if [[ $line == '@'* ]];       head="$line" && echo "$head" >> "${name}_new.fq"     elif [[ -n $head && ${#line} == 51 && $line =~ (ctg|ttg)$ ]];       sequence="${line:0:49}" && echo "$sequence" >> "${name}_new.fq"     elif [[ $line == '+'* ]];       plus="$line" && echo "$line" >> "${name}_new.fq"     else       quality="$line" && echo "$quality" >> "${name}_new.fq"     fi   done < "$sample" done 

example output

> cat sample_new.fq  > cat sample.fastq @hwi-st1072:187:c35yuacxx:7:1101:1609:1983 1:n:0:acagtg nggagaaagagagtgtgtttttagggggagatttttaaaatggttgttttg + #0<bfffffffff<bfffiifffffiiibfffffiifiiiiiffbffffff @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttattcgggaggttgagctg + #0<bffffffffffiibffiiiiiifiiiffiifiiifiifiiffffiiff @hwi-st1072:187:c35yuacxx:7:1101:9351:2210 1:n:0:acagtg cggttttgttttattttgtatgattaggagggttttggaggtttagttacc + bbbffffffffffiiiiiffiifiiiiiiiiiffiififiiffiiifiiii @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttat + #0<bffffffffffiibffiiiiiifiiiffiifi  > ./abovescript  > cat sample_new.fq @hwi-st1072:187:c35yuacxx:7:1101:1609:1983 1:n:0:acagtg nggagaaagagagtgtgtttttagggggagatttttaaaatggttgttt + @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttattcgggaggttgagc + @hwi-st1072:187:c35yuacxx:7:1101:9351:2210 1:n:0:acagtg cggttttgttttattttgtatgattaggagggttttggaggtttagttacc + bbbffffffffffiiiiiffiifiiiiiiiiiffiififiiffiiifiiii @hwi-st1072:187:c35yuacxx:7:1101:1747:1995 1:n:0:acagtg nggttgtggtggtgggtatttgtagttttatttat + 

Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -