Dharsy D. Rodriguez Velazquez

Contact info:

Bio

Social Sciences student-researcher at Computer Sciences Department. IDGeNe student program at the University of Puerto Rico, Rio Piedras Campus. During the time I will be working with my peers and the Profesor in Bioinformatic of gene expression.

Research Goals

To combine social sciences, public health (Genetics Epidemiology), Sociogenomics in order to ubderstand societal problems.

Research Abstract

During the expression of genes in tissues there is cell heterogeneity which causes a massive amount of voluminous information about the normal cells and anomalous cells. Bioinformatics is a field that can provide tools to analyze large data sets. In bioinformatics, computer technology is the main protagonist, it helps to analyze and translate biological data. Hence, we will be using this field to evaluate the differential expression in cancer tissues of humans, using publicly available datasets. We selected samples of homo sapiens colon colorectal tumor core, and adjacent non-malignant colon tissue to study. Our goal is to identify gene patterns, with this in mind the method that will be used to process the data is Binarization, especially the Binary Differential Analysis (BDA). The Binary Differential Analysis has the advantage of reproducibility of data, it reduces the amount of information and detects biologically relevant genes. BDA collects the genes that were expressed as a non-zero value which was assigned a one, if the gene did not express it will be a zero value.

  1. Gerard A. Bouland, Ahmed Mahfouz, Marcel J.T. Reinders. The rise of sparser single-cell RNAseq datasets; consequences and opportunities. BioRxiv 2022.05.20.492823; doi: https://doi.org/10.1101/2022.05.20.492823

  2. Satija, R., Farrell, J.A., Gennert, D., Schier, A.F. and Regev, A. (2015) Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. Retrieved from: https://doi.org/10.1038/n bt.3192

  3. He D, Zakeri M, Sarkar H, Soneson C, Srivastava A, Patro R. (2022) Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data. Nat Methods. Mar;19(3):316-322. doi: 10.1038/s41592-022-01408-3. Epub 2022 Mar 11. PMID: 35277707; PMCID: PMC8933848.

Weekly UPDATES

June 5-9, 2023

-IDGeNe first meeting -Meet & Greet All Summer Programs -Workshop: Mindsets -Workshop: Lab Etiquette and Lab Safety -Workshop: Bioinformatics I and II -Workshop: Software Carpentry -Workshop: Sexual Harrasment -Mentoring: Dr. Adriana Baez --Mentor-Mentee_Align_Expectations (WORD) -Laboratory Work: We prepared a work plan for the summer: +Cancer data +Binarization +Single cell information +Request account for Boqueron cluster +We discussed the rules of the laboratoy:MegaProbe

June 12-16, 2023

-Created Github, Anaconda account +Workshop: Bioinformatic III -Educational Talk: Genomics and Society (Dr. Gerardo Arroyo: La Genómica y su impacto en las Ciencias Forenses) +Meet the Mentors Dr. Tugrul Giray +Workshop: Software Carpentry +Educationall Talk: Impostor Syndrome +Workshop: Software Carpentry +Genomic Seminar:Evolution and development of patterns in butterfly wings -Workshop:Ethics -Movie Night: Justice, Equity, Diversity and Inclusion -Educational Talk: Implicit Bias -Techniques and Journal Club- T. Giray -Deliverable: One Page Research Summary

/home/drodriguez1 wget https://repo.anaconda.com/miniconda/Miniconda3-py311_23.5.2-0-Linux-x86_64.sh at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:338) drodriguez1@hulk:~$ conda create -n fastqc -c bioconda -c conda-forge fastqc See https://ubuntu.com/esm or run: sudo pro status Enable ESM Apps to receive additional future security updates.history To see these additional updates run: apt list --upgradable 3 updates can be applied immediately. Expanded Security Maintenance for Applications is not enabled. Welcome to Ubuntu 22.04.2 LTS (GNU/Linux 5.19.0-46-generic x86_64) conda config --set channel_priority strict (fastqc) drodriguez1@hulk:~$ conda config --add channels defaults

To activate this environment, use 222 # 223 # $ conda activate fastqc 224 # 225 # To deactivate an active environment, use 226 # 227 # $ conda deactivate 228 (base) drodriguez1@hulk:~$ fastqc 229 Command 'fastqc' not found, but can be installed with: 230 apt install fastqc 231 Please ask your administrator. 232 (base) drodriguez1@hulk:~$ conda activate fastqc 233 (fastqc) drodriguez1@hulk:~$ fastqc 234 Exception in thread "main" java.awt.HeadlessException: 235 No X11 DISPLAY variable was set, but this program performed an operation which requires it. 236 at java.desktop/java.awt.GraphicsEnvironment.checkHeadless(Graphi csEnvironment.java:208) 237 at java.desktop/java.awt.Window.(Window.java:548) 238 at java.desktop/java.awt.Frame.(Frame.java:423) 239 at java.desktop/java.awt.Frame.(Frame.java:388) 240 at java.desktop/javax.swing.JFrame.(JFrame.java:180) 241 at uk.ac.babraham.FastQC.FastQCApplication.(FastQCApplicati on.java:63) 242 at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication

June 19-24, 2023

-Educational Talk: Genomics and Society Dr. A. Santiago-Cornier Genomic Medicine -Meet the Mentors Dr. Jose Rodríguez Martínez -Laboratory work: Read articles: --Why Binarize? -- Data and Software --Cancer Data -Table data of colorectal cancer paper: [2023-06-19 Mon 13:07] Cancer data for scRNAseq --Workshop: IDP -Techniques: Rodriguez-Martinez -Movie Night: Human Nature -Workshop: Science Communication Dr. Kevin Alicea -Journal club: Rodriguez-Martinez -Workshop: Replicathon

June 26-30, 2023

-Laboratory Work: TCGA atlas, GRCh38, review Genetics concepts,GEO data, Array Express data, normal tissue and tumour core, R1_R2_Index, WingSCP, pipeline.

(base) drodriguez1@hulk:~$ ~ wget-ci~/archivo-2 (base) drodriguez1@hulk:~$ wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR116/ERR11657684/scrEXT001_hg19_S9_L008_I1_001.fastq.gz
scp boqueron.hpcf.upr.edu:/work/humberto/drodriguez1/align_data.

(fastqc) drodriguez1@hulk:~$ conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge conda config --set channel_priority strict Warning: 'defaults' already in 'channels' list, moving to the top (fastqc) drodriguez1@hulk:~$ Login as: drodriguez1 drodriguez1@hulk.ccom.uprrp.edu's password: Welcome to Ubuntu 22.04.2 LTS (GNU/Linux 5.19.0-46-generic x86_64) Expanded Security Maintenance for Applications is not enabled. 3 updates can be applied immediately. To see these additional updates run: apt list --upgradable Enable ESM Apps to receive additional future security updates. See https://ubuntu.com/esm or run: sudo pro status

(base) drodriguez1@hulk:~$ pwd /home/drodriguez1 (base) drodriguez1@hulk:~$ conda create -n fastqc -c bioconda -c conda-fo rge fastqc Collecting package metadata (current_repodata.json): done Solving environment: done history conda create -n af simpleaf piscem conda activate af conda deactivate screen screen -r screen -ls screen -r ls scp boqueron.hpcf.upr.edu:/work/humbero/drodriguez1/align_data. scp boqueron.hpcf.upr.edu:/work/humberto/drodriguez1/align_data. nano align_data

boq 1

-Meet the Mentors: Dr. Imilce Rodríguez Hernández -Techniques Rodríguez-Hernández -Workshop: Grant Funding:(P Ordoñez) -Movie Night: Code Bias -Journal Club: Rodríguez-Hernández -Workshop: How to write a personal statement (P Ordoñez)

July 3-8, 2023

-Laboratory: Boqueron and Hulk, we had an error: We did not tell Hulk to save the fastqc, therefore we fix it. Saving it in nano align. Also we were confused because we had fastq and fastqc mix (L1 andL2) we moved it to anotoher folder. We had an error running piscem, we were trying to write on home, it was not running on the q screen. New Excel and EMTAB. We identify the new samples of the individuals.

hulk 1

-Meet the Mentors: Carmen Cadilla -Techniques: Carmen Cadilla -Lunch with Dr. Esther Peterson -Deliverable: First Draft Personal Statemen+IDP -Journal Club: Carmen Cadilla -IQ Hackathon (Engine 4)

screen screen -ls sreen ls screen -r screen -d conda activate af conda create -n af simpleaf piscem conda deactivate Solving environment: done refdata-gex-GRCh38-2020-A Collecting package metadata (current_repodata.json): done

Last login: Thu Jul 3 13:02:07 2023 from 136.145.181.31 (base) drodriguez1@hulk:~$ pwd /home/drodriguez1 (base) drodriguez1@hulk:~$ ls af_home align_data align_data.save 'Bioinformatic of gene expression.Rproj' buildindex data data_quant fastqc '*.fastq.gz' fix-names.R human-2020-A_splici install.R load-af-data.R miniconda3 mrnaseq-subset.tar Mytest neuvos newfiles nombre-nuevo nombres nuevo pbmc1k_quant R refdata-gex-GRCh38-2020-A scrEXT001_hg19_S9_L008_I1_001.fastq.gz snap (base) drodriguez1@hulk:~$ cd refdata-gex-GRCh38-2020-A/ (base) drodriguez1@hulk:~/refdata-gex-GRCh38-2020-A$ ls fasta genes pickle reference.json star (base) drodriguez1@hulk:~/refdata-gex-GRCh38-2020-A$ cd fasta (base) drodriguez1@hulk:~/refdata-gex-GRCh38-2020-A/fasta$ ls genome.fa genome.fa.fai (base) drodriguez1@hulk:~/refdata-gex-GRCh38-2020-A/fasta$ cd genes -bash: cd: genes: No such file or directory (base) drodriguez1@hulk:~/refdata-gex-GRCh38-2020-A/fasta$ cd ../genes (base) drodriguez1@hulk:~/refdata-gex-GRCh38-2020-A/genes$ ls genes.gtf (base) drodriguez1@hulk:~/refdata-gex-GRCh38-2020-A/genes$ less genes.gtf

description: evidence-based annotation of the human genome (GRCh38), version 32 (Ensembl 98)

provider: GENCODE

contact: gencode-help@ebi.ac.uk

format: gtf

date: 2019-09-05

chr1 HAVANA gene 29554 31109 . + . gene_id "ENSG00000243485"; gene_version "5"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; level 2; hgnc_id "HGNC:52482"; tag "ncRNA_host"; havana_gene "OTTHUMG00000000959.2"; chr1 HAVANA transcript 29554 31097 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-202"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "dotter_confirmed"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002840.1"; chr1 HAVANA exon 29554 30039 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-202"; exon_number 1; exon_id "ENSE00001947070"; exon_version "1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "dotter_confirmed"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002840.1"; chr1 HAVANA exon 30564 30667 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-202"; exon_number 2; exon_id "ENSE00001922571"; exon_version "1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "dotter_confirmed"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002840.1"; chr1 HAVANA exon 30976 31097 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-202"; exon_number 3; exon_id "ENSE00001827679"; exon_version "1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "dotter_confirmed"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002840.1"; chr1 HAVANA transcript 30267 31109 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000469289"; transcript_version "1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-201"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002841.2"; chr1 HAVANA exon 30267 30667 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000469289"; transcript_version "1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-201"; exon_number 1; exon_id "ENSE00001841699"; exon_version "1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002841.2"; chr1 HAVANA exon 30976 31109 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000469289"; transcript_version "1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-201"; exon_number 2; exon_id "ENSE00001890064"; exon_version "1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002841.2"; chr1 HAVANA gene 34554 36081 . - . gene_id "ENSG00000237613"; gene_version "2"; gene_type "lncRNA"; genes.gtf

July 10-14

-Laboratory: Hulk account to work on Rsudio local, Alevin Fry Data (Rscript install), Seurat objetcts tutorial, find the diffrence between normal and tissue core. Download URLs list to match the new table of samples with the columns, identify using rows. Created new directory, we installed miniconda in Hulk.

boq 2

-Meet the Mentors: Dr. Alfredo Ghezzi -Galaxy Workshop “Genomics” -Genomics and Society Dr. Joseph Vogel Bounded Openness over Natural Information -Galaxy Workshop “Transcriptomics” -Galaxy Workshop “Metagenomics” -Journal Club:Dr. Alfredo Ghezzi

(base) drodriguez1@hulk:~/pbmc1k_quant$ ls af_map af_quant simpleaf_quant_log.json (base) drodriguez1@hulk:~/pbmc1k_quant$

nano align_data

July 17-21, 2023

-Meet the Mentors Dr. Josue Pérez Santiago -Worskhop: How to prepare an abstract (J García-Arrarás) -Techniques Dr. Josue Pérez Santiago -Workshop: Statistics in Biology (ME Pérez-Hernández) -Journal Club: Josue Pérez Santiago -Deliverable: Updated Research Summary (w/ Lit review + Methods) <<<<<<

July 24-28, 2023

--Lunch with Dr. José García-Arrarás
--“How to prepare an Oral and Poster Presentations” with Dr. José García-Arrarás at CRiiAS (Bring a poster from your lab) --Techniques with Dr. José García-Arrarás at JGD-223 --Grad students panel with Christian Del Valle --Journal Club with Dr. José García-Arrarás at JGD-223

July 31- August 4, 2023

--Lunch with Dr. Juan S. Ramírez-Lugo at CRiiAS --Site visit: Ebony Madden, Ph.D.Health Equity and Workforce Diversity Program Director National Human Genome Research Institute (NHGRI) --Deliverable: DRAFT Presentation --Deliverable OFFICIAL Final presentation --2023 SUMMER RESEARCH FEST

August 5-15

-Summer Break

August 16-18

-- Research for summer programs off-site: NIH-SIP

August 21-25, 2023

--Presentaciones de los resultados obtenidos en la experiencia de verano off-site.Los estudiantes de segundo año en el programa presentarán la investigación realizada en su programa de verano (off-site). Presentarán 10 minutos c/u.

--

August 28- September 1, 2023

-- Journal Club: "Sparser"

September 4-8, 2023

--Journal Club: "Binarized"

September 11-15, 2023

--Online Recruitment Seminar:Memorial Sloan Kettering Cancer Center (MSK) --Journal Club: "Noise"

September 18-22, 2023

--Graduate Fair --Journal Club: "Dimensional reduction"

September 25-29, 2023

--Scientific Seminar: "Phylogenomic tools to shed light on the evolution of ant-plant symbioses" --Scientific Seminar: "Artificial Intelligence and Machine Learning applied to Biomedicine and Health Disparities Research" --Journal Club: "Stringtie"

October 2-6,2023

--Recruitment Seminar: UT Southwestern

October 9-13,2023

--Workshop: Conquista el Tiempo, Alivia la Ansiedad: Estrategias Universitarias Efectiva --Recruitment Seminar: Gerstner Sloan Kettering Graduate School (GSK)

October 16-20,2023

-Research for summer program off-site NSF/REU

October 23-27,2023

--Scientific Seminar: "Reconstituting human organogenesis and disease on microchips"

October 30- November 3,2023

-- Scientific Seminar: "Data Science"

November 6-10,2023

-- Research for summer program off-site The University of Oklahoma (SURP)

November 13-17,2023

--Entrega de Progress Report/Informe de Progreso Verano 2023

November 20-24,2023

November 27-December 1,2023

--Seminario Científico: "Cell Maps" Seminario Departamental de Biología Daniel Feliciano, Ph.D. --Seminario científico y de reclutamiento ofrecido por la Universidad de New Hampshire (UNH)

December 4-8,2023

December 11-15,2023

December 18-22,2023

-Christmas Break

December 25-29,2023

--MIT prework, documents and survey

January 1,2024: Introduction and Python Review

-Introduction and expectations with Mandana Sassanfar -Magdalena Krubner "Review of pyhton basics" -Magdalena Krubner "Simple coding with python"

January 2,2024: Python Applications/Machine Learning

-Review and discussion with Mandana Sassanfar -Hannah Jobs "Plotting and Curve fitting using python" -Lunch with faculty: Lindsay Case, Kristen Knouse, Sinisa Hvartin, Mehrdad Jazayeri,Francisco Sanchez, Brandon Weissbourd. -Taylor Baum "Matching Learning applications to studying the brain"

January 3,2024: Cognitive neuroscience/Signal Processing

-"Dr. Federico Claudi-Brain Computer interface theory and practice" -Lunch with faculty:Adam Martin, David Page, Eliezer Calo, Ed Boyden, Chris Burge, Matt VAnder Heiden. -"Taylor Baum-Signal processing applied to neuroscience"

January 4,2024: CRYO EM

-Review and discussion with Mandana Sassanfar -Mary Ellen Wiltrout "MIT digital learning and on-line courses" -Joey Davis "Introduction to CryoEM" -Lunch with faculty: Joey Davis, Cathy Drennan, Becky Lamson, Amy Keating, Mark Harnett, Erin Chen -Jooey Davis and Lab members "CryoEM Data processing and analysis"

January 5,2024:cognitive reuroscience/Lab Tours

-Reiew and discussion with Mandana Sassanfar -Lecture:Professor Nancy Kanwisher, "Functional Imaging of the Human Brain" -Lunch with faculty: Steve Flavell, Nancy Kanwisher, Robert Desimone. 3 concurrent Tours, fMRI live demo with Steven Shannon, Athinoula, A. Martinos Imaging Center OR, Tour of MEG facilities with Dimitri Pantazi, CryoEM tour with Sarah Sterling. -Bryan Medina "What makes music so memorable" -Edit Abstract for march (SACNAS) and april (NHGRI)

January 6,2024:GENOMICS/GWAS/RNAseq

-Review and discussion with Mandana Sassanfar -Lecture: Profesor Olivia Corradin, "Human genetics and genomic studies of disease" -Lunch with grad students and faculty: Olivia Corradin and Seychelle Vos,Giselle Valdes & Hannah Jacob "RNAseq Data analysis" -Notification to the Profesor about summer off-site programs

January 7,2024:GENOMICS/GWAS/RNAseq/Information Sessions

-Kamal Maher, "Spatial gene expression patterns using python" -Grad students/Postbacs panel

January 8-12:

--

January 15-19:

--SACNAS abstract deadline --NHGRI abstract deadline --Oklahoma application submit --Python Workshop --Article: "These Are Not etc..." --Starting Escambron Protocol

login as: drodriguez1 drodriguez1@boqueron.hpcf.upr.edu's password: Access denied drodriguez1@boqueron.hpcf.upr.edu's password: Last login: Thu Jan 18 12:24:32 2024 from 136.145.181.31 (base) [drodriguez1@boqueron ~]$ cd $WORK (base) [drodriguez1@boqueron drodriguez1]$ ls af_home data humberto job.1894761.out refs align data_quant job.1894760.err multiqc_data align_data fastqc job.1894760.out multiqc_report.html buildindex human-2020-A_splici job.1894761.err pbmc1k_quant (base) [drodriguez1@boqueron drodriguez1]$ curl -O https://github.com/dib-lab/kh mer/tree/master/data % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Wa rning: Failed to create the file data 9 4567 9 405 0 0 801 0 0:00:05 --:--:-- 0:00:05 2700 curl: (23) Failed writing body (0 != 405) (base) [drodriguez1@boqueron drodriguez1]$ pip install khmer

Collecting khmer Downloading khmer-2.1.1-cp37-cp37m-manylinux1_x86_64.whl (11.5 MB) |████████████████████████████████| 11.5 MB 829 kB/s Collecting screed>=1.0 Downloading screed-1.1.3.tar.gz (144 kB) |████████████████████████████████| 144 kB 35.7 MB/s Installing build dependencies ... done Getting requirements to build wheel ... done Preparing wheel metadata ... done Collecting bz2file Downloading bz2file-0.98.tar.gz (11 kB) Building wheels for collected packages: screed, bz2file Building wheel for screed (PEP 517) ... done Created wheel for screed: filename=screed-1.1.3-py2.py3-none-any.whl size=96016 sha256=6816a997dc0b7c85344fda7bb35ae62594f83e612acb6ac48bc3f0cd3b420144 Stored in directory: /home/humberto/drodriguez1/.cache/pip/wheels/b2/91/48/1a166ff196d74759328d2d1e74daa3e8989a279424e99a2a9d Building wheel for bz2file (setup.py) ... done Created wheel for bz2file: filename=bz2file-0.98-py3-none-any.whl size=6882 sha256=34e91244bb420673cdb11e9a92a189e56838c92aed3b09c6a0170a035481cea7 Stored in directory: /home/humberto/drodriguez1/.cache/pip/wheels/85/ce/8d/b5f76b602b16a8a39f2ded74189cf5f09fc4a87bea16c54a8b Successfully built screed bz2file Installing collected packages: screed, bz2file, khmer Successfully installed bz2file-0.98 khmer-2.1.1 screed-1.1.3 (base) [drodriguez1@boqueron drodriguez1]$ pytest --pyargs khmer -m 'not known_failing and not jenkins and not huge and not linux' -bash: pytest: command not found (base) [drodriguez1@boqueron drodriguez1]$ curl -O https://github.com/dib-lab/khmer/tree/master/khmer % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 101 4479 101 4479 0 0 8715 0 --:--:-- --:--:-- --:--:-- 34190 (base) [drodriguez1@boqueron drodriguez1]$ conda create -n khmers Collecting package metadata (current_repodata.json): done Solving environment: done

==> WARNING: A newer version of conda exists. <== current version: 4.8.3 latest version: 23.11.0

Please update conda by running

$ conda update -n base -c defaults conda

Package Plan

environment location: /home/humberto/drodriguez1/miniconda3/envs/khmers

Proceed ([y]/n)? y

Preparing transaction: done Verifying transaction: done Executing transaction: done

To activate this environment, use

$ conda activate khmers

To deactivate an active environment, use

$ conda deactivate

(base) [drodriguez1@boqueron drodriguez1]$ conda activate kmers Could not find conda environment: kmers You can list all discoverable environments with conda info --envs.

(base) [drodriguez1@boqueron drodriguez1]$ conda activate khmers (khmers) [drodriguez1@boqueron drodriguez1]$ pip install khmers ERROR: Could not find a version that satisfies the requirement khmers (from versions: none) ERROR: No matching distribution found for khmers (khmers) [drodriguez1@boqueron drodriguez1]$ ls af_home buildindex fastqc job.1894760.err job.1894761.out multiqc_report.html align data human-2020-A_splici job.1894760.out khmer pbmc1k_quant align_data data_quant humberto job.1894761.err multiqc_data refs (khmers) [drodriguez1@boqueron drodriguez1]$ pip install kmers Collecting kmers Downloading kmers-0.2.1-py3-none-any.whl (5.9 kB) Collecting biopython Downloading biopython-1.81.tar.gz (19.3 MB) |████████████████████████████████| 19.3 MB 1.9 MB/s Collecting numpy Downloading numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB) |████████████████████████████████| 15.7 MB 34.6 MB/s Building wheels for collected packages: biopython Building wheel for biopython (setup.py) ... done Created wheel for biopython: filename=biopython-1.81-cp37-cp37m-linux_x86_64.whl size=3009565 sha256=030e2435c57dd57859b3187f23e7bd5b02b2bfb979d891fd852db0912d33a7e1 Stored in directory: /home/humberto/drodriguez1/.cache/pip/wheels/02/c0/ad/b31c4b0d92571e9a91055bf96c89887cf3e31db3bc3bd0ca14 Successfully built biopython Installing collected packages: numpy, biopython, kmers Successfully installed biopython-1.81 kmers-0.2.1 numpy-1.21.6 (khmers) [drodriguez1@boqueron drodriguez1]$ curl-o https://github.com/dib-lab/khmer/tree/master/khmer -bash: curl-o: command not found (khmers) [drodriguez1@boqueron drodriguez1]$ ls af_home buildindex fastqc job.1894760.err job.1894761.out multiqc_report.html align data human-2020-A_splici job.1894760.out khmer pbmc1k_quant align_data data_quant humberto job.1894761.err multiqc_data refs (khmers) [drodriguez1@boqueron drodriguez1]$ cd $WORK khmer (khmers) [drodriguez1@boqueron drodriguez1]$ ls af_home buildindex fastqc job.1894760.err job.1894761.out multiqc_report.html align data human-2020-A_splici job.1894760.out khmer pbmc1k_quant align_data data_quant humberto job.1894761.err multiqc_data refs (khmers) [drodriguez1@boqueron drodriguez1]$ curl -o starcluster start -o -s 1 -i m2.2xlarge -n ami-999d49f0 pipeline

curl: (6) Couldn't resolve host 'start'

curl: (7) Failed to connect to 0.0.0.1: Invalid argument curl: (6) Couldn't resolve host 'm2.2xlarge' curl: (6) Couldn't resolve host 'ami-999d49f0' curl: (6) Couldn't resolve host 'pipeline' (khmers) [drodriguez1@boqueron drodriguez1]$ curl -o ami-999d49f0 curl: no URL specified! curl: try 'curl --help' or 'curl --manual' for more information (khmers) [drodriguez1@boqueron drodriguez1]$ curl -O http://public.ged.msu.edu.s3.amazonaws.com/2013-khmer-counting/2013-khmer-counting-data.tar.gz

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 5731M 100 5731M 0 0 14.6M 0 0:06:32 0:06:32 --:--:-- 17.3M (khmers) [drodriguez1@boqueron drodriguez1]$ (khmers) [drodriguez1@boqueron drodriguez1]$ curl -O https://s3.amazonaws.com/public.ged.msu.edu/mrnaseq-subset.tar % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 121 243 0 243 0 0 558 0 --:--:-- --:--:-- --:--:-- 2858 (khmers) [drodriguez1@boqueron drodriguez1]$ ls -l total 5870669 -rw-r--r-- 1 drodriguez1 humberto 6010092907 Jan 18 13:29 2013-khmer-counting-data.tar.gz drwxr-sr-x 3 drodriguez1 humberto 4096 Jul 5 2023 af_home -rwxr-xr-x 1 drodriguez1 humberto 1122 Jul 5 2023 align -rwxr-xr-x 1 drodriguez1 humberto 1115 Jul 17 2023 align_data -rw-r--r-- 1 drodriguez1 humberto 266 Jul 19 2023 buildindex drwxr-sr-x 2 drodriguez1 humberto 4096 Jul 17 2023 data drwxr-sr-x 3 drodriguez1 humberto 4096 Jul 17 2023 data_quant drwxr-sr-x 2 drodriguez1 humberto 4096 Jul 5 2023 fastqc drwxr-sr-x 4 drodriguez1 humberto 4096 Jul 4 2023 human-2020-A_splici drwxr-sr-x 2 drodriguez1 humberto 12288 Jul 5 2023 humberto -rw-r--r-- 1 drodriguez1 humberto 78 Jul 5 2023 job.1894760.err -rw-r--r-- 1 drodriguez1 humberto 13703 Jul 5 2023 job.1894760.out -rw-r--r-- 1 drodriguez1 humberto 78 Jul 5 2023 job.1894761.err -rw-r--r-- 1 drodriguez1 humberto 5237 Jul 5 2023 job.1894761.out -rw-r--r-- 1 drodriguez1 humberto 4479 Jan 18 12:58 khmer -rw-r--r-- 1 drodriguez1 humberto 243 Jan 18 13:32 mrnaseq-subset.tar drwxr-sr-x 2 drodriguez1 humberto 4096 Jul 3 2023 multiqc_data -rw-r--r-- 1 drodriguez1 humberto 1366899 Jul 3 2023 multiqc_report.html drwxr-sr-x 4 drodriguez1 humberto 4096 Jul 5 2023 pbmc1k_quant drwxr-sr-x 3 drodriguez1 humberto 4096 Jul 3 2023 refs (khmers) [drodriguez1@boqueron drodriguez1]$ tar xf 2013-khmer-counting-data.tar.gz (base) [drodriguez1@boqueron drodriguez1]$ conda activate khmers

January 22-26:

-- Course schedule to the director -- Continuing working with the Article: "These Are Not etc..." and Starting Escambron Protocol

login as: drodriguez1 drodriguez1@boqueron.hpcf.upr.edu's password: Access denied drodriguez1@boqueron.hpcf.upr.edu's password: Last login: Thu Jan 18 15:00:45 2024 from 136.145.181.31 (base) [drodriguez1@boqueron ~]$ conda activate khmers (khmers) [drodriguez1@boqueron ~]$ ls archivo_2 buildindex.save buildindex.save.1 dc_workshop miniconda3 Miniconda3-latest-Linux-x86_64.sh Miniconda3-py310_23.3.1-0-Linux-x86_64.sh Miniconda3-py38_23.3.1-0-Linux-x86_64.sh miniconda.html nano.save nano.save.1 SRR1032911_1.fastq SRR1032911_2.fastq (khmers) [drodriguez1@boqueron ~]$ ls archivo_2 buildindex.save buildindex.save.1 dc_workshop miniconda3 Miniconda3-latest-Linux-x86_64.sh Miniconda3-py310_23.3.1-0-Linux-x86_64.sh Miniconda3-py38_23.3.1-0-Linux-x86_64.sh miniconda.html nano.save nano.save.1 SRR1032911_1.fastq SRR1032911_2.fastq (khmers) [drodriguez1@boqueron ~]$ cd $WORK (khmers) [drodriguez1@boqueron drodriguez1]$ ls =1.0 fastqc mrnaseq-subset.tar 2013-khmer-counting-data.tar.gz human-2020-A_splici multiqc_data af_home humberto multiqc_report.html align job.1894760.err pbmc1k_quant align_data job.1894760.out pipeline buildindex job.1894761.err refs data job.1894761.out WARNING: data_quant khmer (khmers) [drodriguez1@boqueron drodriguez1]$ tar tf 2013-khmer-counting-data.tar.gz pipeline/ pipeline/iowa_prairie_0920.fa.1 pipeline/iowa_prairie_0920.fa.2 pipeline/iowa_prairie_0920.fa.3 (khmers) [drodriguez1@boqueron drodriguez1]$ mkdir -p {WORK}/ quality (khmers) [drodriguez1@boqueron drodriguez1]$ cd $ {WORK}/ quality -bash: cd: $: No such file or directory (khmers) [drodriguez1@boqueron drodriguez1]$ ls =1.0 human-2020-A_splici multiqc_report.html 2013-khmer-counting-data.tar.gz humberto pbmc1k_quant af_home job.1894760.err pipeline align job.1894760.out quality align_data job.1894761.err refs buildindex job.1894761.out WARNING: data khmer {WORK} data_quant mrnaseq-subset.tar fastqc multiqc_data (khmers) [drodriguez1@boqueron drodriguez1]$ cd quality (khmers) [drodriguez1@boqueron quality]$ ls (khmers) [drodriguez1@boqueron quality]$ ln -fs /work/humberto/drodriguez1/2013-khmer-counting-data.tar.gz/.fastq.gz. (khmers) [drodriguez1@boqueron quality]$ l -bash: l: command not found (khmers) [drodriguez1@boqueron pipeline]$ ls ecoliMG1655.fa iowa_prairie_0920.fa.1 iowa_prairie_0920.fa.4 quality ecoli_ref.fastq iowa_prairie_0920.fa.2 iowa_prairie_0920.fa.5 .fastqc.gz, iowa_prairie_0920.fa.3 MH0001.trimmed.fa (khmers) [drodriguez1@boqueron pipeline]$ rm *.fastqc.gz\, (khmers) [drodriguez1@boqueron pipeline]$ rmdir quality (khmers) [drodriguez1@boqueron drodriguez1]$ module load fastqc (khmers) [drodriguez1@boqueron drodriguez1]$ fastqc .fastq.gz Skipping '.fastq.gz' which didn't exist, or couldn't be read (khmers) [drodriguez1@boqueron drodriguez1]$ cd quality (khmers) [drodriguez1@boqueron quality]$ ln -fs ${WORK}/pipeline/fa^C (khmers) [drodriguez1@boqueron quality]$ ls (khmers) [drodriguez1@boqueron quality]$ ls ../pipeline ecoliMG1655.fa iowa_prairie_0920.fa.2 iowa_prairie_0920.fa.5 ecoli_ref.fastq iowa_prairie_0920.fa.3 MH0001.trimmed.fa iowa_prairie_0920.fa.1 iowa_prairie_0920.fa.4 (khmers) [drodriguez1@boqueron quality]$ ln -fs ${WORK}/pipeline/fa (khmers) [drodriguez1@boqueron quality]$ ls ../pipeline -l total 16204776 -rw-r--r-- 1 drodriguez1 humberto 4706054 Jul 25 2013 ecoliMG1655.fa -rw-r--r-- 1 drodriguez1 humberto 1141494078 Jun 28 2013 ecoli_ref.fastq -rw-r--r-- 1 drodriguez1 humberto 1085661175 Jun 24 2013 iowa_prairie_0920.fa.1 -rw-r--r-- 1 drodriguez1 humberto 2169950952 Jun 24 2013 iowa_prairie_0920.fa.2 -rw-r--r-- 1 drodriguez1 humberto 3137253948 Jun 24 2013 iowa_prairie_0920.fa.3 -rw-r--r-- 1 drodriguez1 humberto 4049116668 Jun 24 2013 iowa_prairie_0920.fa.4 -rw-r--r-- 1 drodriguez1 humberto 5005467903 Jun 24 2013 iowa_prairie_0920.fa.5 lrwxrwxrwx 1 drodriguez1 humberto 50 Jan 22 12:34 MH0001.trimmed.fa -> / work/humberto/drodriguez1/pipeline/ecoliMG1655.fa (khmers) [drodriguez1@boqueron quality]$ ln -fs ${WORK}/pipeline/.fa. . (khmers) [drodriguez1@boqueron quality]$ ls iowa_prairie_0920.fa.1 iowa_prairie_0920.fa.3 iowa_prairie_0920.fa.5 iowa_prairie_0920.fa.2 iowa_prairie_0920.fa.4 (khmers) [drodriguez1@boqueron quality]$ ls -l total 0 lrwxrwxrwx 1 drodriguez1 humberto 58 Jan 22 12:37 iowa_prairie_0920.fa.1 -> /wor k/humberto/drodriguez1/pipeline/iowa_prairie_0920.fa.1 lrwxrwxrwx 1 drodriguez1 humberto 58 Jan 22 12:37 iowa_prairie_0920.fa.2 -> /wor k/humberto/drodriguez1/pipeline/iowa_prairie_0920.fa.2 lrwxrwxrwx 1 drodriguez1 humberto 58 Jan 22 12:37 iowa_prairie_0920.fa.3 -> /wor k/humberto/drodriguez1/pipeline/iowa_prairie_0920.fa.3 lrwxrwxrwx 1 drodriguez1 humberto 58 Jan 22 12:37 iowa_prairie_0920.fa.4 -> /wor k/humberto/drodriguez1/pipeline/iowa_prairie_0920.fa.4 lrwxrwxrwx 1 drodriguez1 humberto 58 Jan 22 12:37 iowa_prairie_0920.fa.5 -> /wor k/humberto/drodriguez1/pipeline/iowa_prairie_0920.fa.5 (khmers) [drodriguez1@boqueron quality]$ module load fastqc (khmers) [drodriguez1@boqueron quality]$ fastqc .fastq.gz Skipping '.fastq.gz' which didn't exist, or couldn't be read (khmers) [drodriguez1@boqueron quality]$ fastqc .fa. Failed to process iowa_prairie_0920.fa.1 uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start wit h '@' at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158) at uk.ac.babraham.FastQC.Sequence.FastQFile.(FastQFile.java:89) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(Sequen ceFactory.java:106) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(Sequen ceFactory.java:62) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunne r.java:159) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.(OfflineRunner.jav a:121) at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:3 16) Failed to process iowa_prairie_0920.fa.2 uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start wit h '@' at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158) at uk.ac.babraham.FastQC.Sequence.FastQFile.(FastQFile.java:89) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(Sequen ceFactory.java:106) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(Sequen ceFactory.java:62) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunne r.java:159) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.(OfflineRunner.jav a:121) at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:3 16) Failed to process iowa_prairie_0920.fa.3 uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start wit h '@' at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158) at uk.ac.babraham.FastQC.Sequence.FastQFile.(FastQFile.java:89) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(Sequen ceFactory.java:106) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(Sequen ceFactory.java:62) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunne r.java:159) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.(OfflineRunner.jav a:121) at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:3 16) Failed to process iowa_prairie_0920.fa.4 uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start wit h '@' at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158) at uk.ac.babraham.FastQC.Sequence.FastQFile.(FastQFile.java:89) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(Sequen ceFactory.java:106) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(Sequen ceFactory.java:62) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunne r.java:159) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.(OfflineRunner.jav a:121) at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:3 16) Failed to process iowa_prairie_0920.fa.5 uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start wit h '@' at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158) at uk.ac.babraham.FastQC.Sequence.FastQFile.(FastQFile.java:89) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(Sequen ceFactory.java:106) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(Sequen ceFactory.java:62) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunne r.java:159) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.(OfflineRunner.jav a:121) (khmers) [drodriguez1@boqueron quality]$ less iowa_prairie_0920.fa.1

1461:1:1:1627:2143/2 TCAGCTCGATTTGAAGTTCGCCCTGGTTCGGCGTCAGGACGATGGCGTCGATGAGGCCTCGAAGCGCCTCAGACGCTTCCGTGCG 1461:1:1:1540:2176/1 TGGAGGAGCAGAACCTGGTGAGCTTTGAACGGGAAACCGACTCCCGGGGGAAAGGGAGCGTCGTC 1461:1:1:1352:2135/2 AGCTGCCGTGCCTGCGCGGGCGAGATGTCGGATCCTGAATTCAAATACCGCGTCCTCTGGCCGTGCGGCAGTCCTTGGCG 1461:1:1:1593:2140/1 GAGCCGCCTCCCGCTCGTCCCCGCGATCCCCGCCTTCCCGGGGACAAAAGCGGGGACCCGGATTTGCAAGCATCC 1461:1:1:2062:2174/1 ACAAAGGCACTGGCCATTGCATTGCCGGCGAGGTCCGTGACCCCGGTGCTAATCGTGGCG 1461:1:1:3340:2123/1 GCACAGATAATTTTGCCGGCCTTTCATCAACGCTGCTTCAAACTCGACTGGCAG 1461:1:1:2959:2028/1 AACGACAAAAAGCCAAAAATAGAAACCGCCGTCGACGACATTGGTCGTGGAGCTCATCCCAAGTTCTACGAGTTGGTGTCCCAGAGCGAGTTGG 1461:1:1:2843:2136/1 ACCTCGTAACGCCGGGCCTCCTCCGCCGGTTTCGGACGGGCCGCGACAGCCAGGCCGAAGCGTTCCCGGGCGGCGACCCAGTCGCCCGTGGCCTCCGCGA 1461:1:1:3109:2162/2 AGCGCAGAACGCGCCAAAGCCAGACAAGCCGATGATCACTCCGAACCCCGACGGGACCTTCACGGCTCAAAAGCAGCCGAAGGATGCCAAGGCCAAGGCG 1461:1:1:2710:2196/1 GGGAACAAGGTGATGGCTTACTTCGAGCGCCTCTGATCGGCGCGCCCTACGCCGCCTCGCGCTCCGGCACCTTCACCGCCGCGAGCG 1461:1:1:4205:2141/1 GGCCTGTACGCGATGAGGTCGCCGCGCTCGCGGCGCTTCGGGGTCATCCTGATCGCGACCGGGCTCGTGTGGTCGCTGACCGCGTTCGGTGAGTCGTCCG 1461:1:1:3666:2093/1 GTGATCAAGCCCATGGCGGTGTCGTCGAACTTCGGGTCGC 1461:1:1:4271:2210/2 TGTCCTATCTGATCGACCCGCACACGCCGAAAAACGACGGCACCTTTCGGCCGATCGAGGTCATCGCCAAGCCCGGCACCGTGGTCTGGGCCAACC 1461:1:1:3877:2185/1

January 29-2 February, 2024

February 5-9, 2024

February 12-16, 2024

February 19-24:

February 26-1 March 2024:

March 4-9

-- SACANAS, Local Conference, UPRRP

March 11-15, 2024:

March 18-22, 2024:

March 25-29, 2024:

April 1-5, 2024:

April 8-12, 2024:

April 15-19, 2024:

April 22-26, 2024:

April 29-3 May, 2024

May 6-10, 2024:

May 13-17, 2024:

May 20-24, 2024:

May 27-31, 2024: