Contact info:


"Motivational note. Craig McGill crash-landed in the acid swamps of Boreal 9, fought off arachnid kidney-poachers, and hijacked a tame starwhal. If he can do all that, you can survive one more day." - Subnautica PDA

Another another weekly-ish update:

After a failed attempt to fulfill my research goals during the last time, here we are again trying to make the world right once more.

Week 13: (08-12/Apr)

  • Back at it right after the exams.
  • Got word of possible short reads incoming for the Holothuria Glaberrima Selenka.
  • Started working on a possible outline for the tech report.

Week 12: (01-05/Apr)

  • Studying hard for this week's exams, not much work done.

Week 11: (25-29/Mar)

  • Started reading up on the problem of the interpretation of alternative splicings when utilizing short reads and how this can be addressed by utilizing a hybrid approach that incorporates both short & long reads:
    • In essence both short reads and long reads are combined to solve ambiguity by having sequences that contain all the region (specially at junctions sites where the exons are removed) with long reads and add robustness through higher sampling depth from short reads.
    • Yeast Whole Genome Sequencing with Nanopore & Illumina

Week 10: (18-22/Mar)

Week 9: (11-15/Mar)

  • Too many evaluations this week, here's a list:
    • General Biology 1 - Exam
    • Mini-review presentation
    • Mini-review written report
    • Fermentation Lab Report
    • Compilers Assingment
    • Big Data to Biology - Exam
  • Not much research done, if any. :(

Week 8: (04-08/Mar)

Week 7: (25-01/Feb-Mar)

Week 6: (18-22/Feb)

  • Topics in Biology: Big Data to Biology - First research sprint is due could not work on research this week.

Week 5: (11-15/Feb)

  • Fixed the ViewSonic Secondary display in the lab. :)
  • Started Reading the paper I was assigned for the journal club.

Week 4: (04-08/Feb)

  • Behind on Compilers Class Lexer assingment must catch up ot it, no research done this week.

Week 3: (28-01/Jan-Feb)

  • Performed a search for possible papers to present during journal club

Week 2: (21-25/Jan)

  • Disscused Possible reasearch topics
  • Possible Pojects:
    • De-nove Genome Assembly of Sea Cuecumber
    • Genomic Sequencing pipeline development
    • Butterfly multiple data integration software approach:
      • Gene expresion, RNA-Seq, Chip-Seq & Transcription Factors

Week 1: (14-18/Jan)

  • Joined Megraprobe


"War... war never changes!" - Ron Perlman

Another weekly-ish update:

This time around we'll be working with something a bit non-bioinformatics, this trimester (u_u) we'll be working with some cybersecurity. The project will focus on analysing the encryption and protocol for private gossip on the awesomely decent(ralized) scuttlebutt network.

Week 7: (23-08/Jun)

  • Due to unforeseen and undue events I made the hard decision of discontinuing my attempt of researching this trimester.

Week 6: (16-20/Apr)

  • Should get something meaningful graphed.

Week 5: (09-13/Apr)

  • Still wrapping my head around Fourier Transforms in the context of this research, althought it is more clearer now.
  • Should learn a lot more about Fourier Transforms.
  • Added to help clarify research goals.
  • Added as a guide for using sbot to navigate the gossip.

Week 4: (02-06/Apr)

  • Finally got sbot installed following this guide.
  • npm is very problematic sometimes
  • Do not remove python3 from your ubuntu box it might (probably will) break it
  • Discussed possible implementations of Fourier transform analysis in intended research setting.

Week 3: (26-28/Mar)

Week 2: (19-23/Mar)

  • Reading up a lot on private box
  • Explored around Scuttlebutt's Installation and possibly useful files

Week 1: (12-16/Mar)

  • Reading up on scuttlebutt in general, ScuttleButt-Protocol Guide
  • Found out about private box and encrytion hash convention used by scuttlebut


Guess who's back, back again
Shady's back, tell a friend
- Slim Shaddy

NEW-New-new weekly weekly update:

  • Man I wanted to say that!
During this semester we'll be working on a shared transcript discovery tool that observes the de-bruijn graph obtained from different transcripts in order to analyze differential expression based on reads from sea cucumber RNA, in order to study the parts of it's genome that are related to it's ability to regenerate some of it's tissue.

Sea Cucumber or Holothuria glaberrima Selenka

Holothuria glaberrima Selenka!

Image taken from BMC genomics [article](

Week 14: (26-30/Feb)

  • Classes have oficially ended
  • Software and report have been disscussed, reviewed, and finalized

Week 13: (19-23/Feb)

  • Last lab meeting of this semester
  • We dicussed the technical report
  • We dicussed the final details of the modifications to the algorithm and programs

Week 12: (12-16/Feb)

  • Verified optimized algorithms
  • Discussed future directions for the software created and the research

Week 11: (05-09/Feb)

  • Discussed possible approach for future research goals
    • Approach to comparing sequences utilizing a Differential Expression Coefficient
  • Implemented some optimizations to algorithm (Python generators anyone)
  • Went over technical report with Kevin

Week 10: (29-02/Jan-Feb)

  • Talked about trouble with generating sample data for validation
  • Looking for possible optimizations to algorithms

Week 9: (22-26/Jan)

  • Missed meeting due to work

Week 8: (15-19/Jan)

  • Missed meeting due to work

Week 7: (08-12/Jan)

  • Meeting suspended this week

Week 6: (01-05/Jan)

  • Meeting suspended this week

Week 5: (25-29/Dec)

  • First birthday ever spent in school such a weird feeling, thanks to everyone who took part in making it a special day.

Week 4: (18-22/Dec)

  • Branched the program for possible GFA2 adaptation
  • Dicussed possible approaches to differential expression on this software
  • Biological significance of data vs. memory efficiency and performace (time complexity)
  • Ease of interpretations of results
  • Future uses of data generate, to log or to garbage collect that is the question

Week 3: (11-15/Dec)

  • Settled on using GFA-1 format specifications for output

Week 2: (04-08/Dec)

Week 1: (27/Nov-01/Dec)

  • Had an unofficial first meeting with collaborators
  • Drew project goals
  • Discussed possible approaches for the study
  • Partially analyzed the code in pmelsted de-bruijn graph generator


New-New Weekly update:

iBRIC summer internship at University of Pittsburgh, it's gonna be a blast and I'll keep you posted.

Causal Bayesian Networks!

Image taken from Google

Week 10: (24-28/July)

  • Mock presentations due Monday and Tuesday
  • Presentation to Department of Biomedical Informatics on Wednesday
  • Poster due Wednesday
  • Research presentation due Wednesday
  • Poster presentation @Symposium due Friday

Week 9: (17-21/July)

  • Finally got some data on the comparison of the algorithms
  • Data seems to favor our hypothesis
  • Some errors where pointed out running analysis again
  • Final Poster Crunch time
  • Research presentation Crunch time
  • Weekly meeting with my mentors, we discussed:
  • Preliminary results
  • The poster draft
  • Due to time constrains we won't be working with the Sachs data

Week 8: (10-14/July)

  • Research topic update:
  • Enabling Constraint-Based Causal Discovery Algorithms to obtain more information from observational and experimental data by performing interventions.
  • Research round-table #2 went great.
  • Polished and submitted abstract
  • Modeling interventions in causal networks to enrich constraint-based causal search
  • Weekly meeting with one of my mentors, we discussed:
  • Relevant hurdles we need to address to improve the effectiveness of the algorithms
  • Research direction for the next few weeks

Week 7: (03-07/July)

  • Read part of a paper suggested by my graduate student mentor
  • Joint Causal Inference from Observational and Experimental Datasets (Magliacane, Sara
  • Worked on the abstract because it's due next Wednesday, July 12th
  • Created some graphs that'll hopefully be useful for the poster
  • Broke my operating systems graphical user interface (Ubuntu 16.04.02 LTS) by messing with the drivers
  • Nvidia 304 Driver broken package
    • Unistall the package and reinstall Ubuntu-Desktop
  • Used "$ startx" which changes the ownership of the xserver to root so noone else can use it :(
    • "sudo chown $USER:USER .Xauthority" should fix it
  • Fixed it on the weekend

Week 6: (26-30/June)

  • Watching most BD2K video resources on working with statistics, big data, & causal modeling
  • Found this awesome paper/primer by Chris J Needham, James R Bradford, Andrew J Bulpitt, and David R Westhead, Fran Lewitter (Editor) to aid me in the task of understanding and explaining causal discovery research
  • The date for the presentation of the abstract approaches, an early draft can be found here
  • Research round-table came up this Week, it went great, through constant discussion of my research topic I am able to refine my understanding of my work and it's importance, which in turns helps me a lot in presenting and advancing it

Week 5: (19-23/June)

  • Reference and Citation management talk
  • Ethics Forum presentation at Duquesne University:
  • Michael A. Bellesiles Ethics Case
  • Weekly meeting with graduate student mentor
  • Talked about representing experimental data in causal software
  • Currently playing with his code for interventional data in Tetrad than can be found in his repository
  • Working on understanding the relevance and implications of separating interventional/experimental data from observational data on the causal models.
  • Reading section 5.4.2 from Causation, Prediction, and Search by Peter Sprites et al. to better understand the PC Algorithm

Week 4: (12-16/June)

  • Center for Causal Discovery short course
  • Main topics Discussed:
  • Causal Networks
  • Bayesian Statistics
  • Bayesian Causal Networks
  • Tetrad Causal Discovery Software
  • Datathon Thursday
  • Journal Club Meeting on Friday

Week 3: (05-09/June)

  • During this week, I was instructed to do the following:
  • Get the Tetrad software up and running in my computer
  • Read up on and understand the (Peter Spirtes and Clark Glymour) PC Algorithm
  • Observational Data Representation within causal networks
  • Experimental Data Representation within causal networks
  • Ethics Forum Group presentation on Michael A. Bellesiles Case Arming America coming soon.
  • Research topic update:
  • Methods for integrating both Observational and Experimental Data in Constraint-Based Causal Discovery Algorithms for Causal Discovery in Causal Bayesian Networks.

Week 2: (30/May-02/June)

  • During this week we have been asked to write a brief proposal due Friday 2nd.
  • Proposal
  • Reading a lot of literature in order to understand Causal Bayesian Networks, d-separation and the Causal Markov condition.

Week 1: (22-26/May)

  • Was assigned to Dr. Gregory Cooper's Lab at the Pitt Department of Biomedical Informatics in the Center for Causal Discovery.
  • Was required to complete ethics in research training modules.
  • Responsible Conduct of Research
  • Biomedical Informatics Research ethics
  • Research topic:
  • Causal Discovery utilizing Causal Bayesian Networks Algorithms
  • Started reading up on all the required statistics and graph theory required to work with the topic.
  • Currently reading the first chapter of:

    Glymour, C. N., & Cooper, G. F. (Eds.). (1999). Computation, causation, and discovery. Aaai Press.


I have respawned :) Muahahahahhah! No but for real, I'm back, doing research at the Megaprobe-Lab in the hopes of expanding my knowledge of bioinformatics, and general programing.

New Weekly UPDATE (Jan 2017 - May 2017)

Week 11-19:

  • Classes still interrupted.

Week 10: (20-27/Mar)

  • Campus went on strike protesting budget cuts.
  • Classes suspended till further notice.

Week 9: (13-17/Mar)

Week 8: (06-10/Mar)

  • Professor suggested utilizing SnakeMake so far it's a really good prospect.
  • Read this SnakeMake paper for more info.

Week 7: (27/Feb-3/Mar)

  • Found a more pythonic way of finding information about device resources while reading up on the python standard library documentation.
  • Found interesting alternative to make called Do-it.

Week 6: (20-24/Feb)

  • Very anxious this Week.
  • Student strike.
  • Got sick :'(

Week 5: (13-17/Feb)

  • Did not work in the lab this week.
  • Most time was spent studying.

Week 4: (06-10/Feb)

  • Pipeline device resource detection script:
  • Return available CPU cores
  • Return available RAM free

Week 3: (01-05/Feb)

  • Working on pipeline makefile.
  • Learning about Trinity.

Week 2: (25-29/Jan)

  • Started work on pipeline makefile.
  • Looking at Diginorm.

Week 1: (19-22/Jan)

  • Started reading up on the khmer paper!
  • Joined the Lab :') YEY!


I confess my crimes against humanity and Eleutherodactylus coqui races alike, for I have vanquished the furious lab frog.

Now, I am become Death, the destroyer of worlds. J.R. Oppenheimer


Research Goals

Create and analyze correctly obtained De Bruijn graphs from de-novo sequence assemblies utilizing mutual.

Weekly UPDATE (Jan 2016 - May 2016)

Week 17: (09/May-13/May)

  • Got down to working on the technical report.
  • Cleaned and updated the repo:
  • Improved formatting on documents.

Week 16: (02/May-6/May)

  • Worked Hard on fixing parser issues.
  • Got some output, lightly checked/skimmed output for errors, did not find any.
  • Continued working on technical report.
  • To do:
  • Finish technical report.

Week 15: (025-29/Apr)

  • Began corroboration of correct parsing of LastGraph file:
  • Noticed wrong data being displayed in parser's GFA's output of Read Count.
  • Noticed Problem in Link report reported from Bandage's output vs. Parsers output.
  • Noticed Problem in reported Link's nodes orientation from Parser.
  • Changed the way parser stored the data from first storing it in an array to now utilizing networkx's built-in structure for storing attribute.
  • Fixed wrong read count report.
  • Kinda theoretically fixed links being reported and node orientation.
  • Parser is temporarily broken due to issues with networkx structure.
  • To do:
  • Fix issues that are preventing the parser from executing correctly.
  • Continue technical report.

Week 14: (18-22/Apr)

  • Started working on LastGraph parser in python:
  • Reads file
  • Finds NODE
  • Appends to list
  • Finds ARC
  • To done :)
  • Write all contigs obtained from De-Bruijn
  • Determine format for output
  • Presumably got a working parser now.
  • New to do:
  • Corroborate parser works
  • Keep improving it
  • Initiate technical report

Week 13: (11-15/Apr)

  • Hulk got back up (still has some fight left).
  • Ran Mutual twice simultaneously:
  • A) With the purpose of comparing organism's A (input velvet/oases output [velvetA]) De-Bruijn graph with organism's B (input velvet/oases output [velvetB]) De-Bruijn graph after the data had been cleaned with sickle and scythe.
  • B) With the purpose of comparing organism's A with organism's A (self comparison) expecting duplicate output from Mutual (outputa.fa = outputb.fa).
  • Created a database using Blast's formatdb on Mutual's output (outputa.fa) to compare other .fa files against it.
  • Blasted through Mutual's output (outputa.fa) database against Velvet's ouput (contigs.fa)
  • Blasted through Mutual's output (outputa.fa) database against Mutual's ouput (outputb.fa)
  • To do:
  • Analyse results see if comparisons results were accurate.

Week 12: (04-08/Apr)

  • Many evaluations this week also :'(, worked mostly on preparing a bioinformatics presentation that aids in explaining Mutual's Heuristic Pairwise Alignment algorithm implementation.
  • Was corrected by professor need to parse Velvelt's LastGraph file (that is to say it's De-Bruijn graph), not the fasta/fastq file.
  • Good news everyone! Found a LastGraph parser in written Ruby. Bad news everyone! I don't understand ruby. :{
  • Tried to ran Mutual but The Hulk(server) is down.
  • To do:
  • Turn to Velvet/Oases De-Bruijn parser.
  • Need to run clean data on Hulk.

Week 11: (28/Mar-01/Apr)

Week 10: (21-25/Mar)

  • Holy Week/Spring break.

Week 9: (14-18/Mar)

  • Ran clean data through Velvet/Oases.
  • Graphed Output of Organism A.
    • Noticeable results on graph.
  • Preparing to run files on mutual.
  • Student assembly paralyzed work.

Bandage graph of velvet/oases output.

Week 8: (07-11/Mar)

  • Git repo is finally in order. :)
  • Finally got to quality trimming:
  • Needed Illumina Adapters FASTA files to remove adapters, found them in Trimmomatic source code.
  • Helpful note utilize FastQC to help determine how to quality control your reads. Especially figuring out the encoding (In this case Illumina/Sanger Encoding).
  • Apparently scythe supports gunzipped fastq files.
  • Finally removed Adapters
  • Finally trimmed Edges
  • Created quality reports using FASTQC for each step.
  • To do:
  • Verify if quality can be further improved.
  • Re-run mutual with clean data and graph outputs
  • Close to source code examination.

Week 7: (29/Feb-4/Mar)

  • Reinstalled Ubuntu. :'D (So shinny!)
  • Updated Weekly.
  • Many a exams this week.
  • Began reading "The Cartoon Guide To Genetics" by Larry Gonick & Mark Wheelis.
  • Analyze best way to compare outputs from velvet/oases and mutual.
  • Options:
  • Find a way to graph mutuals output compare graphs with velvet's.
  • Also possibly compare with output of other that does analysis between multiple organisms.
  • Utilize blast to compare each contig.
  • Utilize clustel to compare contig alignment.
  • Still to do:
    • Remove Adapters with Scythe.
    • Trim edges with Sickle.

Week 6: (22-26/Feb)

  • Fought with git and github's repos in an effort to sync files, failed.
  • Switching strategy: Prioritizing some file to be copied and leaving other ones behind stored only locally in both hulk and personal machine.
  • Installed Scythe and Sickle.
  • Graphed Contigs from velvet's LastGraph utilizing Bandage.
  • Local Computer's Ubuntu OS broke (RIP) :'( Thankfully to prior advice given in a lab meeting, had created multiple back ups of research data.
  • Read a few more papers concerning bioinformatics and its history.

Bandage graph of velvet/oases output Starlet Sea Anemone for comparisson

Week 5: (15-20/Feb)

  • Worked on creating research documentation and a research log:
  • Wrote FAQ to help ease the reproduction of this research.
  • Experienced many issues syncing local and remote git repo.
  • Verified Mutual's output using two different blast services both returned correctly identified nemastotella similarity.
  • Verified Quality of input files using FastQC, file are very high quality reads.
  • To do:
  • Visualize contigs using Bandage.
  • Trim Illumina adapters from input sequences.

Week 4: (08-12/Feb)

Week 3: (01-05/Feb)

  • Initiated test running of mutual on local (DELL Inspiron 7548) machine, utilizing sample Sea Anemone Dataset.
  • Estimated run-time: 72 hour.
  • Expected output: A transcript with very long contigs due to the high similarity of the two test organisms.
  • Results: Two empty output files. >:'(
  • Possible issues: Computer might not meet requirements to performs such heavy work very efficiently.
  • Path of action: Proceeded to move operations elsewhere, The University of Puerto Rico's Computer Science Department very own Hulk.
  • Rest of the week(get it?) due to health problems.

Week 2: (25-29/Jan)

  • Installed required programs (velvet/oases) to test out mutual.
  • Acquired Sample Data Set of Nemastotella Embryonic Transcriptome (Starlet sea anemone).
  • Working on glossary of Bioinformatics related terms.
  • Continued reading up on de-novo sequences, mutual/velvet/oases and De-Bruijn Graph assembly of short reads.
  • Test ran mutual.

Week 1: (19-22/Jan)

  • Started reading up on de-novo sequencing and familiarizing with the terms used around it.