Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Explain the need for user-defined functions, Explain the need for user-defi...

Explain the need for user-defined functions. The need for user-defined function: 1.  A programmer might be having a block of code that he has repeated forty times all over t

Blocking and non-blocking network, Blocking and Non-Blocking network In...

Blocking and Non-Blocking network In non-blocking networks the route from any free input node to any free output node can always be offered. Crossbar is an best example of non-

What is disk access time, Q. What is disk access time? The disk access ...

Q. What is disk access time? The disk access time has two key components: Seek Time: Seek time is the time for disk arm to move heads to the cylinder comprising the desi

Define access time for magnetic disk, Define access time for magnetic disk....

Define access time for magnetic disk. The sum of seek time and rotational delay is known as access time for disks. Normal 0 false false false EN-IN

Why is catch almost always a bad idea, Why is catch (Exception) almost alwa...

Why is catch (Exception) almost always a bad idea?  Well, if at that point you know that an error has happened, then why not write the proper code to handle that error instead

What are the elements of an instruction, Q. What are the elements of an ins...

Q. What are the elements of an instruction? As the function of instruction is to communicate to CPU what to do it needs a minimum set of communication such as:  What op

Define grammar of a language, Define Grammar of a language. A formal la...

Define Grammar of a language. A formal language grammar is a set of formation rules which describe that strings formed from the alphabet of a formal language are syntactically

What are models and meta models, Model: It is a entire explanation of s...

Model: It is a entire explanation of something (i.e. system). Meta model: It shows the model elements, syntax and semantics of the notation that permits their manipulatio

What is stack pointer, Stack pointer is a particular purpose 16-bit registe...

Stack pointer is a particular purpose 16-bit register in the Microprocessor, which grasp the address of the top of the stack.

Boiler troubles, Differences between internal and external treatment in boi...

Differences between internal and external treatment in boiler

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd