Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Conversion of decimal number 82.67 to its binary equivalent, Conversion of ...

Conversion of Decimal number 82.67 to its Binary Equivalent Ans. Firstly see the integer part 82 and determine its binary equivalent  as The Binary equivalent is (101001

Explain real time systems, Q. Explain Real time system? A real time sys...

Q. Explain Real time system? A real time system defines an interactive processing system with severe time limitations. A real-time system is used whenever there are rigid time

Explain the quantization error of an ADC, Explain the Quantization error...

Explain the Quantization error of an ADC. Ans. Quantization error- An analog voltage is within the range of 0 to 1V and for 3 bit output, the size of all intervals are

What is home shopping, Home Shopping TV broadcast of goods for purchase...

Home Shopping TV broadcast of goods for purchase, sent directly to a viewer . This online shopping is available because of e-commerce.

Interpolation search, Interpolation Search The next task is to implemen...

Interpolation Search The next task is to implement a variable size decrease-and-conquer solution to search. See Levitin [2007] pp 190 for a detailed description of the interpol

What is sdram, Synchronous dynamic random access memory (SDRAM) is dynamic ...

Synchronous dynamic random access memory (SDRAM) is dynamic random access memory (DRAM) that is initialized with the system bus. Classic DRAM has an asynchronous interface, which m

Internal organization of memory chip - computer architecture, Internal Orga...

Internal Organization of memory chip: Word line & bit lines 16x8 organization : 16 words of 8 bits per Form of an array

Short notes on displacement only addressing mode, (a) Write short notes on...

(a) Write short notes on displacement only addressing mode. (b) Explain the formats of a 80-bit floating point number. (c) Given the following assembly program. Instructi

Systems analyst in traditional business, Q. Systems Analyst in Traditional ...

Q. Systems Analyst in Traditional Business? In the traditional business information services are centralized for entire organization or for a specific location. In this organiz

Explain i/o buffer and advantage of buffering, What is an I/O buffer? What ...

What is an I/O buffer? What is the advantage of buffering? Is buffering always effective? Justify your answer with help of an example.   One type of I/O requirement arises from

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd