Knowledge-Based Systems 71 (2014) 419–434 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosysAutomatic melody composition based on a probabilistic model of musicstyle and harmonic rulesCarles Roig, Lorenzo J. Tardón ⇑, Isabel Barbancho, Ana M. BarbanchoUniversidad de Málaga, ATIC Research Group, Andalucía Tech, ETSI Telecomunicación, Campus de Teatinos s/n, E29071 Málaga, Spainarticle info abstractArticle history: The aim of the present work is to perform a step towards the design of specific algorithms and methodsReceived 22 March 2014 for automatic music generation. A novel probabilistic model for the characterization of music learnedReceived in revised form 10 August 2014 from music samples is designed. This model makes use of automatically extracted music parameters,Accepted 22 August 2014 namely tempo, time signature, rhythmic patterns and pitch contours, to characterize music. Specifically,Available online 28 August 2014 learned rhythmic patterns and pitch contours are employed to characterize music styles. Then, a novel autonomous music compositor that generates new melodies using the model developed will beKeywords: presented. The methods proposed in this paper take into consideration different aspects related to theAutomatic melody composition traditional way in which music is composed by humans such as harmony evolution and structureMusic style repetitions and apply them together with the probabilistic reutilization of rhythm patterns and pitchMusic analysis contours learned beforehand to compose music pieces.Probabilistic modelMachine learning Ó 2014 Elsevier B.V. All rights reserved.1. Introduction style. It is not the characterization of music styles what is ulti- mately pursued in this work but the utilization of the descriptions The main objective in the field of music information retrieval is selected for the unattended generation of new melodies accordingto provide algorithms and methods for the analysis and description to those styles.of musical pieces to computational systems [1]. The advancesachieved allow computers to perform different tasks like automatic In other words, in this work, style-based composition parame-music transcription [2], or unattended genre classification, among ters will be learned from the low level features for style classifica-others [3]. Furthermore, the computational model of the human tion. Thus, a main novelty that will be presented will be based onexperience in the musical field and the way in which humans the analysis and post-processing of low level musical features toprocess this information are topics of great interest for psychology determine the style of the music that will be generated. The proce-and musicology [4]. dure developed improves the performance and adds a different level of complexity with respect to other approaches. An ad hoc In this context, the automatic generation of musical content is database will be built from the melodic and rhythmic patternsthe topic considered in this paper. Often, music is defined as ‘orga- found for different musical styles. Using this information, the pro-nized sound’ [5], this organization requires order, structure and a posed composer system will be able to generate new melodieslogical composition style learned by training [6]. Thus, a novel according to the characterized style [9] in a way that will be similaralgorithm designed for learning composition rules and patterns to the one followed by a human composer.will be shown in this paper. Specifically, three different musicalaspects will be considered: time (tempo and time signature esti- As mentioned above, a set of descriptors must be analysed inmation [1]), rhythm (rhythmic pattern learning) and intonation order to model the style of the melodies. Concerning the temporal(musical contour detection). descriptors, namely tempo and time signature, previous related works can be found. The work presented by Uhle and Herre in Different methods for music style classification have already [1] is focused on onset estimation by means of spectral analysisbeen described [7,8]. However, unlike cited works, in our case, and a subsequent study to estimate the length of the bar. On thethe classification is not performed to train a classifier but to build other hand, in [10], histograms are employed to find the mostlya data model upon the selected descriptors of a particular music repeated inter onset value in order to estimate the tempo without performing a spectral analysis. Also, there are recent methods for ⇑ Corresponding author. structural analysis, as the one described in [11], based on a time- span tree for the detection of structural similarity; this task can E-mail addresses: [email protected] (C. Roig), [email protected] (L.J. Tardón), be addressed by making use of the auto-similarity matrix [12]. [email protected] (I. Barbancho), [email protected] (A.M. Barbancho).http://dx.doi.org/10.1016/j.knosys.2014.08.0180950-7051/Ó 2014 Elsevier B.V. All rights reserved.
420 C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434method can be employed for both the extraction of music patterns stage, which is crucial in our work, is based on music theoryand tempo estimation. rules. The low level temporal estimation process can be related to Additionally, a set of automatic processes for the extraction ofcertain stages of previous MIDI-to-score works like the one musical features from MIDI files has been designed to learn thedescribed in [13], which is focused on the complex study of the musical patterns required by the composition scheme. Thus, wemusical object in the MIDI representation through the analysis of have designed a key-independent, unattended and style-based pat-the accent and beat induction. Unlike this work, the goal of the tern learning system plus a composition process that replicate thetemporal estimation scheme that will be presented in this style of the training data assuming that the musical style is relatedmanuscript is the characterization of simple rhythmic patterns to rhythmic patterns and melodic contours. Although harmony and(melodies without rhythmic ornaments). chord progressions are also closely related to music style defini- tion, the chord analysis was postponed for further development The scheme designed in this work proposes an innovative and it is placed out of the scope of this manuscript.approach for tempo estimation based on the inter onset interval(IOI) histogram. Our approach is inspired by [1,10]. However, some This paper is organized in five sections. After the introduction,improvements are proposed in order to attain better accuracy and in Section 2, a description of the analysis module will be presented.lower computational cost. Specifically, the proposed system uses a In Section 3, the automatic composer will be explained and a com-multi-resolution method that reduces the number of cost function plete specification of all its subsystems will be given. In Section 4,evaluations and achieves better accuracy. the tests done to evaluate the system and the results obtained will be drawn. Finally, the last section will present the conclusions Regarding musical composition based on pattern reallocation coming out from this work.and variations, which is the key point of the present work, othermethods are described in the bibliography. In [14–16], Markov 2. Pattern analysermodels are used for the modeling and composition process. Theuse of genetic algorithms, such as in Biles’ GenJam system [17], The selected approach for the design of the generation schemehas also been considered in the automatic music composition con- is based on the music theory method called obstinato [9], astext. Biles’ system and ours are based on learning musical described in Sections 1 and 3. This method considers the composi-sequences (rhythm and pitch) plus a mutation. However the sys- tion of music on the basis of the repetition of patterns so that thetem we propose does not require a fitness evaluation stage since repetitions themselves constitute the melodic structure. In order toevery learned pattern can be adapted to fit the composition. Fur- learn the patterns that will be used to compose new melodies, it isthermore, GenJam implements pattern mutation capability trough necessary to analyse the available samples to identify and extractgenetic algorithms which increases the flexibility and variability of the required elements for the composition, namely: rhythmicthe content. However, this feature cannot be applied in our patterns, pitch contours, harmonic progressions and tempoapproach since our scheme pursues style-replication. information (needed for the proper analysis of the data). Similar developments are proposed in [22]. Methods based on probabilistic approaches such as Cope’sExperiments in Musical Intelligence (EMI) [18,19] also focus on Thus, a database of musical parameters is designed to modelthe creation of an automatic music composition framework. Both musical styles (as in [23]) and organize and store the data that willEMI framework and the proposed system are based on probabilis- be used for the composition of new melodies. Since the main objec-tic models (for pattern selection) and the training data stored in tive is to develop a valid music model for the automatic creation ofthe database comes from musical features extracted from MIDI composition with style replication, it is necessary to discoverfiles (rhythm, pitch, dynamics,. . .). However EMI performs which parameters can be used for the proper modeling of musicalbeat-to-beat pattern matching while our approach makes use of styles. We decided to base our approach on the probabilistic anal-complete measures. Although both schemes use actual motives, ysis of musical elements such as rhythm, pitch and harmony. Fig. 1the use of longer patterns, as in our approach, provides a clearer presents a scheme of the analysis system.style replication. Moreover, unlike Cope’s approach, that generatesnon-tonal compositions, the proposed system defines a harmonic Tempo can be extracted easily by analysing MIDI meta-dataand rhythmic structure to compose tonal music. messages [24]. However, this piece of information can be missing (undefined) or incorrect. In order to develop a generally usable Another music generation method based on probabilistic struc- robust scheme, an algorithm to estimate the tempo and time signa-tures is Inmamusys [20]. This method follows a global scheme sim- ture has been designed. Note that tempo information is critical inilar to the proposed system. Both schemes use previously learned order to correctly identify rhythmic figures.patterns for generating new compositions by replicating and real-locating them. However, the learning stages and features are differ- Rhythm, pitch motives (melody) and harmony (which is consid-ent, the composition rules are distinct and, also, our system ered at the generation stage) are main pieces of informationperforms a post-processing process to allow all motives in the necessary for music generation [25]. This information will be codeddatabase fit in a composition. The solution to this problem adopted and stored in the database.in Inmamusys is to restrict the combinations to a subset of motivesthat had been previously tagged as compatible. The database will be divided into three levels hierarchically organized containing (1) time signatures, (2) rhythm patterns, Fractals, fuzzy logic and experts systems have also been used and (3) pitch contours (see Fig. 2). The symbolic notation used infor the composition of melodies [21]. However, unlike the pro- this database to store the duration of the rhythmic figures corre-posed system, the melodies generated by the scheme described sponds to the ratio between the lengths of the notes and the lengthin [21] are not related to any learned style and the composition of a whole note. In this way, figure identification is tempo indepen-scheme is radically different. dent and the complexity is reduced. So, the whole notes will be denoted as 1, the half notes as 0.5, and so on. Recall that the Summing up, the main contribution of the composition stage of specific duration (in s) of the figures is important only whenthe proposed system is the reduction of the importance of fitness the waveform is created. Melodic contours will be converted towhen reallocating musical motives. Biles solves this issue by using the equivalent MIDI note number [24].genetic algorithms that force the mutation of the patterns to fit[17]. In our proposal, the reallocation of randomly selected As an example, the rhythm instance associated with a 2/4patterns plus a global melody contour arrangement stage allow measure with C–D–E notes being a quarter note and two eighthall the elements to fit together. The global music arrangement
C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434 421Fig. 1. Scheme of the analysis system.notes will generate an instance ½60 62 64 under the element will discover and count the number of instances of each different½0:25 0:125 0:125 that belongs to the element ½2 4 of level 1 (Fig. 2). rhythmic pattern. Note that each measure can have a different pitch contour. The probabilistic model defined is obtained by The hierarchy of the levels in the database has been established counting the number of instances of each rhythmic motive in thefrom the most static to the most dynamic feature in a melody. First, song analysed. There are some very common style-independentnote that although the time signature can change in the same song, rhythmic patterns, conversely there are other rhythmic patternsthis possibility is not taken into account in the present analysis. A well related to a particular music style. Observe that it would betime signature is extracted for each song, it will be the responsibil- possible to identify music styles by using this probabilistic modelity of the user to feed the training system with correct data. If this [26]. However, music style classification is not in the scope of thiscondition is not fullfilled, rhythmic patterns will be wrongly work, but the models are. Thus, our system will estimate alearned and the style characterization scheme will be mistaken. probabilistic distribution of the appearance of rhythmic patterns for modeling each style. As it will be explained later, these The rhythmic patterns define the song structure. They areexpected to be repeated during the song, so the analysis process
422 C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434Fig. 2. Scheme of the organization of music characterization parameters in the database of the analysis system.Pattern Probability 0.12Pattern Probability 0.1 Advanced Level 0.08 0.06 0.04 0.02 0 0 5 10 15 20 25 30 35 Rhythm Pattern Identifier 0.3 0.25 Basic Level 0.2 0.15 0.1 0.05 0 0 5 10 15 20 25 30 35 Rhythm Pattern Identifier Fig. 3. Comparison between two different rhythmic pattern distributions.probabilities found for each rhythmic pattern will be used by the and used for music modeling and composition. The comparisoncomposition scheme independently for each of the measures that shows the results of the analysis of two classical academic musiccompose the generated musical piece. datasets. The upper figure shows a higher level of complexity (corre- sponding to advanced music learning courses), while the lower one In Fig. 3, the distribution of the appearance rate of the rhythmic is simpler since it was obtained after the analysis of basic musicpatterns1 of two different music styles can be observed. Each of the learning lessons. These figures illustrate the presence of some com-patterns correspond to a complete measure. As it will be described mon patterns, also, the different patterns appear with different prob-later, the segmentation into measures (controlled by the time signa- ability or they do not appear at all. A main hypothesis in the presentture estimation) defines the rhythmic patterns that will be stored work is that the absence or presence, with specific probabilities, of certain rhythmic patterns can characterize a musical style [26]. 1 The 36 patterns presented in Fig. 3 were obtained by using the rhythm extractionmodule shown in Fig. 1. In further sections, this system will be described. The samples The composition scheme will be based on the probabilistic rep-used were MIDI files created manually and extracted from [27]. These excerpts can be resentation of the rhythmic patterns, so that the melody genera-classified as Academic music, i.e. classical music used for teaching and learning in tion system will select rhythmic patterns to replicate a style: aWestern Conservatories. new musical composition will be created in which the distribution
C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434 423 IOI Histogram 140 120Occurrence of IOI’s 100 n = 60 80 x = 0.2 60 40 20 0 0 0.5 1 1.5 IOI values (seconds)Fig. 4. Sample histogram of the inter onset intervals of a MIDI song. The lowest IOI present in the histogram defines the approximate tactus (0.2 s).of the rhythmic patterns will be similar to the ones in the training the fine search that will be defined later, see Fig. 4). However, somedata set of the target style. considerations must be done: Finally, the lowest level in the database corresponds to the Rests are not explicitly extracted. MIDI files contain infor-melodic contour. This is the most variable feature in a song and mation about the notes solely (Note On and note Off eventsit is related to the style through the analysis of the tones and the [24]). However, resting periods are indirectly extracted andintervals in a song. However, the tone and interval histograms taken into consideration to properly estimate the tempo.are more related to chords, harmony and tonality. The tactus estimated can be too short or too long. Never- Recall that the probabilistic distribution of rhythmic patterns is theless, the value estimated will be a multiple or a divisorconsidered to be the basis of the music style model. However, the of the actual tactus. Considering a certain logical rangepitch contours will be needed for music generation, so they are also for the tempo, this value can be adjusted.analysed and stored in the database. The probability of appearanceof each pitch contour will not be stored. Thus, all the pitch contours Because of the discrete nature of the histogram [10], the tactusdetected in a given music style for a certain rhythmic pattern willbe equally considered. Details will be drawn in Section 3.2. obtained will differ from the real one, so a novel post processing or a fine adjustment of the tactus based on the minimization of the Now, we describe in detail the analysis stages that extract the onset deviations is performed. The method proposed in [10,1] isinformation used to create the music characterization databasedescribed. based on the evaluation of the cost function for each of the tactus candidates defined in a temporal window around the initial esti-2.1. Temporal estimations mation. The tactus that attains the lowest error is selected. In order First of all, tempo and time signature have to be estimated from to achieve the desired accuracy the time step between candidatesthe MIDI file in order to correctly perform a bar separation process should be small, and the time window around the initial candidateto properly split rhythmic patterns (since the measure length isconsidered the minimum element for the composition of music should be wide (related to the histogram resolution). Thus, thisin this work, as in [20]). The information that will be used to this scheme is computationally expensive. This fact is a main drawbackend will be the inter onset intervals (IOI) of the notes. Note that when a large number of long melodies are used for training.the estimated tempo will be used only in the analysis stage. Thetempo is neither learned nor stored since the same musical compo- The method proposed in this paper reduces the computationalsition can be played with different tempos. load by means of a multi-resolution analysis scheme of the candi- The estimation stages are described next. date temporal window. The algorithm proposed evaluates only a subset of the candidates in the selected temporal window.2 The2.1.1. Tempo estimation The proposed algorithm for tempo estimation is based on the algorithm divides the window into four intervals. For each interval,work presented in [10]. However, in our scenario, there is no guar- 5 candidates (equally separated) are evaluated. The interval withantee that a percussive signal will be available to help the tempo the tactus candidate that attains the lowest error candidate isestimation process (observe that this type of signal is commonly divided again into 4 zones, and like in the previous step, 5 candidatesperiodic, so that the pulse can be easily estimated). In our case,the analysed IOI’s are directly extracted from the melody, which are evaluated. This process is repeated 6 times. These parametersdoes not necessarily follow a stable periodic beat. However, the were heuristically selected after performing a large number of tests.relation between IOI’s will be a divisor of the fundamental beat,or tactus [28,29]. The cost function employed computes for each onset the time deviation between the real onset (obtained from the MIDI) and Initially, the IOI with the lowest value in the histogram, which the nearest onset when a certain beat is assumed. Thus, the nor-represents the appearance of IOI values of a certain melody, willbe used to approximate the beat (it will be the initial value of malized cumulative temporal deviation of the j-th candidate (or, simply, error, ej) is defined as follows: ej ¼ 1 Xn À IOIijÞ ð1Þ Tj minðjtjðkÞ i¼1 k 2 The time window is the same as in [1]: a 66% of the initial tactus.
424 C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434where n is the number of notes in the MIDI stream, Tj is the j-th tac- Candidate ErrorNormalised Cumulativetus candidate, tjðkÞ is the multiple of the j-th beat candidate that is Deviationthe nearest one to IOIi, and IOIi is the i-th inter onset interval. The 7objective is to find the tactus that minimizes ej to define the finaltempo. 6 Note that since smaller tactus generate smaller deviations, the 5error function is normalized by the candidate value. Fig. 5 showsa comparison between normalized and non-normalized errors. 4 The proposed algorithm is both less computationally expensive tactuso = 0.2and more accurate than the approach in [1]. In Table 1, a compar-ison between the algorithm proposed and the one in [1] is shown. 3 error = 4.0485 o Our system reduces the number of cost function evaluationssignificantly (specially for large tactus values), and improves the 2resolution (separation between consecutive candidate tactus). tactusopt = 0.1508 In Fig. 6, the cost function for the tactus candidates is presented error = 0.0514together with the initial tactus estimation and the optimized one. opt The method described is useful for general MIDI files: MIDIfiles synthetically generated (quantized tempo) and actual 1performances, in which the note durations have slight variationsdue to the human performer. 0 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 Once the tactus is found, it has to be associated with a certainrhythmic element in order to obtain the duration of the quarter Candidate tactus (seconds)note (by convenience) so as to express the tempo in quarters perminute. Due to the wide range of tempos in music: from Largo Fig. 6. Global temporal deviation for several tactus candidates. The initial tactus(40–60 quarters per minute) to Presto (180–200 quarters per value obtained in the first stage attains a global error of 4.0485. The optimizedminute) [30], the tempo range needs to be restricted in order to tactus attains an error of 0.0514.establish an accurate relation between durations and musicalelements. This requisite is illustrated by the following observation: a tactus of 0.25 s can correspond to a sixteenth note at 60 quarters per minute, and to an eighth note at 120 quarters per minute. We will assume that the tempo of the training data is Moderato (76–108 quarters per minute). Nevertheless, the system will gener- ate a mapping of figures by means of the calculation of the dura- tion of each rhythmic figure using the set range. If the number of quarter notes per minute is between 76 and 108, then the mapping will be as shown in Table 2. Candidate ErrorNormalised 8Cummulative 6Deviation 4Cummulative Deviation 2(seconds) 0 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 Candidate tactus (seconds) 2 1.5 1 0.5 0 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 Candidate tactus (seconds) Fig. 5. Comparison between normalized (top) versus non-normalized (bottom) cumulative temporal deviation.Table 1Performance comparison between tactus estimation schemes. The tactus is measured in seconds. The comparison was made using samples created by using a software tool andextracted from [27]. The rows labeled as ’a’ correspond to the performance of the algorithm in [1], and the ones labeled as ’b’ correspond to the proposed system.Sample Tactus Initial tactus Optimized tactus Error (Eq. (1)) Evals # Resolution (s)#1a 0.5 0.5979 0.4842 3.47 401#1b 0.5 0.5979 0.4996 1.98 120 1:00 Á 10À3 1:95 Á 10À5#2a 0.25 0.3479 0.2497 11.66 234#2b 0.25 0.3479 0.2499 9.46 120 1:00 Á 10À3 1:38 Á 10À5#3a 0.25 0.2897 0.2503 6.35 195#3b 0.25 0.2897 0.2500 5.2457 120 1:00 Á 10À3 9:58 Á 10À6#4a 0.15 0.1856 0.1505 1.184 125#4b 0.15 0.1856 0.1499 0.986 120 1:00 Á 10À3 6:07 Á 10À6
C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434 425Table 2 Table 4Rhythmic figure assignment of the tactus for note duration x, according to the Evaluation of the tempo estimator. Experiment 2: Using non quantized MIDI filesestablished tempo range (76–108 quarters per minute). obtained from actual human performance. This table shows the real tempo, the estimation obtained by the algorithm and the number of IOI used to generate theDuration (s) Rhythmic figure assigned histogram.x > 4 Ã 0:78 Too long Sample Actual tempo Estimated tempo #IOI’s used Whole note4 Ã 0:56 < x < 4 Ã 0:78 Half note #1 125 124.15 89 Quarter note #2 125 124.05 872 Ã 0:56 < x < 2 Ã 0:78 Eighth note #3 125 124.28 90 Sixteenth note #4 125 123.97 11760 ¼ 0:56 < x < 0:78 ¼ 60 Too short #5 100 100.76 92108 76 #6 100 120 #7 98.74 911 Ã 0:56 < x < 1 Ã 0:78 #8 90 90.15 1012 2 #9 90 89.49 65 #10 90 89.67 541 Ã 0:56 < x < 1 Ã 0:78 #11 85 84.26 614 4 #12 85 84.02 62 85 84.84x < 1 Ã 0:56 4 If the tactus obtained is labeled as too short or too long (Table 2), After the tempo has been obtained, the duration of the metricthen the system assumes that the tempo was incorrectly estimated figures is well known and it is possible to decode a MIDI file intoand a new range will be considered. If the tactus is too short, the a sequence of musical figures. Specifically, some musical rhythmictempo range will be raised to Allegro values (105–132 quarters features can be extracted in order to estimate the most probableper minute). On the contrary, the range will be decreased to Adagio time signature. Using that information, the MIDI file can be split(40–80 quarters per minute). As stated in [31], the estimation algo- into measures, which are the basic ingredients required by ourrithm often estimates a doubled or halved tempo. This behavior is scheme to compose music.caused by the perceptual tempo concept [32]. This means that theestimation of the real tempo is subjective: some editors may use a 2.1.2. Time signature estimationshorter rhythmic figure and reduce the original tempo, while The estimation of the time signature is based on the detection ofothers can do the contrary for the same composition (with thesame performance velocity). So, this type of ‘error’ in the tempo repeated structures in a melody. The repetition of melodic andestimation scheme is not really important. rhythmic patterns is a common practice when composing music. In our scheme, bar repetitions will be observed by using a multi In order to evaluate the algorithm proposed, two tests have resolution analysis [33,34] to find the bar length that best suitsbeen performed. The first experiment was designed to evaluate the input melody among a set of candidates. The main idea is tothe accuracy of the tempo estimator leaving aside the influence split the melody into bars of different tentative lengths.of the human performance. In other words, the input dataset wasa group of 30 s MIDI files (extracted from [27]), with known tempo, After performing the different splits, some descriptors related tosynthetically created (using Steibergs’s Cubase 5.0) so that the a statistical analysis of the split bars are extracted. Our schemeduration of each of the rhythmic elements was exactly quantified uses the Rhythm Self Similarity Matrix, RSSM, as described inaccording to the tempo (e.g. if the tempo is 100 bpm, then the [35]. The rhythmic elements will be split into the number of tactusquarter note duration is exactly 0.6 s). The results of the first set corresponding to their durations in order to build the RSSM usingof experiments are shown in Table 3. the tactus as measurement unit. Thus, the inter onset interval (IOI) of each note is divided into tactus (beats), as illustrated in Fig. 7. The second experiment uses MIDI transcriptions of actualperformances. In this case, the durations of the rhythmic figures Each figure subdivision will be labeled according to the dura-are not quantized. Actually they are affected by normal deviations tion of the figure (number of beats). Rhythmic elements shorterderived from the human interpretation. This means that the onset than the pulse will be labeled 0. The rests are included in the interpositions are not perfectly aligned with the beat and the durations onset intervals (IOI). In the following equation, a numericare not exact multiples or divisors of the tactus. representation of this labeling procedure, using the example shown in Fig. 7, is presented: This second experiment is carried out using MIDI files corre-sponding to the same melodies as in the previous experiment IOI ¼ ½0:5; 0:375; 0:125 ) IOI0 ¼ ½4; 4; 4; 4; 3; 3; 3; 1 ð2Þbut, in this case, played by a human performer and recorded usinga MIDI controller (Kawai CA91). The results of this experiment areshown in Table 4. It can be observed that the system accuratelyestimates the tempo without a percussive reference and nonquantized IOI’s.Table 3 where IOI is the original inter onset interval vector and IOI0 is theEvaluation of the tempo estimator. Experiment 1: synthetic MIDI files with quantized labeled beat vector. 0.5 is divided into four beats and labeled as 4durations. This table presents the real tempo, the estimation obtained by the (assuming that the pulse corresponds to an eighth note, a half notealgorithm and the number of inter onset intervals used to generate the histogram. is subdivided into 4), 0.375 as 3, and 0.125 as 1. So, IOI0 can be defined as:Sample Actual tempo Estimated tempo #IOI’s used IOI0 ¼ ½IOI0ð1Þ; . . . ; IOI0ðjÞ for j ¼ 1; . . . ; M ð3Þ#1 125 124.75 89 Fig. 7. Example of the division into tactus of different rhythmic elements.#2 125 124.85 87#3 125 124.83 90#4 125 124.21 117#5 100 100.15 92#6 100 120#7 99.27 91#8 90 90.02 101#9 90 89.84 65#10 90 89.99 54#11 85 84.98 61#12 85 84.87 62 85 84.99
426 C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434where IOI0ðjÞ is the j-th tactus from the transcription and M is the Number of repeated bar instances: average number of instances per repeated bar. The larger the number of instances of a partic-number of tactus in the input MIDI file. ular bar, the more probable the separation will be considered.The inter onset interval vector obtained is split according to the Number of ties between bars: number of notes divided between two bars. The lower the number of ties between bars, the morelength of the different beat candidates. These lengths will corre- probable the separation should be. However, note that it is com- mon to find ties between bars in some scores. Thus, thisspond to a number of beats ranging from k ¼ 2 to k ¼ 12 (note that descriptor is not critical, nevertheless it can provide a useful piece of information: a large number of ties between bars is athe candidate index is the same as the measure length associated sign of wrong separation.with the tactus candidate: kk ¼ k, where kk is the measure length Number of detected bars: The number of bars detected is not aassociated with the k-th tactus candidate). These values were strict constrain, however, the number of bars is a power of two in most cases.defined according to the most common time signature numerators Observe that the longer the bar candidate, the lower the num-[36]. ber of detected bars for the same IOI0. In order to take into account this fact, the number of repeated bars, bar instances and ties is nor-This process will create a set of vectors called Bar Split Vectors malized by the total number of bars detected.that will be denoted BSV k and defined as: In order to classify the time signature using these descriptors, i we make use of the classifier based on the Sequential Minimal Optimization (SMO) algorithm [37] of Weka [38]. The classifierBSV k ¼ ½IOI0ðði À 1Þ Ã kk þ 1Þ; . . . ; IOI0ði à kkÞ for i ¼ 1; . . . ; N was trained with 10-fold cross validation with a dataset compris- i ing 100 instances. These instances were extracted after the analysis of 100 MIDI files coming from the human performance of 100 dif- ð4Þ ferent scores using a MIDI keyboard (Kawai CA91). The scores were obtained from music textbooks studied during official music stud-where k indexes the tactus candidate, i identifies the measure vec- ies at Spanish conservatories [27].tor obtained from the splitting of IOI0 in i-th place related to the bar The time signature of these instances was manually labeled according to the number of tactus per measure. In this process,length associated with the k-th tactus candidate, kk, and N is the two important considerations must be taken into account: first, since the goal of the time signature module is to find the optimalnumber of measures found. measure length rather than to identify the actual time signature, signatures like 2/2 and 4/4 or 3/4 and 6/8 will be considered toAfter splitting IOI0 into bars for all the candidates, the self sim- be equivalent since their measure length is the same. Second, the tactus can vary from score to score, depending on the music scoreilarity matrix, RSSMk, corresponding to the extracted bars will be edition. A score with tactus an eighth note, can be rewritten using a quarter note as tactus. So, there will be cases where the tactus esti-computed. The matrix defined in this work is boolean: the element mated is doubled of halved, and the number of tactus will notin the i-th row and j-column, RSSMkði; jÞ (i; j 2 ½0 . . . N, where N isthe number of split bars), will store the similarity between the i-th and the j-th bars, which is defined as: 8 < k kRSSMk ði; jÞ ¼ : 1 BSV i ¼ BSV j ð5Þ 0 BSV k – BSV k i jRSSM examples of a certain content are shown in Fig. 8. Using the different RSSM’s computed, the following descriptorsare extracted: Number of repeated bars: cardinal that counts the bars repeated in the split performed for a certain length candidate. The most repeated bar should perform a proper separation with higher probability.Fig. 8. Example of k Rhythm Self-Similarity Matrices (RSSM). The elements in white represent the repeated bars. The elements in the main diagonal are not taken into accountfor the analysis, since these elements compare a bar against itself.
C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434 427 22Instances be the correct ones, however, they will be perfectly useful for the 20 composition stage. 18 16 The parameters obtained by the tempo estimator module (see 14 previous section) are used to quantize the duration of the IOI0 of 12 the MIDI file. In order to coherently store different patterns 10 obtained with distinct tempos (different MIDI files with different speed), the patterns are scaled to a common convenient tempo: 8 240 bpm (any other value could have been selected). At this speed, 6 a quarter note lasts 0.25 s and a whole note 1 s. Thus, the scaling 4 transformation can be expressed as: 2 0 T0 ¼ T 1 bpm ð6Þ 4 60 Two Three Four Five Six Seven Eight Nine Twelve where T0 is the new duration of the figures, T is the original duration Time Signature Numerators in seconds in the MIDI file, and bpm is the actual tempo in quarters per minute.Fig. 9. Distribution of the time signatures used during the training of the timesignature estimation scheme. The samples used were extracted from an advanced At this point only the IOI information is obtained, rests arelevel conservatory textbook [27]. missed. In order to identify all the rhythmic elements in the mel- ody, including the rests, some processes should be done: first,Table 5 the difference between the Note Off and Note On events is usedConfusion matrix from the time signature (bar length) estimation using a multi- to determine the duration of the note, and the difference betweenresolution approach. The algorithm used for the classification was SMO with 10-fold the Note On event and the previous Note Off event is used to definecross-validation the separation between two notes. Second, the duration informa- tion vector is created by the concatenation of the duration samples Estimated together with the separation information. If the separation is 0, no extra element is added; otherwise, the corresponding duration is Two Three Four Five Six Seven Eight Nine Twelve added and treated as a common note (in the duration vector). At the same time, a pitch vector is generated. Both vectors have theActual 1 12 u s r same length and the rests are identified with null pitch in the pitchTwo r t 16 t vector.Three 14Four 4 4 1 Now, splitting into measures is easily accomplished by applyingFive thresholds to the cumulative sum of the values in the duration vec-Six s 2 6 tor and using the outcome of the Tempo Estimation Module [30].Seven 10 2 The measure splitter can face two different cases: (a) the accumu-Eight y lation of durations equals the threshold and (b) the accumulationNine 22 overpasses the threshold. In the first case, the measure splitterTwelve immediately divides the bar since the notes detected make a com- plete measure. The second case implies the existence of a tiestrictly coincide with the ground-truth. These cases could be con- between bars. However, as the composition method proposed insidered a mistake, however, a more relaxed criterion can be used this work is based in the relocation of complete measures, the notesince the main objective of this stage is to define a bar length to durations must be fixed by removing the tie through the bar line.divide a composition into measures with musical meaning. Thus, The note tied across the bar line will be split into two notes: onethe estimation of linear combinations (specially power of two) of with the proper duration to complete the previous bar and anotherthe length of the actual measure is acceptable because the patterns one with the remaining duration, which will be part of the follow-discovered will be meaningful from the musical point of view as ing bar.they will belong to the same rhythmic structure [36]. The distribu-tion of the ground-truth time signatures used for training is shown The rhythmic patterns obtained by the splitting scheme arein Fig. 9. stored. Every time a rhythmic motive is repeated, the new pitch contour is linked to the rhythmic instance. Patterns with more con- The cross-validation process attained a success rate of 69% in tour versions have appeared more times, and they will be selectedthe classification of nine bar lengths. In Table 5, the confusion with higher probability than others by the composer system, rep-matrix is presented. It must be noticed that most of the mistakes licating the probabilistic model of the rhythmic patterns in theappear between doubles and halves3 (circled numbers). If these composed melodies. In Section 3, the probabilistic model estab-results are considered as correct, then the success rate becomes 93%. lished at this point will be used for the pattern selection routine, since the random behavior of the rhythmic content should be rep-2.2. Rhythm extraction licated in the output musical compositions. Rhythm is probably the musical aspect more closely related to 2.3. Pitch contour extractionthe definition of the structure of the composition. Also, the partic-ular use of special rhythmic sequences like syncopations and orna- The pitch information is easily obtained from the MIDI mes-ments can be related to the definition of music styles. sages. However, the pitch contour [30] is more important for the generation scheme than the notes themselves. As a matter of fact, The rhythm estimation system designed works with any time the exact notes are not really necessary because a harmony correc-structure. Note that if the tempo or time signature were not cor- tor has been implemented in the melody generator to adapt therectly obtained, the rhythmic patterns that will be found will not melody to the chord progression. 3 In the case of the Three, Twelve is four times Three. So it can also be considered a In order to maintain the personality and the style of the reusedcorrect combination of measures. motives, the pitch contour must be preserved [39]. However, the
428 C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434use of variations instead of the unmodified patterns provides fur- In order to fix melodic gaps between patterns, the systemther flexibility. The patterns can be adapted to new harmonies makes use of an automatic method for the evaluation of the musicand the output melody can be set up to any desired key signature. expectancy and the adaptation of the melody based on Schellberg’s simplification of Realization–Expectation model [41]. This method In our system, each discovered pitch contour defines a usable will be described in depth in further sections.pitch contour for a certain rhythmic pattern of a given style. Recallthat the appearance rate of the rhythmic patterns is learned since it Finally, an automatic method for harmonic mutation isis crucial for the definition of the music style [26]. Conversely, the implemented. The usage of a chord-based database would restrictmeasured probability of appearance of the usable pitch contour, is the utilization of certain patterns to the cases in which the targetnot considered. Thus, all the possible pitch contours for a certain chord is found. This approach would require a massive trainingrhythmic structure and style are considered equally likely. with melodic patterns in order to have enough elements in the database to guarantee a sufficient level of variability in the output3. Melody generator melodies. Furthermore, the Narmour’s expectancy adjustment would not be as accurate as required. So, a harmonic adapter has After the analysis stages, a database with rhythmic and pitch been implemented. The designed approach is based on theinformation and predefined chord progressions will be available. harmonic adaptation method based on level changing [42] definedThe Melody Generator will use all this data to create new melodies in the classic music theory literature.replicating the style of the songs analysed. The Melody Generatordesigned is based on the concatenation of patterns in the database. Thus, the process to create a melody using the learned database starts with the definition of the score parameters (time signature, The user can set the size of the score, the initial tonality (the tempo, tonality, length) (step 1 in Fig. 10) together with the rhyth-proposed system is able to introduce tone modulation [36] by mic and harmonic structure. The selection of time signaturechanging the tonality along the score), the time signature and implies the selection of a subset of the database for the subsequentthe database of parameters corresponding to a certain style. Fur- steps (see Fig. 2) according to the information extracted from thethermore, rhythmic repetitions and harmony progression can be samples analysed. Then, the required rhythmic patterns aredefined by the user or selected from a list of predefined sequences. selected to fill the score slots (one pattern per each measure inThe specific parameters that can be selected by the user are pre- the score). For each rhythmic pattern a contour is selected fromsented in Table 6. Note that the user can select the tempo and the database. Next, the melodic line gaps are fixed using the expec-the instrument that will be used to play the generated melody. tancy method. Later, the harmony is adapted. A graphical represen-However, this choice is not actually considered part of the melody tation of this process is depicted in Fig. 10, a block diagram isgeneration scheme since the same melody can be played with dif- presented in Fig. 11 and details on each of the steps will be givenferent instruments and tempos. in the next sections. The system makes use of some rules to guide the pattern selec- 3.1. Harmony progressiontion process, ensure harmony adaptation and guarantee the conti-nuity of the pitch contour. The user can design a particular chord progression from scratch. This choice would probably, restrict the usability of the composi- Concerning the pattern selection, our main assumption for style tion scheme to musicians and people with strong musical back-replication is based on the probabilistic model of the rhythmic pat- ground (the chord progression is a very important parameter forterns and the selected pitch contours. On the one hand, we assume the musical success, there are combinations of chords that do notthat there are some basic rhythmic patterns that appear indepen- sound well together while others do [36]). So, a library of harmonicdently of the style of music, but there are also some particular progressions that sound well together has also been defined [43].rhythmic structures that commonly appear in a particular style. The selection of harmonic progressions is performed at step 2 inOn the other hand, we assume that all the pitch contours found Fig. 10.during the analysis stage for a certain style and rhythmic patternare equally probable [26]. So, the histogram of the rhythmic The predefined progressions used in our system follow thepatterns is obtained (see Fig. 12). Then, the selection of rhythmic Western music theory and have been taken from [43]. These arepatterns will be performed randomly, according to the probability presented in Table 7.distribution derived from the histogram found for a certain musicstyle. Afterwards, a suitable pitch contour will be selected. The fact that some harmonic progressions sound well while others do not depends on the listener’s expectation [40], which is A main problem found when concatenating independent pat- related to the cultural environment and the listener’s preferences.terns is the melodic gap between them. This gap is perceived According to some harmonic progression rules in Western classicalannoying by both trained and untrained musicians. According to music, tonic chords (I and vi) can evolve towards any chord, dom-Narmour’s Realization–Expectation model [40], there is a number inant chords (V, vii, and sometimes iii) must go to tonic and preof facts that objectively make the listener like or dislike the music dominant chords (ii and IV) must go to a dominant chord. The pre-played. This is related to the expectancy. If the sequence of tones is defined progressions used in our system (Table 7) follow Westernunexpected, it will be more likely that the listener will dislike the music theory rules and have been taken from [43]. Note that thesemelody, and vice versa. rules cannot satisfy the expectation of a non-Western listener, and they could perceive that this progression does not sound well.Table 6Parameters selectable by the user. Additionally, a set of rhythmic structures at phrase level is also provided. These structures can be selected by the user and, then,Global level Phrase level Measure level the composition configuration will be set. In Table 8, the set of Predefined Rhythmic Pattern rhythmic structures provided is presented. Note that the configu- Initial Tonality Tonality ration can be changed by the user. However, the structures pro- Time Signature Predefined Harmonic Pattern Chord vided are suggested to the user because they are commonly used. Number of Bars Rhythmic Pattern Style Database (pattern repetitions) Finally, in order to adapt the patterns selected, an innovative Tempo algorithm for melody transposition based on music theory rules Instrument has been designed (step 4 in Fig. 10). This algorithm will be described in Section 3.3.
C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434 429Fig. 10. Graphical representation of composition phases from database to final melody.3.2. Pattern selection patterns will be searched in the database, if the structure is A–B–A–C, like in Fig. 10, three different patterns will be Since the database contains several rhythmic patterns searched.organized per time signatures, a subset of usable patterns thatfullfil the time signature requirement will be selected. Among The previous establishment of the rhythmic structures meansthese patterns, several ones will be chosen according to the that the rhythms of the measures labeled with the same letter willestablished rhythmic structure (step 2 in Fig. 10). For be based on the same rhythmic pattern.example if the rhythmic structure is A–B–B–A, only two In the next sections, the pattern selection process for each mea- sure will be explained.
430 C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434 Fig. 11. Scheme of the melody generator.Table 7 120Predefined harmony progressions used for Melody Generation for both major and 100minor tonalities. 80Major minor Number of versions per pattern 60 40I I IV V7 i iv i ivI IV V7 I i iv V7 iI V7 IV V7 V7 iv i iI V7 ii IV ii° V7 i iI V7 IV IV V7 i V7 iV7 IV I I i iv V7 iI iii IV V7 i ii° i ii° Table 8 20 Predefined rhythmic structures used for Melody Generation. 0 Rhythmic structure (4 measures) 0 20 40 60 80 100 120 AAAA Pattern Identifier ABAB ABBA Fig. 12. Histogram of the number of versions of each pattern found in a dataset AABB with advanced academic music style samples [27]. AABC ABCA Samples of a random variable defined by the pdf in Eq. (7) are ABCD drawn to select the rhythmic patterns that will be used by the mel- ody generator. In order to obtain more natural results, the control3.2.1. Rhythmic pattern selection module will not select the same patterns for measures labeled dif- The selection of rhythmic patterns is random using the esti- ferently (A, B, C, . . .). Thus, the number of different rhythmic pat- terns to select will be defined according to the initial scoremated probabilistic pattern distribution (see Fig. 12 as an example configuration (step 3.1 in Fig. 10).of a pattern distribution). The rhythmic patterns will be obtainedaccording to the conditional rhythmic probability density function 3.2.2. Pitch contour selection(pdf) learned from the analysis phase: After each measure in the structure has been assigned a rhyth-f ðRjSÞ ¼ XRs NRi dðR À RiÞ ð7Þ mic pattern, a pitch contour must be selected for each measure NRT (step 3.2 in Fig. 10) among the ones available (level three of the i¼1 database structure shown in Fig. 2).where S represents the music style, Rs is the number of different As already mentioned in Section 2, the relative frequency of the pitch contours for each rhythmic pattern and style are not stored.rhythmic patterns discovered for the music style S; NRi is the num- Thus, all the pitch contours discovered for each rhythmic patternber of times the pattern Ri appeared in the training set and NRT is the and music style are considered to be equally probable for the com-total number of rhythmic patterns in the training set of style S. dðÁÞ position. Then, the following conditional probability density func- tion of the pitch contour C given the rhythmic pattern R and theis the Dirac delta function. music style S is defined:
C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434 431f ðCjR; SÞ ¼ 1 XNCT À CiÞ ð8Þ first note will be replaced by the tonic. Then, the nearest progres- NCT dðC sion that achieves the correct harmony will be created. The nearest progression is defined as the one that differs from the original pro- i¼1 gression in the smallest possible pitch distances and with the min- imum number of note changes.where NCT is the number of different pitch contour patternsdiscovered in the training set for the rhythmic pattern R and the Recall that the position of the notes is key for the chord trans-music style S. Ci represents each of the usable pitch contour position stage. So, the chord transposition block first identifiespatterns for each rhythmic pattern in each style. Observe that the the chord notes (these are considered responsible of the harmonyset of usable pitch contours is different (with different size and definition) and the non-chord notes (which do not necessarilydifferent elements) for each rhythmic pattern in each style. belong to the chord, commonly called passing notes) [36]. This pro- cess is done by observing the position of each note in the measure. A sample of a random variable that behaves according to the The notes placed in downbeats will be considered chord notes, theconditional pdf in Eq. (8) is drawn for each measure to select the notes placed in upbeats will be considered non-chord ones.pitch contour to use for the composition. Note that, in some particular cases, the real chord notes are not At this stage, the pitch progression selected may not (very likely placed in the measure downbeats [36], however, since the mostwill not) be in accordance with the proper harmony set up. At a important feature is the pitch contour, the model is simplified tolater stage, the chord transposition system will adapt the selected consider the most common case, in which the chord notes arecontour to the harmony progression and it will ensure the continu- placed in measure downbeats.ity of the melodic line (step 4 in Fig. 10). The chord transposition subsystem applies two different3.3. Chord transposition procedures to the two different types of notes: The chord transposition system performs the proper changes to Accented notes must belong to the established chord.the notes to ensure that the harmony is the one selected and to – First chord note (or Narmour candidate [40]): This noteguarantee the continuity of the melodic line according to the is assigned to the nearest pitch that belongs to theexpectation model [40,41]. Our approach implements a level chord.changing harmony adaptation method [42] based on music theory, – Secondary chord notes: In order to keep the originalplus additional constrains derived from the expectation model pitch contour, secondary accented notes are moved to(stage 4 in Fig. 10). the nearest pitch in the contour direction. As stated before, the simple concatenation of independent pat- Unaccented notes do not necessarily belong to the chord. Interns, like the ones selected according to the method described in this case, the original interval between the previous noteSection 3.2.1, causes the appearance of transitions that do not and the current note is replicated.sound natural to both trained and lay listeners. A simplificationof the adaptation of the Narmour’s Implication–Realization model The score shown after stage 4 in Fig. 10 draws the result of theof expectation [40], done by Schellenberg [41], will be used to application of the chord transposition process to the pitch contourshandle this issue. The idea is to generate a Narmour candidate that shown in the same figure immediately after stage 3.2.properly follows the melodic line in the posterior measure, afterthe harmonic adaptation. Thus, after a measure is completed, the When the pitch and harmony adaptation process is finished forfirst note of the following measure (under the hypothesis that every measure, the process of creation of a new melody isthe first note in the measure is a harmonic note) will be the nearest completed.chord note to the candidate. 4. Evaluation In order to generate each new note, the system carries out per-ceptual analyses of the last two notes of each measure. In this con- The system performance was evaluated by building an opentext, perceptual means that they are relevant for the expectation of survey4 based on the framework described in [15]. The surveythe continuation of the melody. These two notes (implication) are showed three different sets of melodies so that each participant inused to evaluate a third note (realization), which will be the the survey evaluated each of the sets separately. Each set consistedcandidate note [41]. Some perceptual descriptors used for the of three melodies, two of them corresponded to scores composedgeneration of the candidate note are the following [41]: by actual human composers and the other one was generated by the proposed system. All the melodies (as well as the accompani- Interval: A small interval (less than a tritone) [44] implies that ment) were performed and recorded by a human musician so that the next note should follow the direction of the pitch progres- the melodies in each set shared the same playing style. sion. A change in the direction would not achieve the expectation. The selection of the real human and automatic melodies was done randomly. There is no special distinction between the three Pitch jump: The pitch jump after a small interval should be melody sets regarding the data sets selected or the parameters similar to the previous one. For example, an interval of 1 tone employed in the composition system. should be followed by another interval of 1 or 2 tones in the same direction, according to the previous rule. The instructions provided to the evaluators indicated that they had to pick the melody that, in their opinion, was generated auto- Progression of the intervals: matically, for each set of three melodies. They were informed that – If the implication interval is less than 2 semitones, then, the in each set, two of the melodies had been composed by human third note should be nearer the first note of the implication. composers and one had been composed by using our automatic This choice implies a change of direction of the pitch melody composition approach. Every participant evaluated all progression. the three sets and none of them knew the composition rules. – After a change in the direction or a large interval, the realiza- tion interval should be smaller than a tritone. The melodies generated by our software were the third in the first group, the second in the second group and the second in the As a result, the first note of the pattern will be replaced by thecandidate. If the first measure in the melody is being analysed, the 4 www.atic.uma.es/composition.
432 C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434Fig. 13. Generated melodies for the evaluation process.third group. In Fig. 13, the scores of the automatically generated Table 9melodies are shown. Identification of melodies composed by the proposed system. Survey results. The participants had the possibility of writing a comment about Melody set identifier Incorrect identification Correct identificationthe melodies and the evaluation. According to the feedbackreceived, most participants decided to select the less appealing 1 71% (63) 29% (25)melody, in their opinion, as the automatically composed one. A 2 70% (62) 30% (26)total of 88 persons participated in the survey. According to the 3 58% (51) 42% (37)declarations of the participants, 17% had no musical backgroundwhile 83% had. Total 67% (176) 33% (88) Table 9 shows the results of the survey. The Melody Set Identifier Table 10refers to each of the three sets of melodies to be evaluated. Incor- Training data for the style replication survey.rect identification shows the percentage of participants that pickedthe wrong sample and Correct Identification draws the percentage Style Number Descriptionof evaluators that chose the melody composed by our system. of songsThe numbers in brackets show the absolute number of participantsor evaluations in each category. 88 participants evaluated the three Pop 50 Hal Leonard, Pop Score Books, 2014 [45]groups of melodies, so the total amount of tests is 264. Globally, Academic 30 Carl Czerny, 30 Etudes de Mécanisme, Op.849, 2010 [46]only in 33% of the experiments the automatically generated mel- Flamenco 20 Various Authors, Viva el Pasodoble, 2005 [47]ody was found. Since the majority of the evaluators admitted hav-ing selected the most unpleasant sample in the set, it can be Table 11 Mean Varianceconsidered that the automatically composed scores can not only Style replication survey. Score.resemble a human composition, but also sound more appealing 3.60 0.7than certain human ones. Style 3.20 1.3 3.93 1.1 An additional subjective survey was proposed5 to evaluate the Popperformance of the system developed regarding music style replica- Academiction. Three databases were generated using the described system Flamencoaccording to three different styles: Pop, Academic and Flamenco.Table 10 describes the data used for training. Each participant was asked to give a score ranging from 1 to 5 (where 1 means different and 5 very similar) to asses the similarity In this survey, several groups of audio samples were presented. of the styles of the training set and the generated set.In each group, the samples labeled as training set represented dataused to extract the required parameters to model the style. The In Table 11, the results of this survey are presented. Forty par-samples labeled as generated set represented the melodies created ticipants gave their opinion about the similarity between trainingby the proposed system. sets and generated data sets. On the one hand, the survey shows higher similarity results for the flamenco set (3.925 out of 5). On 5 www.atic.uma.es/stylereplication.
C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434 433Patter Probability 0.16 Original Probability Distribution 0.14Pattern Probability 0.12 20 40 60 80 100 120 Rhytm Pattern Identifier 0.1 Replicated Probability Distribution 0.08 0.06 20 40 60 80 100 120 0.04 Rhytm Pattern Identifier 0.02 0 0 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0Fig. 14. Probabilistic distribution of pattern rhythm of the training data (top) and a set of output compositions (bottom).the other hand, the style that attained the lowest results is the Aca- 5. Conclusiondemic set with a score of 3.2 out of 5. The variance of the answers A system that analyses melodies and extracts information aboutis high for the Academic case (r2 ¼ 1:3), when compared against the style and the level of complexity of the melodies has been designed. An automatic music composer has been proposed thatthe variance obtained for the Pop style: 0.7. At the sight of these is able to create new melodies of a selected music style by usingresults, three conclusions can be drawn: the parameters extracted by the analysis system. The label regarding the style of each group influenced the par- It has been considered that the music style (or genre) can be ticipants, according to the feedback received. A widely known defined by the study of the probabilistic distribution of the rhyth- style, like Pop, together with popular songs used in the training mic patterns that appear in a musical composition. In [26], Tem- set, helped the participants to set the proper context in order to perley performed a study relating the rhythm and style, evaluate the similarity. On the other hand, an unfamiliar genre following his works, sets of musical pieces have been characterized for most participants (Flamenco) and the ambiguous definition by analysing their rhythmic structures and the time signatures. of Academic style complicated the establishment of a proper The developed system implements an unattended probabilistic context. analysis of musical features plus a novel approach to reuse this data to compose new musical content with style replication. The genre that attained the higher score was Flamenco. Note that this style is characterized by particular rhythmic patterns An open survey prepared to evaluate the system showed that and harmonic progressions. Taking into account that the pro- less than 29% of the participants were able to identify at least posed system is mainly based on the analysis and identification two of the three automatically created scores among other melo- of rhythmic patterns, we can conclude that this result of the dies composed by humans, in the sets of melodies prepared for survey confirms the good performance of the designed com- the evaluation survey. Also, according to the results found in the poser and the validity of the hypotheses employed. experiments, the composition scheme described successfully repli- cates the music style of the songs analysed. Regarding the melodic line, which is also modeled by the pro- posed system, it should be observed that Pop music is charac- Our scheme performs well, according to the results of the sub- terized by a continuous melodic line (with small melodic jective evaluation and to the objective evaluation of the perfor- intervals). On the other hand, Flamenco has outstanding melo- mance of the style replication scheme. The system designed can dic ornaments. At the sight of the results, these features were be improved by adding other functionalities and capabilities such successfully modeled by the proposed scheme and successfully as the generation of polyphonic output including melody and replicated by the melody composition system. accompaniment, the ability of analysing polyphonic scores in order to learn new style-based accompaniments, and the addition of Finally, an objective evaluation of the style replication process mechanisms to model melodic expressiveness.has been performed. Since the music style is assumed to be definedby the distribution of the rhythmic patterns, the sample distribu- Nevertheless, the system described has proved to be able totions of the rhythmic patterns of the training data and a set melo- generate compositions that are difficult to distinguish amongdies composed have been observed. The experiment shows the others created by human compositors.goodness of the style replication scheme since similar sample dis-tributions are found (see Fig. 14). We performed a Pearson’s Chi- AcknowledgementSquare test [48] to evaluate the following hypothesis: the samplesfrom the training data and the samples extracted from the melody This work was supported by the Ministerio de Economía y Com-generated come from the same distribution (null hypothesis). The petitividad of the Spanish Government under Project No. TIN2013-results from this test state that the null hypothesis cannot be 47276-C6-2-R and by the Ministerio de Educación, Cultura yrejected at the commonly used significance level of 5% [49]. Deporte through the ‘Programa Nacional de Movilidad de Recursos
434 C. Roig et al. / Knowledge-Based Systems 71 (2014) 419–434Humanos del Plan Nacional de I-D+i 2008-2011, prorrogado por [18] D. Cope, Computer modeling of musical intelligence in EMI, Comp. Music J. 16Acuerdo de Consejo de Ministros de 7 de octubre de 2011’. The (1992) 69–83.work has been done in the context of Campus de Excelencia Inter-nacional Andalucía Tech, Universidad de Málaga. [19] D. Cope, Computer Models of Musical Creativity, MIT Press, Cambridge, 2005. Lorenzo J. Tardón and Isabel Barbancho are very grateful toGeorge Tzanetakis for his support during their stay at the Univer- [20] M. Delgado, W. Fajardo, M. Molina-Solana, Inmamusys: intelligent multiagentsity of Victoria, Victoria BC, Canada. music system, Exp. Syst. Appl. 36 (2009) 4574–4580.Appendix A. Supplementary data [21] O. López-Ortega, S. López-Popa, Fractals, fuzzy logic and expert systems to assist in the construction of musical pieces, Exp. Syst. Appl. 39 (2012) 11911– Supplementary data associated with this article can be found, in 11923.the online version, at http://dx.doi.org/10.1016/j.knosys.2014.08.018. [22] M. Shan, S. Chiu, Algorithmic compositions based on discovered musical patterns, Multimedia Tools Appl. 46 (2010) 1–23.References [23] P. Ponce de León, J. Iñesta, Statistical description models for melody analysis [1] C. Uhle, J. Herre, Estimation of tempo, micro time and time signature from and characterization, in: Proceedings of the International Computer Music percussive music, in: Proceedings of Digital Audio Effects Workshop 2003 Conference 2004, University of Miami, USA, 2004, pp. 149–156. (DAFx 2003), 2003. [24] A.M. Manufacturers, The Complete MIDI 1.0 Detailed Specification, 1996. [2] M. Ryynänen, A. Klapuri, Automatic transcription of melody, bass line, and [25] R. Brindle, Musical Composition, Oxford University Press, 1986. chords in polyphonic music, Comp. Music J. 32 (2008) 72–86. [26] D. Temperley, Music and Probability, MIT Press, Cambridge, 2007. [27] F. Sierra, Lecciones de Entonación, Real Musical, 1992. [3] Z. Rás, A. Wieczorkowska (Eds.), Advances in Music Information Retrieval, [28] F. Lerdahl, R. Jackendoff, A Generative Theory of Tonal Music, MIT Press, Studies in Computational Intelligence, vol. 274, Springer-Verlag, Berlin/ Heidelberg, 2010. Cambridge, MA, 1983. [29] W. Fitch, A. Rosenfeld, Perception and production of syncopated rhythms, [4] S. Koelsch, W. Siebel, Towards a neural basis of music perception, in: Proceedings of TRENDS in Cognitive Sciences 2005, vol. 9, 2005, pp. 578–584. Music Percep. 25 (2007) 43–58. [30] W. Appel, Harvard Dictionary of Music, second ed., The Belknap Press of [5] R. Goldman, Ionisation; density, 21.5; integrales; octandre; hyperprism; poeme electronique, Mus. Quart. (1961) 133–134. Harvard University, Cambridge, Massachusetts, 2000. [31] K. Seyerlehner, G. Widmer, D. Schnitzer, From rhythm patterns to perceived [6] G. Nierhaus, Algorithmic Composition: Paradigms of Automated Music Generation, vol. 34, Springer-Verlag, Wien, New York, 2010. tempo, in: International Society for Music Information Retrieval (ISMIR 2007), 2007, pp. 519–524. [7] M. Shan, F. Kuo, M. Chen, Music style mining and classification by melody, in: [32] M. McKinney, D. Moelants, Ambiguity In Tempo Perception: What Draws Proceedings of IEEE International Conference on Multimedia and Expo (ICME Listeners to Different Metrical Levels?, University of California Press, 2006 pp. 2002), vol. 1, 2002, pp. 97–100. 155–166. [33] M. Gainza, D. Barry, E. Coyle, Automatic bar line segmentation, in: 123rd [8] Z. Cataltepe, Y. Yaslan, A. Sonmez, Music genre classification using midi and Convention of Audio Engineering Society Convention Paper, 2007. audio features, EURASIP J. Appl. Sig. Process. (2007). [34] M. Gainza, E. Coyle, Time signature detection by using a multi resolution audio similarity matrix, in: 122nd Convention of Audio Engineering Society [9] E. Molina, Hacer música... para aprender a componer, Eufonía, Didáctica de la Convention Paper, 2007. Música (2011) 53–64. [35] J. Foote, M. Cooper, Visualizing musical structure and rhythm via self- similarity, in: Proceedings of the 2001 International Computer Music[10] F. Gouyon, P. Herrera, P. Cano, Pulse-dependent analyses of percussive music, Conference, 2001, pp. 419–422. in: Proceedings of ICASSP 2002, vol. 4, 2002, pp. 396–401. [36] B. Benward, Music: In Theory and Practice, seventh ed., vol. 1, McGraw-Hill Companies, 2003.[11] S. Tojo, K. Hirata, Structural similarity based on time-span tree, in: Proceedings [37] J. Platt, Sequential Minimal Optimization: A Fast Algorithm for Training of 9th International Symposium on Computer Music Modelling and Retrieval Support Vector Machines, Technical Report MSR-TR-98-14, Microsoft (CMMR 2012), Queen Mary University of London, 2012, pp. 645–660. Research, 1998. [38] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The weka[12] M. Müeller, D. Ellis, A. Klapuri, G. Richard, Signal processing for music analysis, data mining software: an update, SIGKDD Explor. 11 (2009). IEEE J. Select. Topics Sig. Process. 5 (2011) 1088–1110. [39] W. Dowling, D. Fujitani, Contour, interval and pitch recognition in memory for melodies, J. Acoust. Soc. Am. 49 (1971) 524–531.[13] E. Cambouropoulos, From midi to traditional musical notation, in: Proceedings [40] E. Narmour, The Analysis and Cognition of Melodic Complexity: The of the AAAI Workshop on Artificial Intelligence and Music: Towards Formal Implication–Realization Model, University of Chicago Press, 1992. Models for Composition, Performance and Analysis, vol. 30, 2000. [41] E. Schellenberg, Simplifying the implication–realization model of musical expectancy, Music Percep. 14 (1997) 295–318.[14] A. Merwe, V. Der, W. Schulze, Music generation with markov models, IEEE [42] D. Roca, E. Molina, Vademecum Musical, Instituto de Eduación Musical, 2006. Multimedia 18 (2011) 78–85. [43] R. Ottman, Elementary Harmony: Theory and Practice, University of Michigan Press, USA, 1998.[15] M. Pearce, W.G., Towards a framework for the evaluation of machine [44] A. Yilmaz, Z. Telatar, Note-against-note two-voice counterpoint by means of compositions, in: Proceedings of the AISB’01 Symposium on AI and fuzzy logic, Knowl.-Based Syst. 23 (2010) 256–266. Creativity in Arts and Science, 2001, pp. 22–32. [45] H. Leonard, Hal Leonard’s Pop Score Sheet Books, Hal Leonard Corporation, 2014.[16] D. Conklin, Music generation from statistical models, in: Proceedings of the [46] C. Czerny, 30 Etudes de Mécanisme, Op. 849, Peters, 2010. Symposium on Artificial Intelligence and Creativity in the Arts and Sciences [47] V. Authors, Viva el Pasodoble, Nueva Carisch España, 2005. (AISB 2003), 2003, pp. 30–35. [48] A. Field, Discovering Statistics Using IBM SPSS Statistics, Sage, 2013. [49] S. Stigler, Fisher and the 5% level, Chance 21 (2008). 12–12.[17] E. Miranda, J. Biles, Evolutionary Computer Music, Springer, 2007.
Search
Read the Text Version
- 1 - 16
Pages: