چکیده
1. مقدمه
2. استخراج ویژگی صوتی
3. ترکیب ملودی موسیقی مبتنی بر مدل سکانس-سکانس
4. ترکیب ملودی مبتنی بر RNN
5. آزمایش ها و تجزیه و تحلیل نتایج
6. نتیجه گیری
منابع
Abstract
1. Introduction
2. Acoustic Feature Extraction
3. Sequence-Sequence Model-Based Music Melody Synthesis
4. RNN-Based Melody Synthesis
5. Experiments and Result Analysis
6. Conclusions
Data Availability
References
چکیده
ایجاد موسیقی کامپیوتری دارای چشم اندازهای کاربردی گسترده است. به طور کلی برای تولید موسیقی که با مدل موسیقی تک نمادی اصلی مطابقت دارد یا ریتمها و ضربهای موسیقی را به خاطر میسپارد، به هوش مصنوعی (AI) و یادگیری ماشین (ML) متکی است. با این حال، مدلهای سنتز ملودی موسیقی مبتنی بر شبکههای عصبی مصنوعی (ANN) بسیار کم است. برخی از مدلهای مبتنی بر ANN نمیتوانند با تغییر ناپذیری جابهجایی مجموعه آموزشی ریتم اصلی سازگار شوند. برای غلبه بر این نقص، این مقاله سعی میکند یک فناوری سنتز خودکار ملودیهای آموزش موسیقی مبتنی بر شبکه عصبی بازگشتی (RNN) ایجاد کند. در ابتدا، یک استراتژی برای استخراج ویژگی های آکوستیک از ملودی موسیقی پیشنهاد شد. سپس، مدل سکانس-سکانس برای سنتز ملودی های موسیقی عمومی به کار گرفته شد. پس از آن، یک RNN برای ترکیب ملودی موسیقی با ملودی آواز، مانند یافتن بخشهای آواز مناسب برای ملودی موسیقی در سناریوی آموزش ایجاد شد. RNN می تواند ملودی موسیقی را با تاخیری کوتاه تنها بر اساس ویژگی های آکوستیک استاتیک ترکیب کند و نیاز به ویژگی های پویا را از بین ببرد. مدل پیشنهادی از طریق آزمایشات معتبر است.
توجه! این متن ترجمه ماشینی بوده و توسط مترجمین ای ترجمه، ترجمه نشده است.
Abstract
Computer music creation boasts broad application prospects. It generally relies on artificial intelligence (AI) and machine learning (ML) to generate the music score that matches the original mono-symbol score model or memorize/recognize the rhythms and beats of the music. However, there are very few music melody synthesis models based on artificial neural networks (ANNs). Some ANN-based models cannot adapt to the transposition invariance of original rhythm training set. To overcome the defect, this paper tries to develop an automatic synthesis technology of music teaching melodies based on recurrent neural network (RNN). Firstly, a strategy was proposed to extract the acoustic features from music melody. Next, the sequence-sequence model was adopted to synthetize general music melodies. After that, an RNN was established to synthetize music melody with singing melody, such as to find the suitable singing segments for the music melody in teaching scenario. The RNN can synthetize music melody with a short delay solely based on static acoustic features, eliminating the need for dynamic features. The proposed model was proved valid through experiments.
Introduction
With the rapid development of modern computer science, many researchers have shifted their focus to computer-based algorithm composition or automatic music melody generation system. The research results on music melody synthesis and music modeling methods are being applied to various fields. The research of computer music creation aims to quantify and combine the emotional tendencies of music, with the aid of computer and mathematical algorithms. The specific tasks include aided composition, sound simulation and storage, and music analysis and creation [1, 2]. Computer music creation generally relies on artificial intelligence (AI) and machine learning (ML) to generate the music score that matches the original mono-symbol score model or memorize/recognize the rhythms and beats of the music. Despite its broad application prospects, the AI-based composition without needing lots of music knowledge rules is in the theoretical stage [3, 4].
Speech processing has been widely applied in composition and songwriting, record production, and entertainment. Unlike simple speech synthesis, music melody synthesis has two additional processing steps: tone detection and transformation [5, 6]. Wenner et al. [7] preprocessed the musical melody synthesis corpus through automatic note segmentation and voiced/unvoiced sound recognition, constructed a high-quality music melody synthesis system, and proposed a music melody adjustment algorithm, which functions as an adaptive filter capable of detecting musical note cycles.
Conclusions
Based on the RNN algorithm, this paper probes deep into the automatic synthesis of music teaching melodies. After extracting the acoustic features from music melodies, the authors established a sequence-sequence model for synthetizing general music melodies. To find the suitable signing segments for a given music melody in the teaching scenario, an RNN was set up to synthetize music melody with singing melody. After that, the convergence of different network models was compared through experiments, which verifies the feasibility of our model. In addition, the results of different models were compared before and after adding the singing melody, and the difference of the melody generated by our model and the original music melody was quantified accurately. Furthermore, the prediction error of phoneme time of each model configuration and that after applying the time constraint were obtained through experiments. The relevant results confirm the superiority of our model over DCNN and LSTM in modeling music melody sequence.