Code for the paper:

Reconstructing Phylogenetic Networks via Cherry Picking and Machine Learning
Giulia Bernardini, Leo van Iersel, Esther Julien and Leen Stougie

To run the code, the following packages have to be installed: numpy, pandas, scikit-learn, networkx, and joblib.

Cherry picking heuristic (CPH)

The CPH is implemented in CPH.py. To experiment with this file,

RUN in terminal:

python main_heuristic.py <instance num.> <ML model name> <leaf number> <bool (0/1) for exact input> <option>

option: if exact input = 0: option = reticulation number, else: option = forest size

EXAMPLE:

python main_heuristic.py 0 N10_maxL100_random_balanced 20 0 50

In the file test_data_gen.py, tree sets used for testing can be generated.

Generate training data

The code for initializing and updating the features can be found in Features.py.

For training the random forest, we first have to generate training data. The code for this is given in train_data_gen.py. RUN in terminal:

python train_data_gen.py <number of networks> <maxL>

maxL: maximum number of leaves per network EXAMPLE:

python train_data_gen.py 10 20

Train random forest

In the folder LearningCherries you can find the code for training a random forest. In the folder LearningCherries/TrainedRFModels, there are some trained random forests in joblib format.

Preprocessed data

The Data folder consists of:

Results of CPH instances. The instances are divided in 4 categories: FTS (Full Tree Set), Real, RealSmall, and RTS (Restricted Tree Set).
- FINAL_heuristic_scores_ML[<random forest model used>]_TEST[<test instance type used>]_<number of instances>.pickle
Test for different test instances
Train for generated train data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

LearningCherries

LearningCherries

NetworkGen

NetworkGen

.gitignore

.gitignore

CPH.py

CPH.py

Features.py

Features.py

README.md

README.md

main_heuristic.py

main_heuristic.py

test_data_gen.py

test_data_gen.py

train_data_gen.py

train_data_gen.py

Repository files navigation

Code for the paper:

Cherry picking heuristic (CPH)

Generate training data

Train random forest

Preprocessed data

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Data		Data
LearningCherries		LearningCherries
NetworkGen		NetworkGen
.gitignore		.gitignore
CPH.py		CPH.py
Features.py		Features.py
README.md		README.md
main_heuristic.py		main_heuristic.py
test_data_gen.py		test_data_gen.py
train_data_gen.py		train_data_gen.py

estherjulien/HybridML

Folders and files

Latest commit

History

Repository files navigation

Code for the paper:

Cherry picking heuristic (CPH)

Generate training data

Train random forest

Preprocessed data

About

Resources

Stars

Watchers

Forks

Languages