Andrés Goens

Andrés Goens
	E-mail	andres.goens@tu-dresden.de

Publications

2024
Marcus Rossel, Andrés Goens, "Bridging Syntax and Semantics of Lean Expressions in E-Graphs", Proceedings of the 2024 EGRAPHS workshop, Jun 2024. [Bibtex & Downloads]

Bridging Syntax and Semantics of Lean Expressions in E-Graphs

Reference

Marcus Rossel, Andrés Goens, "Bridging Syntax and Semantics of Lean Expressions in E-Graphs", Proceedings of the 2024 EGRAPHS workshop, Jun 2024.

Bibtex

@InProceedings{rossel_egraphs24,
title={Bridging Syntax and Semantics of Lean Expressions in E-Graphs},
author={Marcus Rossel and Andrés Goens},
booktitle = {Proceedings of the 2024 EGRAPHS workshop},
year={2024},
month = jun,
location = {Copenhagen, Denmark},
url = {https://arxiv.org/abs/2405.10188},
numpages = {6},
}

Downloads

No Downloads available for this publication

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3768

×

2023
Jeronimo Castrillon, Karol Desnos, Andrés Goens, Christian Menard, "Dataflow Models of Computation for Programming Heterogeneous Multicores", Springer Nature Singapore, pp. 1–40, Singapore, Jan 2023. [doi] [Bibtex & Downloads]

Dataflow Models of Computation for Programming Heterogeneous Multicores

Reference

Jeronimo Castrillon, Karol Desnos, Andrés Goens, Christian Menard, "Dataflow Models of Computation for Programming Heterogeneous Multicores", Springer Nature Singapore, pp. 1–40, Singapore, Jan 2023. [doi]

Abstract
The hardware complexity of modern integrated circuits keeps increasing at a steady pace. Heterogeneous Multi-Processor Systems-on-Chips (MPSoCs) integrate general-purpose processing elements, domain-specific processors, dedicated hardware accelerators, reconfigurable logic, as well as complex memory hierarchies and interconnect. While offering unprecedented computational power and energy efficiency, MPSoCs are notoriously difficult to program. This chapter presents Models of Computation (MoCs) as an appealing alternative to traditional programming methodologies to harness the full capacities of modern MPSoCs. By raising the level of abstraction, MoCs make it possible to specify complex systems with little knowledge of the target architecture. The properties of MoCs make it possible for tools to automatically generate efficient implementations for heterogeneous MPSoCs, relieving developers from time-consuming manual exploration. This chapter focuses on a specific MoC family called dataflow MoCs. Dataflow MoCs represent systems as graphs of computational entities and communication channels. This graph-based system specification enables intuitive description of parallelism and supports many analysis and optimization techniques for deriving safe and highly efficient implementations on MPSoCs.

Bibtex

@InBook{castrillon_hca22,
author = {Jeronimo Castrillon and Karol Desnos and Andrés Goens and Christian Menard},
booktitle = {Handbook of Computer Architecture},
date = {2023-01},
pages = {1--40},
title = {Dataflow Models of Computation for Programming Heterogeneous Multicores},
doi = {10.1007/978-981-15-6401-7_45-2},
isbn = {978-981-15-6401-7},
url = {https://doi.org/10.1007/978-981-15-6401-7_45-2},
editor = {Anupam Chattopadhyay et al.},
publisher = {Springer Nature Singapore},
address = {Singapore},
abstract = {The hardware complexity of modern integrated circuits keeps increasing at a steady pace. Heterogeneous Multi-Processor Systems-on-Chips (MPSoCs) integrate general-purpose processing elements, domain-specific processors, dedicated hardware accelerators, reconfigurable logic, as well as complex memory hierarchies and interconnect. While offering unprecedented computational power and energy efficiency, MPSoCs are notoriously difficult to program. This chapter presents Models of Computation (MoCs) as an appealing alternative to traditional programming methodologies to harness the full capacities of modern MPSoCs. By raising the level of abstraction, MoCs make it possible to specify complex systems with little knowledge of the target architecture. The properties of MoCs make it possible for tools to automatically generate efficient implementations for heterogeneous MPSoCs, relieving developers from time-consuming manual exploration. This chapter focuses on a specific MoC family called dataflow MoCs. Dataflow MoCs represent systems as graphs of computational entities and communication channels. This graph-based system specification enables intuitive description of parallelism and supports many analysis and optimization techniques for deriving safe and highly efficient implementations on MPSoCs.},
month = jan,
year = {2023},
}

Downloads

2301_Castrillon_dataflow-programmig_preview-www [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3393

×

2021
Robert Khasanov, Julian Robledo, Christian Menard, Andr'es Goens, Jeronimo Castrillon, "Domain-specific hybrid mapping for energy-efficient baseband processing in wireless networks", In ACM Transactions on Embedded Computing Systems (TECS). Special issue of the International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES), Association for Computing Machinery, vol. 20, no. 5s, New York, NY, USA, Sep 2021. [doi] [Bibtex & Downloads]

Domain-specific hybrid mapping for energy-efficient baseband processing in wireless networks

Reference

Robert Khasanov, Julian Robledo, Christian Menard, Andr'es Goens, Jeronimo Castrillon, "Domain-specific hybrid mapping for energy-efficient baseband processing in wireless networks", In ACM Transactions on Embedded Computing Systems (TECS). Special issue of the International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES), Association for Computing Machinery, vol. 20, no. 5s, New York, NY, USA, Sep 2021. [doi]

Abstract
Advancing telecommunication standards continuously push for larger bandwidths, lower latencies, and faster data rates. The receiver baseband unit not only has to deal with a huge number of users expecting connectivity but also with a high workload heterogeneity. As a consequence of the required flexibility, baseband processing has seen a trend towards software implementations in cloud Radio Access Networks (cRANs). The flexibility gained from software implementation comes at the price of impoverished energy efficiency. This paper addresses the trade-off between flexibility and efficiency by proposing a domain-specific hybrid mapping algorithm. Hybrid mapping is an established approach from the model-based design of embedded systems that allows us to retain flexibility while targeting heterogeneous hardware. Depending on the current workload, the runtime system selects the most energy-efficient mapping configuration without violating timing constraints. We leverage the structure of baseband processing, and refine the scheduling methodology, to enable efficient mapping of 100s of tasks at the millisecond granularity, improving upon state-of-the-art hybrid approaches. We validate our approach on an Odroid XU4 and virtual platforms with application-specific accelerators on an open-source prototype. On different LTE workloads, our hybrid approach shows significant improvements both at design time and at runtime. At design-time, mappings of similar quality to those obtained by state-of-the-art methods are generated around four orders of magnitude faster. At runtime, multi-application schedules are computed 37.7% faster than the state-of-the-art without compromising on the quality.

Bibtex

@Article{khasanov_cases21,
author = {Robert Khasanov and Julian Robledo and Christian Menard and Andrés Goens and Jeronimo Castrillon},
title = {Domain-specific hybrid mapping for energy-efficient baseband processing in wireless networks},
doi = {10.1145/3476991},
issn = {1539-9087},
number = {5s},
url = {https://doi.org/10.1145/3476991},
volume = {20},
abstract = {Advancing telecommunication standards continuously push for larger bandwidths, lower latencies, and faster data rates. The receiver baseband unit not only has to deal with a huge number of users expecting connectivity but also with a high workload heterogeneity. As a consequence of the required flexibility, baseband processing has seen a trend towards software implementations in cloud Radio Access Networks (cRANs). The flexibility gained from software implementation comes at the price of impoverished energy efficiency. This paper addresses the trade-off between flexibility and efficiency by proposing a domain-specific hybrid mapping algorithm. Hybrid mapping is an established approach from the model-based design of embedded systems that allows us to retain flexibility while targeting heterogeneous hardware. Depending on the current workload, the runtime system selects the most energy-efficient mapping configuration without violating timing constraints. We leverage the structure of baseband processing, and refine the scheduling methodology, to enable efficient mapping of 100s of tasks at the millisecond granularity, improving upon state-of-the-art hybrid approaches. We validate our approach on an Odroid XU4 and virtual platforms with application-specific accelerators on an open-source prototype. On different LTE workloads, our hybrid approach shows significant improvements both at design time and at runtime. At design-time, mappings of similar quality to those obtained by state-of-the-art methods are generated around four orders of magnitude faster. At runtime, multi-application schedules are computed 37.7% faster than the state-of-the-art without compromising on the quality.},
address = {New York, NY, USA},
articleno = {60},
issue_date = {October 2021},
journal = {ACM Transactions on Embedded Computing Systems (TECS). Special issue of the International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES)},
location = {Virtual conference},
month = sep,
numpages = {26},
publisher = {Association for Computing Machinery},
year = {2021},
}

Downloads

2110_Khasanov_CASES [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3155

×
Alexander Brauckmann, Andr'es Goens, Jeronimo Castrillon, "PolyGym: Polyhedral Optimizations as an Environment for Reinforcement Learning", Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 17-29, Sep 2021. [doi] [Bibtex & Downloads]

PolyGym: Polyhedral Optimizations as an Environment for Reinforcement Learning

Reference

Alexander Brauckmann, Andr'es Goens, Jeronimo Castrillon, "PolyGym: Polyhedral Optimizations as an Environment for Reinforcement Learning", Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 17-29, Sep 2021. [doi]

Abstract
The polyhedral model allows a structured way of defining semantics-preserving transformations to improve the performance of a large class of loops. Finding profitable points in this space is a hard problem which is usually approached by heuristics that generalize from domain-expert knowledge. Existing search space formulations in state-of-the-art heuristics depend on the shape of particular loops, making it hard to leverage generic and more powerful optimization techniques from the machine learning domain. In this paper, we propose a shape-agnostic formulation for the space of legal transformations in the polyhedral model as a Markov Decision Process (MDP). Instead of using transformations, the formulation is based on an abstract space of possible schedules. In this formulation, states model partial schedules, which are constructed by actions that are reusable across different loops. With a simple heuristic to traverse the space, we demonstrate that our formulation is powerful enough to match and outperform state-of-the-art heuristics. On the Polybench benchmark suite, we found the search space to contain transformations that lead to a speedup of 3.39x over LLVM O3, which is 1.34x better than the best transformations found in the search space of isl, and 1.83x better than the speedup achieved by the default heuristics of isl. Our generic MDP formulation enables future work to use reinforcement learning to learn optimization heuristics over a wide range of loops. This also contributes to the emerging field of machine learning in compilers, as it exposes a novel problem formulation that can push the limits of existing methods.

Bibtex

@InProceedings{brauckmann_pact21,
author = {Brauckmann, Alexander and Goens, Andrés and Castrillon, Jeronimo},
booktitle = {Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT)},
title = {PolyGym: Polyhedral Optimizations as an Environment for Reinforcement Learning},
month = sep,
doi = {10.1109/PACT52795.2021.00009},
pages = {17-29},
url = {https://ieeexplore.ieee.org/document/9563041},
year = {2021},
abstract = {The polyhedral model allows a structured way of defining semantics-preserving transformations to improve the performance of a large class of loops. Finding profitable points in this space is a hard problem which is usually approached by heuristics that generalize from domain-expert knowledge. Existing search space formulations in state-of-the-art heuristics depend on the shape of particular loops, making it hard to leverage generic and more powerful optimization techniques from the machine learning domain. In this paper, we propose a shape-agnostic formulation for the space of legal transformations in the polyhedral model as a Markov Decision Process (MDP). Instead of using transformations, the formulation is based on an abstract space of possible schedules. In this formulation, states model partial schedules, which are constructed by actions that are reusable across different loops. With a simple heuristic to traverse the space, we demonstrate that our formulation is powerful enough to match and outperform state-of-the-art heuristics. On the Polybench benchmark suite, we found the search space to contain transformations that lead to a speedup of 3.39x over LLVM O3, which is 1.34x better than the best transformations found in the search space of isl, and 1.83x better than the speedup achieved by the default heuristics of isl. Our generic MDP formulation enables future work to use reinforcement learning to learn optimization heuristics over a wide range of loops. This also contributes to the emerging field of machine learning in compilers, as it exposes a novel problem formulation that can push the limits of existing methods.},
}

Downloads

2109_Brauckmann_PACT [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3182

×
Andr'es Goens, Jeronimo Castrillon, "Embeddings of Task Mappings to Multicore Systems", Proceedings of the 21st IEEE International Conference on Embedded Computer Systems: Architectures Modeling and Simulation (SAMOS), Springer-Verlag, pp. 161–176, Berlin, Heidelberg, Jul 2021. [doi] [Bibtex & Downloads]

Embeddings of Task Mappings to Multicore Systems

Reference

Andr'es Goens, Jeronimo Castrillon, "Embeddings of Task Mappings to Multicore Systems", Proceedings of the 21st IEEE International Conference on Embedded Computer Systems: Architectures Modeling and Simulation (SAMOS), Springer-Verlag, pp. 161–176, Berlin, Heidelberg, Jul 2021. [doi]

Abstract
The problem of finding good mappings is central to designing and executing applications efficiently in embedded systems. In heterogeneous multicores, which are ubiquitous today, this problem yields an intractably large design space of possible mappings. Most methods explore this space using heuristics, many of which implicitly use geometric notions in mappings. In this paper we explore the geometry of the mapping problem explicitly, for finding embeddings of the mapping space that capture its structure. This allows us to formulate new mapping strategies by leveraging the geometry of the mapping space, as well as improving existing heuristics that do so implicitly. We evaluate our approach on a novel mapping heuristic based on gradient descent, as well as multiple existing meta-heuristics. For complex architectures, our methods improved the results of established exploration meta-heuristics by about an order of magnitude in average.

Bibtex

@InProceedings{goens_samos21,
author = {Andrés Goens and Jeronimo Castrillon},
booktitle = {Proceedings of the 21st IEEE International Conference on Embedded Computer Systems: Architectures Modeling and Simulation (SAMOS)},
date = {2021-07},
title = {Embeddings of Task Mappings to Multicore Systems},
doi = {10.1007/978-3-031-04580-6_11},
isbn = {978-3-031-04579-0},
location = {Samos, Greece},
organization = {IEEE},
pages = {161--176},
publisher = {Springer-Verlag},
url = {https://doi.org/10.1007/978-3-031-04580-6_11},
abstract = {The problem of finding good mappings is central to designing and executing applications efficiently in embedded systems. In heterogeneous multicores, which are ubiquitous today, this problem yields an intractably large design space of possible mappings. Most methods explore this space using heuristics, many of which implicitly use geometric notions in mappings. In this paper we explore the geometry of the mapping problem explicitly, for finding embeddings of the mapping space that capture its structure. This allows us to formulate new mapping strategies by leveraging the geometry of the mapping space, as well as improving existing heuristics that do so implicitly. We evaluate our approach on a novel mapping heuristic based on gradient descent, as well as multiple existing meta-heuristics. For complex architectures, our methods improved the results of established exploration meta-heuristics by about an order of magnitude in average.},
address = {Berlin, Heidelberg},
month = jul,
numpages = {16},
year = {2021},
}

Downloads

2107_Goens_SAMOS [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3071

×
Andr'es Goens, Timo Nicolai, Jeronimo Castrillon, "mpsym: Improving Design-Space Exploration of Clustered Manycores with Arbitrary Topologies", In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), IEEE Press, vol. 41, no. 6, pp. 1592-1605, Jul 2021. [doi] [Bibtex & Downloads]

mpsym: Improving Design-Space Exploration of Clustered Manycores with Arbitrary Topologies

Reference

Andr'es Goens, Timo Nicolai, Jeronimo Castrillon, "mpsym: Improving Design-Space Exploration of Clustered Manycores with Arbitrary Topologies", In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), IEEE Press, vol. 41, no. 6, pp. 1592-1605, Jul 2021. [doi]

Bibtex

@Article{goens_tcad21,
author = {Andrés Goens and Timo Nicolai and Jeronimo Castrillon},
date = {2021-07},
journal = {IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)},
title = {mpsym: Improving Design-Space Exploration of Clustered Manycores with Arbitrary Topologies},
month = jul,
numpages = {14 pp},
volume={41},
number={6},
pages={1592-1605},
doi = {10.1109/TCAD.2021.3102512},
issn = {1937-4151},
url = {https://ieeexplore.ieee.org/document/9506807},
publisher = {IEEE Press},
year = {2021},
}

Downloads

2107_Goens_TCAD [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3171

×
Andres Wilhelm Goens Jokisch, "Improving Model-Based Software Synthesis: A Focus on Mathematical Structures", PhD thesis, TU Dresden, 172 pp., May 2021. (Cloud & Heat Dissertationspreis) [Bibtex & Downloads]

Improving Model-Based Software Synthesis: A Focus on Mathematical Structures

Reference

Andres Wilhelm Goens Jokisch, "Improving Model-Based Software Synthesis: A Focus on Mathematical Structures", PhD thesis, TU Dresden, 172 pp., May 2021. (Cloud & Heat Dissertationspreis)

Bibtex

@PhdThesis{goens_phd21,
author = {Goens Jokisch, Andres Wilhelm},
institution = {TU Dresden},
title = {Improving Model-Based Software Synthesis: A Focus on Mathematical Structures},
pages = {172 pp.},
month = may,
year = {2021},
url = {https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-748845},

}

Downloads

2021_Goens_PhD [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3086

×
Christian Menard, Andr'es Goens, Gerald Hempel, Robert Khasanov, Julian Robledo, Felix Teweleitt, Jeronimo Castrillon, "Mocasin—Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores", Proceedings of the 2021 Drone Systems Engineering and Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Association for Computing Machinery, pp. 66–73, New York, NY, USA, Jan 2021. (Video Presentation) [doi] [Bibtex & Downloads]

Mocasin—Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores

Reference

Christian Menard, Andr'es Goens, Gerald Hempel, Robert Khasanov, Julian Robledo, Felix Teweleitt, Jeronimo Castrillon, "Mocasin—Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores", Proceedings of the 2021 Drone Systems Engineering and Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Association for Computing Machinery, pp. 66–73, New York, NY, USA, Jan 2021. (Video Presentation) [doi]

Abstract
We present Mocasin, an open-source rapid prototyping framework for researching, implementing and validating new algorithms and solutions in the field of mapping software to heterogeneous multi-cores. In contrast to the many existing tools that often specialize for a particular use-case, Mocasin is an open, flexible and generic research environment that abstracts over the approaches taken by other tools. Mocasin is designed to support a wide range of models of computation and input formats, implements manifold mapping strategies and provides an adjustable high-level simulator for performance estimation. This infrastructure serves as a flexible vehicle for exploring new approaches and as a blueprint for building customized tools. We highlight the key design aspects of Mocasin that enable its flexibility and illustrate its capabilities in a case-study showing how Mocasin can be used for building a customized tool for researching runtime mapping strategies in an LTE uplink receiver.

Bibtex

@InProceedings{menard_rapido21,
author = {Christian Menard and Andrés Goens and Gerald Hempel and Robert Khasanov and Julian Robledo and Felix Teweleitt and Jeronimo Castrillon},
title = {Mocasin---Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores},
booktitle = {Proceedings of the 2021 Drone Systems Engineering and Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
year = {2021},
address = {New York, NY, USA},
month = jan,
publisher = {ACM},
doi = {10.1145/3444950.3447285},
isbn = {9781450389525},
location = {Budapest, Hungary},
pages = {66–73},
publisher = {Association for Computing Machinery},
series = {DroneSE and RAPIDO '21},
url = {https://doi.org/10.1145/3444950.3447285},
abstract = {We present Mocasin, an open-source rapid prototyping framework for researching, implementing and validating new algorithms and solutions in the field of mapping software to heterogeneous multi-cores. In contrast to the many existing tools that often specialize for a particular use-case, Mocasin is an open, flexible and generic research environment that abstracts over the approaches taken by other tools. Mocasin is designed to support a wide range of models of computation and input formats, implements manifold mapping strategies and provides an adjustable high-level simulator for performance estimation. This infrastructure serves as a flexible vehicle for exploring new approaches and as a blueprint for building customized tools. We highlight the key design aspects of Mocasin that enable its flexibility and illustrate its capabilities in a case-study showing how Mocasin can be used for building a customized tool for researching runtime mapping strategies in an LTE uplink receiver.},
numpages = {8},

}

Downloads

2101_Menard_RAPIDO [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2944

×

2020
Robert Wittig, Andrés Goens, Christian Menard, Emil Matus, Gerhard P. Fettweis, Jeronimo Castrillon, "Modem Design in the Era of 5G and Beyond: The Need for a Formal Approach", Proceedings of the 27th International Conference on Telecommunications (ICT), pp. 1-5, Oct 2020. [doi] [Bibtex & Downloads]

Modem Design in the Era of 5G and Beyond: The Need for a Formal Approach

Reference

Robert Wittig, Andrés Goens, Christian Menard, Emil Matus, Gerhard P. Fettweis, Jeronimo Castrillon, "Modem Design in the Era of 5G and Beyond: The Need for a Formal Approach", Proceedings of the 27th International Conference on Telecommunications (ICT), pp. 1-5, Oct 2020. [doi]

Abstract
In the era of 5G and beyond, adaptive workloads and the need for energy efficiency drive are becoming increasingly vital. Changes in parameters of the physical layer algorithm can cascade throughout the algorithm, requiring additional changes to keep a correct functionality within the timing bounds. These factors drive the process of designing systems for mobile communication towards reconfigurability. In this paper we analyze the trade-offs involved in changing algorithmic parameters and show how reconfigurable systems can be used to produce energy-efficient systems. We argue that we ought to resort to formal models to tame this reconfigurability and examine where existing formal models fall short.

Bibtex

@InProceedings{goens_ict20,
author = {Robert Wittig and Andr{\'e}s Goens and Christian Menard and Emil Matus and Gerhard P. Fettweis and Jeronimo Castrillon},
booktitle = {Proceedings of the 27th International Conference on Telecommunications (ICT)},
title = {Modem Design in the Era of 5G and Beyond: The Need for a Formal Approach},
location = {Virtual. Bali, Indonesia},
month = oct,
abstract = {In the era of 5G and beyond, adaptive workloads and the need for energy efficiency drive are becoming increasingly vital. Changes in parameters of the physical layer algorithm can cascade throughout the algorithm, requiring additional changes to keep a correct functionality within the timing bounds. These factors drive the process of designing systems for mobile communication towards reconfigurability. In this paper we analyze the trade-offs involved in changing algorithmic parameters and show how reconfigurable systems can be used to produce energy-efficient systems. We argue that we ought to resort to formal models to tame this reconfigurability and examine where existing formal models fall short.},
year = {2020},
pages={1-5},
doi={10.1109/ICT49546.2020.9239539},
url = {https://ieeexplore.ieee.org/document/9239539},
}

Downloads

2010_Wittig_ICT [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2648

×
Alexander Brauckmann, Andrés Goens, Jeronimo Castrillon, "ComPy-Learn: A Toolbox for Exploring Machine Learning Representations for Compilers", In Proceeding: 2020 Forum for Specification and Design Languages (FDL), pp. 1-4, Sep 2020. [doi] [Bibtex & Downloads]

ComPy-Learn: A Toolbox for Exploring Machine Learning Representations for Compilers

Reference

Alexander Brauckmann, Andrés Goens, Jeronimo Castrillon, "ComPy-Learn: A Toolbox for Exploring Machine Learning Representations for Compilers", In Proceeding: 2020 Forum for Specification and Design Languages (FDL), pp. 1-4, Sep 2020. [doi]

Abstract
Deep Learning methods have not only shown to improve software performance in compiler heuristics, but also e.g. to improve security in vulnerability prediction or to boost developer productivity in software engineering tools. A key to the success of such methods across these use cases is the expressiveness of the representation used to abstract from the program code. Recent work has shown that different such representations have unique advantages in terms of performance. However, determining the best-performing one for a given task is often not obvious and requires empirical evaluation. Therefore, we present ComPy-Learn, a toolbox for conveniently defining, extracting, and exploring representations of program code. With syntax-level language information from the Clang compiler frontend and low-level information from the LLVM compiler backend, the tool supports the construction of linear and graph representations and enables an efficient search for the best-performing representation and model for tasks on program code.

Bibtex

@InProceedings{brauckmann_fdl20,
author = {Alexander Brauckmann and Andr\'{e}s Goens and Jeronimo Castrillon},
title = {ComPy-Learn: A Toolbox for Exploring Machine Learning Representations for Compilers},
booktitle = {2020 Forum for Specification and Design Languages (FDL)},
year = {2020},
location = {Kiel, Germany},
month = sep,
pages={1-4},
doi={10.1109/FDL50818.2020.9232946},
url = {https://ieeexplore.ieee.org/document/9232946},
abstract = {Deep Learning methods have not only shown to improve software performance in compiler heuristics, but also e.g. to improve security in vulnerability prediction or to boost developer productivity in software engineering tools. A key to the success of such methods across these use cases is the expressiveness of the representation used to abstract from the program code. Recent work has shown that different such representations have unique advantages in terms of performance. However, determining the best-performing one for a given task is often not obvious and requires empirical evaluation. Therefore, we present ComPy-Learn, a toolbox for conveniently defining, extracting, and exploring representations of program code. With syntax-level language information from the Clang compiler frontend and low-level information from the LLVM compiler backend, the tool supports the construction of linear and graph representations and enables an efficient search for the best-performing representation and model for tasks on program code.},
}

Downloads

2009_Brauckmann_FDL [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2842

×
Christian Menard, Andrés Goens, Marten Lohstroh, Jeronimo Castrillon, "Achieving Determinism in Adaptive AUTOSAR", Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 822–827, Mar 2020. (Best paper award candidate A-Track, Video Presentation) [doi] [Bibtex & Downloads]

Achieving Determinism in Adaptive AUTOSAR

Reference

Christian Menard, Andrés Goens, Marten Lohstroh, Jeronimo Castrillon, "Achieving Determinism in Adaptive AUTOSAR", Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 822–827, Mar 2020. (Best paper award candidate A-Track, Video Presentation) [doi]

Abstract
AUTOSAR AP is an emerging industry standard that tackles the challenges of modern automotive software design, but does not provide adequate mechanisms to enforce deterministic execution. This poses profound challenges to testing and maintenance of the application software, which is particularly problematic for safety-critical applications. In this paper, we analyze the problem of nondeterminism in AP and propose a framework for the design of deterministic automotive software that transparently integrates with the AP communication mechanisms. We illustrate our approach in a case study based on the brake assistant demonstrator application that is provided by the AUTOSAR consortium. We show that the original implementation is nondeterministic and discuss a deterministic solution based on our framework.

Bibtex

@InProceedings{menard_date20,
author = {Christian Menard and Andr{\'e}s Goens and Marten Lohstroh and Jeronimo Castrillon},
title = {Achieving Determinism in Adaptive AUTOSAR},
booktitle = {Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE)},
year = {2020},
series = {DATE '20},
month = mar,
publisher = {IEEE},
location = {Grenoble, France},

abstract = {AUTOSAR AP is an emerging industry standard that tackles the challenges of modern automotive software design, but does not provide adequate mechanisms to enforce deterministic execution. This poses profound challenges to testing and maintenance of the application software, which is particularly problematic for safety-critical applications. In this paper, we analyze the problem of nondeterminism in AP and propose a framework for the design of deterministic automotive software that transparently integrates with the AP communication mechanisms. We illustrate our approach in a case study based on the brake assistant demonstrator application that is provided by the AUTOSAR consortium. We show that the original implementation is nondeterministic and discuss a deterministic solution based on our framework.},
isbn = {978-3-9819263-4-7},
pages = {822--827},
doi = {10.23919/DATE48585.2020.9116430},
url = {https://ieeexplore.ieee.org/abstract/document/9116430},
}

Downloads

2003_Menard_DATE [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2545

×
Asif Ali Khan, Andrés Goens, Fazal Hameed, Jeronimo Castrillon, "Generalized Data Placement Strategies for Racetrack Memories", Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 1502–1507, Mar 2020. (Video Presentation) [doi] [Bibtex & Downloads]

Generalized Data Placement Strategies for Racetrack Memories

Reference

Asif Ali Khan, Andrés Goens, Fazal Hameed, Jeronimo Castrillon, "Generalized Data Placement Strategies for Racetrack Memories", Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 1502–1507, Mar 2020. (Video Presentation) [doi]

Abstract
Ultra-dense non-volatile racetrack memories (RTMs) have been investigated at various levels in the memory hierarchy for improved performance and reduced energy consumption. However, the innate shift operations in RTMs hinder their applicability to replace low-latency on-chip memories. Recent research has demonstrated that intelligent placement of memory objects in RTMs can significantly reduce the amount of shifts with no hardware overhead, albeit for specific system setups. However, existing placement strategies may lead to sub-optimal performance when applied to different architectures. In this paper we look at generalized data placement mechanisms that improve upon existing ones by taking into account the underlying memory architecture and the timing and liveliness information of memory objects. We propose a novel heuristic and a formulation using genetic algorithms that optimize key performance parameters. We show that, on average, our generalized approach improves the number of shifts, performance and energy consumption by 4.3x, 46% and 55% respectively compared to the state-of-the-art.

Bibtex

@InProceedings{khan_date20,
author = {Asif Ali Khan and Andr{\'e}s Goens and Fazal Hameed and Jeronimo Castrillon},
title = {Generalized Data Placement Strategies for Racetrack Memories},
booktitle = {Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE)},
year = {2020},
series = {DATE '20},
publisher = {IEEE},
location = {Grenoble, France},
month = mar,
isbn = {978-3-9819263-4-7},
pages = {1502--1507},
doi = {10.23919/DATE48585.2020.9116245},
url = {https://ieeexplore.ieee.org/document/9116245},

abstract = {Ultra-dense non-volatile racetrack memories (RTMs) have been investigated at various levels in the memory hierarchy for improved performance and reduced energy consumption. However, the innate shift operations in RTMs hinder their applicability to replace low-latency on-chip memories. Recent research has demonstrated that intelligent placement of memory objects in RTMs can significantly reduce the amount of shifts with no hardware overhead, albeit for specific system setups. However, existing placement strategies may lead to sub-optimal performance when applied to different architectures. In this paper we look at generalized data placement mechanisms that improve upon existing ones by taking into account the underlying memory architecture and the timing and liveliness information of memory objects. We propose a novel heuristic and a formulation using genetic algorithms that optimize key performance parameters. We show that, on average, our generalized approach improves the number of shifts, performance and energy consumption by 4.3x, 46% and 55% respectively compared to the state-of-the-art.},
}

Downloads

2003_Khan_DATE [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2547

×
Marten Lohstroh, Íñigo Íncer Romero, Andrés Goens, Patricia Derler, Jeronimo Castrillon, Edward A. Lee, Alberto Sangiovanni-Vincentelli, "Reactors: A Deterministic Model for Composable Reactive Systems", Cyber Physical Systems. Model-Based Design – Proceedings of the 9th Workshop on Design, Modeling and Evaluation of Cyber Physical Systems (CyPhy 2019) and the Workshop on Embedded and Cyber-Physical Systems Education (WESE 2019) (Chamberlain, Roger and Edin Grimheden, Martin and Taha, Walid), Springer International Publishing, pp. 59–85, Cham, Feb 2020. [doi] [Bibtex & Downloads]

Reactors: A Deterministic Model for Composable Reactive Systems

Reference

Marten Lohstroh, Íñigo Íncer Romero, Andrés Goens, Patricia Derler, Jeronimo Castrillon, Edward A. Lee, Alberto Sangiovanni-Vincentelli, "Reactors: A Deterministic Model for Composable Reactive Systems", Cyber Physical Systems. Model-Based Design – Proceedings of the 9th Workshop on Design, Modeling and Evaluation of Cyber Physical Systems (CyPhy 2019) and the Workshop on Embedded and Cyber-Physical Systems Education (WESE 2019) (Chamberlain, Roger and Edin Grimheden, Martin and Taha, Walid), Springer International Publishing, pp. 59–85, Cham, Feb 2020. [doi]

Abstract
This paper describes a component-based concurrent model of computation for reactive systems. The components in this model, featuring ports and hierarchy, are called reactors. The model leverages a semantic notion of time, an event scheduler, and a synchronous-reactive style of communication to achieve determinism. Reactors enable a programming model that ensures determinism, unless explicitly abandoned by the programmer. We show how the coordination of reactors can safely and transparently exploit parallelism, both in shared-memory and distributed systems.

Bibtex

@InProceedings{Lohstroh_cyphy19,
author = {Marten Lohstroh and {\'I}{\~n}igo {\'I}ncer Romero and Andr\'{e}s Goens and Patricia Derler and Jeronimo Castrillon and Edward A. Lee and Alberto Sangiovanni-Vincentelli},
title = {Reactors: A Deterministic Model for Composable Reactive Systems},
editor= {Chamberlain, Roger and Edin Grimheden, Martin and Taha, Walid},
booktitle={Cyber Physical Systems. Model-Based Design -- Proceedings of the 9th Workshop on Design, Modeling and Evaluation of Cyber Physical Systems (CyPhy 2019) and the Workshop on Embedded and Cyber-Physical Systems Education (WESE 2019)},
year = {2020},
location = {New York City, NY, USA},
month = feb,
publisher={Springer International Publishing},
address={Cham},
pages={59--85},
abstract={This paper describes a component-based concurrent model of computation for reactive systems. The components in this model, featuring ports and hierarchy, are called reactors. The model leverages a semantic notion of time, an event scheduler, and a synchronous-reactive style of communication to achieve determinism. Reactors enable a programming model that ensures determinism, unless explicitly abandoned by the programmer. We show how the coordination of reactors can safely and transparently exploit parallelism, both in shared-memory and distributed systems.},
isbn={978-3-030-41131-2},
url = {https://link.springer.com/chapter/10.1007/978-3-030-41131-2_4},
doi = {10.1007/978-3-030-41131-2_4},
numpages = {27pp},
}

Downloads

1910_Lohstroh_CyPhy [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2532

×
Alexander Brauckmann, Andrés Goens, Sebastian Ertel, Jeronimo Castrillon, "Compiler-Based Graph Representations for Deep Learning Models of Code", Proceedings of the 29th ACM SIGPLAN International Conference on Compiler Construction (CC 2020), Association for Computing Machinery, pp. 201–211, New York, NY, USA, Feb 2020. [doi] [Bibtex & Downloads]

Compiler-Based Graph Representations for Deep Learning Models of Code

Reference

Alexander Brauckmann, Andrés Goens, Sebastian Ertel, Jeronimo Castrillon, "Compiler-Based Graph Representations for Deep Learning Models of Code", Proceedings of the 29th ACM SIGPLAN International Conference on Compiler Construction (CC 2020), Association for Computing Machinery, pp. 201–211, New York, NY, USA, Feb 2020. [doi]

Bibtex

@InProceedings{brauckmann_cc20,
author = {Alexander Brauckmann and Andr\'{e}s Goens and Sebastian Ertel and Jeronimo Castrillon},
title = {Compiler-Based Graph Representations for Deep Learning Models of Code},
booktitle = {Proceedings of the 29th ACM SIGPLAN International Conference on Compiler Construction (CC 2020)},
year = {2020},
isbn = {9781450371209},
url = {https://doi.org/10.1145/3377555.3377894},
doi = {10.1145/3377555.3377894},
series = {CC 2020},
pages = {201–211},
numpages = {11},
publisher = {Association for Computing Machinery},
location = {San Diego, CA, USA},
month = feb,
address = {New York, NY, USA},
keywords = {conf},
}

Downloads

2002_Brauckmann_CC [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2561

×

2019
Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism", Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell, ACM, pp. 146–161, New York, NY, USA, Aug 2019. [doi] [Bibtex & Downloads]

STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism

Reference

Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism", Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell, ACM, pp. 146–161, New York, NY, USA, Aug 2019. [doi]

Abstract
Dataflow execution models are used to build highly scalable parallel systems. A programming model that targets parallel dataflow execution must answer the following question: How can parallelism between two dependent nodes in a dataflow graph be exploited? This is difficult when the dataflow language or programming model is implemented by a monad, as is common in the functional community, since expressing dependence between nodes by a monadic bind suggests sequential execution.
Even in monadic constructs that explicitly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, when abstractions of the chosen programming model do not enable adequate reasoning about state, it is difficult to detect parallelism between composed stateful computations.
In this paper, we propose a programming model that enables the composition of stateful computations and still exposes opportunities for parallelization. We also introduce smap, a higher-order function that can exploit parallelism in stateful computations. We present an implementation of our programming model and smap in Haskell and show that basic concepts from functional reactive programming can be built on top of our programming model with little effort. We compare these implementations to a state-of-the-art approach using monad-par and LVars to expose parallelism explicitly and reach the same level of performance, showing that our programming model successfully extracts parallelism that is present in an algorithm. Further evaluation shows that smap is expressive enough to implement parallel reductions and our programming model resolves short-comings of the stream-based programming model for current state-of-the-art big data processing systems.

Bibtex

@InProceedings{ertel_haskell19,
author = {Ertel, Sebastian and Adam, Justus and Rink, Norman A. and Goens, Andr{\'e}s and Castrillon, Jeronimo},
title = {{STCLang}: State Thread Composition as a Foundation for Monadic Dataflow Parallelism},
booktitle = {Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell},
year = {2019},
series = {Haskell 2019},
pages = {146--161},
address = {New York, NY, USA},
month = aug,
publisher = {ACM},
abstract = {Dataflow execution models are used to build highly scalable parallel systems. A programming model that targets parallel dataflow execution must answer the following question: How can parallelism between two dependent nodes in a dataflow graph be exploited? This is difficult when the dataflow language or programming model is implemented by a monad, as is common in the functional community, since expressing dependence between nodes by a monadic bind suggests sequential execution.
Even in monadic constructs that explicitly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, when abstractions of the chosen programming model do not enable adequate reasoning about state, it is difficult to detect parallelism between composed stateful computations.
In this paper, we propose a programming model that enables the composition of stateful computations and still exposes opportunities for parallelization. We also introduce smap, a higher-order function that can exploit parallelism in stateful computations. We present an implementation of our programming model and smap in Haskell and show that basic concepts from functional reactive programming can be built on top of our programming model with little effort. We compare these implementations to a state-of-the-art approach using monad-par and LVars to expose parallelism explicitly and reach the same level of performance, showing that our programming model successfully extracts parallelism that is present in an algorithm. Further evaluation shows that smap is expressive enough to implement parallel reductions and our programming model resolves short-comings of the stream-based programming model for current state-of-the-art big data processing systems.},
acmid = {3342600},
doi = {10.1145/3331545.3342600},
isbn = {978-1-4503-6813-1},
keywords = {conf},
location = {Berlin, Germany},
numpages = {16},
url = {http://doi.acm.org/10.1145/3331545.3342600}
}

Downloads

1908_Ertel_Haskell [PDF]

Related Paths
HAEC, Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2476

×
Andrés Goens, Christian Menard, Jeronimo Castrillon, "On Compact Mappings for Multicore Systems", Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS) (D. Pnevmatikatos and M. Pelcat and M. Jung), Springer, Cham, vol. 11733, pp. 325–335, Jul 2019. [doi] [Bibtex & Downloads]

On Compact Mappings for Multicore Systems

Reference

Andrés Goens, Christian Menard, Jeronimo Castrillon, "On Compact Mappings for Multicore Systems", Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS) (D. Pnevmatikatos and M. Pelcat and M. Jung), Springer, Cham, vol. 11733, pp. 325–335, Jul 2019. [doi]

Bibtex

@InProceedings{goens_samos19,
author = {Andr{\'e}s Goens and Christian Menard and Jeronimo Castrillon},
title = {On Compact Mappings for Multicore Systems},
booktitle = {Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS)},
year = {2019},
editor = {D. Pnevmatikatos and M. Pelcat and M. Jung},
volume = {11733},
pages = {325--335},
month = jul,
organization = {IEEE},
publisher = {Springer, Cham},
doi = {10.1007/978-3-030-27562-4_23},
isbn = {978-3-030-27561-7},
location = {Pythagorion, Greece},
numpages = {11},
url = {https://link.springer.com/chapter/10.1007/978-3-030-27562-4_23}
}

Downloads

1907_Goens_SAMOS [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2456

×
Andrés Goens, Alexander Brauckmann, Sebastian Ertel, Chris Cummins, Hugh Leather, Jeronimo Castrillon, "A Case Study on Machine Learning for Synthesizing Benchmarks", Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL), ACM, pp. 38–46, New York, NY, USA, Jun 2019. [doi] [Bibtex & Downloads]

A Case Study on Machine Learning for Synthesizing Benchmarks

Reference

Andrés Goens, Alexander Brauckmann, Sebastian Ertel, Chris Cummins, Hugh Leather, Jeronimo Castrillon, "A Case Study on Machine Learning for Synthesizing Benchmarks", Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL), ACM, pp. 38–46, New York, NY, USA, Jun 2019. [doi]

Abstract
Good benchmarks are hard to find because they require a substantial effort to keep them representative for the constantly changing challenges of a particular field. Synthetic benchmarks are a common approach to deal with this, and methods from machine learning are natural candidates for synthetic benchmark generation. In this paper we investigate the usefulness of machine learning in the prominent CLgen benchmark generator. We re-evaluate CLgen by comparing the benchmarks generated by the model with the raw data used to train it. This re-evaluation indicates that, for the use case considered, machine learning did not yield additional benefit over a simpler method using the raw data. We investigate the reasons for this and provide further insights into the challenges the problem could pose for potential future generators.

Bibtex

@InProceedings{goens_mapl19,
author = {Andr\'{e}s Goens and Alexander Brauckmann and Sebastian Ertel and Chris Cummins and Hugh Leather and Jeronimo Castrillon},
title = {A Case Study on Machine Learning for Synthesizing Benchmarks},
booktitle = {Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL)},
year = {2019},
series = {MAPL 2019},
doi = {10.1145/3315508.3329976},
url = {http://doi.acm.org/10.1145/3315508.3329976},
acmid = {3329976},
isbn = {978-1-4503-6719-6/19/06},
pages = {38--46},
address = {New York, NY, USA},
month = jun,
publisher = {ACM},
keywords = {conf},
location = {Phoenix, AZ, USA},
numpages = {9},
abstract = {Good benchmarks are hard to find because they require a substantial effort to keep them representative for the constantly changing challenges of a particular field. Synthetic benchmarks are a common approach to deal with this, and methods from machine learning are natural candidates for synthetic benchmark generation. In this paper we investigate the usefulness of machine learning in the prominent CLgen benchmark generator. We re-evaluate CLgen by comparing the benchmarks generated by the model with the raw data used to train it. This re-evaluation indicates that, for the use case considered, machine learning did not yield additional benefit over a simpler method using the raw data. We investigate the reasons for this and provide further insights into the challenges the problem could pose for potential future generators.},
}

Downloads

1906_Goens_MAPL [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2450

×
Marten Lohstroh, Martin Schoeberl, Andrés Goens, Armin Wasicek, Christopher Gill, Marjan Sirjani, Edward A Lee, "Actors Revisited for Time-Critical Systems", Proceedings of the 56th annual Design Automation Conference, ACM, 4pp, Las Vegas, NV, USA, Jun 2019. [doi] [Bibtex & Downloads]

Actors Revisited for Time-Critical Systems

Reference

Marten Lohstroh, Martin Schoeberl, Andrés Goens, Armin Wasicek, Christopher Gill, Marjan Sirjani, Edward A Lee, "Actors Revisited for Time-Critical Systems", Proceedings of the 56th annual Design Automation Conference, ACM, 4pp, Las Vegas, NV, USA, Jun 2019. [doi]

Bibtex

@InProceedings{lohstroh_dac19,
title={Actors Revisited for Time-Critical Systems},
author={Lohstroh, Marten and Schoeberl, Martin and Goens, Andr{\'e}s and Wasicek, Armin and Gill, Christopher and Sirjani, Marjan and Lee, Edward A},
year={2019},
booktitle = {Proceedings of the 56th annual Design Automation Conference},
year = {2019},
series = {DAC 2019},
pages = {4pp},
address = {Las Vegas, NV, USA},
month = jun,
publisher = {ACM},
keywords = {conf},
location = {Las Vegas, NV, USA},
numpages = {4},
url = {http://doi.acm.org/10.1145/3316781.3323469},
doi = {10.1145/3316781.3323469},
}

Downloads

No Downloads available for this publication

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2453

×
Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "Category-Theoretic Foundations of ``STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism''", In CoRR, vol. abs/1906.12098, Jun 2019. [Bibtex & Downloads]

Category-Theoretic Foundations of ``STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism''

Reference

Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "Category-Theoretic Foundations of ``STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism''", In CoRR, vol. abs/1906.12098, Jun 2019.

Bibtex

@Article{ertel_haskellsup19,
author = {Sebastian Ertel and Justus Adam and Norman A. Rink and Andr{\'{e}}s Goens and Jeronimo Castrillon},
title = {Category-Theoretic Foundations of ``STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism''},
journal = {CoRR},
year = {2019},
volume = {abs/1906.12098},
month = jun,
archiveprefix = {arXiv},
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1906-12098},
eprint = {1906.12098},
url = {http://arxiv.org/abs/1906.12098}
}

Downloads

1906_Ertel_Haskellsupp [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2486

×

2018
Andrés Goens, Christian Menard, Jeronimo Castrillon, "On the Representation of Mappings to Multicores", Proceedings of the IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-18), pp. 184–191, Vietnam National University, Hanoi, Vietnam, Sep 2018. [doi] [Bibtex & Downloads]

On the Representation of Mappings to Multicores

Reference

Andrés Goens, Christian Menard, Jeronimo Castrillon, "On the Representation of Mappings to Multicores", Proceedings of the IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-18), pp. 184–191, Vietnam National University, Hanoi, Vietnam, Sep 2018. [doi]

Abstract
Application requirements for embedded systems are growing rapidly, as is the complexity of systems designed to execute them. A common abstraction used to tame this growing complexity is that of a mapping, which assigns parts of an application to different hardware resources. Modern flows need to explore an intractably large design space of mappings, and be able to quickly find near-optimal mappings for different objectives, sometimes at runtime. With systems featuring thousands of cores in the near horizon, we need methods to make this exploration step truly scalable. In this paper we argue that the mathematical representation of a mapping is central to achieve this. We present different representations and how these could be applied to different contexts and objectives, like complex design- space exploration meta-heuristics or efficient runtime systems.

Bibtex

@InProceedings{goen_mcsoc18,
author = {Andr\'{e}s Goens and Christian Menard and Jeronimo Castrillon},
title = {On the Representation of Mappings to Multicores},
booktitle = {Proceedings of the IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-18)},
year = {2018},
address = {Vietnam National University, Hanoi, Vietnam},
month = sep,
pages = {184--191},
doi = {10.1109/MCSoC2018.2018.00039},
url = {https://ieeexplore.ieee.org/document/8540232},
isbn = {978-1-5386-6689-0/18/},
abstract = {Application requirements for embedded systems are growing rapidly, as is the complexity of systems designed to execute them. A common abstraction used to tame this growing complexity is that of a mapping, which assigns parts of an application to different hardware resources. Modern flows need to explore an intractably large design space of mappings, and be able to quickly find near-optimal mappings for different objectives, sometimes at runtime. With systems featuring thousands of cores in the near horizon, we need methods to make this exploration step truly scalable. In this paper we argue that the mathematical representation of a mapping is central to achieve this. We present different representations and how these could be applied to different contexts and objectives, like complex design- space exploration meta-heuristics or efficient runtime systems.},
}

Downloads

1809_Goens_MCSoC [PDF]

Related Paths
HAEC, Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=2152

×
Jeronimo Castrillon, Matthias Lieber, Sascha Klüppelholz, Marcus Völp, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andrés Goens, Sebastian Haas, Dirk Habich, Hermann Härtig, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Akash Kumar, Wolfgang Lehner, Linda Leuschner, Siqi Ling, Steffen Märcker, Christian Menard, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, Sascha Wunderlich, "A Hardware/Software Stack for Heterogeneous Systems", In IEEE Transactions on Multi-Scale Computing Systems, vol. 4, no. 3, pp. 243-259, Jul 2018. [doi] [Bibtex & Downloads]

A Hardware/Software Stack for Heterogeneous Systems

Reference

Jeronimo Castrillon, Matthias Lieber, Sascha Klüppelholz, Marcus Völp, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andrés Goens, Sebastian Haas, Dirk Habich, Hermann Härtig, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Akash Kumar, Wolfgang Lehner, Linda Leuschner, Siqi Ling, Steffen Märcker, Christian Menard, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, Sascha Wunderlich, "A Hardware/Software Stack for Heterogeneous Systems", In IEEE Transactions on Multi-Scale Computing Systems, vol. 4, no. 3, pp. 243-259, Jul 2018. [doi]

Abstract
Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.

Bibtex

@Article{castrillon_tmscs17,
author = {Jeronimo Castrillon and Matthias Lieber and Sascha Kl{\"u}ppelholz and Marcus V{\"o}lp and Nils Asmussen and Uwe Assmann and Franz Baader and Christel Baier and Gerhard Fettweis and Jochen Fr{\"o}hlich and Andr\'{e}s Goens and Sebastian Haas and Dirk Habich and Hermann H{\"a}rtig and Mattis Hasler and Immo Huismann and Tomas Karnagel and Sven Karol and Akash Kumar and Wolfgang Lehner and Linda Leuschner and Siqi Ling and Steffen M{\"a}rcker and Christian Menard and Johannes Mey and Wolfgang Nagel and Benedikt N{\"o}then and Rafael Pe{\~n}aloza and Michael Raitza and J{\"o}rg Stiller and Annett Ungeth{\"u}m and Axel Voigt and Sascha Wunderlich},
title = {A Hardware/Software Stack for Heterogeneous Systems},
journal = {IEEE Transactions on Multi-Scale Computing Systems},
year = {2018},
month = jul,
volume={4},
number={3},
pages={243-259},
abstract = {Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.},
doi = {10.1109/TMSCS.2017.2771750},
issn = {2332-7766},
url = {http://ieeexplore.ieee.org/document/8103042/}
}

Downloads

1711_Castrillon_TMSCS [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=1597

×
Sebastian Ertel, Andrés Goens, Justus Adam, Jeronimo Castrillon, "Compiling for Concise Code and Efficient I/O", Proceedings of the 27th International Conference on Compiler Construction (CC 2018), ACM, pp. 104–115, New York, NY, USA, Feb 2018. [doi] [Bibtex & Downloads]

Compiling for Concise Code and Efficient I/O

Reference

Sebastian Ertel, Andrés Goens, Justus Adam, Jeronimo Castrillon, "Compiling for Concise Code and Efficient I/O", Proceedings of the 27th International Conference on Compiler Construction (CC 2018), ACM, pp. 104–115, New York, NY, USA, Feb 2018. [doi]

Abstract
Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals.

Bibtex

@InProceedings{ertel_cc18,
author = {Sebastian Ertel and Andr\'{e}s Goens and Justus Adam and Jeronimo Castrillon},
title = {Compiling for Concise Code and Efficient I/O},
booktitle = {Proceedings of the 27th International Conference on Compiler Construction (CC 2018)},
series = {CC 2018},
year = {2018},
month = feb,
location = {Vienna, Austria},
publisher = {ACM},
numpages = {12},
pages = {104--115},
doi = {10.1145/3178372.3179505},
url = {https://dl.acm.org/citation.cfm?id=3179505},
acmid = {3179505},
address = {New York, NY, USA},
abstract = {Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals.},
}

Downloads

cc-2018-slides [PDF]

1802_Ertel_CC [PDF]

Related Paths
HAEC, Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=1793

×
Robert Khasanov, Andrés Goens, Jeronimo Castrillon, "Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap", Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'18), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 20–25, New York, NY, USA, Jan 2018. [doi] [Bibtex & Downloads]

Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap

Reference

Robert Khasanov, Andrés Goens, Jeronimo Castrillon, "Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap", Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'18), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 20–25, New York, NY, USA, Jan 2018. [doi]

Abstract
Modern embedded systems are rapidly increasing their complexity, both in terms of numbers of cores, as well as heterogeneity. To generate efficient code for these systems, it is common to leverage formal models of computation.
Among these, the dataflow model of Kahn Process Networks (KPN) is widespread because it is expressive but guarantees a deterministic execution. However, the KPN model is ill-suited to expose data-level parallelism, since this has to be made explicit in the process network. This is aggravated by the fact that its most common execution model, Kahn-MacQueen, poses restrictive conditions on the scheduling of data-parallel processes, leading to an inefficient execution. In this paper we present a novel extension to the KPN model and a relaxed execution strategy that addresses this problem, while keeping the deterministic KPN semantics. It improves run-time adaptivity in malleable way and provides implicit parallelism. We evaluate our approach on two architectures, improving the performance of a benchmark by up to 25.6% on an Intel chip with hyper-threading, and by up to 78.0% on a heterogeneous embedded ARM big.LITTLE architecture.

Bibtex

@InProceedings{khasanov_parma18,
author = {Robert Khasanov and Andr\'{e}s Goens and Jeronimo Castrillon},
title = {Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap},
booktitle = {Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'18), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
series = {PARMA-DITAM '18},
isbn = {978-1-4503-6444-7},
pages = {20--25},
year = {2018},
month = jan,
numpages = {6},
url = {http://doi.acm.org/10.1145/3183767.3183790},
doi = {10.1145/3183767.3183790},
acmid = {3183790},
publisher = {ACM},
address = {New York, NY, USA},
location = {Manchester, United Kingdom},
abstract = {Modern embedded systems are rapidly increasing their complexity, both in terms of numbers of cores, as well as heterogeneity. To generate efficient code for these systems, it is common to leverage formal models of computation.
Among these, the dataflow model of Kahn Process Networks (KPN) is widespread because it is expressive but guarantees a deterministic execution. However, the KPN model is ill-suited to expose data-level parallelism, since this has to be made explicit in the process network. This is aggravated by the fact that its most common execution model, Kahn-MacQueen, poses restrictive conditions on the scheduling of data-parallel processes, leading to an inefficient execution. In this paper we present a novel extension to the KPN model and a relaxed execution strategy that addresses this problem, while keeping the deterministic KPN semantics. It improves run-time adaptivity in malleable way and provides implicit parallelism. We evaluate our approach on two architectures, improving the performance of a benchmark by up to 25.6% on an Intel chip with hyper-threading, and by up to 78.0% on a heterogeneous embedded ARM big.LITTLE architecture.},
}

Downloads

1801_Khasanov_PARMA-DITAM [PDF]

Related Paths
HAEC, Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=1781

×
Andrés Goens, Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers", Proceedings of the 11th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG'2018), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Jan 2018. [Bibtex & Downloads]

Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers

Reference

Andrés Goens, Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers", Proceedings of the 11th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG'2018), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Jan 2018.

Bibtex

@InProceedings{goens_multiprog18,
author = {Andr{\'e}s Goens and Sebastian Ertel and Justus Adam and Jeronimo Castrillon},
title = {Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers},
booktitle = {Proceedings of the 11th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG'2018), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
year = {2018},
url = {http://research.ac.upc.edu/multiprog/multiprog2018/papers/MULTIPROG-2018_Goens.pdf},
month = jan,
location = {Manchester, United Kingdom}
}

Downloads

1801_Goens_MULTIRPOG [PDF]

Related Paths
HAEC, Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=1782

×

2017
Andrés Goens, Sergio Siccha, Jeronimo Castrillon, "Symmetry in Software Synthesis", In ACM Transactions on Architecture and Code Optimization (TACO),, ACM, vol. 14, no. 2, pp. 20:1–20:26, New York, NY, USA, Jul 2017. [doi] [Bibtex & Downloads]

Symmetry in Software Synthesis

Reference

Andrés Goens, Sergio Siccha, Jeronimo Castrillon, "Symmetry in Software Synthesis", In ACM Transactions on Architecture and Code Optimization (TACO),, ACM, vol. 14, no. 2, pp. 20:1–20:26, New York, NY, USA, Jul 2017. [doi]

Abstract
With the surge of multi- and manycores, much research has focused on algorithms for mapping and scheduling on these complex platforms. Large classes of these algorithms face scalability problems. This is why diverse methods are commonly used for reducing the search space. While most such approaches leverage the inherent symmetry of architectures and applications, they do it in a problem-specific and intuitive way. However, intuitive approaches become impractical with growing hardware complexity, like Network-on-Chip interconnect or heterogeneous cores. In this paper, we present a formal framework that can determine the inherent symmetry of architectures and applications algorithmically and leverage these for problems in software synthesis. Our approach is based on the mathematical theory of groups and a generalization called inverse semigroups. We evaluate our approach in two state-of-the-art mapping frameworks. Even for the platforms with a handful of cores of today and moderate-size benchmarks, our approach consistently yields reductions of the overall execution time of algorithms, accelerating them by a factor up to 10 in our experiments, or improving the quality of the results.

Bibtex

@article{goens_taco17symmetry,
author = {Goens, Andr{\'e}s and Siccha, Sergio and Castrillon, Jeronimo},
title = {Symmetry in Software Synthesis},
journal = {ACM Transactions on Architecture and Code Optimization (TACO),},
issue_date = {July 2017},
volume = {14},
number = {2},
month = jul,
year = {2017},
issn = {1544-3566},
pages = {20:1--20:26},
articleno = {20},
numpages = {26},
url = {http://doi.acm.org/10.1145/3095747},
doi = {10.1145/3095747},
acmid = {3095747},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Scalability, automation, clusters, design-space exploration, group theory, heterogeneous, inverse-semigroups, mapping, metaheuristics, network-on-chip, symmetry},
eprint = "arXiv:1704.06623",
abstract = {With the surge of multi- and manycores, much research has focused on algorithms for mapping and scheduling on these complex platforms. Large classes of these algorithms face scalability problems. This is why diverse methods are commonly used for reducing the search space. While most such approaches leverage the inherent symmetry of architectures and applications, they do it in a problem-specific and intuitive way. However, intuitive approaches become impractical with growing hardware complexity, like Network-on-Chip interconnect or heterogeneous cores. In this paper, we present a formal framework that can determine the inherent symmetry of architectures and applications algorithmically and leverage these for problems in software synthesis. Our approach is based on the mathematical theory of groups and a generalization called inverse semigroups. We evaluate our approach in two state-of-the-art mapping frameworks. Even for the platforms with a handful of cores of today and moderate-size benchmarks, our approach consistently yields reductions of the overall execution time of algorithms, accelerating them by a factor up to 10 in our experiments, or improving the quality of the results.}
}

Downloads

1704_Goens_TACO-arxiv [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=1453

×
Andrés Goens, Robert Khasanov, Marcus Hähnel, Till Smejkal, Hermann Härtig, Jeronimo Castrillon, "TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings", Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17), ACM, pp. 11–20, New York, NY, USA, Jun 2017. [doi] [Bibtex & Downloads]

TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings

Reference

Andrés Goens, Robert Khasanov, Marcus Hähnel, Till Smejkal, Hermann Härtig, Jeronimo Castrillon, "TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings", Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17), ACM, pp. 11–20, New York, NY, USA, Jun 2017. [doi]

Bibtex

@InProceedings{goens_scopes17,
author = {Andr\'{e}s Goens and Robert Khasanov and Marcus H{\"a}hnel and Till Smejkal and Hermann H{\"a}rtig and Jeronimo Castrillon},
title = {TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings},
booktitle = {Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17)},
year = {2017},
month = jun,
series = {SCOPES '17},
isbn = {978-1-4503-5039-6},
location = {Sankt Goar, Germany},
pages = {11--20},
numpages = {10},
url = {http://doi.acm.org/10.1145/3078659.3078663},
doi = {10.1145/3078659.3078663},
acmid = {3078663},
publisher = {ACM},
address = {New York, NY, USA}
}

Downloads

1706_Goens_SCOPES [PDF]

Related Paths
HAEC, Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=1451

×
Gerald Hempel, Andrés Goens, Josefine Asmus, Jeronimo Castrillon, Ivo F. Sbalzarini, "Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering", Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES '17), ACM, pp. 21–30, New York, NY, USA, Jun 2017. [doi] [Bibtex & Downloads]

Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering

Reference

Gerald Hempel, Andrés Goens, Josefine Asmus, Jeronimo Castrillon, Ivo F. Sbalzarini, "Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering", Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES '17), ACM, pp. 21–30, New York, NY, USA, Jun 2017. [doi]

Bibtex

@InProceedings{hempel_scopes17,
author = {Gerald Hempel and Andr\'{e}s Goens and Josefine Asmus and Jeronimo Castrillon and Ivo F. Sbalzarini},
title = {Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering},
booktitle = {Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES '17)},
year = {2017},
series = {SCOPES '17},
pages = {21--30},
address = {New York, NY, USA},
month = jun,
publisher = {ACM},
acmid = {3078667},
doi = {10.1145/3078659.3078667},
isbn = {978-1-4503-5039-6},
location = {Sankt Goar, Germany},
numpages = {10},
url = {http://doi.acm.org/10.1145/3078659.3078667}
}

Downloads

1706_Hempel_SCOPES [PDF]

Related Paths
Biological Systems Path, Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=1452

×
Andrés Goens, Jeronimo Castrillon, "Optimizing for Data-Parallelism in Kahn Process Networks", In Proceeding: ACM SRC at International Symposium on Code Generationand Optimization (CGO), Feb 2017. [Bibtex & Downloads]

Optimizing for Data-Parallelism in Kahn Process Networks

Reference

Andrés Goens, Jeronimo Castrillon, "Optimizing for Data-Parallelism in Kahn Process Networks", In Proceeding: ACM SRC at International Symposium on Code Generationand Optimization (CGO), Feb 2017.

Bibtex

@inproceedings{goens17cgo,
author = {Andr\'{e}s Goens and Jeronimo Castrillon},
title = {Optimizing for Data-Parallelism in Kahn Process Networks},
year = {2017},
month = feb,
booktitle= {ACM SRC at International Symposium on
Code Generationand Optimization (CGO)},
location = {Austin, TX, USA},
}

Downloads

1701_Goens_SRCCGO [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=1544

×

2016
Marcus Völp, Sascha Klüppelholz, Jeronimo Castrillon, Hermann Härtig, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andres Goens, Sebastian Haas, Dirk Habich, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Wolfgang Lehner, Linda Leuschner, Matthias Lieber, Siqi Ling, Steffen Märcker, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, "The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware", Proceedings of the 1st International Workshop on Post-Moore's Era Supercomputing (PMES), Co-located with The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, USA, Nov 2016. [Bibtex & Downloads]

The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware

Reference

Marcus Völp, Sascha Klüppelholz, Jeronimo Castrillon, Hermann Härtig, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andres Goens, Sebastian Haas, Dirk Habich, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Wolfgang Lehner, Linda Leuschner, Matthias Lieber, Siqi Ling, Steffen Märcker, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, "The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware", Proceedings of the 1st International Workshop on Post-Moore's Era Supercomputing (PMES), Co-located with The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, USA, Nov 2016.

Abstract
Future systems based on post-CMOS technologies,
will be wildly heterogeneous, with properties largely unknown today.,
This paper presents our design of a new hardware/software stack to address the,
challenge of preparing software development for such systems.,
It combines well-understood technologies from different areas, e.g., network-on-chips,
capability operating systems, flexible programming models and model checking.,
We describe our approach and provide details on key technologies.

Bibtex

@InProceedings{voelp16_pmes,
author = {Marcus V{\"o}lp and Sascha Kl{\"u}ppelholz and Jeronimo Castrillon and Hermann H{\"a}rtig and Nils Asmussen and Uwe Assmann and Franz Baader and Christel Baier and Gerhard Fettweis and Jochen Fr{\"o}hlich and Andres Goens and Sebastian Haas and Dirk Habich and Mattis Hasler and Immo Huismann and Tomas Karnagel and Sven Karol and Wolfgang Lehner and Linda Leuschner and Matthias Lieber and Siqi Ling and Steffen M{\"a}rcker and Johannes Mey and Wolfgang Nagel and Benedikt N{\"o}then and Rafael Pe{\~n}aloza and Michael Raitza and J{\"o}rg Stiller and Annett Ungeth{\"u}m and Axel Voigt},
title = {The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware},
booktitle = {Proceedings of the 1st International Workshop on Post-Moore's Era Supercomputing (PMES), Co-located with The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)},
year = {2016},
address = {Salt Lake City, USA},
month = nov,
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/1611_Voelp_PMES.pdf},
abstract = {Future systems based on post-CMOS technologies,
will be wildly heterogeneous, with properties largely unknown today.,
This paper presents our design of a new hardware/software stack to address the,
challenge of preparing software development for such systems.,
It combines well-understood technologies from different areas, e.g., network-on-chips,
capability operating systems, flexible programming models and model checking.,
We describe our approach and provide details on key technologies.},
}

Downloads

1611_Voelp_PMES [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=923

×
Christian Menard, Andrés Goens, Jeronimo Castrillon, "High-Level NoC Model for MPSoC Compilers", Proceedings of the IEEE Nordic Circuits and Systems Conference (NORCAS'16), pp. 1-6, Copenhagen, Denmark, Nov 2016. [doi] [Bibtex & Downloads]

High-Level NoC Model for MPSoC Compilers

Reference

Christian Menard, Andrés Goens, Jeronimo Castrillon, "High-Level NoC Model for MPSoC Compilers", Proceedings of the IEEE Nordic Circuits and Systems Conference (NORCAS'16), pp. 1-6, Copenhagen, Denmark, Nov 2016. [doi]

Bibtex

@InProceedings{menard_norcas16,
author = {Christian Menard and Andr\'{e}s Goens and Jeronimo Castrillon},
title = {High-Level NoC Model for MPSoC Compilers},
booktitle = {Proceedings of the IEEE Nordic Circuits and Systems Conference (NORCAS'16)},
year = {2016},
pages={1-6},
doi = {10.1109/NORCHIP.2016.7792876},
series = {NORCAS},
address = {Copenhagen, Denmark},
month = nov,
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/1611_Menard_NORCAS.pdf}
}

Downloads

1611_Menard_NORCAS [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=1180

×
Andres Goens, Robert Khasanov, Jeronimo Castrillon, Simon Polstra, Andy Pimentel, "Why Comparing System-level MPSoC Mapping Approaches is Difficult: a Case Study", Proceedings of the IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), pp. 281-288, Ecole Centrale de Lyon, Lyon, France, Sep 2016. [doi] [Bibtex & Downloads]

Why Comparing System-level MPSoC Mapping Approaches is Difficult: a Case Study

Reference

Andres Goens, Robert Khasanov, Jeronimo Castrillon, Simon Polstra, Andy Pimentel, "Why Comparing System-level MPSoC Mapping Approaches is Difficult: a Case Study", Proceedings of the IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), pp. 281-288, Ecole Centrale de Lyon, Lyon, France, Sep 2016. [doi]

Abstract
Software abstractions are crucial to effectively program heterogeneous Multi-Processor Systems on Chip (MPSoCs). Prime examples of such abstractions are Kahn Process Networks (KPNs) and execution traces. When modeling computation as a KPN, one of the key challenges is to obtain a good mapping, i.e., an assignment of logical computation and communication to physical resources. In this paper we compare two system-level frameworks for solving the mapping problem: Sesame and MAPS. These frameworks, while superficially similar, embody different approaches. Sesame, motivated by modeling and design-space exploration, uses evolutionary algorithms for mapping. MAPS, being a compiler framework, uses simple and fast heuristics instead. In this work we highlight the value of common abstractions, such as KPNs and traces, as a vehicle to enable comparisons between large independent frameworks. These types of comparisons are fundamental for advancing research in the area. At the same time, we illustrate how the lack of formalized models at the hardware level are an obstacle to achieving fair comparisons. Additionally, using a set of applications from the embedded systems domain, we observe that genetic algorithms tend to outperform heuristics by a factor between 1x and 5x, with notable exceptions. This performance comes at the cost of a longer computation time, between 0 and 2 orders of magnitude in our experiments.

Bibtex

@InProceedings{goen_mcsoc16,
author= {Andres Goens and Robert Khasanov and Jeronimo Castrillon and Simon Polstra and Andy Pimentel},
title= {Why Comparing System-level {MPSoC} Mapping Approaches is Difficult: a Case Study},
booktitle= {Proceedings of the IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16)},
year= {2016},
address= {Ecole Centrale de Lyon, Lyon, France},
month= sep,
pages = {281-288},
doi = {10.1109/MCSoC.2016.48},
abstract = {Software abstractions are crucial to effectively program heterogeneous Multi-Processor Systems on Chip (MPSoCs). Prime examples of such abstractions are Kahn Process Networks (KPNs) and execution traces. When modeling computation as a KPN, one of the key challenges is to obtain a good mapping, i.e., an assignment of logical computation and communication to physical resources. In this paper we compare two system-level frameworks for solving the mapping problem: Sesame and MAPS. These frameworks, while superficially similar, embody different approaches. Sesame, motivated by modeling and design-space exploration, uses evolutionary algorithms for mapping. MAPS, being a compiler framework, uses simple and fast heuristics instead. In this work we highlight the value of common abstractions, such as KPNs and traces, as a vehicle to enable comparisons between large independent frameworks. These types of comparisons are fundamental for advancing research in the area. At the same time, we illustrate how the lack of formalized models at the hardware level are an obstacle to achieving fair comparisons. Additionally, using a set of applications from the embedded systems domain, we observe that genetic algorithms tend to outperform heuristics by a factor between 1x and 5x, with notable exceptions. This performance comes at the cost of a longer computation time, between 0 and 2 orders of magnitude in our experiments.},
days= {21},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/1609_Goens_MCSoC.pdf}
}

Downloads

1609_Goens_MCSoC [PDF]

Related Paths
HAEC, Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=806

×
Andrés Goens, Jeronimo Castrillon, Maximilian Odendahl, Rainer Leupers, "An Optimal Allocation of Memory Buffers for Complex Multicore Platforms", In Journal of Systems Architecture, Elsevier, vol. 66-67, pp. 69–83, May 2016. [doi] [Bibtex & Downloads]

An Optimal Allocation of Memory Buffers for Complex Multicore Platforms

Reference

Andrés Goens, Jeronimo Castrillon, Maximilian Odendahl, Rainer Leupers, "An Optimal Allocation of Memory Buffers for Complex Multicore Platforms", In Journal of Systems Architecture, Elsevier, vol. 66-67, pp. 69–83, May 2016. [doi]

Abstract
In deeply embedded heterogeneous multicores the allocation of data to memories is crucial for application performance. For applications with stringent throughput constraints, the allocation is often done manually by carefully assigning static memory locations to the logical buffers of the application. Today, designers are confronted with applications with thousands of buffers and architectures with hundreds of memories, rendering manual approaches impractical. In this paper we present an automatic approach for statically allocating logical buffers to physical memories, assuming a fixed task-to-processor mapping and respecting multiple throughput constraints.
In our approach, we model the application in a data-centric way, by explicitly defining buffers and associating computational tasks that access the buffers within well-specified time intervals. Besides, we use an architecture model that allows to perform an allocation that is aware of the topology of the multicore and the physical bandwidth constraints of the interconnect. We present a layered approach to describe and solve the buffer-allocation problem as well as related subproblems, using mixed-integer linear pro- gramming. We show that the buffer-allocation problem is NP-complete, and present a more scalable formulation as a semi-definite programming problem. We evaluate the proposed LP methods by allocating around 1000 buffers corresponding to processing one frame in the Long-Term Evolution (LTE) standard, onto a multicore with 80 processing elements. We introduce a solution approach that allowed to find an optimal allocation in around 2 hours, which is at least two orders of magnitude faster than a straightforward formulation.

Bibtex

@Article{goens_jsa16,
Title={An Optimal Allocation of Memory Buffers for Complex Multicore Platforms},
Author={Goens, Andr\'{e}s and Castrillon, Jeronimo and Odendahl, Maximilian and Leupers, Rainer},
Journal={Journal of Systems Architecture},
volume={66-67},
pages={69--83},
doi={10.1016/j.sysarc.2016.05.002},
publisher={Elsevier},
Year={2016},
month=may,
abstract={In deeply embedded heterogeneous multicores the allocation of data to memories is crucial for application performance. For applications with stringent throughput constraints, the allocation is often done manually by carefully assigning static memory locations to the logical buffers of the application. Today, designers are confronted with applications with thousands of buffers and architectures with hundreds of memories, rendering manual approaches impractical. In this paper we present an automatic approach for statically allocating logical buffers to physical memories, assuming a fixed task-to-processor mapping and respecting multiple throughput constraints.
In our approach, we model the application in a data-centric way, by explicitly defining buffers and associating computational tasks that access the buffers within well-specified time intervals. Besides, we use an architecture model that allows to perform an allocation that is aware of the topology of the multicore and the physical bandwidth constraints of the interconnect. We present a layered approach to describe and solve the buffer-allocation problem as well as related subproblems, using mixed-integer linear pro- gramming. We show that the buffer-allocation problem is NP-complete, and present a more scalable formulation as a semi-definite programming problem. We evaluate the proposed LP methods by allocating around 1000 buffers corresponding to processing one frame in the Long-Term Evolution (LTE) standard, onto a multicore with 80 processing elements. We introduce a solution approach that allowed to find an optimal allocation in around 2 hours, which is at least two orders of magnitude faster than a straightforward formulation.}
}

Downloads

No Downloads available for this publication

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=684

×

2015
Andrés Goens, Jeronimo Castrillon, "Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs", In Proceeding: System Level Design from HW/SW to Memory for Embedded Systems. IESS 2015. IFIP Advances in Information and Communication Technology, vol 523 (Götz, Marcelo and Schirner, Gunar and Wehrmeister, Marco Aurélio and Al Faruque, Mohammad Abdullah and Rettberg, Achim), Springer International Publishing, pp. 116–127, Foz do Iguaçu, Brazil, Nov 2015. [doi] [Bibtex & Downloads]

Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs

Reference

Andrés Goens, Jeronimo Castrillon, "Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs", In Proceeding: System Level Design from HW/SW to Memory for Embedded Systems. IESS 2015. IFIP Advances in Information and Communication Technology, vol 523 (Götz, Marcelo and Schirner, Gunar and Wehrmeister, Marco Aurélio and Al Faruque, Mohammad Abdullah and Rettberg, Achim), Springer International Publishing, pp. 116–127, Foz do Iguaçu, Brazil, Nov 2015. [doi]

Abstract
Current approaches for mapping Kahn Process Networks (KPN) and Dynamic Data Flow (DDF) applications rely on assumptions on the program behavior specific to an execution. Thus, a near-optimal mapping, computed for a given input data set, may become sub-optimal at run-time. This happens when a different data set induces a significantly different behavior. We address this problem by leveraging inherent mathematical structures of the dataflow models and the hardware architectures. On the side of the dataflow models, we rely on the monoid structure of histories and traces. This structure help us formalize the behavior of multiple executions of a given dynamic application. By defining metrics we have a formal framework for comparing the executions. On the side of the hardware, we take advantage of symmetries in the architecture to reduce the search space for the mapping problem. We evaluate our implementation on execution variations of a randomly-generated KPN application and on a low-variation JPEG encoder benchmark. Using the described methods we show that trace differences are not sufficient for characterizing performance losses. Additionally, using platform symmetries we manage to reduce the design space in the experiments by two orders of magnitude.

Bibtex

@InProceedings{goens_iess15,
author = {Goens, Andr\'{e}s and Castrillon, Jeronimo},
title = {Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs},
booktitle = {System Level Design from HW/SW to Memory for Embedded Systems. IESS 2015. IFIP Advances in Information and Communication Technology, vol 523},
year = {2015},
editor = {G{\"o}tz, Marcelo and Schirner, Gunar and Wehrmeister, Marco Aur{\'e}lio and Al Faruque, Mohammad Abdullah and Rettberg, Achim},
pages = {116--127},
address = {Foz do Igua{\c{c}}u, Brazil},
month = nov,
publisher = {Springer International Publishing},
doi = {10.1007/978-3-319-90023-0_10},
url = {https://link.springer.com/chapter/10.1007%2F978-3-319-90023-0_10},
isbn={978-3-319-90023-0},
abstract = {Current approaches for mapping Kahn Process Networks (KPN) and Dynamic Data Flow (DDF) applications rely on assumptions on the program behavior specific to an execution. Thus, a near-optimal mapping, computed for a given input data set, may become sub-optimal at run-time. This happens when a different data set induces a significantly different behavior. We address this problem by leveraging inherent mathematical structures of the dataflow models and the hardware architectures. On the side of the dataflow models, we rely on the monoid structure of histories and traces. This structure help us formalize the behavior of multiple executions of a given dynamic application. By defining metrics we have a formal framework for comparing the executions. On the side of the hardware, we take advantage of symmetries in the architecture to reduce the search space for the mapping problem. We evaluate our implementation on execution variations of a randomly-generated KPN application and on a low-variation JPEG encoder benchmark. Using the described methods we show that trace differences are not sufficient for characterizing performance losses. Additionally, using platform symmetries we manage to reduce the design space in the experiments by two orders of magnitude.},
}

Downloads

1511_Goens_IESS [PDF]

Related Paths
Orchestration Path

Permalink

https://cfaed.tu-dresden.de/publications?pubId=463

×

2014
Maximilian Odendahl, Andrés Goens, Rainer Leupers, Gerd Ascheid, Benjamin Ries, Berthold Vöcking, Tomas Henriksson, "Optimized buffer allocation in multicore platforms", Proceedings of the conference on Design, Automation & Test in Europe, pp. 324, 2014. [Bibtex & Downloads]

Optimized buffer allocation in multicore platforms

Reference

Maximilian Odendahl, Andrés Goens, Rainer Leupers, Gerd Ascheid, Benjamin Ries, Berthold Vöcking, Tomas Henriksson, "Optimized buffer allocation in multicore platforms", Proceedings of the conference on Design, Automation & Test in Europe, pp. 324, 2014.

Bibtex

@inproceedings{odendahl2014optimized,
title={Optimized buffer allocation in multicore platforms},
author={Odendahl, Maximilian and Goens, Andr{\'e}s and Leupers, Rainer and Ascheid, Gerd and Ries, Benjamin and V{\"o}cking, Berthold and Henriksson, Tomas},
booktitle={Proceedings of the conference on Design, Automation \& Test in Europe},
pages={324},
year={2014},
organization={European Design and Automation Association}
}

Downloads

No Downloads available for this publication

Permalink

https://cfaed.tu-dresden.de/publications?pubId=457

×

Go back

Andrés Goens

2024

2023

2021

2020

2019

2018